Welcome!

@DevOpsSummit Authors: Liz McMillan, Yeshim Deniz, Zakia Bouachraoui, Pat Romanski, Elizabeth White

Related Topics: @CloudExpo, Microsoft Cloud, Agile Computing, @DXWorldExpo

@CloudExpo: Article

New Year’s Resolutions for Internet Retail By @Papa_Fire @CloudExpo [#Cloud]

The technology space has evolved. So have a lot of technologists

As the holiday rush is winding down, I sit here reflecting on all the companies that lost business/revenue during the busiest time of the year. Loss of business not because of technology failure, although this is always a manifestation of a problem, but because of process failure in order to remedy the failures of technology. I've offered some tips on preparing for the holiday traffic from the system architecture perspective, but perhaps I should have concentrated on preparing for the rush from the organizational perspective.

Behind the extensive downtimes I witness every holiday, I see a corporate failure to change the archaic processes to match the change in business models. Often, the companies that are the most prone to this are the companies transitioning from either a brick and mortar model or from enterprise software to web-based offering. The latter are actually transitioning from B2B to B2C model, often without realizing it. But they are not the only offenders. Even web-only companies suffer from the same symptoms. Whatever the company type is, the change must come from the top. Often, the corporate inflexibility and complacency is the main drivers behind the legacy processes not reflecting the state of modern operations.

This year, for example, I witnessed a large e-commerce site, originating from a traditional catalog company, suffer a big revenue blow during Black Friday specifically because of devaluing the principals of collaboration and shared responsibility while running a complex, business-critical web application. The owners made a conscious decision to separate the operations and development groups and maintain the traditional software development lifecycle (SDLC), limiting each group's responsibilities specific to the respective domain. And those choices were the reason why they were unable to accept orders for over 8 hours on Black Friday. Eight hours of no revenue. On an e-commerce site. During the busiest time of the year. Boom.

Shared accountability
Management of the application was a function of the operations group, however, system administrators had no domain knowledge of the application or, perhaps even worse, the deployment history or rollback procedures. On the flip side, developers also viewed the operations solely as a responsibility of system administrators, so when they were done deploying the code, they assumed their job was done. This meant that no developer was available immediately (being a holiday and all) to troubleshoot the problem.

Instrumentation
Monitoring was also defined by business units as a function of the operations team and, therefore, adding application level monitors was not part of the development life cycle. All the system level monitors that you would expect from the traditional operations team responsible for systems only were showing no anomalies in behavior. No metrics showing application health or business rules were in place, making it difficult to pinpoint the problem in the application layer and, consequently, extending troubleshooting (and outage) time.

Flexibility
The development group had a defined process for modifying and deploying the code that they had to follow, preventing them from deploying quick patches as needed. They were forced to follow the standard SDLC process for a critical bug fix instead of adjusting the process to shorten time-to-market for an issue affecting millions of users and as much in dollar figures.

It is also worth mentioning a lack of automation, since packaging, testing and deployment of the patch itself took significant time because of the required coordination and hand-offs between the groups. And a lack of a rollback plan that would have allowed them to quickly back out of the last set of changes, letting users to continue shopping while developers were working on the fix. But one can argue that those oversights fall on IT groups rather than business groups, although it still falls within the domain of process failure.

The technology space has evolved. So have a lot of technologists. Businesses, however, especially larger ones, have a natural aversion to change that is often justified by risk and cost factors. However, processes are put in place for exactly that reason - to save time and money. If they don't accomplish those two goals - or worse, contributing to the opposite - they need to be changed. My hope is that in light of visible, high profile failures, businesses will make New Year's resolutions to make these changes and begin to realize that ROI of change in the right direction is worth it.

More Stories By Leon Fayer

Leon Fayer is Vice President at OmniTI, a provider of web infrastructures and applications for companies that require scalable, high performance, mission critical solutions. He possesses a proven background of both web application development and production deployment for complex systems and in his current role advises clients about critical aspects of project strategies and plans to help ensure project success. Leon can be contacted at [email protected]

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@DevOpsSummit Stories
With more than 30 Kubernetes solutions in the marketplace, it's tempting to think Kubernetes and the vendor ecosystem has solved the problem of operationalizing containers at scale or of automatically managing the elasticity of the underlying infrastructure that these solutions need to be truly scalable. Far from it. There are at least six major pain points that companies experience when they try to deploy and run Kubernetes in their complex environments. In this presentation, the speaker will detail these pain points and explain how cloud can address them.
While DevOps most critically and famously fosters collaboration, communication, and integration through cultural change, culture is more of an output than an input. In order to actively drive cultural evolution, organizations must make substantial organizational and process changes, and adopt new technologies, to encourage a DevOps culture. Moderated by Andi Mann, panelists discussed how to balance these three pillars of DevOps, where to focus attention (and resources), where organizations might slip up with the wrong focus, how to manage change and risk in all three areas, what is possible and what is not, where to start, and especially how new structures, processes, and technologies can help drive a new DevOps culture.
When building large, cloud-based applications that operate at a high scale, it's important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. "Fly two mistakes high" is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regulatory scrutiny and increasing consumer lack of trust in technology in general.