Welcome!

@DevOpsSummit Authors: Mamoon Yunus, Pat Romanski, XebiaLabs Blog, Automic Blog, Elizabeth White

Related Topics: @DevOpsSummit, Agile Computing, @CloudExpo

@DevOpsSummit: Blog Post

It’s the Big Game. Your Website Is Crashing. What Do You Do? [#DevOps]

It may seem like the game is over and there is no chance of getting the system running, but remember your training and keep calm

It's hard to believe, but the conclusion of the NFL football season is upon us. In the east coast vs. west coast championship rivalry, Neotys will be rooting for our own New England Patriots. It shouldn't come as a surprise, as our office is only 35 miles from Gillette Stadium. Not only will the Patriots take on the Seattle Seahawks in the big game, but many web-based companies will be taking on large-scale traffic challenges after airing their prime time commercials that day. After hearing this, you should ignore the pregame hype, put down the nachos and get your servers ready for maxed out user load.

You've Visualized the Win... But Not the Loss
Imagine a scenario in which your company, after capping off a great 2014, starts the new year with a large advertising budget. The CMO decides that running an advertisement during the big game is a just what the company needs, so a slot is booked. With costs up to $4.5M for a thirty- second spot, all precautions are taken. After tirelessly preparing for the traffic influx that will occur, the ad runs and it's better than anyone had imagined. Prospective customers realize the call-to-action and begin to hit the website in large quantities.

Then it happens... Just as it seems you are crossing into the end zone for a touchdown, you feel the football start to slip from your grip. Website performance metrics begin to fail and the entire system seems as if it's about to go down. CPUs are spiking and system memory is at its max. It is at this point you realize, the whole application is about to crash!

Where Did the Game Plan Go Wrong?
The first thing you need to do is take a step back and think about where you could have gone wrong in preparing. Did you miss a step somewhere along the line? How thorough was all your load and performance testing beforehand? Where may you have unearthed a fault in your site before this incident?

As you know, if your site wasn't optimally functional prior to the traffic increase, a small problem in your ecommerce engine, system configuration, or usability could exponentially compound in times of high use. If you've set up your live metrics and tracking data properly, hopefully you know that your site is failing before your users do. It is imperative to be one step ahead of them before problems become severe.

It's Time to Call an Audible
It may seem like the game is over and there is no chance of getting the system running, but remember your training and keep calm. You will need to get developers, testers, the operations team and other partners to be involved in the conversation immediately and form an action plan. This is not a good time for other minutiae. Get all hands on deck and delegate leadership for the reboot so everyone knows exactly who is in charge. The focus should be solely on figuring out what happened and fixing it.

The easiest way to narrow down the source of the problem is to utilize data. It's time to scour through your monitoring systems to see what went wrong. If you have been running simulated user monitoring like NeoSense, reference that system as it can give you valuable data. Once you have checked through other logs, you hopefully will have narrowed the problem down to only a few possible catalysts.

Recover and Make the Game Winning Drive
There are all sorts of possibilities as to where the problem lies. For example, you may have run into:

  • A programming error on the website
  • A DNS problem pointing users to expired domains
  • A larger networking error
  • A bottle-necked application
  • An entire server that has crashed

In any case, figure out if the entire site is down or if just a small portion is down and keep the working parts in order.

If problems are load-related, get some new servers up and running as soon as possible. Hopefully you are on an elastic platform that will allow you manage your load balance overflow. With so many active current users, also check your auto-scaling settings and make sure you are initiating the correct ones.

Run a Few Plays
Once it seems you have found where the problem lies, it is time to rectify it. Address disk space and memory problems with additional horsepower. Once you have found malfunctioning processes, restart those systems. Keep the right people close by in case the crash keeps happening and notify the rest of the team of the temporary fix. Even though the crisis has been averted for now, you will definitely want to do a deeper investigation once the dust settles and administer a more permanent fix at a later time.

Celebrate the Win
With a fully functioning website active again, you should be able to handle the rise in traffic for the foreseeable future. Your customers are now re-engaged with the user experience they expect from your web service and you will be able to finish watching the Patriots beat the Seahawks (fingers crossed!) in the final. Finally, remember everyone who helped and be sure to thank the whole team for being supportive and addressing the problem.

More Stories By Tim Hinds

Tim Hinds is the Product Marketing Manager for NeoLoad at Neotys. He has a background in Agile software development, Scrum, Kanban, Continuous Integration, Continuous Delivery, and Continuous Testing practices.

Previously, Tim was Product Marketing Manager at AccuRev, a company acquired by Micro Focus, where he worked with software configuration management, issue tracking, Agile project management, continuous integration, workflow automation, and distributed version control systems.

@DevOpsSummit Stories
The goal of Continuous Testing is to shift testing left to find defects earlier and release software faster. This can be achieved by integrating a set of open source functional and performance testing tools in the early stages of your software delivery lifecycle. There is one process that binds all application delivery stages together into one well-orchestrated machine: Continuous Testing. Continuous Testing is the conveyer belt between the Software Factory and production stages. Artifacts are moved from one stage to the next only after they have been tested and approved to continue. New code submitted to the repository is tested upon commit. When tests fail, the code is rejected. Subsystems are approved as part of periodic builds on their way to the delivery stage, where the system is being tested as production ready. The release process stops when tests fail. The key is to shift test c...
In IT, we sometimes coin terms for things before we know exactly what they are and how they’ll be used. The resulting terms may capture a common set of aspirations and goals – as “cloud” did broadly for on-demand, self-service, and flexible computing. But such a term can also lump together diverse and even competing practices, technologies, and priorities to the point where important distinctions are glossed over and lost.
Most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes a lot of work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reduction in cost and increase in speed.
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chief Architect at Cedexis, covered strategies for orchestrating global traffic achieving the highest-quality end-user experience while spanning multiple clouds and data centers and reacting at the velocity of modern development teams.
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. His expertise is in automating deployment, management, and problem resolution in these environments, allowing his teams to run large transactional applications with high availability and the speed the consumer demands.
SYS-CON Events announced today that Massive Networks will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Massive Networks mission is simple. To help your business operate seamlessly with fast, reliable, and secure internet and network solutions. Improve your customer's experience with outstanding connections to your cloud.
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and controlling infrastructure. The rise of Site Reliability Engineering (SRE) is part of that redefinition of operations vs development roles in organizations.
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the works because of misaligned incentives.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big Data are challenging the existing design patterns of the Data Center. The increase in complexity of managing legacy systems alongside new systems is beyond the ability of most IT departments. This leads to multiple tiers of storage and high economic costs, during a time in which IT is expected to do more with less.
Given the popularity of the containers, further investment in the telco/cable industry is needed to transition existing VM-based solutions to containerized cloud native deployments. The networking architecture of the solution isolates the network traffic into different network planes (e.g., management, control, and media). This naturally makes support for multiple interfaces in container orchestration engines an indispensable requirement.
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the application economy. With CA software at the center of their IT strategy, organizations can leverage the technology that changes the way we live - from the data center to the mobile device. CA's software and solutions help customers thrive in the new application economy by delivering the means to deploy, monitor and secure their applications and infrastructure.
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that’s no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, will explore how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He will explain how automation, orchestration and governance are fundamental to managing today’s hybrid cloud environments and are critical for digital businesses to deliver services faster, with better user experience and higher quality, all while saving money.
While some vendors scramble to create and sell you a fancy solution for monitoring your spanking new Amazon Lambdas, hear how you can do it on the cheap using just built-in Java APIs yourself. By exploiting a little-known fact that Lambdas aren’t exactly single-threaded, you can effectively identify hot spots in your serverless code. In his session at @DevOpsSummit at 21st Cloud Expo, Dave Martin, Product owner at CA Technologies, will give a live demonstration and code walkthrough, showing how to overcome the challenges of monitoring S3 and RDS. This presentation will provide an overview of necessary Amazon Lambda concepts and discus how to integrate the monitoring data with other tools.
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the application economy.
As more and more companies are making the shift from on-premises to public cloud, the standard approach to DevOps is evolving. From encryption, compliance and regulations like GDPR, security in the cloud has become a hot topic. Many DevOps-focused companies have hired dedicated staff to fulfill these requirements, often creating further siloes, complexity and cost. This session aims to highlight existing DevOps cultural approaches, tooling and how security can be wrapped in every facet of the build and release cycle and how to get sales and customer facing resources wrapped in.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise...
SYS-CON Events announced today that Calligo has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo is an innovative cloud service provider offering mid-sized companies the highest levels of data privacy. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalized support service from its globally located cloud platforms. Through its four pillars of focus, Calligo delivers a platform that businesses can trust to deliver the high level of service and protection they expect and is lacking in many cloud offerings.
SYS-CON Events announced today that Elastifile will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Elastifile Cloud File System (ECFS) is software-defined data infrastructure designed for seamless and efficient management of dynamic workloads across heterogeneous environments. Elastifile provides the architecture needed to optimize your hybrid cloud environment, by facilitating efficient data access across cloud and on-premises boundaries - with all the advantages of public IaaS everywhere.
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory?
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Launched in 2016, Cloudistics helps anyone bring the power of the cloud to the data center in an easy-to-use, on- premises cloud platform that automatically provides high performance resources for all types of applications: Docker, Splunk, Hadoop, Citrix® VDI, and many other high performance workloads. With no onsite controllers to install or maintain, it’s easy to scale across a large site or multiple locations – all from a single, centralized dashboard.
With Cloud Foundry you can easily deploy and use apps utilizing websocket technology, but not everybody realizes that scaling them out is not that trivial. In his session at 21st Cloud Expo, Roman Swoszowski, CTO and VP, Cloud Foundry Services, at Grape Up, will show you an example of how to deal with this issue. He will demonstrate a cloud-native Spring Boot app running in Cloud Foundry and communicating with clients over websocket protocol that can be easily scaled horizontally and coordinate communication between multiple instances by using an additional message broker.
@DevOpsSummit at Cloud Expo taking place Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center, Santa Clara, CA, is co-located with the 21st International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
SYS-CON Events announced today that Golden Gate University will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Since 1901, non-profit Golden Gate University (GGU) has been helping adults achieve their professional goals by providing high quality, practice-based undergraduate and graduate educational programs in law, taxation, business and related professions. Many of its courses are taught by faculty actively working in their field of expertise, providing students with skills that can be applied immediately. The new MS in Business Analytics, like most of its programs, is available fully online or in-person in downtown SF.
SYS-CON Events announced today that Golden Gate University will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Since 1901, non-profit Golden Gate University (GGU) has been helping adults achieve their professional goals by providing high quality, practice-based undergraduate and graduate educational programs in law, taxation, business and related professions. Many of its courses are taught by faculty actively working in their field of expertise, providing students with skills that can be applied immediately. The new MS in Business Analytics, like most of its programs, is available fully online or in-person in downtown SF.