Welcome!

DevOps Journal Authors: Carmen Gonzalez, Liz McMillan, Lori MacVittie, Trevor Parsons, Elizabeth White

Blog Feed Post

Black Friday Horror Story Averted with Alerting and Monitoring

image_pdfimage_print

AppDynamics recently announced the launch of our Application Intelligence Platform, which is the underlying infrastructure that delivers our portfolio of products to customers. A key component of the Application Intelligence platform is the notion of extensibility – we can integrate with many of the existing tools you already have in place so you can leverage AppDynamics analytics in the tools and dashboards your team knows and loves with minimal effort. These extensions as we call them are available on the AppDynamics eXchange section of our Community for download, and customers even have the option to submit extensions they’ve written themselves to be included in the eXchange.

To illustrate the power of the 75+ extensions we’ve published in our community, I’ll walk you through two scenarios that involve several common technologies that are prevalent across our customer base.

____

Before AppDynamics:

Jerry has been tossing and turning all night long. In fact, he’s had difficulty sleeping the past three weeks. His sub-optimal sleep patterns are in large part a result of the production application environment he is responsible for. “Things always seem to break in the middle of the night,” Jerry complained to his wife earlier that day.

As the DevOps lead for their company’s mission critical ecom app, Jerry is copied on most urgent application related alerts so that he can help manually forward the details he gets from his current monitoring tools to the admin from his team who happens to be on call at the time. Tonight, he only received 5 such notifications which is less than normal, but still sufficient to wake him up throughout the night. As he squints in the darkness and his eyes adjust to the bright screen, he sees a new notification that troubles him… “SSL Certificate Expired?” he mumbles to himself. “How is that possible?”

He checks the clock – 5:30AM. The person who handles the SSL Certificate isn’t going to be awake for a few more hours. Jerry’s heart drops because he knows that for every hour his ecommerce application is down it costs his company about $10,000 of revenue. “Why wasn’t this on my radar?” Jerry says. “We could’ve planned for this.”

Jerry gets to work early and starts sending emails and calling stakeholders to schedule an 8:30AM conference call. By 9:15AM the action items and deliverables are clear. By 10:30AM the SSL Certificate is renewed and the ecom store is back online servicing customers. Whew. “That could’ve been a lot worse than just 5 hours of downtime and $50K of revenue impact,” Jerry reasons with a colleague.

Back at his desk, Jerry looks at his calendar, his next meeting is ‘testing & capacity planning’ which is a weekly recurring meeting with him and his team.

Jerry’s company is preparing for the holiday season (Black Friday, Cyber Monday, etc.) which is still a few months away but for ecommerce stores, these peak seasons are huge operational and business challenges. You know that $10K per hour of revenue metric?  During those peak days in the holiday season that quadruples to $40K of revenue per hour. The ecommerce store can’t have any hiccups during that time or the impact would be massive, and that’s why this particular recurring meeting leading up to the code freeze are very important.

Jerry greets his team and looks over the shoulder of one of his sys admins. She’s just got the application infrastructure diagram drawn on the white board and has the first load test done and now they are analyzing the results. Looks like most of the synthetic tests they’ve run completed with relatively few errors and utilization was within the acceptable range even as the load increased over the duration of the load test. So far so good.

Jerry moves on to peek over the shoulder of his DBA who is currently analyzing the Cassandra cluster metrics after the load test. Disk I/O looked good and memory looked OK. Over the course of the next hour Jerry’s team tests 6 different load testing and failover scenarios. Today’s tests are done – until next week.

“Everything looks good… a little too good,” Jerry says to himself. “My team and I understand things like utilization and throughput but how does that translate to things my boss and the rest of the business care about?”

If only there was another approach to monitoring that would save Jerry from the fire drills, cut down on the constant testing and debugging, and give him a real-time view into how customers were engaging with his ecommerce application…

Luckily for Jerry, AppDynamics does just that…  Let’s look at this same situation one year later.

After AppDynamics:

Jerry wakes up from a great night’s sleep and checks his email for the daily AppDynamics events digest that gets sent to him with all of the application events over the last 24 hours. Only one event in the digest. Ever since Jerry’s organization invested into AppDynamics’ products that are delivered on the Application Intelligence Platform, his dev team has gotten code-level visibility into the root cause of performance issues inside his ecommerce application and has substantially cut down the number of bugs in the software. That means less production issues for his team to deal with downstream.

Using the PagerDuty alerting extension, the one issue that was sent in Jerry’s digest triggered the creation of a help ticket and was automatically assigned to the technician on duty with no manual intervention on Jerry’s part.

By the time Jerry checked on the status, the issue was already resolved. Nice.

On his way to work, Jerry smiles and thinks about last year’s SSL Certificate debacle. Since installing the SSL Certificate Monitoring extension from AppDynamics, his team has been able to build a dashboard that shows the number of days left until the SSL Certificate expires. No more SSL Certs expiring without anyone knowing ahead of time.

Jerry arrives at work and goes to his recurring ‘testing and capacity planning’ meeting that his team sets up every year around this time. Since deploying AppDynamics and installing two additional AppDynamics extensions – the Cassandra monitoring extension and the Amazon Web Services (AWS) cloud connector extension – his testing and capacity planning work for the holiday season has gotten a lot easier.

First, AppDynamics has given him and his team a great topology view that has relieved them of their needs for Visio diagrams and whiteboarded architectures. Being able to have a real-time view of how the different components of an application interact with each other, and have that map update automatically as new code is released, was hugely valuable for Jerry’s team.

Screen Shot 2014-06-18 at 10.26.02 AM

Second, during Cassandra testing, in addition to getting basic metrics like disk I/O and memory, the Cassandra extension provides configurable metrics like:

  • Cache size, capacity, hit count, hit rate, request count

  • Total latency, statistics, timeout requests, unavailable requests

  • Bloom filter disk space used, false positives, false ratio

  • SSTables compression ratio, live tables, disk space, compacted row size

  • Row size histogram

  • Column count histogram

  • Memtable columns, data size, switch count

  • Pending tasks

  • Read latency

  • Write latency

  • Pending and completed tasks

  • Compaction tasks pending and completed

  • Timeouts

  • Dropped messages

  • Streams

  • Total disk space used

  • Thread pool tasks: active, completed, blocked, pending

By leveraging these metrics, Jerry’s team is able to get granular visibility into Cassandra performance and see exactly where performance bottlenecks occur. This visibility has cut down the time needed to test their Cassandra implementation drastically. Pinpointing exactly where the performance issues are and what caused them enable Jerry’s team to proactively address Cassandra performance issues before they affect end users.

Finally, while capacity planning, Jerry now leverages the Amazon Web Services (AWS) cloud connector extension which allows his team to easily scale up and scale down in the cloud automatically based on policies that can involve a number of rules including:

•       Overall application health (load, response time, number of slow calls, etc.)

•       Business transaction health (load, response time, number of slow calls, etc.)

•       End User Experience health (pages / iFrames / AJAX requests per minute, first byte time, DOM ready time, etc.)

•       Databases & Remote Services health (calls per minute, errors per minute, etc)

•       Error rates (exceptions, return codes, etc.)

This year, Jerry’s team is putting a few different health rules in place that will automatically scale up the AWS EC2 resources when certain load & response time metrics are breached and scale down when those metrics go back down to a normal level. Jerry has also added an authorization step to these workflows that will alert him and ask for permission before spinning instances up or down. That way, they only pay for the EC2 resources they need to use and Jerry still has full control.

Screen Shot 2014-06-12 at 3.49.26 PMScreen Shot 2014-06-12 at 3.49.51 PM

Screen Shot 2014-06-12 at 3.50.17 PM

Jerry leaves the testing meeting with full confidence that his team has a good grasp on the upcoming peak season and has the visibility in place that will allow his team to quickly deal with any performance issues as they arise.

_____

As you can see, Jerry is in a lot better spot this year than he was 1 year ago. By leveraging AppDynamics he has one platform that can easily connect to the rest of the technologies he already uses and provide him a single UI in which he can manage the performance of his environment.

If you’d like to try AppDynamics for free and test drive some of the extensions we’ve highlighted in this blog post, click here.

The post Black Friday Horror Story Averted with Alerting and Monitoring written by appeared first on Application Performance Monitoring Blog from AppDynamics.

Read the original blog entry...

More Stories By Sandi Mappic

Sandi Mappic has a passion for making apps go faster. She works with AppDynamics around the clock to help customers resolve performance pain and master application performance management. (This is AppDynamics blog feed written by several different AppDynamics bloggers.)

@DevOpsSummit Stories
The 4th International DevOps Summit, co-located with16th International Cloud Expo – being held June 9-11, 2015, at the Javits Center in New York City, NY – announces that its Call for Papers is now open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits, DevOps is correlated with 2...
We live in a time when seconds – even milliseconds – can have a dramatic economic impact on your company’s future. With technology being the primary conduit for consumer interaction, the user experience is at center stage. User experience will be a deciding factor in separating the future winners from the losers. By building more speed and agility into the application delivery process, DevOps promises great rewards. However, with this promise, also come significant ramifications for failure. In his session at DevOps Summit, Tom Lounibos, SOASTA CEO, will explore both the benefits and the som...
Our expectations for buying all sorts of consumer goods has gone through a radical transformation we now take for granted. Why should we not expect this same level of service from IT businesses? We accept the status quo for how software delivery exists today but would reject it without hesitation if it were applied to pretty much any other online consumer experience. Take pizza delivery as an example. Fifteen years ago ordering a pizza meant trying to choose an item from a grease-stained menu somebody shoved under your door. You'd make a phone call and end up speaking to somebody who sounded ...
When an enterprise builds a hybrid IaaS cloud connecting its data center to one or more public clouds, security is often a major topic along with the other challenges involved. Security is closely intertwined with the networking choices made for the hybrid cloud. Traditional networking approaches for building a hybrid cloud try to kludge together the enterprise infrastructure with the public cloud. Consequently this approach requires risky, deep "surgery" including changes to firewalls, subnets and other modifications to the corporate security infrastructure. Connecting a public cloud to the ...
SYS-CON Events announced today that Grid Dynamics, the leading provider of scalable eCommerce technology solutions, will exhibit at DevOps Summit Silicon Valley, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Grid Dynamics is a leading provider of open, scalable, next-generation commerce technology solutions for Tier 1 retail. Grid Dynamics has in-depth expertise in commerce technologies, wide involvement in the open source community and a modern, global workforce. Great companies, partnered with Grid Dynamics, gain a sustainable business...
Docker offers a new, lightweight approach to application portability. Applications are shipped using a common container format and managed with a high-level API. Their processes run within isolated namespaces that abstract the operating environment independently of the distribution, versions, network setup, and other details of this environment. This "containerization" has often been nicknamed "the new virtualization." But containers are more than lightweight virtual machines. Beyond their smaller footprint, shorter boot times, and higher consolidation factors, they also bring a lot of new fea...
For better or worse, DevOps has gone mainstream. All doubt was removed when IBM and HP threw up their respective DevOps microsites. Where are we on the hype cycle? It's hard to say for sure but there's a feeling we're heading for the "Peak of Inflated Expectations". What does this mean for the Enterprise? Should they avoid DevOps? Definitely not. Should they be cautious though? Absolutely. The truth is that DevOps and the Enterprise are at best strange bedfellows. The movement has its roots in the tech community's elite. Open source projects and methodologies driven by the alumni of companies ...
SYS-CON Events announced today that Serena Software will exhibit at DevOps Summit Silicon Valley, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Serena Software supports DevOps and Continuous Delivery by providing application deployment automation and software release management solutions to replace slow and error-prone manual processes. 2,500 enterprises around the world trust Serena to help them develop and deploy better software.
We are all here because we are sold on the transformative promise of The Cloud. But what good is all of this ephemeral, on-demand infrastructure if your usage doesn't actually improve the agility and speed of your business? How must Operations adapt in order to avoid stifling your Cloud initiative? In his session at DevOps Summit, Damon Edwards, co-founder and managing partner of the DTO Solutions, will highlight the successful organizational, process, and tooling patterns of high-performing companies that have reshaped their Operations to enable the business to get full value from their Clo...
SYS-CON Events announced today that O'Reilly Media has been named “Media Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. O'Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O'Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurring their adoption by amplifying "faint signals" from the alpha geeks who are creating the future. An...
SYS-CON Events announced today that Gigaom Research has been named "Media Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Ashar Baig, Research Director, Cloud, at Gigaom Research, will also lead a Power Panel on the topic "Choosing the Right Cloud Option." Gigaom Research provides timely, in-depth analysis of emerging technologies for individual and corporate subscribers. Gigaom Research's network of 200+ independent analysts provides new content daily that bridges the gap between break...
Sanjeev Sharma is the latest author to join DevOps Journal. Sanjeev is a solution architect and DevOps Worldwide lead with Rational Software, an IBM brand and the author of 'DevOps for Dummies.' DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. SYS-CON Media CEO Carmen Gonzalez is founder and publisher of DevOps Journal, and Roger Strukhoff, long-time SYS-CON editor and the conference chair of Cloud Expo is the editor of the world's leading DevOps resource.
SYS-CON Events announced today that IBM is holding a Bluemix Developer Playground on November 5, 10:30 am to 5:30 pm at 15th Cloud Expo. 15th Cloud Expo, co-located with @ThingsExpo, Big Data Expo, and DevOps Summit is taking place Nov. 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. The labs, for developers of all levels, will highlight the ease of use of Bluemix, its services and functionality and provide short-term introductory projects that developers can complete between sessions. Developers will be able spend as much time as they want working on specific DevOps pro...
When you set off to build an app that will change the world, designing your system architecture to be reliable and scalable is important but the stark reality is that, for your MVP, you probably had a “need for speed” (of development). You didn’t know what all the axes were to scale your application, where your stress points would be, and what weird and wonderful ways your customers would use it down the road. In a world of zero-downtime services, landing the plane to figure it out is not an option. In his session at DevOps Summit, Andrew Miklas, CTO of PagerDuty, will share lessons learned ...
SYS-CON Events announced today that SOASTA, the leader in cloud and mobile testing, will exhibit at DevOps Summit Silicon Valley, which will take place November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. SOASTA is the leader in cloud testing. Its web and mobile test automation and monitoring solutions, CloudTest, TouchTest and mPulse, enable developers, QA professionals and IT operations teams to test and monitor users with unprecedented speed, scale, precision and visibility. The innovative product set streamlines test creation, automates provisioning and execution, ...
Founded in 1997, ActiveState is a global leader providing software application development and management solutions. The Company's products include: Stackato, a commercially supported Platform-as-a-Service (PaaS) that harnesses open source technologies such as Cloud Foundry and Docker; dynamic language distributions ActivePerl, ActivePython and ActiveTcl; and developer tools such as the popular Komodo Edit and Komodo IDE. Headquartered in Vancouver, Canada, ActiveState is trusted by customers and partners worldwide, across many industries including telecommunications, aerospace, software, fina...
SYS-CON Events announced today that ElasticBox is holding a Hackathon at DevOps Summit, November 6 from 12 pm -4 pm at the Santa Clara Convention Center in Santa Clara, CA. You can enter as an individual or team of up to 10 developers. A New Star Is Born Every Month! All completed ElasticBoxes will then be sent to a judging panel - 12 winners will be featured on the ElasticBox website in 2015. All entrants will receive five full enterprise licenses for one year + ElasticBox headphones + ElasticBox T-shirt. Winners can also choose to interview with ElasticBox to join one of the fastest growi...
SYS-CON Events announced today that Calm.io has been named “Bronze Sponsor” of DevOps Summit Silicon Valley, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Calm.io is a cloud orchestration platform for AWS, vCenter, OpenStack, or bare metal, that runs your CL tools puppet, Chef, shell, git, Jenkins, nagios, and will soon support New Relic and Docker. It can run hosted, or on premise and provides VM automation / expiry, self-service portals, audit, approvals, and budgeting.
Blue Box has closed a $10 million Series B financing. The round was led by a strategic investor and included participation from prior investors including Voyager Capital and Founders Collective, as well as the Blue Box executive team. This round follows a $4.3 million Series A closed in December of 2012 and led by Voyager Capital. In May of this year, the company announced general availability of its private cloud as a service offering, Blue Box Cloud. Since that release, the company has demonstrated market validation through customer adoption, positive reviews from industry analysts and k...
The speed of product development has increased massively in the past 10 years. At the same time our formal secure development and SDL methodologies have fallen behind. This forces product developers to choose between rapid release times and security. In his session at DevOps Summit, Michael Murray, Director of Cyber Security Consulting and Assessment at GE Healthcare, will examine the problems and present some solutions for moving security in to the DevOps lifecycle to ensure that we get fast AND secure.