Welcome!

@DevOpsSummit Authors: Liz McMillan, Elizabeth White, Pat Romanski, Jason Bloomberg, Yeshim Deniz

Related Topics: @DevOpsSummit, Linux Containers, Containers Expo Blog

@DevOpsSummit: Article

Scaling Incident Management | @DevOpsSummit #DevOps #APM #Monitoring

Incident management is paramount to the success of any modern ITOps team

Scaling Incident Management
By Patrick O'Fallon

Incident management is paramount to the success of any modern ITOps team. However, much like growing a business, scaling incident management can also trigger growing pains. As the landscape of devices, applications, and systems grows - each requiring monitoring - so too, does the alert noise and complexity around management for on-call staff. With an increasing number of engineers on your team, it can be difficult to on-board and implement new notification policies and after-hours operations to ensure your team is efficient and load is fairly distributed. And the push towards hybrid models of IT and bimodal IT environments can also complicate incident management. Nevertheless, with a few tried and true techniques, you can scale incident management in a planned, deliberate, organized, and effective way.

Don't fall victim to your changing ITOps environment
Let's first understand the problem with an example where scaling becomes a serious issue.
You've finally dialed in your incident management process, only to shortly after learn that your company has bought a new business. Now your Ops team is taking over IT for the new environment, in addition to what you're already responsible for. At first glance, you think of the perfect scenario in which you can simply apply the same tools and methodology to this entirely new stack.

However, reality is rarely perfect - the new company may leverage a different tech stack and different incident management monitoring tools and methodologies. While this scenario is incredibly daunting, it's very similar to any growth scenario - whether it be growing your IT team, or adopting more agile and bimodal ITOps structures. Whichever scale scenario you may face, below are some ideas for any organization that is working on scaling their monitoring, incident management, and team.

Identify the main areas of scale
Are you implementing new hardware, software, or services? Are there new complexities within your future state ITOps environment? Has your engineering team just grown? Have you inherited an application in which code errors need to be reported? In all cases, you must identify the areas in which your ITOps team is being forced to scale your operations.

Monitoring Tools
Ensuring coverage of your monitoring tools across your entire stack is paramount to the success of scaling. To adopt to this change, don't be afraid to implement multiple or entirely new monitoring systems outside of your current stack. The goal of these systems is to gain full-stack visibility, and in many cases this requires implementing different monitoring tools in order to appropriately monitor disparate and new systems. But to truly support organized scale, there needs to be a way to normalize, de-dupe, correlate, and gain actionable insights from all this data. All the events generated by these monitoring tools must be centralized in a single hub, from which they can be triaged and routed to the right on-call engineer.

Noise Reduction
When monitoring is in place, the goal is then to understand the data for effective incident resolution. Adjusting the routing behavior across your monitoring tools and configuring the appropriate thresholding is a great next step to ensure your team does not experience alert fatigue once you have implemented new tools. Aggregating this data and suppressing or filtering out non-actionable alerts from paging within a common incident management system is critical to help reduce the noise and enrich the visibility of incidents across your entire stack.

Incident Management
A comprehensive incident management platform will help integrate data from all your tools and grow with you as you scale. It not only unifies all your disparate monitoring alerts into one common system, it supports growth in your engineering team without generating confusion around resource management. Moreover, it helps facilitate more accountability as well as more organized collaboration. As a bonus, you can leverage incident analytics to show your boss how well your ITOps team is managing and resolving outages.

Scale and complexity are not going away
The world of ITOps is evolving rapidly, but one thing is clear - IT teams are being ordered to scale their operations in almost every capacity. Legacy ITOps environments are transitioning to and adopting more hybrid and agile architectures and frameworks. Users are continually demanding faster and more reliable access to data across different devices. As a result, it's necessary for ITOps teams to be equipped with a plan for scaling. Incident management is now a necessity as the stakes of downtime get higher.

The post Scaling Incident Management appeared first on PagerDuty.

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.

@DevOpsSummit Stories
Many companies start their journey to the cloud in the DevOps environment, where software engineers want self-service access to the custom tools and frameworks they need. Machine learning technology can help IT departments keep up with these demands. In his session at 21st Cloud Expo, Ajay Gulati, Co-Founder, CTO and Board Member at ZeroStack, will discuss the use of machine learning for automating provisioning of DevOps resources, taking the burden off IT teams.
SYS-CON Events announced today that Cedexis will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cedexis is the leader in data-driven enterprise global traffic management. Whether optimizing traffic through datacenters, clouds, CDNs, or any combination, Cedexis solutions drive quality and cost-effectiveness.
SYS-CON Events announced today that Enroute Lab will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enroute Lab is an industrial design, research and development company of unmanned robotic vehicle system. For more information, please visit http://elab.co.jp/.
SYS-CON Events announced today that Mobile Create USA will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Mobile Create USA Inc. is an MVNO-based business model that uses portable communication devices and cellular-based infrastructure in the development, sales, operation and mobile communications systems incorporating GPS capability.
SYS-CON Events announced today that Suzuki Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Suzuki Inc. is a semiconductor-related business, including sales of consuming parts, parts repair, and maintenance for semiconductor manufacturing machines, etc. It is also a health care business providing experimental research for dementia, etc. For more information, visit http://www.e-suzuki.co.jp/en/.
With the rise of DevOps, containers are at the brink of becoming a pervasive technology in Enterprise IT to accelerate application delivery for the business. When it comes to adopting containers in the enterprise, security is the highest adoption barrier. Is your organization ready to address the security risks with containers for your DevOps environment? In his session at @DevOpsSummit at 21st Cloud Expo, Chris Van Tuin, Chief Technologist, NA West at Red Hat, will discuss: The top security risks with containers and how to manage these risks at scale including Images, Builds, Registry, Deployment, Hosts, Network, Storage, APIs, Monitoring/Logging, and Federation.
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of engineers can navigate the Carrier Ecosystem for your IT team acting as an extension of your business, producing a hassle-free experience.
SYS-CON Events announced today that Nihon Micron will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nihon Micron Co., Ltd. strives for technological innovation to establish high-density, high-precision processing technology for providing printed circuit board and metal mount RFID tags used for communication devices. For more information, visit http://www.nihon-micron.co.jp/.
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
SYS-CON Events announced today that Ryobi Systems will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ryobi Systems Co., Ltd., as an information service company, specialized in business support for local governments and medical industry. We are challenging to achive the precision farming with AI. For more information, visit http://www.ryobi-sol.co.jp/en/.
SYS-CON Events announced today that SIGMA Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. uLaser flow inspection device from the Japanese top share to Global Standard! Then, make the best use of data to flip to next page. For more information, visit http://www.sigma-k.co.jp/en/.
SYS-CON Events announced today that Daiya Industry will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Daiya Industry specializes in orthotic support systems and assistive devices with pneumatic artificial muscles in order to contribute to an extended healthy life expectancy. For more information, please visit https://www.daiyak.co.jp/en/.
Today traditional IT approaches leverage well-architected compute/networking domains to control what applications can access what data, and how. DevOps includes rapid application development/deployment leveraging concepts like containerization, third-party sourced applications and databases. Such applications need access to production data for its test and iteration cycles. Data Security? That sounds like a roadblock to DevOps vs. protecting the crown jewels to those in IT.
SYS-CON Events announced today that B2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. B2Cloud specializes in IoT devices for preventive and predictive maintenance in any kind of equipment retrieving data like Energy consumption, working time, temperature, humidity, pressure, etc.
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp empowers global organizations to unleash the full potential of their data to expand customer touchpoints, foster greater innovation and optimize their operations.
Most of the time there is a lot of work involved to move to the cloud, and most of that isn't really related to AWS or Azure or Google Cloud. Before we talk about public cloud vendors and DevOps tools, there are usually several technical and non-technical challenges that are connected to it and that every company needs to solve to move to the cloud. In his session at 21st Cloud Expo, Stefano Bellasio, CEO and founder of Cloud Academy Inc., will discuss what the tools, disciplines, and cultural aspects are that enterprise companies are considering to get to the cloud and eventually transform the way they build software and services.
SYS-CON Events announced today that Interface Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Interface Corporation is a company developing, manufacturing and marketing high quality and wide variety of industrial computers and interface modules such as PCIs and PCI express. For more information, visit http://www.interface-amita.com/aboutus/interface_profile.asp.
SYS-CON Events announced today that Keisoku Research Consultant Co. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Keisoku Research Consultant, Co. offers research and consulting in a wide range of civil engineering-related fields from information construction to preservation of cultural properties. For more information, visit http://www.krcnet.co.jp/eng_site/e_index.htm.
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that Fusic will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Fusic Co. provides mocks as virtual IoT devices. You can customize mocks, and get any amount of data at any time in your test. For more information, visit https://fusic.co.jp/english/.
SYS-CON Events announced today that N3N will exhibit at SYS-CON's @ThingsExpo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. N3N’s solutions increase the effectiveness of operations and control centers, increase the value of IoT investments, and facilitate real-time operational decision making. N3N enables operations teams with a four dimensional digital “big board” that consolidates real-time live video feeds alongside IoT sensor data and analytics insights onto a single, holistic, display, focusing attention on what matters, when it matters.
Today most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes significant work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reduction in cost and increase in speed. Sometimes in order to reduce complexity teams compromise features or change requirements
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS – software, platform, and infrastructure as a service.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the connection between building better software and business agility has been a tenuous one at best. But now that Agile is maturing and Digital Transformation is driving change across enterprises large and small, companies are realizing that their best bet for achieving business agility is to take the best of Agile and apply it across the entire organization.
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, will discuss how data centers of the future will be managed, how the public cloud best suits your organization, and what the future holds for operations and infrastructure engineers in a post-container world. Is a serverless world inevitable?