Welcome!

@DevOpsSummit Authors: Elizabeth White, Liz McMillan, Pat Romanski, Yeshim Deniz, Mehdi Daoudi

Related Topics: @DevOpsSummit, Linux Containers, Containers Expo Blog

@DevOpsSummit: Blog Feed Post

Metrics and KPIs for Test Environment Stability | @DevOpsSummit #DevOps #APM #Monitoring

How often is an environment unavailable due to factors within your project’s control?

How often is an environment unavailable due to factors within your project’s control? How often is an environment unavailable due to external factors? Are the software and hardware in an environment up to date with the target Production systems? How often do you have to resort to manual workarounds due to an environment?

Metric: Availability and Uptime Percentage
QA and Staging environments seldom require the same level of uptime as Production, but tell that to a team of developers working 24/7 on a project that has an aggressive deadline. As a Test Environment Manager, you know that when a QA system is unavailable, you will get immediate calls from developers and managers.

As a Test Environment Manager, you will also want to understand the root cause of every outage. If you follow a problem management process for Production outages, you should follow a similar process with test environment management. Understanding why an outage happened is critical for communicating with a development team. Very often a QA environment will become unavailable due to a factor far outside the control of a Test Environment Manager. If one team pushes bad code that interrupts the QA process for all teams you need to be able to identify this clearly.

How to Measure Availability and Uptime?
Keep track of system availability with a standard monitoring tool such as Zabbix or Nagios. If your systems are visible to the public internet, you can also use hosted platforms like Pingdom to measure system availability.

Example Metric: Goal for Availability
An uptime of 95% is usually sufficient for a QA or Staging environment.
If your development is limited to a few time zones, you can also further qualify this by only measuring availability during development hours. While your Production availability commitment is often higher that 99% or 99.5%, you don’t have to treat every QA outage as an emergency. But, your developers may have other opinions—95% uptime still allows for eight hours of downtime a week. You may want to aim higher.

How Does This Metric Motivate Concrete Action?
When you measure system availability and make these numbers public, you encourage Test Environment Managers to make a commitment to uptime. This results in fewer obstacles for QA and development, allowing them to deliver software faster. There’s nothing more debilitating to an organization than disruptions in QA and testing. Measuring this metric allows you to encourage movement toward always-available QA systems.

Metric: Mean Time Between Outages
If your system has a 95% availability, then almost seventy-five minutes of downtime is acceptable every day. If your system fails for ten minutes every hour during an eight-hour work day due to a build or deployment, you’ll be creating a QA or Staging environment that has a 5% chance of losing developer and QA confidence. To get an accurate picture of system availability you need to couple an availability percentage metric with your mean time between outages (MTBO).

How to Measure MTBO
If you follow a process that keeps track of outages and strives to understand the root causes of these outages, you’ll develop a database of issues that you can use to derive your MTBO. If you have a monitoring system configured to calculate availability percentages automatically, you can use this same system to record your MTBO.

Example Metric: Goal for MTBO
This depends on your availability goal. The lower your availability goal, the higher your MTBO should be. For example, if you have a 95% uptime commitment then your outages need to be spaced over a day or a week. You might have eight hours of downtime each weekend to perform system upgrades or a nightly build and deploy process that takes about an hour, but what you can’t have is an MTBO of 45–60 minutes. This will mean that QA and Staging systems will be unavailable for a few minutes every hour, which will result in dissatisfied customers.

How does this Metric Motivate Concrete Action?
If your MBTO is very short, this suggests that build and deploy activity from a continuous integration environment is frequently interrupting both Development and QA. If your MBTO is very high, but your availability is very low (95% or lower) this means that you are experiencing multi-hour downtime at least once a day. When you measure MBTO, you encourage your Release Engineers and Test Environment Managers to work together to create build and deployment scripts that don’t affect availability, and you encourage your staff to approach QA and Staging uptime with care. Without this metric, you run the risk of having teams grow complacent with frequent, low-level unavailability as long as they satisfy overall availability metrics.

Metric: Downtime Requirement for a Test Environment Build and Deploy
When software is deployed to any system, there is a natural tendency for disruption
. If new code is being deployed to an application server that server often requires a restart so that new code can be loaded. If a web server such as Apache or Nginx is being reconfigured this often requires a fast restart measured in seconds.

Some of these build and deploy related disruptions can be avoided through the use of load balancers and clusters of machines. On the largest projects, this is essential in both Production as well as Staging and QA systems. An example is a QA system for a large bank’s transaction processing system. There are so many teams that depend on this system to be up and running 24/7 that causing any disruption would run the risk of freezing the QA process across the entire company.

Other build and deploy downtimes are unavoidable. A frequent example is changes to a database schema. Certain changes to tables and indexes require systems to be stopped and rebooted to reach a state where database activity isn’t competing with DDL statements.

The downtime requirement for a given build and deploy to a test environment is a central measure that is directly related to the availability metrics mentioned before in this section.

How to Measure Build/Deploy Downtime
It’s simple: run a build and deployment and keep track of the downtime that falls into the timespan of each build and deploy function. If you have a continuous integration system such as Jenkins or Bamboo, grab the timestamps of the last few builds and look at your monitoring metrics on QA and Staging to see if there is a system impact.

Example Metric: Goal for Build/Deploy Downtime
Your goal for this metric depends on your level of availability
. If you are working on a shared service, your build and deploy downtime requirement should be as close to zero as possible. If you are working on a less critical application, then your build and deploy downtime should be measured in minutes or seconds.

How does this Metric Motivate Concrete Action?
This metric encourages your Release Engineers and Test Environment Managers to drive build and deploy downtime to zero. With the tools available to developers and DevOps professionals it is possible to achieve zero-downtime deployments to QA and Staging systems. Doing this will give your internal customers more confidence in the systems you are delivering.

The post Metrics and KPIs for Test Environment Stability appeared first on Plutora.

Read the original blog entry...

More Stories By Plutora Blog

Plutora provides Enterprise Release and Test Environment Management SaaS solutions aligning process, technology, and information to solve release orchestration challenges for the enterprise.

Plutora’s SaaS solution enables organizations to model release management and test environment management activities as a bridge between agile project teams and an enterprise’s ITSM initiatives. Using Plutora, you can orchestrate parallel releases from several independent DevOps groups all while giving your executives as well as change management specialists insight into overall risk.

Supporting the largest releases for the largest organizations throughout North America, EMEA, and Asia Pacific, Plutora provides proof that large companies can adopt DevOps while managing the risks that come with wider adoption of self-service and agile software development in the enterprise. Aligning process, technology, and information to solve increasingly complex release orchestration challenges, this Gartner “Cool Vendor in IT DevOps” upgrades the enterprise release management from spreadsheets, meetings, and email to an integrated dashboard giving release managers insight and control over large software releases.

@DevOpsSummit Stories
"I think DevOps is now a rambunctious teenager – it’s starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with Tintri's web services architecture and APIs. Impress your DevOps team with smart and autonomous infrastructure.
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are an IT services solution provider and we sell software to support those solutions. Our focus and key areas are around security, enterprise monitoring, and continuous delivery optimization," noted John Balsavage, President of A&I Solutions, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud platforms. Through its four pillars of focus, Calligo delivers a platform that businesses can trust to deliver the high level of service and protection they expect and is lacking in many cloud offerings.
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise...
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big Data are challenging the existing design patterns of the Data Center. The increase in complexity of managing legacy systems alongside new systems is beyond the ability of most IT departments. This leads to multiple tiers of storage and high economic costs, during a time in which IT is expected to do more with less.
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex to learn. This is because Kubernetes is more of a toolset than a ready solution. Hence it’s essential to know when and how to apply the appropriate Kubernetes constructs.
As enterprise cloud becomes the norm, businesses and government programs must address compounded regulatory compliance related to data privacy and information protection. The most recent, Controlled Unclassified Information and the EU’s GDPR have board level implications and companies still struggle with demonstrating due diligence. Developers and DevOps leaders, as part of the pre-planning process and the associated supply chain, could benefit from updating their code libraries and design by incorporating changes.
"I'm here to leverage my secret sauce, which is using outsourced development and the company that I utilize is delaPlex Software and they've basically allowed me to win Fortune 500 companies," noted Justin Witz, CTO of FRA and PlanTools, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
DX World EXPO, LLC., a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, discussed the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insights from the vast trove of private and public data available to enterprises.
"I will be talking about ChatOps and ChatOps as a way to solve some problems in the DevOps space," explained Himanshu Chhetri, CTO of Addteq, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that EnterpriseTech has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. EnterpriseTech is a professional resource for news and intelligence covering the migration of high-end technologies into the enterprise and business-IT industry, with a special focus on high-tech solutions in new product development, workload management, increased efficiency, and maximizing competitive edge.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lavi, a Nutanix DevOps Solution Architect, explored the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
"We began as LinuxAcademy.com about five years ago as a very small outfit. Since then we've transitioned into more of a DevOps training company - the technologies and the tooling around DevOps," explained Doug Vanderweide, an instructor at Linux Academy, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of engineers can navigate the Carrier Ecosystem for your IT team acting as an extension of your business, producing a hassle-free experience.
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Get certified, manage the full lifecycle of your cloud-based resources, and build your knowledge based using Cloud Academy’s expert-created content, comprehensive Learning Paths, and innovative Hands-on Labs.
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a single platform to drive unprecedented simplicity in the data center. Customers can start with a base infrastructure and scale to multi-site and multi-geo infrastructures with predictable economics and performance.
SYS-CON Events announced today that CHEETAH Training & Innovation will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CHEETAH Training & Innovation is a cloud consulting and IT training firm specializing in improving clients cloud strategies and infrastructures for medium to large companies.