Welcome!

@DevOpsSummit Authors: Zakia Bouachraoui, Yeshim Deniz, Elizabeth White, Pat Romanski, Liz McMillan

Related Topics: @DevOpsSummit, Java IoT, Microservices Expo, Linux Containers

@DevOpsSummit: Blog Post

Configuration Drift: The Cost of Complexity

Imagine this — you're rolling out a new version of your web app

Imagine this — you're rolling out a new version of your web app. Works great in the dev environment, and it's been signed off on in staging, so it gets rolled out to production. Things seem fine, so you call it a night.

Then the support requests begin flooding in. Something's broken somewhere, and it's not immediately obvious how. Performance monitor shows the machines are running well, so it can't be that. Ah well, better crack one of those neon-colored energy drinks, it's time to roll back and log into these machines to look through logs and config files for a potential cause. "How could this be happening," you ask, "I mean... these machines are all configured the same, right?"

costofcomplexity

Often, that's wrong.

Configuration drift is a very real and increasingly common problem, especially in growing environments. In a way, you can call it the "hidden cost of complexity," and there are a number of causes behind it.

  • Well-meaning team members could've updated something to a new version, installed a conflicting package or service, or applied a fix thought to be minor.
  • Software or OS updates applied here but not there could've thrown everything out of whack.
  • A tiny change in a far-flung config file could be the metaphorical butterfly that flapped its wings.
  • Changing settings or firmware on a network device may affect some or all clients connected through it.
  • A machine could've been compromised in a way that isn't obvious.
  • Space aliens.

And as wildly varied as the causes can be, the potential effects are even worse. We're talking downtime, failed infrastructure, loss of data, loss of business, and even loss of customer trust.

One reason the lurking configuration drift problem isn't more widely discussed in IT probably has a great deal to do with the wide variation in its causes and effects-something with a thousand possible causes and a thousand possible effects is difficult to pin down as one phenomenon. It's not as easy to define and fight as, say, viruses or hardware failure. Viruses are things we can point to and say, "These are bad, here's how they proliferate, and here's how you protect yourself," and as for hardware failure, we all know what that looks like and know how to mitigate it when it happens.

Another reason for not discussing config drift is probably that-until recently-there hasn't been a single solution for preventing or dealing with it.

GuardRail directly combats configuration drift by continually scanning and monitoring your configs across practically every platform and device. It's a robust, collaborative platform with tools to graphically identify differences and potential hazards, and alert you when something goes awry. Reports can be exported to PDF for auditing or compliance purposes, and configs you verify as good can be exported to Chef, Docker, Ansible, and Puppet for automation.

And when we say "collaborative," we mean it. We designed GuardRail from the ground-up to be simple enough to be a valuable tool for every stakeholder. Nodes and their differences are represented graphically, in an easy-to-navigate interface that's useful no matter your background.

Don't believe it's possible? We'd be happy to give you the grand tour and show you a live demo running on real devices. Or check out the product page and get started right away.

Read the original blog entry...

More Stories By ScriptRock Blog

ScriptRock makes GuardRail, a DevOps-ready platform for configuration monitoring.

Realizing we were spending way too much time digging up, cataloguing, and tracking machine configurations, we began writing our own scripts and tools to handle what is normally an enormous chore. Then we took the concept a step further, giving it a beautiful interface and making it simple enough for our bosses to understand. We named it GuardRail after its function — to allow businesses to move fast and stay safe.

GuardRail scans and tracks much more than just servers in a datacenter. It works with network hardware, Cloud service providers, CloudFlare, Android devices, infrastructure, and more.

@DevOpsSummit Stories
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like "How is my application doing" but no idea how to get a proper answer.
This session will provide an introduction to Cloud driven quality and transformation and highlight the key features that comprise it. A perspective on the cloud transformation lifecycle, transformation levers, and transformation framework will be shared. At Cognizant, we have developed a transformation strategy to enable the migration of business critical workloads to cloud environments. The strategy encompasses a set of transformation levers across the cloud transformation lifecycle to enhance process quality, compliance with organizational policies and implementation of information security and data privacy best practices. These transformation levers cover core areas such as Cloud Assessment, Governance, Assurance, Security and Performance Management. The transformation framework presented during this session will guide corporate clients in the implementation of a successful cloud solu...
So the dumpster is on fire. Again. The site's down. Your boss's face is an ever-deepening purple. And you begin debating whether you should join the #incident channel or call an ambulance to deal with his impending stroke. Yes, we know this is a developer's fault. There's plenty of time for blame later. Postmortems have a macabre name because they were once intended to be Viking-like funerals for someone's job. But we're civilized now. Sort of. So we call them post-incident reviews. Fires are never going to stop. We're human. We miss bugs. Or we fat finger a command - deleting dozens of servers and bringing down S3 in US-EAST-1 for hours - effectively halting the internet. These things happen.
Hackers took three days to identify and exploit a known vulnerability in Equifax’s web applications. I will share new data that reveals why three days (at most) is the new normal for DevSecOps teams to move new business /security requirements from design into production. This session aims to enlighten DevOps teams, security and development professionals by sharing results from the 4th annual State of the Software Supply Chain Report -- a blend of public and proprietary data with expert research and analysis.Attendees can join this session to better understand how DevSecOps teams are applying lessons from W. Edwards Deming (circa 1982), Malcolm Goldrath (circa 1984) and Gene Kim (circa 2013) to improve their ability to respond to new business requirements and cyber risks.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, software-defined solution with rich machine intelligence.