Welcome!

@DevOpsSummit Authors: Yeshim Deniz, Zakia Bouachraoui, Pat Romanski, Liz McMillan, Elizabeth White

Related Topics: @DevOpsSummit, Linux Containers, @CloudExpo

@DevOpsSummit: Blog Feed Post

DevOps: Apollo Mission Control | @DevOpsSummit @ToddVernon #DevOps

The pattern between the world of space mission operations and the evolution of SaaS businesses is converging

DevOps and Apollo Mission Control
By Todd Vernon

Lately I have been reading the excellent book Digital Apollo. It explores the evolution of digital control systems and the man-machine interface that evolved during the development of space flight and ultimately the Apollo missions. It’s a fantastic book – more technical than most – but very approachable to those not familiar with flight control, embedded software, or the challenges of building such systems. As I read the book, I could not help but compare the way space missions were executed to that of the role of DevOps in modern SaaS businesses.

Apollo cover

I started my career at NASA testing digital flight controls for an experimental aircraft the X-29. The X-29 flight test program was just the latest in the series of one-off aircrafts that started with the Bell X-1 and moved to the X-15 that laid the groundwork for Apollo. As a result, flight test was executed in a very similar fashion in all these programs. Nearly every switch, surface, actuator, probe was instrumented and that data was downlinked in real-time to a control room as the airplane or spacecraft flew.

As the vehicles became more fly-by-wire and had digital computers at their core, those computers also downlinked a lot of their internal state variables to the ground where teams of engineers could keep track of every button push, flight mode, acceleration in real-time, helping the pilot look for things that could happen to potentially end the mission or end his life.

The pattern between the world of space mission operations and the evolution of SaaS businesses is converging. While generally no one dies if your SaaS service fails to operate, the implication of downtime every year gets more and more real. If you operate a platform that services customers that collectively pay millions of dollars a day for your product or service, that is serious business.

Like state variables downlinked from Apollo, we now watch the equivalent using tools like New Relic as our systems support millions or billions of customer transactions through the services we have built. While Apollo’s AGC had to work for several hundred hours at a time, our SaaS services get turned on once when we launch our company and the mission goes on forever. As a result, we are replacing rooms of engineers there for days with systems that connect them to the technology all the time.

Modern monitoring tools are starting to approach the quality of observation we had back at NASA for immediacy of data, but at the same time, now far surpass those relatively crude tools for the spontaneity of exploration and discovery. Today, I get an alert on my iPhone when some part of our system is acting inconsistent and I can interact with our engineers in real-time regardless of location.

Like the rooms of engineers that supported an Apollo mission, today we are on the verge of supporting our complex systems with a virtual room of engineers using tools like VictorOps. As systems become more complex, it becomes more likely the problem needs to be solved by the person that wrote the code in the first place. Very often only that person has (or ever had) the knowledge of how the system works with such intimacy as to know how to fix it or work around it to keep the mission (business) functioning.

apollo_14_lm

On Apollo 14, an engineer noticed while the space craft was in Lunar orbit that the software bit, buried deep in the guidance and navigation computer inside the Lunar Module (LM), signified that the descent program abort was initiated. This was caused by a loose bead of solder that effectively kept “pushing” the abort button and was not a problem or even noticed by the crew as the descent program that would land the astronauts on the moon was not running yet.

Had that program been initiated, as was scheduled only minutes later, the mission would have been aborted and quite likely the crew would have been lost. The knowledge of that specific engineer who knew how the system would misbehave was enabled by the ability to be connected to software through advanced monitoring, and the ability to act on that data in real-time. If you removed any part of the equation, Apollo 14 would have been much different.

This is the basic DNA of how we look at our product at VictorOps. We connect engineers to the mission-critical machines that run your business. If you expect the unexpected and outfit your teams accordingly, you can be ready to respond to any problem faster and more accurately then your competition.

The post DevOps and Apollo Mission Control appeared first on VictorOps.

More Stories By VictorOps Blog

VictorOps is making on-call suck less with the only collaborative alert management platform on the market.

With easy on-call scheduling management, a real-time incident timeline that gives you contextual relevance around your alerts and powerful reporting features that make post-mortems more effective, VictorOps helps your IT/DevOps team solve problems faster.

@DevOpsSummit Stories
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Addteq is a leader in providing business solutions to Enterprise clients. Addteq has been in the business for more than 10 years. Through the use of DevOps automation, Addteq strives on creating innovative solutions to solve business processes. Clients depend on Addteq to modernize the software delivery process by providing Atlassian solutions, create custom add-ons, conduct training, offer hosting, perform DevOps services, and provide overall support services.
Contino is a global technical consultancy that helps highly-regulated enterprises transform faster, modernizing their way of working through DevOps and cloud computing. They focus on building capability and assisting our clients to in-source strategic technology capability so they get to market quickly and build their own innovation engine.
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addresses many of the challenges faced by developers and operators as monolithic applications transition towards a distributed microservice architecture. A tracing tool like Jaeger analyzes what's happening as a transaction moves through a distributed system. Monitoring software like Prometheus captures time-series events for real-time alerting and other uses. Grafeas and Kritis provide security polic...
DevOpsSUMMIT at CloudEXPO will expand the DevOps community, enable a wide sharing of knowledge, and educate delegates and technology providers alike. Recent research has shown that DevOps dramatically reduces development time, the amount of enterprise IT professionals put out fires, and support time generally. Time spent on infrastructure development is significantly increased, and DevOps practitioners report more software releases and higher quality. Sponsors of DevOpsSUMMIT at CloudEXPO will benefit from unmatched branding, profile building and lead generation opportunities.