Welcome!

@DevOpsSummit Authors: Liz McMillan, Dalibor Siroky, Pat Romanski, Elizabeth White, Stackify Blog

Related Topics: @DevOpsSummit, Linux Containers, Containers Expo Blog

@DevOpsSummit: Blog Feed Post

NASA’s Juno Mission and IT Operations | @DevOpsSummit #APM #DevOps #ContinuousTesting

How does all this relate to human problems of operating an ITOps environment?

NASA's Juno Mission and IT Operations
By Ophir Ronen

I've always wanted to be a starship pilot traveling the stars. While there is a slim chance of interstellar travel happening in my lifetime, we are starting to enter a fascinating era. We're doing incredible things like landing on cometstesting ion engines, and even exploring EM-drives. What's especially exciting right at this moment is humanity placing a probe around one of the most intense environments in the solar system - orbiting Jupiter.

Harsh Environments
The Juno spacecraft has to deal with an incredibly harsh environment. The biggest challenge is the intense radiation - 20,000 greater than Earth's - which Juno will not survive but rather contend with for a brief time. "Once these electrons hit a spacecraft, they immediately begin to ricochet and release energy, creating secondary photons and particles, which then ricochet," Heidi Becker, leader of Juno's radiation-monitoring team, said during a news conference last month. "It's like a spray of radiation bullets."

Why am I bringing up the Jupiter mission in the context of IT Operations? How does all this relate to human problems of operating an ITOps environment? The answer is simple - both pose harsh environments that require planning, well-defined processes, and appropriate tooling in order to endure and thrive. The IT Operations version of a spray of radiation bullets is the at times overwhelming flood of non-actionable and actionable alerts flowing in from the various management systems.

In the past, we called these non-actionable alerts "noise" but we're moving away from that nomenclature as we're discovering golden nuggets of leading and trailing edge indicators in the sea of IT Operations alert data.

Alert Suppression
When my former company, Event Enrichment HQ, was acquired by PagerDuty late last year, the expectation was to augment the existing excellent array of incident response capabilities with event management focused enhancements. We initiated this effort by creating our PagerDuty common event format (PD-CEF) with which we normalize and structure alerts from your management systems. By doing so, we set the stage with which to build new and powerful tools to help you accelerate incident response. Building on that solid foundation of normalized event data, our new event rules engine allows you to classify groups of alerts and to act on them, starting with event and alert suppression. Alert suppression is necessary as our philosophy to deal with the enormous load of alerts generated by today's infrastructure is not to drop them but instead suppress them.

Why suppress alerts you ask? Our research has shown that many of those so-called "noise" alerts are leading edge indicators to much more severe issues. By sending in more events rather than less to PagerDuty, you will gain a much deeper and more profound understanding of the event flows and alert clusters in your IT Infrastructure using our new IT Operations visualization tools.

The Future
As you will see at PagerDuty Summit, these enhancements to PagerDuty's core offering will go far beyond what you have seen from us thus far. We are intensely focused on providing you the tooling with which to give you a deeper understanding and specific context to issues and incidents which impact your company.

Now a year in after the acquisition, I'm excited to report that PagerDuty has undergone an evolutionary leap into the future. We have always and will continue to embrace lean and agile methodology as per Tim's earlier post; we're focused on learning and empathy as described by Jonny; and we're creating a profound fusion of event management (data) and incident management (people) capabilities. These are heady times here at PagerDuty.

We're now T-1 week away from PagerDuty Summit where we'll kick off this wild ride and introduce you to all of these new capabilities. If you join us at The Village on Sept 13th, you will get to experience it first hand. I'm looking forward to seeing you there!

Referenced articles:

The post NASA's Juno Mission and IT Operations appeared first on PagerDuty.

Read the original blog entry...

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.

@DevOpsSummit Stories
As Marc Andreessen says software is eating the world. Everything is rapidly moving toward being software-defined – from our phones and cars through our washing machines to the datacenter. However, there are larger challenges when implementing software defined on a larger scale - when building software defined infrastructure. In his session at 16th Cloud Expo, Boyan Ivanov, CEO of StorPool, provided some practical insights on what, how and why when implementing "software-defined" in the datacenter.
ChatOps is an emerging topic that has led to the wide availability of integrations between group chat and various other tools/platforms. Currently, HipChat is an extremely powerful collaboration platform due to the various ChatOps integrations that are available. However, DevOps automation can involve orchestration and complex workflows. In his session at @DevOpsSummit at 20th Cloud Expo, Himanshu Chhetri, CTO at Addteq, will cover practical examples and use cases such as self-provisioning infrastructure/applications, self-remediation workflows, integrating monitoring and complimenting integrations between Atlassian tools and other top tools in the industry.
"Storpool does only block-level storage so we do one thing extremely well. The growth in data is what drives the move to software-defined technologies in general and software-defined storage," explained Boyan Ivanov, CEO and co-founder at StorPool, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Is advanced scheduling in Kubernetes achievable?Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations? In his session at @DevOpsSummit at 21st Cloud Expo, Oleg Chunikhin, CTO at Kublr, answered these questions and demonstrated techniques for implementing advanced scheduling. For example, using spot instances and cost-effective resources on AWS, coupled with the ability to deliver a minimum set of functionalities that cover the majority of needs – without configuration complexity.
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, which can process our conversational commands and orchestrate the outcomes we request across our personal and professional realm of connected devices.