Welcome!

@DevOpsSummit Authors: Zakia Bouachraoui, Carmen Gonzalez, Yeshim Deniz, Elizabeth White, Courtney Abud

Related Topics: @DevOpsSummit, Linux Containers

@DevOpsSummit: Blog Feed Post

Advice for the New On-Call Engineer By @VictorOps | @DevOpsSummit [#DevOps]

There is more to being on-call than just knowing how to type in the latest ChatOps commands

Advice for the New On-call Engineer

By Dan Hopkins

There is more to being on-call than just knowing how to type in the latest ChatOps commands, reboot AMIs and print out java stack traces. There are life skills that come from being on-call for a while and fortunately, those are lessons that can be taught.

Here at VictorOps we’re currently adding six new engineers to our on-call roster, so I’ve been thinking about the experience of being on-call and how to make the best of it.

The first day you go on-call can be frightening. The most important thing to remember is that you’ve already passed the first test. You have the trust and respect of your teammates and are providing them with a valuable commodity: peace of mind. No one wants to be on-call, so stepping up to the plate and taking shifts helps to improve the lives of everyone on your team.

https://www.flickr.com/photos/zakh/

1.) Make sure you understand and have the tools you need to do your job. If you don’t know how to use them while you’re at work, there is no way you’ll remember at 2am. Here’s a list, obviously your particular job might vary…

* VPN
* SSH credentials
* sudo privileges
* RSA key fob
* Credentials to your support portal
* Phone numbers and escalation policies for components of the system that you’re responsible for
* Links to the runbooks or chatops commands

2.) Understand the expectations for being on-call, both implicit and explicit. Hopefully your company has taken time to document the expectation for how you’re supposed to behave when you’re on-call. It’s always best to have things explicit, but looking through your chat rooms or timeline might give you indication if there are implicit rules that different team members follow. Some examples of both implicit and explicit rules are:

* “How fast should you be responding to pages?”
* “When should you escalate incidents to more senior team members, other teams or customer support?”
* “How should you handle short periods of time where you need to be away from your computer, such as going out to dinner or a movie?”

at_mentions

3.) Remember to communicate. This is often a tricky one for people in our field but communicating between teams (both engineering and non-engineering) is one of the key skills to being an on-call engineer. In addition to being expected to fix or diagnose issues, you’re there to send out communications with the rest of your team(s). There is definitely finesse in understanding when an issue needs to be run up the flagpole so take care to learn from how others on your team communicate.

4.) Manage your life. If you’re not a full time on-call engineer, you’re going to spend a lot of time balancing your “real duties” with being on-call and most importantly, with having a life. This is a tricky balance to get good at. If you’re on-call for extended periods (longer than a few days) you’re going to notice a precipitous drop off in “vigilance.” There are behaviors and a level of focus that you can only sustain for so long while being on-call.

2984249685_7fc90e5b13_o

5.) What about sleeping? When you’re on-call on a night shift, and you’ll be sleeping during it, there is a quick “pre-sleep” checklist that you should learn:

* Your “pager” should be set to “make lots of noise”
* Check your timeline for any warnings that will become incidents overnight (better to catch it early)
* You might save yourself a headache by having your computer at hand (close to your bed) so you don’t have to run through the house in your skivvys

6.) You’re not actually on house arrest. If you still want to have a life while on-call you might, on occasion, leave the house. Consider doing a few of the following:

* take your laptop and a phone that can tether
* let your teammates know
* trade on-call for a couple hours

Hopefully your first night on-call won’t be the shitstorm you fear and you’ll move on to be an integral part of the on-call team. If you’re looking for other helpful tips, check out our On-Call Firefight Survival Guide. Here’s to making on-call suck less!

The post Advice for the New On-call Engineer appeared first on VictorOps.

Read the original blog entry...

More Stories By VictorOps Blog

VictorOps is making on-call suck less with the only collaborative alert management platform on the market.

With easy on-call scheduling management, a real-time incident timeline that gives you contextual relevance around your alerts and powerful reporting features that make post-mortems more effective, VictorOps helps your IT/DevOps team solve problems faster.

@DevOpsSummit Stories
Cloud-Native thinking and Serverless Computing are now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential. DevOpsSUMMIT at CloudEXPO expands the DevOps community, enable a wide sharing of knowledge, and educate delegates and technology providers alike.
Cloud-Native thinking and Serverless Computing are now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential. DevOpsSUMMIT at CloudEXPO expands the DevOps community, enable a wide sharing of knowledge, and educate delegates and technology providers alike.
The dream is universal: heuristic driven, global business operations without interruption so that nobody has to wake up at 4am to solve a problem. Building upon Nutanix Acropolis software defined storage, virtualization, and networking platform, Mark will demonstrate business lifecycle automation with freedom of choice and consumption models. Hybrid cloud applications and operations are controllable by the Nutanix Prism control plane with Calm automation, which can weave together the following: database as a service with Era, micro segmentation with Flow, event driven lifecycle operations with Epoch monitoring, and both financial and cloud governance with Beam. Combined together, the Nutanix Enterprise Cloud OS democratizes and accelerates every aspect of your business with simplicity, security, and scalability.
Is your enterprise growing the right skills to fight the digital transformation (DX) battles? With 69% of enterprises describing the DX skill drought as being soft skills, rather than technology skills, are you ready to survive against disrupters? The next wave of business disruption is already crashing on your enterprise as AI, Blockchain and IoT change the nature and location of business. Now is the time to prepare. Drawing on experiences with large and midsize enterprises, Marco Coulter tabulates the skills needed to survive DX while innovating at scale. He will start with a focus on the ‘lingua franca' or common language between business and technology needed for today's digitally savvy or agile enterprise.
Where many organizations get into trouble, however, is that they try to have a broad and deep knowledge in each of these areas. This is a huge blow to an organization's productivity. By automating or outsourcing some of these pieces, such as databases, infrastructure, and networks, your team can instead focus on development, testing, and deployment. Further, organizations that focus their attention on these areas can eventually move to a test-driven development structure that condenses several long phases into a faster, more efficient process. This methodology has a name, of course: Continuous delivery. As Jones pointed out at CloudEXPO, continuous delivery allows developers to trim the fat off tasks and gives them more time to focus on the individual parts of the process. But remember-implementing this methodology requires organizations to offload management of databases, infrastruct...