Welcome!

@DevOpsSummit Authors: Yeshim Deniz, Zakia Bouachraoui, Liz McMillan, Elizabeth White, Pat Romanski

Related Topics: @DevOpsSummit, Linux Containers

@DevOpsSummit: Blog Feed Post

Advice for the New On-Call Engineer By @VictorOps | @DevOpsSummit [#DevOps]

There is more to being on-call than just knowing how to type in the latest ChatOps commands

Advice for the New On-call Engineer

By Dan Hopkins

There is more to being on-call than just knowing how to type in the latest ChatOps commands, reboot AMIs and print out java stack traces. There are life skills that come from being on-call for a while and fortunately, those are lessons that can be taught.

Here at VictorOps we’re currently adding six new engineers to our on-call roster, so I’ve been thinking about the experience of being on-call and how to make the best of it.

The first day you go on-call can be frightening. The most important thing to remember is that you’ve already passed the first test. You have the trust and respect of your teammates and are providing them with a valuable commodity: peace of mind. No one wants to be on-call, so stepping up to the plate and taking shifts helps to improve the lives of everyone on your team.

https://www.flickr.com/photos/zakh/

1.) Make sure you understand and have the tools you need to do your job. If you don’t know how to use them while you’re at work, there is no way you’ll remember at 2am. Here’s a list, obviously your particular job might vary…

* VPN
* SSH credentials
* sudo privileges
* RSA key fob
* Credentials to your support portal
* Phone numbers and escalation policies for components of the system that you’re responsible for
* Links to the runbooks or chatops commands

2.) Understand the expectations for being on-call, both implicit and explicit. Hopefully your company has taken time to document the expectation for how you’re supposed to behave when you’re on-call. It’s always best to have things explicit, but looking through your chat rooms or timeline might give you indication if there are implicit rules that different team members follow. Some examples of both implicit and explicit rules are:

* “How fast should you be responding to pages?”
* “When should you escalate incidents to more senior team members, other teams or customer support?”
* “How should you handle short periods of time where you need to be away from your computer, such as going out to dinner or a movie?”

at_mentions

3.) Remember to communicate. This is often a tricky one for people in our field but communicating between teams (both engineering and non-engineering) is one of the key skills to being an on-call engineer. In addition to being expected to fix or diagnose issues, you’re there to send out communications with the rest of your team(s). There is definitely finesse in understanding when an issue needs to be run up the flagpole so take care to learn from how others on your team communicate.

4.) Manage your life. If you’re not a full time on-call engineer, you’re going to spend a lot of time balancing your “real duties” with being on-call and most importantly, with having a life. This is a tricky balance to get good at. If you’re on-call for extended periods (longer than a few days) you’re going to notice a precipitous drop off in “vigilance.” There are behaviors and a level of focus that you can only sustain for so long while being on-call.

2984249685_7fc90e5b13_o

5.) What about sleeping? When you’re on-call on a night shift, and you’ll be sleeping during it, there is a quick “pre-sleep” checklist that you should learn:

* Your “pager” should be set to “make lots of noise”
* Check your timeline for any warnings that will become incidents overnight (better to catch it early)
* You might save yourself a headache by having your computer at hand (close to your bed) so you don’t have to run through the house in your skivvys

6.) You’re not actually on house arrest. If you still want to have a life while on-call you might, on occasion, leave the house. Consider doing a few of the following:

* take your laptop and a phone that can tether
* let your teammates know
* trade on-call for a couple hours

Hopefully your first night on-call won’t be the shitstorm you fear and you’ll move on to be an integral part of the on-call team. If you’re looking for other helpful tips, check out our On-Call Firefight Survival Guide. Here’s to making on-call suck less!

The post Advice for the New On-call Engineer appeared first on VictorOps.

Read the original blog entry...

More Stories By VictorOps Blog

VictorOps is making on-call suck less with the only collaborative alert management platform on the market.

With easy on-call scheduling management, a real-time incident timeline that gives you contextual relevance around your alerts and powerful reporting features that make post-mortems more effective, VictorOps helps your IT/DevOps team solve problems faster.

@DevOpsSummit Stories
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a member of the Society of Information Management (SIM) Atlanta Chapter. She received a Business and Economics degree with a minor in Computer Science from St. Andrews Presbyterian University (Laurinburg, North Carolina). She resides in metro-Atlanta (Georgia).
The current environment of Continuous Disruption requires companies to transform how they work and how they engineer their products. Transformations are notoriously hard to execute, yet many companies have succeeded. What can we learn from them? Can we produce a blueprint for a transformation? This presentation will cover several distinct approaches that companies take to achieve transformation. Each approach utilizes different levers and comes with its own advantages, tradeoffs, costs, risks, and outcomes.
This sixteen (16) hour course provides an introduction to DevOps, the cultural and professional movement that stresses communication, collaboration, integration and automation in order to improve the flow of work between software developers and IT operations professionals. Improved workflows will result in an improved ability to design, develop, deploy and operate software and services faster.
Enterprises are universally struggling to understand where the new tools and methodologies of DevOps fit into their organizations, and are universally making the same mistakes. These mistakes are not unavoidable, and in fact, avoiding them gifts an organization with sustained competitive advantage, just like it did for Japanese Manufacturing Post WWII.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a multi-faceted approach of strategy and enterprise business development. Andrew graduated from Loyola University in Maryland and University of Auckland with degrees in economics and international finance.