Welcome!

@DevOpsSummit Authors: Elizabeth White, Yeshim Deniz, Pat Romanski, Aruna Ravichandran, Liz McMillan

Related Topics: @DevOpsSummit

@DevOpsSummit: Blog Feed Post

Checklist for Cloud Service Operations

I knew one software company that failed their SaaS transition because they chose to cut a few corners

I knew one software company that failed their SaaS transition because they chose to cut a few corners with the operations. Since they were software engineers, they did not really want to spend time on such mundane tasks as security, auditing, backups and so on. One day they let a disgruntled employee go, the guy went to an internet cafe, logged into the hosting account with the shared admin credentials, and deleted all customer data.

There were no backups or data replicas to bring the data back, no personal admin accounts or procedures to prevent such an incident from happening, and even no monitoring to learn about the issue before customers did. This was the end of this SaaS application – it just never recovered.

Agile DevOps in the Cloud - Session recording from WSO2Con Asia 2014

Cloud business is more than just putting some code online
(and collecting money ;)) Whether you are offering Software-as-a-Service (SaaS) web application, Platform-as-a-Service (PaaS) or Infrastructure-as-a-Service (IaaS) – what you are offering is more than just your code – it is your service.

Even if you do not offer a formal service level agreement (SLA) and have a statement in your Terms of Service that you are not liable for anything, your online application or platform is still a service so your customers expect it to be reliable and secure.

At our recent WSO2Con, Chamith Kumarage delivered an excellent session on how our Cloud DevOps team works. If you are delivering a service online (or considering doing so) – make sure to watch the recording (quick registration required).

Here’s my quick summary of Chamith’s advice:

1. Automate everything: repetitive tasks not only are inefficient and mundane, and eat your time. When done manually they are unreliable. Humans tend to do things slightly differently each time they do them, or not do them at all.

2. Tasks are really parts of processes: when you come up with something that needs to be done, ask yourself what is the process flow for this task? For example, a data backup is really a part of a process that includes:

  • Scheduled (e.g. at 1 a.m. every day) script which creates a backup,
  • Some sort of monitoring system which verifies that the script ran and the backup got created,
  • Notifications on failures and procedures that need to be followed in not,
  • Backup testing: automated and/or regular manual recovery drills (if manual then documented and performed by different team members).

3. Design for failure: everything will be failing so make sure that your system can sustain the failures. For example, if your system uses multiple virtual machines in the cloud, keep running a “chaos monkey” script which keeps randomly killing the instances and automated tests which ensure that these instance failures do not affect the overall system (by the way, see how Netflix does that.)

4. Self-healing and success verification are critical for all tasks. Any task and operation can fail (see above) so the system should not get “surprised” but should always automatically validate the action results and if something didn’t go right – implement the healing procedures (start new instances, retry, and so on).

5. Enforce discipline, processes, automation, checklists. Document everything. This will make your processes repeatable and reliable.

Bus monkey test” (related to the above) if one of your team members gets hit by a bus – all operations should keep working: everything needs to be documented and tried by other team members. (* This is a mental experiment – do not actually hit your team-members by busses :))

6. Monitoring and analytics: the key is not to collect and show tons of data and alerts, but be able to quickly detect abnormal behavior.

7. Communications: your dashboards should quickly and clearly give you the big picture and relevant details. Key metrics and system state should be something that everyone sees and understands, effective drill-downs should make it easy to understand and fix stuff.

8. Agile delivery: waterfall processes in the cloud are bad and stressful.The smaller the changes and the more often and in more automated fashion they are – the more mundane they become: which lowers the risks and improves the skills and reliability. Cloud and big-bang releases do not go well together.

9. Use standard tools and native systems of underlying platforms – do not reinvent the wheels. For example, if the platform gives you SQL-as-a-service (Amazon RDS, Azure SQL and so on) – use those and not your own MySQL running on a virtual machine.

10. Post-mortem analysis is a must. If something did get wrong after all, you need a formal investigation process:

  • What happened?
  • Why and what needs to be done to prevent this in the future?
  • If automated monitoring didn’t catch it, why and what needs to be done to prevent this in the future?
  • If validation and self-healing didn’t catch it, why and what needs to be done to prevent this?

Full session recording and slides are available here.

Read the original blog entry...

More Stories By Dmitry Sotnikov

Dmitry Sotnikov is VP of Cloud at WSO2, building the cloud business for this leading middleware provider. Check out the WSO2 Cloud platform at http://CloudPreview.WSO2.com

@DevOpsSummit Stories
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and controlling infrastructure. The rise of Site Reliability Engineering (SRE) is part of that redefinition of operations vs development roles in organizations.
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we've helped public, private and nonprofit organizations implement technology solutions that speed and simplify their operations. As one of the fastest growing IT solution providers in the country, we have gained a reputation for effortless implementations with relentless follow-through and enduring support.
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of engineers can navigate the Carrier Ecosystem for your IT team acting as an extension of your business, producing a hassle-free experience.
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual business failure.
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating systems, programming languages and microprocessors. Their elite team has collectively earned dozens of patents, three film credits and grown record setting businesses. And collectively, they've shipped more than 2 billion licensed products. They are difference makers who have a reputation for delivering innovative products and accomplishing what many others don't believe is even possible. They are ...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that TidalScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale is the leading provider of Software-Defined Servers that bring flexibility to modern data centers by right-sizing servers on the fly to fit any data set or workload. TidalScale’s award-winning inverse hypervisor technology combines multiple commodity servers (including their associated CPUs, memory storage and network) into one or more large servers capable of handling the biggest Big Data problems and most unpredictable workloads.
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant that knows everything and can respond to your emotions and verbal commands!
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, will describe how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launching of virtual storage services to its enterprise market.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lavi, a Nutanix DevOps Solution Architect, explored the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo Silicon Valley which will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is at the intersection of technology and business-optimizing tools, organizations and processes to bring measurable improvements in productivity and profitability," said Aruna Ravichandran, vice president, DevOps product and solutions marketing, CA Technologies. "It's this results-driven combination of technology and business that makes me so passionate about DevOps and its future in the industry. I am truly honored to take on this co-chair role, and look forward to working with the DevOps Summit team at Cloud Expo and attendees to advance DevOps."
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based infrastructure. Delivering modern applications to end users, therefore, is a constantly shifting challenge. Delivery automation helps IT Ops teams ensure that apps are providing an optimal end user experience over hybrid-cloud and multi-cloud environments, no matter what the current state of the infrastructure is. To employ a delivery automation strategy that reflects your business rules, making real-time decisions based on a combination of real user monitoring, synthetic testing, APM, NGINX / local load balancers, and other data sources, is critical.
Digital transformation is changing the face of business. The IDC predicts that enterprises will commit to a massive new scale of digital transformation, to stake out leadership positions in the "digital transformation economy." Accordingly, attendees at the upcoming Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA, Oct 31-Nov 2, will find fresh new content in a new track called Enterprise Cloud & Digital Transformation.
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, will discuss how given the magnitude of today's application ecosystem, tweaking existing software to stitch various components together leads to sub-optimal solutions. This definitely deserves a re-think, and paves the way for a new breed of lightweight application servers that are micro-services and DevOps ready!
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp empowers global organizations to unleash the full potential of their data to expand customer touchpoints, foster greater innovation and optimize their operations.
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. That means serverless is also changing the way we leverage public clouds. Truth-be-told, many enterprise IT shops were so happy to get out of the management of physical servers within a data center that many limitations of the existing public IaaS clouds were forgiven. However, now that we’ve lived a few years with public IaaS clouds, developers and CloudOps pros are giving a huge thumbs down to the ...
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. This leads to a waste of cloud resources and increased operational overhead.
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere delivers a more modern architectural approach to storage that doesn't require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbuilding of data centers to house increasing amounts of storage infrastructure.
Containers are rapidly finding their way into enterprise data centers, but change is difficult. How do enterprises transform their architecture with technologies like containers without losing the reliable components of their current solutions? In his session at @DevOpsSummit at 21st Cloud Expo, Tony Campbell, Director, Educational Services at CoreOS, will explore the challenges organizations are facing today as they move to containers and go over how Kubernetes applications can deploy with legacy components, and also go over automated capabilities provided by operators to auto-update Kubernetes with zero downtime for current and secure deployments.