Welcome!

@DevOpsSummit Authors: Elizabeth White, Sematext Blog, Yeshim Deniz, Derek Weeks, Liz McMillan

Related Topics: @DevOpsSummit

@DevOpsSummit: Blog Feed Post

Checklist for Cloud Service Operations

I knew one software company that failed their SaaS transition because they chose to cut a few corners

I knew one software company that failed their SaaS transition because they chose to cut a few corners with the operations. Since they were software engineers, they did not really want to spend time on such mundane tasks as security, auditing, backups and so on. One day they let a disgruntled employee go, the guy went to an internet cafe, logged into the hosting account with the shared admin credentials, and deleted all customer data.

There were no backups or data replicas to bring the data back, no personal admin accounts or procedures to prevent such an incident from happening, and even no monitoring to learn about the issue before customers did. This was the end of this SaaS application – it just never recovered.

Agile DevOps in the Cloud - Session recording from WSO2Con Asia 2014

Cloud business is more than just putting some code online
(and collecting money ;)) Whether you are offering Software-as-a-Service (SaaS) web application, Platform-as-a-Service (PaaS) or Infrastructure-as-a-Service (IaaS) – what you are offering is more than just your code – it is your service.

Even if you do not offer a formal service level agreement (SLA) and have a statement in your Terms of Service that you are not liable for anything, your online application or platform is still a service so your customers expect it to be reliable and secure.

At our recent WSO2Con, Chamith Kumarage delivered an excellent session on how our Cloud DevOps team works. If you are delivering a service online (or considering doing so) – make sure to watch the recording (quick registration required).

Here’s my quick summary of Chamith’s advice:

1. Automate everything: repetitive tasks not only are inefficient and mundane, and eat your time. When done manually they are unreliable. Humans tend to do things slightly differently each time they do them, or not do them at all.

2. Tasks are really parts of processes: when you come up with something that needs to be done, ask yourself what is the process flow for this task? For example, a data backup is really a part of a process that includes:

  • Scheduled (e.g. at 1 a.m. every day) script which creates a backup,
  • Some sort of monitoring system which verifies that the script ran and the backup got created,
  • Notifications on failures and procedures that need to be followed in not,
  • Backup testing: automated and/or regular manual recovery drills (if manual then documented and performed by different team members).

3. Design for failure: everything will be failing so make sure that your system can sustain the failures. For example, if your system uses multiple virtual machines in the cloud, keep running a “chaos monkey” script which keeps randomly killing the instances and automated tests which ensure that these instance failures do not affect the overall system (by the way, see how Netflix does that.)

4. Self-healing and success verification are critical for all tasks. Any task and operation can fail (see above) so the system should not get “surprised” but should always automatically validate the action results and if something didn’t go right – implement the healing procedures (start new instances, retry, and so on).

5. Enforce discipline, processes, automation, checklists. Document everything. This will make your processes repeatable and reliable.

Bus monkey test” (related to the above) if one of your team members gets hit by a bus – all operations should keep working: everything needs to be documented and tried by other team members. (* This is a mental experiment – do not actually hit your team-members by busses :))

6. Monitoring and analytics: the key is not to collect and show tons of data and alerts, but be able to quickly detect abnormal behavior.

7. Communications: your dashboards should quickly and clearly give you the big picture and relevant details. Key metrics and system state should be something that everyone sees and understands, effective drill-downs should make it easy to understand and fix stuff.

8. Agile delivery: waterfall processes in the cloud are bad and stressful.The smaller the changes and the more often and in more automated fashion they are – the more mundane they become: which lowers the risks and improves the skills and reliability. Cloud and big-bang releases do not go well together.

9. Use standard tools and native systems of underlying platforms – do not reinvent the wheels. For example, if the platform gives you SQL-as-a-service (Amazon RDS, Azure SQL and so on) – use those and not your own MySQL running on a virtual machine.

10. Post-mortem analysis is a must. If something did get wrong after all, you need a formal investigation process:

  • What happened?
  • Why and what needs to be done to prevent this in the future?
  • If automated monitoring didn’t catch it, why and what needs to be done to prevent this in the future?
  • If validation and self-healing didn’t catch it, why and what needs to be done to prevent this?

Full session recording and slides are available here.

Read the original blog entry...

More Stories By Dmitry Sotnikov

Dmitry Sotnikov is VP of Cloud at WSO2, building the cloud business for this leading middleware provider. Check out the WSO2 Cloud platform at http://CloudPreview.WSO2.com

@DevOpsSummit Stories
"We host and fully manage cloud data services, whether we store, the data, move the data, or run analytics on the data," stated Kamal Shannak, Senior Development Manager, Cloud Data Services, IBM, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Interoute has announced the integration of its Global Cloud Infrastructure platform with Rancher Labs’ container management platform, Rancher. This approach enables enterprises to accelerate their digital transformation and infrastructure investments. Matthew Finnie, Interoute CTO commented “Enterprises developing and building apps in the cloud and those on a path to Digital Transformation need Digital ICT Infrastructure that allows them to build, test and deploy faster than ever before. The integration of Rancher software with Interoute Digital Platform gives developers access to a managed container platform that sits on a global privately networked cloud, enabling true distributed computing.”
Whether you like it or not, DevOps is on track for a remarkable alliance with security. The SEC didn’t approve the merger. And your boss hasn’t heard anything about it. Yet, this unruly triumvirate will soon dominate and deliver DevSecOps faster, cheaper, better, and on an unprecedented scale. In his session at DevOps Summit, Frank Bunger, VP of Customer Success at ScriptRock, discussed how this cathartic moment will propel the DevOps movement from such stuff as dreams are made on to a practical, powerful, and insanely valuable asset to enterprises. You may call it DevSecOps, or SecDevOps, or maybe even DevOpsSec. Choose your own adventure.
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, will provide a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with Tintri's web services architecture and APIs. Impress your DevOps team with smart and autonomous infrastructure.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to change lives by teaching Linux and cloud technology to the tens of thousands of students that learn at the Linux Academy.
"delaPlex is a software development company. We do team-based outsourcing development," explained Mark Rivers, COO and Co-founder of delaPlex Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the application economy.
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in San Francisco and Tel Aviv, Loom Systems works with customers across industries around the world.
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverless applications, such as the AWS Serverless Application Model (AWS SAM), Chalice, and ClaudiaJS.
SYS-CON Events announced today that HTBase will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. HTBase (Gartner 2016 Cool Vendor) delivers a Composable IT infrastructure solution architected for agility and increased efficiency. It turns compute, storage, and fabric into fluid pools of resources that are easily composed and re-composed to meet each application’s needs. With HTBase, companies can quickly provision resources and deploy unique, mission-critical, self-designed solutions to add-onto or create any type of infrastructure as per the business requirement. HTBase is the first company to enable a true multi-cloud strategy, enabling organizations to automate movement of data and workloads between private and public clouds. This means that organizations can now move data and workloads between pub...
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on quality and value.
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Get certified, manage the full lifecycle of your cloud-based resources, and build your knowledge based using Cloud Academy’s expert-created content, comprehensive Learning Paths, and innovative Hands-on Labs.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buyers their thoughts on their experience.
SYS-CON Events announced today that Infranics will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Since 2000, Infranics has developed SysMaster Suite, which is required for the stable and efficient management of ICT infrastructure. The ICT management solution developed and provided by Infranics continues to add intelligence to the ICT infrastructure through the IMC (Infra Management Cycle) based on mathematical analysis and forecasting Big Data Analyze and Control.
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 additional third-party data centers across Europe. Its full-service Unified ICT platform serves international enterprises and many of the world’s leading service providers, as well as governments and universities.
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a single platform to drive unprecedented simplicity in the data center. Customers can start with a base infrastructure and scale to multi-site and multi-geo infrastructures with predictable economics and performance.
SYS-CON Events announced today that SD Times | BZ Media has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. BZ Media LLC is a high-tech media company that produces technical conferences and expositions, and publishes a magazine, newsletters and websites in the software development, SharePoint, mobile development and commercial UAV markets.
Building custom add-ons does not need to be limited to the ideas you see on a marketplace. In his session at 20th Cloud Expo, Sukhbir Dhillon, CEO and founder of Addteq, will go over some adventures they faced in developing integrations using Atlassian SDK and other technologies/platforms and how it has enabled development teams to experiment with newer paradigms like Serverless and newer features of Atlassian SDKs. In this presentation, you will be taken on a journey of Add-On and Integration development using popular tools.
Microservices are a very exciting architectural approach that many organizations are looking to as a way to accelerate innovation. Microservices promise to allow teams to move away from monolithic "ball of mud" systems, but the reality is that, in the vast majority of organizations, different projects and technologies will continue to be developed at different speeds. How to handle the dependencies between these disparate systems with different iteration cycles? Consider the "canoncial problem" in this scenario: microservice A (releases daily) depends on a couple of additions to backend B (releases quarterly).
After more than five years of DevOps, definitions are evolving, boundaries are expanding, ‘unicorns’ are no longer rare, enterprises are on board, and pundits are moving on. Can we now look at an evolution of DevOps? Should we? Is the foundation of DevOps ‘done’, or is there still too much left to do? What is mature, and what is still missing? What does the next 5 years of DevOps look like? In this Power Panel at DevOps Summit, moderated by DevOps Summit Conference Chair Andi Mann, panelists looked back at what DevOps has become, and forward at what it might create next.