Click here to close now.

Welcome!

DevOps Journal Authors: Kathy Thomas, Pat Romanski, Mike Kavis, Elizabeth White, Carmen Gonzalez

Related Topics: DevOps Journal

DevOps Journal: Blog Feed Post

Checklist for Cloud Service Operations

I knew one software company that failed their SaaS transition because they chose to cut a few corners

I knew one software company that failed their SaaS transition because they chose to cut a few corners with the operations. Since they were software engineers, they did not really want to spend time on such mundane tasks as security, auditing, backups and so on. One day they let a disgruntled employee go, the guy went to an internet cafe, logged into the hosting account with the shared admin credentials, and deleted all customer data.

There were no backups or data replicas to bring the data back, no personal admin accounts or procedures to prevent such an incident from happening, and even no monitoring to learn about the issue before customers did. This was the end of this SaaS application – it just never recovered.

Agile DevOps in the Cloud - Session recording from WSO2Con Asia 2014

Cloud business is more than just putting some code online
(and collecting money ;)) Whether you are offering Software-as-a-Service (SaaS) web application, Platform-as-a-Service (PaaS) or Infrastructure-as-a-Service (IaaS) – what you are offering is more than just your code – it is your service.

Even if you do not offer a formal service level agreement (SLA) and have a statement in your Terms of Service that you are not liable for anything, your online application or platform is still a service so your customers expect it to be reliable and secure.

At our recent WSO2Con, Chamith Kumarage delivered an excellent session on how our Cloud DevOps team works. If you are delivering a service online (or considering doing so) – make sure to watch the recording (quick registration required).

Here’s my quick summary of Chamith’s advice:

1. Automate everything: repetitive tasks not only are inefficient and mundane, and eat your time. When done manually they are unreliable. Humans tend to do things slightly differently each time they do them, or not do them at all.

2. Tasks are really parts of processes: when you come up with something that needs to be done, ask yourself what is the process flow for this task? For example, a data backup is really a part of a process that includes:

  • Scheduled (e.g. at 1 a.m. every day) script which creates a backup,
  • Some sort of monitoring system which verifies that the script ran and the backup got created,
  • Notifications on failures and procedures that need to be followed in not,
  • Backup testing: automated and/or regular manual recovery drills (if manual then documented and performed by different team members).

3. Design for failure: everything will be failing so make sure that your system can sustain the failures. For example, if your system uses multiple virtual machines in the cloud, keep running a “chaos monkey” script which keeps randomly killing the instances and automated tests which ensure that these instance failures do not affect the overall system (by the way, see how Netflix does that.)

4. Self-healing and success verification are critical for all tasks. Any task and operation can fail (see above) so the system should not get “surprised” but should always automatically validate the action results and if something didn’t go right – implement the healing procedures (start new instances, retry, and so on).

5. Enforce discipline, processes, automation, checklists. Document everything. This will make your processes repeatable and reliable.

Bus monkey test” (related to the above) if one of your team members gets hit by a bus – all operations should keep working: everything needs to be documented and tried by other team members. (* This is a mental experiment – do not actually hit your team-members by busses :))

6. Monitoring and analytics: the key is not to collect and show tons of data and alerts, but be able to quickly detect abnormal behavior.

7. Communications: your dashboards should quickly and clearly give you the big picture and relevant details. Key metrics and system state should be something that everyone sees and understands, effective drill-downs should make it easy to understand and fix stuff.

8. Agile delivery: waterfall processes in the cloud are bad and stressful.The smaller the changes and the more often and in more automated fashion they are – the more mundane they become: which lowers the risks and improves the skills and reliability. Cloud and big-bang releases do not go well together.

9. Use standard tools and native systems of underlying platforms – do not reinvent the wheels. For example, if the platform gives you SQL-as-a-service (Amazon RDS, Azure SQL and so on) – use those and not your own MySQL running on a virtual machine.

10. Post-mortem analysis is a must. If something did get wrong after all, you need a formal investigation process:

  • What happened?
  • Why and what needs to be done to prevent this in the future?
  • If automated monitoring didn’t catch it, why and what needs to be done to prevent this in the future?
  • If validation and self-healing didn’t catch it, why and what needs to be done to prevent this?

Full session recording and slides are available here.

Read the original blog entry...

More Stories By Dmitry Sotnikov

Dmitry Sotnikov is VP of Cloud at WSO2, building the cloud business for this leading middleware provider. Check out the WSO2 Cloud platform at http://CloudPreview.WSO2.com

@DevOpsSummit Stories
In comparison to Infrastructure-as-a-Service (IaaS) and Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) has experienced a slower rate of adoption. However, PaaS is coming into its own as evidenced by increased usage and top technology companies such as HP and IBM supporting open source projects like Cloud Foundry by bringing their own PaaS to market. PaaS is important because apps are an important part of our lives. With this relatively new dependency on web and mobile applications, organizations need to find a way to automate their app deployment and management process just to sta...
Modern Systems announced completion of a successful project with its new Rapid Program Modernization (eavRPMa"c) software. The eavRPMa"c technology architecturally transforms legacy applications, enabling faster feature development and reducing time-to-market for critical software updates. Working with Modern Systems, the University of California at Santa Barbara (UCSB) leveraged eavRPMa"c to transform its Student Information System from Software AG's Natural syntax to a modern application leveraging C# and SQL Server. The newly modernized system enabled UCSB to leverage agile development me...
Hosted PaaS providers have given independent developers and startups huge advantages in efficiency and reduced time-to-market over their more process-bound counterparts in enterprises. Software frameworks are now available that allow enterprise IT departments to provide these same advantages for developers in their own organization. In his workshop session at DevOps Summit, Troy Topnik, ActiveState’s Technical Product Manager, will show how on-prem or cloud-hosted Private PaaS can enable organizations to use this technology on their own infrastructure, and introduce efficient development wor...
In today's digital world, change is the one constant. Disruptive innovations like cloud, mobility, social media, and the Internet of Things have reshaped the market and set new standards in customer expectations. To remain competitive, businesses must tap the potential of emerging technologies and markets through the rapid release of new products and services. However, the rigid and siloed structures of traditional IT platforms and processes are slowing them down – resulting in lengthy delivery cycles and a poor customer experience.
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed expressly to run Docker. He will also discuss Rancher, an orchestration and service discovery platf...
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Things. Akana enables enterprises to share data as APIs, connect and integrate applications, drive part...
WSM International has launched a DevOps services division that offers assessment, consulting and implementation to large enterprises and organizations with complex infrastructures. The concept of DevOps is to blend information technology (IT) software development with operations to optimize the computing infrastructure according to the specific needs of the organization. According to a recent press release from Gartner, "By 2016, DevOps will evolve from a niche strategy employed by large cloud providers to a mainstream strategy employed by 25 percent of Global 2000 organizations."
SYS-CON Events announced today that WSM International (WSM), the world’s leading cloud and server migration services provider, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. WSM is a solutions integrator with a core focus on cloud and server migration, transformation and DevOps services.
What’s inside the cloud? Hard work. Cloud operators know the world inside the datacenter is gritty. Vendor marketing speak and cloudwashing quickly melt in the heat of SLAs, uptime guarantees, and users who want it now. In his session at DevOps Summit, Hernan Alvarez, Chief Product Officer at Blue Box Group, will deliver an unvarnished look inside the world of cloud operators, from the perspective of someone who lives it. Attendees get a front-row look into the toolkits and processes that enable cloud operators to stand up and manage private clouds end to end. Bring your questions and learn h...
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional Social, Mobile and Cloud user experiences, our solutions help large and medium-sized organizations dr...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
Cloud computing is changing the way we look at IT costs, according to industry experts on a recent Cloud Luminary Fireside Chat panel discussion. Enterprise IT, traditionally viewed as a cost center, now plays a central role in the delivery of software-driven goods and services. Therefore, companies need to understand their cloud utilization and resulting costs in order to ensure profitability on their business offerings. Led by Bernard Golden, this fireside chat offers valuable insights on how organizations can get a better handle on their use of cloud computing.
Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.
The speed of software changes in growing and large scale rapid-paced DevOps environments presents a challenge for continuous testing. Many organizations struggle to get this right. Practices that work for small scale continuous testing may not be sufficient as the requirements grow. In his session at DevOps Summit, Marc Hornbeek, Sr. Solutions Architect of DevOps continuous test solutions at Spirent Communications, will explain the best practices of continuous testing at high scale, which is relevant to small scale DevOps, and if there is an expectation of growth as the number of build targe...
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. Join SoftLayer on June 9 at 16th Cloud Expo to learn about IBM Cloud's SoftLayer platform, explore se...
DevOps is all the rage these days and with good reason as it promises to reduce the time-to-market for new applications. It also promises to improve change management, allowing teams to deploy changes to their applications quickly and efficiently. However, DevOps isn’t something you buy, install, or implement; rather it is the symptom of an appropriate organizational system. In his session at DevOps Summit, Mark Thiele, EVP, Data Center Technologies at SUPERNAP International, will discuss how to get to the right organizational model that will allow DevOps practices to flourish.
SYS-CON Events announced today Sematext Group, Inc., a Brooklyn-based Performance Monitoring and Log Management solution provider, will exhibit at SYS-CON's DevOps Summit 2015 New York, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), search analytics (SSA), and search enhancement. They also provide exceptional Search and Big Data consulting services a...
SYS-CON Media announced today that 9 out of 10 " most read" DevOps articles are published by @DevOpsSummit Blog. Launched in October 2014, @DevOpsSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce softw...
SYS-CON Events announced today Arista Networks will exhibit at SYS-CON's DevOps Summit 2015 New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Arista Networks was founded to deliver software-driven cloud networking solutions for large data center and computing environments. Arista’s award-winning 10/40/100GbE switches redefine scalability, robustness, and price-performance, with over 3,000 customers and more than three million cloud networking ports deployed worldwide.
SYS-CON Events announced today that the DevOps Institute has been named “Association Sponsor” of SYS-CON's DevOps Summit, which will take place on June 9–11, 2015, at the Javits Center in New York City, NY. The DevOps Institute provides enterprise level training and certification. Working with thought leaders from the DevOps community, the IT Service Management field and the IT training market, the DevOps Institute is setting the standard in quality for DevOps education and training.
The world's leading Cloud event, Cloud Expo has launched Microservices Journal on the SYS-CON.com portal, featuring over 19,000 original articles, news stories, features, and blog entries. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. Microservices Journal offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Follow new article posts on Twitter at @MicroservicesE
SYS-CON Events announced today that Site24x7, the cloud infrastructure monitoring service, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Site24x7 is a cloud infrastructure monitoring service that helps monitor the uptime and performance of websites, online applications, servers, mobile websites and custom APIs. The monitoring is done from 50+ locations across the world and from various wireless carriers, thus providing a global perspective of the end-user experience. Site24x7 supports monitoring H...
Chef announced that James Casey has been appointed Vice President of Engineering. Casey has more than a decade of experience managing engineering and operations for CERN and is a three-year Chef veteran. Casey brings deep expertise in DevOps practices, as well as an innate understanding of the needs of Chef customers and the community. Casey will oversee the quality and cadence of product development for Chef's engineering and operations teams, and will report to Chef CEO Barry Crist.
SYS-CON Events announced today that SafeLogic has been named “Bag Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. SafeLogic provides security products for applications in mobile and server/appliance environments. SafeLogic’s flagship product CryptoComply is a FIPS 140-2 validated cryptographic engine designed to secure data on servers, workstations, appliances, mobile devices, and in the Cloud.
DevOps tasked with driving success in the cloud need a solution to efficiently leverage multiple clouds while avoiding cloud lock-in. Flexiant today announces the commercial availability of Flexiant Concerto. With Flexiant Concerto, DevOps have cloud freedom to automate the build, deployment and operations of applications consistently across multiple clouds. Concerto is available through four disruptive pricing models aimed to deliver multi-cloud at a price point everyone can afford.