Welcome!

@DevOpsSummit Authors: Elizabeth White, Zakia Bouachraoui, Liz McMillan, Pat Romanski, Roger Strukhoff

Related Topics: @DevOpsSummit, Microservices Expo, @CloudExpo, @DXWorldExpo

@DevOpsSummit: Blog Feed Post

Why We Switched to Cassandra By @DanJones914 | @DevOpsSummit [#DevOps]

Cassandra was the best option to help deliver on these extreme high availability and reliability requirements

Why We Switched to Cassandra

By Dan Jones

Due to the nature of our business, high availability is extremely important to VictorOps and something we take very seriously. We know our customers rely on our service to be always up so that we can process and deliver their alerts and notifications. One of the key components that is critical to the functioning and availability of any SaaS service is the datastore.

At VictorOps we have historically used MySQL in high availability Percona Xtradb Clusters for operational and analytical uses. While MySQL is a mature and reliable relational database and has performed well, we had planned from early on to move to a more horizontally scalable datastore in order to meet our scalability and high availability requirements (including multi-datacenter failover capabilities).

Last fall we began to evaluate datastore alternatives that could help improve scalability, both relational and NoSQL, before deciding to use Cassandra. After evaluating these options we decided that Cassandra was the best option to help deliver on these extreme high availability and reliability requirements.

apache-cassandra

Some of Cassandra’s strengths that influenced this decision include:

- High Availability – Cassandra is a distributed database where all nodes are equivalent (i.e. there is no master node so clients can connect to any available node). Data is replicated at a configurable number of nodes, so that failure of some number of nodes (depending on the replication factor) will not result in loss of data. From the CAP theorem perspective (Consistency, Availability, Partition tolerance), Cassandra’s design provides tunable consistency at the read/write request level, which allows you to increase availability at the expense of consistency where it makes sense.

- Scalability – Cassandra has been shown to be linearly scalable. Since each node adds processing power as well as data capacity, it is possible to scale incrementally to very large data volumes and high throughputs by simply adding new nodes to the cluster.

- “Self-healing” – Cassandra’s eventually consistent data model and node repair features ensure that the consistency of the cluster will be automatically maintained over time. This also makes it very easy to recover failed nodes, increase or decrease the size of the cluster as needed, and even do in place version upgrades (in most cases).

- Multi-datacenter replication – Cassandra’s node replication and eventual consistency features are core to the functioning of this distributed system. These features were designed from the outset and have been improved and battle tested throughout its lifetime and are now considered highly reliable. These features were therefore easily extended to clusters that contain nodes in different geographical locations, and due to the eventual consistency model this includes support for true Active-Active clusters. In fact, Cassandra has the reputation of having the most robust, reliable multi-datacenter replication of any datastore in the industry. This is an important part of our multi-datacenter failover capability at VictorOps and was one of the major factors in the decision to go with Cassandra.

- Large community – Cassandra is an Apache project with a very large, active community including influential companies like Netflix. In addition DataStax continues to drive development and continual improvements of the Cassandra core as well as operational components (they also provide support subscriptions).

5012504924_88ed505a04_z

While Cassandra has many advantages including those described above, it is very different than most other datastores. Cassandra is not a relational database and while the interface to retrieve data (CQL) is very similar to SQL, the underlying data storage and access model is very different. As a result, the performance and operational characteristics of Cassandra are very dependent on the application data model. Therefore, it is important to understand how data is accessed and to design the data model so that it will perform well on the common queries that the application uses.

One data model on which Cassandra performs particularly well is log structured (or time series) data. In this type of model, the data represents a series of measurements or events that happen over time, rather than a set of updates to existing data items. Cassandra allows storing these “immutable” events contiguously on disk ordered by a clustering key (which is often insertion time). It is therefore very efficient to return the set of items based on this clustering key, using serial rather than random disk I/O.

There are many parts of VictorOp’s data model that naturally map to this log structured approach. For example, an incident’s lifecycle is comprised of a set of events that cause the state of the incident to change (e.g. a Critical alert, Creation of an Incident, a Paging escalation, an Acknowledgement, a Recovery, etc). VictorOps surfaces this in the notion of the main Timeline as well as an Incident Timeline.

Obviously the choice of a datastore is an important decision that has a major affect on the scalability, reliability, availability, maintainability and extensibility of a SaaS service. While Cassandra requires more awareness of the underlying data access patterns and the operational characteristics when designing the system, we feel that the benefits it provides in terms of availability, linear scalability and seamless, reliable multi-datacenter replication are a great fit for our business requirements, and will scale to meet our needs in the future.

The post Why we switched to Cassandra appeared first on VictorOps.

More Stories By VictorOps Blog

VictorOps is making on-call suck less with the only collaborative alert management platform on the market.

With easy on-call scheduling management, a real-time incident timeline that gives you contextual relevance around your alerts and powerful reporting features that make post-mortems more effective, VictorOps helps your IT/DevOps team solve problems faster.

@DevOpsSummit Stories
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected path for IoT innovators to scale globally, and the smartest path to cross-device synergy in an instrumented, connected world.
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
ScaleMP is presenting at CloudEXPO 2019, held June 24-26 in Santa Clara, and we’d love to see you there. At the conference, we’ll demonstrate how ScaleMP is solving one of the most vexing challenges for cloud — memory cost and limit of scale — and how our innovative vSMP MemoryONE solution provides affordable larger server memory for the private and public cloud. Please visit us at Booth No. 519 to connect with our experts and learn more about vSMP MemoryONE and how it is already serving some of the world’s largest data centers. Click here to schedule a meeting with our experts and executives.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the modern business digitalization solutions. Achieve up to 50% early-stage technological process development cost cutdown with science and R&D-driven investment strategy with Codete's support.
As you know, enterprise IT conversation over the past year have often centered upon the open-source Kubernetes container orchestration system. In fact, Kubernetes has emerged as the key technology -- and even primary platform -- of cloud migrations for a wide variety of organizations. Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility. As they do so, IT professionals are also embracing the reality of Serverless architectures, which are critical to developing and operating real-time applications and services. Serverless is particularly important as enterprises of all sizes develop and deploy Internet of Things (IoT) initiatives.