Welcome!

@DevOpsSummit Authors: Liz McMillan, Mehdi Daoudi, Elizabeth White, Jason Bloomberg, Pat Romanski

Related Topics: @DevOpsSummit, Microservices Expo, @CloudExpo, @DXWorldExpo

@DevOpsSummit: Blog Feed Post

Why We Switched to Cassandra By @DanJones914 | @DevOpsSummit [#DevOps]

Cassandra was the best option to help deliver on these extreme high availability and reliability requirements

Why We Switched to Cassandra

By Dan Jones

Due to the nature of our business, high availability is extremely important to VictorOps and something we take very seriously. We know our customers rely on our service to be always up so that we can process and deliver their alerts and notifications. One of the key components that is critical to the functioning and availability of any SaaS service is the datastore.

At VictorOps we have historically used MySQL in high availability Percona Xtradb Clusters for operational and analytical uses. While MySQL is a mature and reliable relational database and has performed well, we had planned from early on to move to a more horizontally scalable datastore in order to meet our scalability and high availability requirements (including multi-datacenter failover capabilities).

Last fall we began to evaluate datastore alternatives that could help improve scalability, both relational and NoSQL, before deciding to use Cassandra. After evaluating these options we decided that Cassandra was the best option to help deliver on these extreme high availability and reliability requirements.

apache-cassandra

Some of Cassandra’s strengths that influenced this decision include:

- High Availability – Cassandra is a distributed database where all nodes are equivalent (i.e. there is no master node so clients can connect to any available node). Data is replicated at a configurable number of nodes, so that failure of some number of nodes (depending on the replication factor) will not result in loss of data. From the CAP theorem perspective (Consistency, Availability, Partition tolerance), Cassandra’s design provides tunable consistency at the read/write request level, which allows you to increase availability at the expense of consistency where it makes sense.

- Scalability – Cassandra has been shown to be linearly scalable. Since each node adds processing power as well as data capacity, it is possible to scale incrementally to very large data volumes and high throughputs by simply adding new nodes to the cluster.

- “Self-healing” – Cassandra’s eventually consistent data model and node repair features ensure that the consistency of the cluster will be automatically maintained over time. This also makes it very easy to recover failed nodes, increase or decrease the size of the cluster as needed, and even do in place version upgrades (in most cases).

- Multi-datacenter replication – Cassandra’s node replication and eventual consistency features are core to the functioning of this distributed system. These features were designed from the outset and have been improved and battle tested throughout its lifetime and are now considered highly reliable. These features were therefore easily extended to clusters that contain nodes in different geographical locations, and due to the eventual consistency model this includes support for true Active-Active clusters. In fact, Cassandra has the reputation of having the most robust, reliable multi-datacenter replication of any datastore in the industry. This is an important part of our multi-datacenter failover capability at VictorOps and was one of the major factors in the decision to go with Cassandra.

- Large community – Cassandra is an Apache project with a very large, active community including influential companies like Netflix. In addition DataStax continues to drive development and continual improvements of the Cassandra core as well as operational components (they also provide support subscriptions).

5012504924_88ed505a04_z

While Cassandra has many advantages including those described above, it is very different than most other datastores. Cassandra is not a relational database and while the interface to retrieve data (CQL) is very similar to SQL, the underlying data storage and access model is very different. As a result, the performance and operational characteristics of Cassandra are very dependent on the application data model. Therefore, it is important to understand how data is accessed and to design the data model so that it will perform well on the common queries that the application uses.

One data model on which Cassandra performs particularly well is log structured (or time series) data. In this type of model, the data represents a series of measurements or events that happen over time, rather than a set of updates to existing data items. Cassandra allows storing these “immutable” events contiguously on disk ordered by a clustering key (which is often insertion time). It is therefore very efficient to return the set of items based on this clustering key, using serial rather than random disk I/O.

There are many parts of VictorOp’s data model that naturally map to this log structured approach. For example, an incident’s lifecycle is comprised of a set of events that cause the state of the incident to change (e.g. a Critical alert, Creation of an Incident, a Paging escalation, an Acknowledgement, a Recovery, etc). VictorOps surfaces this in the notion of the main Timeline as well as an Incident Timeline.

Obviously the choice of a datastore is an important decision that has a major affect on the scalability, reliability, availability, maintainability and extensibility of a SaaS service. While Cassandra requires more awareness of the underlying data access patterns and the operational characteristics when designing the system, we feel that the benefits it provides in terms of availability, linear scalability and seamless, reliable multi-datacenter replication are a great fit for our business requirements, and will scale to meet our needs in the future.

The post Why we switched to Cassandra appeared first on VictorOps.

More Stories By VictorOps Blog

VictorOps is making on-call suck less with the only collaborative alert management platform on the market.

With easy on-call scheduling management, a real-time incident timeline that gives you contextual relevance around your alerts and powerful reporting features that make post-mortems more effective, VictorOps helps your IT/DevOps team solve problems faster.

@DevOpsSummit Stories
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin, will discuss the main security considerations enterprises face when rolling out SDDCs and how they can harness key functionality of a virtual environment to achieve more granular security controls across hybrid environments.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereum.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lavi, a Nutanix DevOps Solution Architect, explored the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in multiple vertical markets. Our delegate profiles can be located in our show prospectus.
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.