Welcome!

@DevOpsSummit Authors: Liz McMillan, Yeshim Deniz, Zakia Bouachraoui, Pat Romanski, Elizabeth White

Related Topics: @DevOpsSummit, Microservices Expo, Microsoft Cloud, Containers Expo Blog, @CloudExpo

@DevOpsSummit: Blog Post

Cache Tagging with Redis By @Stackify | @DevOpsSummit [#DevOps]

The process of using the tag is to get all members of the tag's SET and take action on each member

Implementing Cache Tagging with Redis on Azure based Applications

Redis has quickly become one of the top open source key value stores around. We at Stackify have been using Windows Azure's Managed Cache but have had a long history of intermittent problems with it. So we decided it was time to bite the bullet and give Redis a try, which Azure now supports and actually recommends it over their previous Managed Cache. We had one big problem though. Several important parts of our system relied on cache tags and Redis has no out-of-the-box support for tags.

Summary (TL;DR)

We use a Lua script to create a Redis "SET" for each tag that contains all the keys associated with the tag. We have some logic to ensure expiration of the tags corresponds properly with the keys and we have some cleanup code to eliminate old keys created for tags no longer needed. See Sample project on GitHub based on ServiceStack and C#.

How we use tags

We use tags for a few different purposes but the most important is to track which users are currently logged in to our system and what are they currently looking at the UI. Stackify collects monitoring metrics data as it flows into our system. We use SignalR to update the UI in real-time to show the latest monitoring data and alert status. It would be a huge waste of resources to send every result over SignalR to the UI if no one was there to see it.

For every user logged in we store  what they are viewing in cache sort of like this:

Cache key: Client1-User1-UniquePageID
Cache value: true  (not really used)

Related Tags:

  • ServerID1
  • ServerID2
  • ServerID3

When results come in we look for all cache keys that have a specific tag "ServerID1", for example, then we know by the cache keys that had that tag, what users are logged in and if we should send SignalR data via the UI.

How did we implement tagging

We chose to implement tagging using Redis' SET datatype. Redis SETs are unordered and are a true set - meaning duplicate values are not allowed. Their rough analog in the .NET framework is HashSet<T>. To achieve tagging functionality in Redis, we represent each tag as a SET whose values are the keys to the cache entries that have been associated with that tag. From our example it looks something like this:

Key: tag:ServerID1

Value: Redis "SET"

  • Client1-User1-UniquePageID
  • Client1-User2-UniquePageID
  • Client1-User3-UniquePageID

The process of using the tag is then to get all members of the tag's SET and take action (MGET, DEL, EXISTS, etc.) on each member accordingly. In order to keep our operations atomic, we've taken advantage of Redis' built-in scripting capabilities. Redis guarantees that execution of Lua scripts will happen atomically (just as any other single command does), so we don't have to worry about leaving our key-tag associations in an indeterminate state.

Once we implement our tagging Lua script in our cache library, adding cache items with tags is simple. Here's an example below of the simple C# call to PUT a key-value pair with the tags "red" and "blue" associated to it.

Note: We are utilizing the ServiceStack library for connecting to Redis and our sample project on GitHub  also utilizes the ServiceStack with some additional methods we have made around tags. The Lua scripts in our sample project could be used via any programming language.

Lua script to add the tags. Need to learn more about Redis and Lua? Check out this nice beginners guide: Lua: A Guide for Redis Users

Sample of getting related keys by tag from our sample project:

GetKeyByAnyTag is an extension method that we made to ServiceStack that does the following. You could extend this to not only return keys but also return the objects themselves from cache. As part of that you may want to ensure that those keys still exist or TTL has not expired on them. We have done several variations in our own libraries but we opted to keep this sample project simple.

Cleaning up the tag lists

One thing we do have to worry about in this scenario is tag-set growth. While Redis will happily persist a cache entry which never expires, for our purposes using it as a cache, almost everything we persist to Redis is given an explicit TTL. This presents a problem in our chosen tagging strategy, because Redis only allows the expiration of entire entries - expiration cannot be set to a single value within any of Redis' collection data types. This means our tag SETs will eventually contain values that point to keys that have expired. For very active tags with an unbounded set of associated keys, this can result in a large amount of invalid data (up to the point of consuming all allocated memory), and time wasted attempting to operate on cache entries that don't exist. It's a gotcha type of problem without a highly elegant solution -  you just have to choose a least-ugly solution and make it work. For us, that turned out to be a tag-cleanup Lua script which is triggered probabilistically by other tag-related cache operations:

The cleanup is accomplished by iterating through every key in Redis that is a tag and evaluating if it is still used. You will see advice elsewhere suggesting that Redis' KEYS operation should never be run in production. Our experience so far suggests that its performance is acceptable for this type of an operation that is only being called a few times per day.

Tag Naming Consistency

It is important that within your caching framework around Redis that you use a consistent format for your tags. For example, our developers may use "blue" and "red" as a tag but we actually prefix it with a "tag:" so it becomes "tag:blue" within Redis. This is important so we know which Redis keys are actually tags and not normal keys without the developers having to think about the formatting of the tag names. This is also important for cleaning up the tags no longer be used. You may also need to prefix your tags (and keys) with a customer number or some other identifier if you have a multi-tenant system.

So..how well does it work?

We have been using this new implementation for over a couple months now and so far it has worked very well for us. We are doing thousands of Redis queries per minute looking for "tags" the way we implemented it and seeing 3-4ms response times for our queries which has performed well for us thus far.

More Stories By Stackify Blog

Stackify offers the only developers-friendly solution that fully integrates error and log management with application performance monitoring and management. Allowing you to easily isolate issues, identify what needs to be fixed quicker and focus your efforts – Support less, Code more. Stackify provides software developers, operations and support managers with an innovative cloud based solution that gives them DevOps insight and allows them to monitor, detect and resolve application issues before they affect the business to ensure a better end user experience. Start your free trial now stackify.com

@DevOpsSummit Stories
With more than 30 Kubernetes solutions in the marketplace, it's tempting to think Kubernetes and the vendor ecosystem has solved the problem of operationalizing containers at scale or of automatically managing the elasticity of the underlying infrastructure that these solutions need to be truly scalable. Far from it. There are at least six major pain points that companies experience when they try to deploy and run Kubernetes in their complex environments. In this presentation, the speaker will detail these pain points and explain how cloud can address them.
While DevOps most critically and famously fosters collaboration, communication, and integration through cultural change, culture is more of an output than an input. In order to actively drive cultural evolution, organizations must make substantial organizational and process changes, and adopt new technologies, to encourage a DevOps culture. Moderated by Andi Mann, panelists discussed how to balance these three pillars of DevOps, where to focus attention (and resources), where organizations might slip up with the wrong focus, how to manage change and risk in all three areas, what is possible and what is not, where to start, and especially how new structures, processes, and technologies can help drive a new DevOps culture.
When building large, cloud-based applications that operate at a high scale, it's important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. "Fly two mistakes high" is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regulatory scrutiny and increasing consumer lack of trust in technology in general.