Welcome!

@DevOpsSummit Authors: Liz McMillan, Pat Romanski, Yeshim Deniz, Elizabeth White, SmartBear Blog

Related Topics: @DevOpsSummit, Containers Expo Blog, @CloudExpo

@DevOpsSummit: Blog Post

Performance Monitor All Your Apps By @Dynatrace | @DevOpsSummit [#DevOps]

A solution for monitoring large IT infrastructures that contain several hundred components

How to Performance Monitor All Your Applications on a Single Dashboard

It's become easy to monitor applications that are deployed on hundreds of servers - thanks to the advances in application performance management tools. But the more data you collect the harder it is to visualize the health state in a way that a single dashboard tells you both the overall status as well as the problematic component.

Eugene Turetsky (Dynatrace) and Stephan Levesque (SSQ Financial Group) shared their solution for monitoring large IT infrastructures that contain several hundred components that support SSQ's most-critical applications running on a variety of technology stacks including WebLogic, Oracle Databases, Ingres Databases, and WebSphere MQs. When Stephan showed me his SSQ dashboards, I knew I had to write a blog about this.

Stephan agreed to share these details with a larger audience - eventually uploading the plugins that were designed, developed and built by Eugene Turetsky for this onto our Dynatrace GitHub Organization. Now check this out. All Dynatrace dashboards are designated to a wide audience - from high management teams to support engineering teams responsible for maintaining the health of specific components. For example, the following screenshot shows one of SSQ's dashboards: application health arranged vertically, cluster, server and component health horizontally. The names of the apps and servers are sanitized for privacy reasons:

Each dot represents the health status of a component, aggregated to a cluster or an individual server and aggregated onan  application level. If an app goes red or yellow, it's easy to spot which component is causing it

Stephan and his colleagues read this dashboard from top left to bottom right: The big red dot in the top left means that at least one of the applications is unhealthy. Spotting which apps are unhealthy is easy - just look for red. On those application rows it's easy to find the red dots that tell which component (Web Server, App Server, Message Queue, etc.) to focus his root cause analysis on.

Let's look a little deeper into how he calculates the health status of each individual component and how he aggregates the data so that you can rebuild this for your own environment in case you find this useful:

Health Status of Components
A component can be an application server, a database, a message queue or a device such as a Load Balancer. Stephan uses Dynatrace to monitor each component and has one or more metrics for each component that tells him whether it's healthy or not. Here are some examples:

  • Application: Application status is red if one or more clusters or individual un-cluster components are red. Application status is yellow (degraded) if some of an application's clustered components (i.e., nodes) are down but surviving nodes in the cluster can manage the application load. Otherwise the application status is green.
  • WebLogic: If all clustered WebLogic components are down (i.e., cluster is down) then the status of WebLogic is red. If some nodes in the cluster are down but surviving nodes can manage the application load, the status of WebLogic is yellow. Otherwise the status of WebLogic is green.
  • Database: If all clustered database components are down (i.e., cluster is down) then the status of the database is red. If some nodes in the cluster are down but surviving nodes can manage the application load, the status of the database is yellow. Otherwise the status of the database is green.
  • MQ: If all clustered MQ components are down (i.e., cluster is down) then the status of the database is red. If some nodes in the cluster are down but surviving nodes can manage the application load, the status of MQ is yellow. Otherwise the status of MQ is green.
  • Dynatrace agents: The state, or availability, of the Dynatrace agents is also monitored. If a critical agent is unavailable, an alert will be triggered and a red dot will be shown.

Whether you use Dynatrace or other APM tools - make sure you capture both system metrics, such as Availability, CPU, and Memory, but also performance relevant metrics such as Response Time and combine these metrics into your health states.

Aggregating Performance Data from Component to Server to Application
Besides monitoring the health of each component individually, the dashboard also aggregates data "upwards." Stephan calculates an overall health state per component type, e.g., overall WebLogic health in the cluster is calculated based on the states of each individual WebLogic instance. The overall Application Health is then calculated by the Applications Availability as well as the aggregated state of all supporting components. The final overall system health shows whether there is any application currently suffering an issue. The following screenshot shows how this works in a simple example.

Health States get aggregated to Health Groups which eventually end up being aggregated to the Application and the Overall System Status

For further insight, click here for the full article

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@DevOpsSummit Stories
Without lifecycle traceability and visibility across the tool chain, stakeholders from Planning-to-Ops have limited insight and answers to who, what, when, why and how across the DevOps lifecycle. This impacts the ability to deliver high quality software at the needed velocity to drive positive business outcomes. In his general session at @DevOpsSummit at 19th Cloud Expo, Eric Robertson, General Manager at CollabNet, will discuss how customers are able to achieve a level of transparency that enables everyone from Planning-to-Ops to make informed decisions based on business priority and leverage automation to accelerate identifying issues and fast fix to drive continuous feedback and KPI insight.
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smart's sake," and many brands remain in that corner. But many brands are also gradually opting for more strategic approaches. They're taking a breath and stepping back to examine both existing and potential IoT experiences, asking themselves whether their products lend real value. Once we reach this goal, the implications for personalization are staggering. Consumers will expect devices they use and i...
We all know that end users experience the internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices - not doing so will be a path to eventual business failure. The real and more difficult question, in developing microservices-based applications, is this: What's the best combination of cloud services and tools to use to get the right results in the specific business situation in which you need to deliver what your end users’ want. Considering that new streams of IoT data are already raising the stakes on what end users expect in their mo...
We all know that end users experience the internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices - not doing so will be a path to eventual business failure. The real and more difficult question, in developing microservices-based applications, is this: What's the best combination of cloud services and tools to use to get the right results in the specific business situation in which you need to deliver what your end users’ want. Considering that new streams of IoT data are already raising the stakes on what end users expect in their mo...
DXWorldEXPO LLC announced today that ICC-USA, a computer systems integrator and server manufacturing company focused on developing products and product appliances, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of computational needs for many industries. Their solutions provide benefits across many environments, such as datacenter deployment, HPC, workstations, storage networks and standalone server installations. ICC has been in business for over 23 years and their phenomenal range of clients include multinational corporations, universities, and small businesses.