Welcome!

@DevOpsSummit Authors: Yeshim Deniz, Pat Romanski, Liz McMillan, Zakia Bouachraoui, Elizabeth White

Related Topics: @DevOpsSummit, Microservices Expo, Containers Expo Blog

@DevOpsSummit: Blog Feed Post

Docker Monitoring Support By @seti321 | @DevOpsSummit #DevOps #Containers

Containers — with Docker as the leading container implementation — have changed how we deploy systems

by Stefan Thies

Containers and Docker are all the rage these days.  In fact, containers — with Docker as the leading container implementation — have changed how we deploy systems, especially those comprised of micro-services. Despite all the buzz, however, Docker and other containers are still relatively new and not yet mainstream. That being said, even early Docker adopters need a good monitoring tool, so last month we added Docker monitoring to SPM.  We built it on top of spm-agent – the extensible framework for Node.js-based agents and ended up with spm-agent-docker.

Monitoring of Docker environments is challenging. Why? Because each container typically runs  a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and application they run. These servers and the applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resources of a single host. It’s not uncommon for Docker servers to run thousands of short-term containers (e.g., for batch jobs) while a set of permanent services runs in parallel.  Traditional monitoring tools not used to such dynamic environments are not suited for such deployments. SPM, on the other hand, was built with this in mind.  Moreover, container resource sharing calls for stricter enforcement of resource usage limits, an additional issue you must watch carefully. To make appropriate adjustments for resource quotas you need good visibility into any limits containers have reached or errors they have caused. We recommend using alerts according to defined limits; this way you can adjust limits or resource usage even before errors start happening.

How do we get a detailed metrics of each container?
Docker provides a remote interface for container stats (by default exposed via UNIX domain socket). The SPM agent for Docker uses this interface to collect Docker metrics.

SPM for Docker

SPM Docker Agent monitoring other containers, itself running in a Docker container

How to deploy monitoring for Docker
There are several ways one can run a Docker monitor, including:

  1. run it directly on the host machine (“Server” in the figure above)
  2. run one agent for multiple servers
  3. run agent in a container (along containers it monitors) on each server

SPM uses approach 3), aka the “Docker Way”. Thus, SPM for Docker is provided as a Docker Image. This makes the installation easy, requires no installation of dependencies on the host machine compared to approach 1), and it requires no configuration of a server list to support multiple Docker servers.

How to install SPM for Docker
It’s very simple: Create an SPM App of type “Docker” to get the SPM application token (for $TOKEN, see below), and then run:

  1. docker pull sematext/spm-agent-docker and
  2. docker run -d  -v /var/run/docker.sock:/var/run/docker.sock -e SPM_TOKEN=$TOKEN -e HOSTNAME:$HOSTNAME sematext/spm-agent-docker

You’ll see your Docker metrics in SPM after about a minute.

SPM for Docker – Features
If you already know SPM then you’re aware that each SPM integration supports all SPM features.  If, however, you are new to SPM, this summary will help:

  1. Out-of-the-box Dashboards and unlimited custom Dashboards
  2. Multi-user support with role-based access control, application and account sharing
  3. Threshold-based Alerts on all metrics mentioned above including Custom Metrics
  4. Machine learning-based Anomaly Detection on all metrics, including Custom Metrics
  5. Alerting via email, PagerDuty, Nagios and Webhooks  (e.g. Slack, HipChat)
  6. Email subscriptions for scheduled Performance Reports
  7. Secure sharing of graphs and reports with your team, or with the public
  8. Correlation with logs shipped to Logsene
  9. Charting and correlation with arbitrary Events

Let’s continue with the Docker-specific part:

  1. Easy to install docker agent
  2. Monitoring of multiple Docker Hosts and unlimited number of Containers per ‘SPM Docker App’
  3. Predefined Dashboards for all Host and Container metrics
  • OS Metrics of the Docker Host
  • Detailed Container Metrics
    • CPU
    • Memory
    • Network
    • I/O Metrics
  • Limits of Resource Usage
    • CPU throttled times
    • Memory limits
  • Fail counters (e.g., for memory allocation and network packets)
  • Filter and aggregations by Hosts, Images, Container IDs, and Tags

docker-overview-2

SPM for Docker – Predefined Dashboard ‘Overview’

Containerized applications typically communicate with other applications via the exposed network ports; that’s why network metrics are definitely on the hot list of metrics to watch for Docker and a reason to provide such detailed Reports in SPM:

Docker-Network-Metrics

Did you enjoy this little excursion on Docker monitoring? Then it’s time to practice it!

We appreciate feedback of early adopters, so please feel free to drop us a line, DM us on Twitter @sematext or chat with us using the web chat in SPM or on our homepage — we are here to get your monitoring up and running.  If you are a startup, get in touch – we offer discounts for startups!

Filed under: Monitoring Tagged: Container, devops, docker, performance monitoring, spm

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

@DevOpsSummit Stories
Hackers took three days to identify and exploit a known vulnerability in Equifax’s web applications. I will share new data that reveals why three days (at most) is the new normal for DevSecOps teams to move new business /security requirements from design into production. This session aims to enlighten DevOps teams, security and development professionals by sharing results from the 4th annual State of the Software Supply Chain Report -- a blend of public and proprietary data with expert research and analysis.Attendees can join this session to better understand how DevSecOps teams are applying lessons from W. Edwards Deming (circa 1982), Malcolm Goldrath (circa 1984) and Gene Kim (circa 2013) to improve their ability to respond to new business requirements and cyber risks.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, software-defined solution with rich machine intelligence.
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like "How is my application doing" but no idea how to get a proper answer.
This session will provide an introduction to Cloud driven quality and transformation and highlight the key features that comprise it. A perspective on the cloud transformation lifecycle, transformation levers, and transformation framework will be shared. At Cognizant, we have developed a transformation strategy to enable the migration of business critical workloads to cloud environments. The strategy encompasses a set of transformation levers across the cloud transformation lifecycle to enhance process quality, compliance with organizational policies and implementation of information security and data privacy best practices. These transformation levers cover core areas such as Cloud Assessment, Governance, Assurance, Security and Performance Management. The transformation framework presented during this session will guide corporate clients in the implementation of a successful cloud solu...
So the dumpster is on fire. Again. The site's down. Your boss's face is an ever-deepening purple. And you begin debating whether you should join the #incident channel or call an ambulance to deal with his impending stroke. Yes, we know this is a developer's fault. There's plenty of time for blame later. Postmortems have a macabre name because they were once intended to be Viking-like funerals for someone's job. But we're civilized now. Sort of. So we call them post-incident reviews. Fires are never going to stop. We're human. We miss bugs. Or we fat finger a command - deleting dozens of servers and bringing down S3 in US-EAST-1 for hours - effectively halting the internet. These things happen.