Welcome!

@DevOpsSummit Authors: Elizabeth White, Sematext Blog, Yeshim Deniz, Derek Weeks, Liz McMillan

Related Topics: @DevOpsSummit, Java IoT, Microservices Expo, Linux Containers, @CloudExpo, @BigDataExpo

@DevOpsSummit: Blog Post

Software Quality Metrics for Your Continuous Delivery Pipeline | Part 3

It is safe to say that insightful logging and performance are two opposite goals

Let me ask you a question: would you say that you have implemented logging correctly for your application? Correct in the sense that it will provide you with all the insights you require to keep your business going once your users are struck by errors? And in a way that does not adversely impact your application performance? Honestly, I bet you have not. Today I will explain why you should turn off logging completely in production because of its limitations:

  • Relies on Developers
  • Lacks Context
  • Impacts Performance

Intrigued? Bear with me and I will show you how you can still establish and maintain a healthy and useful logging strategy for your deployment pipeline, from development to production, guided by metrics.

What Logging Can Do for You
Developers, including myself, often write log messages because they are lazy. Why should I set a breakpoint and fire up a debugger if it is so much more convenient to dump something to my console via a simple println()? This simple yet effective mechanism also works on headless machines where no IDE is installed, such as staging or production environments:

System.out.println("Been here, done that.");

Careful coders would use a logger to prevent random debug messages from appearing in production logs and additionally use guard statements to prevent unnecessary parameter construction:

if (logger.isDebugEnabled()) {
logger.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));
}

Anyways, the point about logging is that that traces of log messages allow developers to better understand what their program is doing in execution. Does my program take this branch or that branch? Which statements were executed before that exception was thrown? I have done this at least a million of times, and most likely so have you:

if (condition) {
logger.debug("7: yeah!")
} else {
logger.debug("8: DAMN!!!")
}

Test Automation Engineers, usually developers by trade, equally use logging to better understand how the code under test complies with their test scenarios:

class AgentSpec extends spock.lang.Specification {

def "Agent.compute()"() {
def agent = AgentPool.getAgent()

when:
def result = agent.compute(TestFixtures.getData())

then:
logger.debug("result: "  + result);
result == expected
}
}

Logging is, undoubtedly, a helpful tool during development and I would argue that developers should use it as freely as possible if it helps them to understand and troubleshoot their code.

In production, application logging is useful for tracking certain events, such as the occurrence of a particular exception, but it usually fails to deliver what it is so often mistakenly used for: as a mechanism for analyzing application failures in production. Why?

Because approaches to achieving this goal with logging are naturally brittle: their usefulness depends heavily on developers, messages are without context, and if not designed carefully, logging may severely slow down your application.

Secretly, what you are really hoping to get from your application logs, in the one or the other form, is something like this:

A logging strategy that delivers out-of-the-box using dynaTrace: user context, all relevant data in place, zero config

The Limits of Logging
Logging Relies on Developers
Let's face it: logging is, inherently, a developer-centric mechanism. The usefulness of your application logs stands and falls with your developers. A best practice for logging in production says: "don't log too much" (see Optimal Logging @ Google testing blog). This sounds sensible, but what does this actually mean? If we recall the basic motivation behind logging in production, we could equally rephrase this as "log just enough information you need to know about a failure that enables you to take adequate actions". So, what would it take your developers to provide such actionable insights? Developers would need to correctly anticipate where in the code errors would occur in production. They would also need to collect any relevant bits of information along an execution path that bear these insights and, last but not least, present them in a meaningful way so that others can understand, too. Developers are, no doubt, a critical factor to the practicality of your application logs.

Logging Lacks Context
Logging during development is so helpful because developers and testers usually examine smaller, co-located units of code that are executed in a single thread. It is fairly easy to maintain an overview under such simulated conditions, such as a test scenario:

13:49:59 INFO com.company.product.users.UserManager - Registered user ‘foo'.
13:49:59 INFO com.company.product.users.UserManager - User ‘foo' has logged in.
13:49:59 INFO com.company.product.users.UserManager - User ‘foo' has logged out.

But how can you reliably identify an entire failing user transaction in a real-life scenario, that is, in a heavily multi-threaded environment with multiple tiers that serve piles of distributed log files? I say, hardly at all. Sure, you can go mine for certain isolated events, but you cannot easily extract causal relationships from an incoherent, distributed set of log messages:

13:49:59 INFO com.company.product.users.UserManager - User ‘foo' has logged in.
13:49:59 INFO com.company.product.users.UserManager - User ‘bar' has logged in.
...
13:49:60 SEVERE org.hibernate.exception.JDBCConnectionException: could not execute query
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:99)
...

After all, the ability to identify such contexts is key to deciding why a particular user action failed.

Logging Impacts Performance
What is a thorough logging strategy worth if your users cannot use your application because it is terribly slow? In case you did not know, logging, especially during peak load times, may severely slow down your application. Let's have a quick look at some of the reasons:

Writing log messages from the application's memory to persistent storage, usually to the file system, demands substantial I/O (see Top Performance Mistakes when moving from Test to Production: Excessive Logging). Traditional logger implementations wrote files by issuing synchronous I/O requests, which put the calling thread into a wait state until the log message was fully written to disk.

In some cases, the logger itself may cause a decent bottle-neck: in the Log4j library (up to version 1.2), every single log activity results in a call to an internal template method Appender.doAppend() that is synchronized for thread-safety (see Multithreading issues - doAppend is synchronised?). The practical implication of this is that threads, which log to the same Appender, for example a FileAppender, must queue up with any other threads writing logs. Consequently, the application spends valuable time waiting in synchronization instead of doing whatever the app was actually designed to do. This will hurt performance, especially in heavily multi-thread environments like web application servers.

These performance effects can be vastly amplified when exception logging comes into play: exception data, such as error message, stack trace and any other piggy-backed exceptions ("initial cause exceptions") greatly increase the amount of data that needs to be logged. Additionally, once a system is in a faulty state, the same exceptions tend to appear over and over again, further hurting application performance. We had once monitored a 30% drawdown on CPU resources due to more than 180,000 exceptions being thrown in only 5 minutes on one of our application servers (see Performance Impact of Exceptions: Why Ops, Test and Dev need to care). If we had written these exceptions to the file system, they would have trashed I/O, filled up our disk space in no time and had considerably increased our response times.

Subsequently, it is safe to say that insightful logging and performance are two opposite goals: if you want the one, then you have to make a compromise on the other.

For more logging tips click here for the full article.

More Stories By Martin Etmajer

Leveraging his outstanding technical skills as a lead software engineer, Martin Etmajer has been a key contributor to a number of large-scale systems across a range of industries. He is as passionate about great software as he is about applying Lean Startup principles to the development of products that customers love.

Martin is a life-long learner who frequently speaks at international conferences and meet-ups. When not spending time with family, he enjoys swimming and Yoga. He holds a master's degree in Computer Engineering from the Vienna University of Technology, Austria, with a focus on dependable distributed real-time systems.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@DevOpsSummit Stories
"We host and fully manage cloud data services, whether we store, the data, move the data, or run analytics on the data," stated Kamal Shannak, Senior Development Manager, Cloud Data Services, IBM, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Interoute has announced the integration of its Global Cloud Infrastructure platform with Rancher Labs’ container management platform, Rancher. This approach enables enterprises to accelerate their digital transformation and infrastructure investments. Matthew Finnie, Interoute CTO commented “Enterprises developing and building apps in the cloud and those on a path to Digital Transformation need Digital ICT Infrastructure that allows them to build, test and deploy faster than ever before. The integration of Rancher software with Interoute Digital Platform gives developers access to a managed container platform that sits on a global privately networked cloud, enabling true distributed computing.”
Whether you like it or not, DevOps is on track for a remarkable alliance with security. The SEC didn’t approve the merger. And your boss hasn’t heard anything about it. Yet, this unruly triumvirate will soon dominate and deliver DevSecOps faster, cheaper, better, and on an unprecedented scale. In his session at DevOps Summit, Frank Bunger, VP of Customer Success at ScriptRock, discussed how this cathartic moment will propel the DevOps movement from such stuff as dreams are made on to a practical, powerful, and insanely valuable asset to enterprises. You may call it DevSecOps, or SecDevOps, or maybe even DevOpsSec. Choose your own adventure.
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, will provide a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with Tintri's web services architecture and APIs. Impress your DevOps team with smart and autonomous infrastructure.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to change lives by teaching Linux and cloud technology to the tens of thousands of students that learn at the Linux Academy.
"delaPlex is a software development company. We do team-based outsourcing development," explained Mark Rivers, COO and Co-founder of delaPlex Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the application economy.
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in San Francisco and Tel Aviv, Loom Systems works with customers across industries around the world.
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverless applications, such as the AWS Serverless Application Model (AWS SAM), Chalice, and ClaudiaJS.
SYS-CON Events announced today that HTBase will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. HTBase (Gartner 2016 Cool Vendor) delivers a Composable IT infrastructure solution architected for agility and increased efficiency. It turns compute, storage, and fabric into fluid pools of resources that are easily composed and re-composed to meet each application’s needs. With HTBase, companies can quickly provision resources and deploy unique, mission-critical, self-designed solutions to add-onto or create any type of infrastructure as per the business requirement. HTBase is the first company to enable a true multi-cloud strategy, enabling organizations to automate movement of data and workloads between private and public clouds. This means that organizations can now move data and workloads between pub...
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on quality and value.
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Get certified, manage the full lifecycle of your cloud-based resources, and build your knowledge based using Cloud Academy’s expert-created content, comprehensive Learning Paths, and innovative Hands-on Labs.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buyers their thoughts on their experience.
SYS-CON Events announced today that Infranics will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Since 2000, Infranics has developed SysMaster Suite, which is required for the stable and efficient management of ICT infrastructure. The ICT management solution developed and provided by Infranics continues to add intelligence to the ICT infrastructure through the IMC (Infra Management Cycle) based on mathematical analysis and forecasting Big Data Analyze and Control.
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 additional third-party data centers across Europe. Its full-service Unified ICT platform serves international enterprises and many of the world’s leading service providers, as well as governments and universities.
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a single platform to drive unprecedented simplicity in the data center. Customers can start with a base infrastructure and scale to multi-site and multi-geo infrastructures with predictable economics and performance.
SYS-CON Events announced today that SD Times | BZ Media has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. BZ Media LLC is a high-tech media company that produces technical conferences and expositions, and publishes a magazine, newsletters and websites in the software development, SharePoint, mobile development and commercial UAV markets.
Building custom add-ons does not need to be limited to the ideas you see on a marketplace. In his session at 20th Cloud Expo, Sukhbir Dhillon, CEO and founder of Addteq, will go over some adventures they faced in developing integrations using Atlassian SDK and other technologies/platforms and how it has enabled development teams to experiment with newer paradigms like Serverless and newer features of Atlassian SDKs. In this presentation, you will be taken on a journey of Add-On and Integration development using popular tools.
Microservices are a very exciting architectural approach that many organizations are looking to as a way to accelerate innovation. Microservices promise to allow teams to move away from monolithic "ball of mud" systems, but the reality is that, in the vast majority of organizations, different projects and technologies will continue to be developed at different speeds. How to handle the dependencies between these disparate systems with different iteration cycles? Consider the "canoncial problem" in this scenario: microservice A (releases daily) depends on a couple of additions to backend B (releases quarterly).
After more than five years of DevOps, definitions are evolving, boundaries are expanding, ‘unicorns’ are no longer rare, enterprises are on board, and pundits are moving on. Can we now look at an evolution of DevOps? Should we? Is the foundation of DevOps ‘done’, or is there still too much left to do? What is mature, and what is still missing? What does the next 5 years of DevOps look like? In this Power Panel at DevOps Summit, moderated by DevOps Summit Conference Chair Andi Mann, panelists looked back at what DevOps has become, and forward at what it might create next.