Welcome!

@DevOpsSummit Authors: Pat Romanski, Zakia Bouachraoui, Liz McMillan, Elizabeth White, Yeshim Deniz

Related Topics: @DevOpsSummit, Java IoT, @DXWorldExpo

@DevOpsSummit: Article

Multi-Threaded Apps By @GrabnerAndi | @DevOpsSummit [#DevOps]

Why apps show high response time contribution to web requests coming from worker threads

How to Analyze Problems in Multi-Threaded Applications

As part of my Share Your PurePath and Performance Clinic initiatives I get to see lots of interesting problems out there. This time I picked two examples that just came in this week from Balasz and Daniel. Both wanted my opinion on why their apps show high response time contribution to their web requests coming from worker threads that seem to be either in I/O or in a Wait state. The question was what are these threads waiting for and whether this is something that could be optimized to speed up these slow response times they see on some of their critical web requests.

For both apps it turned out that the developers chose to "offload" work items to a pool of worker threads. This is of course a very valid design pattern. The way it was implemented though didn't fully leverage the advantage that multi-threading can give you. We identified two patterns that I will now describe in more detail in the hope that your multi-threaded applications are not suffering from these performance anti-patterns:

  1. Sequential instead of parallel execution of background threads
  2. Many parallel background threads using "non-shareable" resources

I also want to take the opportunity to give you some tips on how to "read" Dynatrace PurePaths. Over the years I've developed my own technique and I hope you find my approach worthwhile to test on your own data.

Pattern #1: Sequential Execution of Threads and a bit more ...
I already tweeted last week. I created the following slide for Balasz to present to his development team including tips on what I look for when analyzing a PurePath:

PurePath showing that the main web request thread executes 3 RMI calls followed by two background threads where the second one actually starts when the first is finished - that's sequential execution

The many callouts I put on this slide might be a bit overwhelming. Let me explain the highlights.

The overall request response time as measured on the web server is 9.896s. The contributors to this response time are pointed out in the callouts that I numbered. Here is short version of it:

  1. The main service request thread on the Java servlet container takes 3203s where 1.770s is almost entirely spent in I/O.
  2. The main service request thread makes 3 RMI calls. The "Elapsed Time" (=Timestamp on when that method was called in relation to the start of the request) shows us that the first one executes after 366ms followed by the second and third in sequential order
  3. After the third RMI call - at exactly 1.800s Elapsed Time the first background thread is started. This thread takes 1.429s to execute.
  4. The second background worker thread was supposed to run in parallel - at least based on what Balasz told me. The Elapsed Time column however shows that it was executed AFTER the first background thread was done. This was not intended that way. It also explains the 1.770s that show up as I/O time on the main service thread. It is mainly the time that the main thread waited until the first background thread was done just to kick off the second background thread.
  5. The second background thread was now really executed in parallel letting the main service request thread finish. This second background thread generates and writes the generated HTML using the HttpServletResponse Output Stream. This also explains the long waiting time on the Web Server as it was waiting 6.690s for that asynchronous background thread to write data to the output stream.

Lesson Learned: Verify if your threads execute as intended
Looking at the actual time stamp (=Elapsed Time) tells you when your threads really get started and whether they execute in parallel or not. It's a great way to verify your actual implementation and lets you learn which component is waiting for which other component.

For more patterns and lessons learned, click here for the full article.

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@DevOpsSummit Stories
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, software-defined solution with rich machine intelligence.
When building large, cloud-based applications that operate at a high scale, it’s important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. “Fly two mistakes high” is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, will discuss how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
"DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited to help the great team at @DevOpsSUMMIT and CloudEXPO tell the world how they can leverage this emerging disruptive trend."
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve full cloud literacy in the enterprise world.
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO Silicon Valley 2019 will cover all of these tools, with the most comprehensive program and with 222 rockstar speakers throughout our industry presenting 22 Keynotes and General Sessions, 250 Breakout Sessions along 10 Tracks, as well as our signature Power Panels. Our Expo Floor will bring together the leading global 200 companies throughout the world of Cloud Computing, DevOps, IoT, Smart Cities, FinTech, Digital Transformation, and all they entail. As your enterprise creates a vision and strategy that enables you to create your own unique, long-term success, learning about all the technologies involved is essential. Companies today not only form multi-cloud and hybrid cloud architectures, but create them with built-in cognitive capabilities.