Welcome!

@DevOpsSummit Authors: Kelly Burford, Automic Blog, Derek Weeks, Liz McMillan, Elizabeth White

Related Topics: @DevOpsSummit, @CloudExpo, @DXWorldExpo

@DevOpsSummit: Blog Feed Post

Parsing and Centralizing Elasticsearch Logs By @Sematext | @DevOpsSummit [#DevOps]

How to use Logstash’s file input to tail the main Elasticsearch log and the slowlogs

No, it’s not an endless loop waiting to happen, the plan here is to use Logstash to parse Elasticsearch logs and send them to another Elasticsearch cluster or to a log analytics service like Logsene (which conveniently exposes the Elasticsearch API, so you can use it without having to run and manage your own Elasticsearch cluster).

If you’re looking for some ELK stack intro and you think you’re in the wrong place, try our 5-minute Logstash tutorial. Still, if you have non-trivial amounts of data, you might end up here again. Because you’ll probably need to centralize Elasticsearch logs for the same reasons you centralize other logs:

  • to avoid SSH-ing into each server to figure out why something went wrong
  • to better understand issues such as slow indexing or searching (via slowlogs, for instance)
  • to search quickly in big logs

In this post, we’ll describe how to use Logstash’s file input to tail the main Elasticsearch log and the slowlogs. We’ll use grok and other filters to parse different parts of those logs into their own fields and we’ll send the resulting structured events to Logsene/Elasticsearch via the elasticsearch output. In the end, you’ll be able to do things like slowlog slicing and dicing with Kibana:

logstash_elasticsearch

TL;DR note: scroll down to the FAQ section for the whole config with comments.

Tailing Files
First, we’ll point the file input to *.log from Elasticsearch’s log directory. This will work nicely with the default rotation, which renames old logs to something like cluster-name.log.SOMEDATE. We’ll use start_position => “beginning”, to index existing content as well. We’ll add the multiline codec to parse exceptions nicely, telling it that every line not starting with a [ sign belongs to the same event as the previous line.

input {
file {
path => "/var/log/elasticsearch/*.log"
type => "elasticsearch"
start_position => "beginning"
codec => multiline {
pattern => "^\["
negate => true
what => "previous"
}
}
}

Parsing Generic Content
A typical Elasticsearch log comes in the form of:

[2015-01-13 15:42:24,624][INFO ][node ] [Atleza] starting ...

while a slowlog is a bit more structured, like:

[2015-01-13 15:43:17,160][WARN ][index.search.slowlog.query] [Atleza] [aa][3] took[19.9ms], took_millis[19], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"a":2}}}], extra_source[],

But fields from the beginning, like timestamp and severity, are common, so we’ll parse them first:

grok {
match => [ "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:severity}%{SPACE}
\]\[%{DATA:log_source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}(?(.|\r|\n)*)" ]
overwrite => [ "message" ] }

For the main Elasticsearch logs, the message field now contains the actual message, without the timestamp, severity, and log source, which are now in their own fields.

Parsing Slowlogs
For slowlogs, the message field now looks like this:

[aa][3] took[19.9ms], took_millis[19], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"a":2}}}], extra_source[],

First we’ll parse the index name and the shard number via grok, then the kv filter will take care of the name-value pairs that follow:

if "slowlog" in [path] {
grok {
match => [ "message", "\[%{DATA:index}\]\[%{DATA:shard}\]%{GREEDYDATA:kv_pairs}" ]
}
kv {
source => "kv_pairs"
field_split => " \],"
value_split => "\["
}
}

Some Cleanup
Now our logs are fully parsed, but there are still some niggles to take care of. One is that each log’s timestamp (the time logged by the application) is in the timestamp field, while the standard @timestamp was added by Logstash when it read that event. If you want @timestamp to hold the application-generated timestamp, you can do it with the date filter:

date {
"match" => [ "timestamp", "YYYY-MM-DD HH:mm:ss,SSS" ]
target => "@timestamp"
}

Other potentially annoying things:

  • at this point, timestamp contains the same data as @timestamp
  • the content of kv_pairs from slowlogs is already parsed by the kv filter
  • the log type (for example, index.search.slowlog.query) is in a field called log_source, to make room for a field called source which stores other things (the JSON query, in this case). I would rather store index.search.slowlog.query in source, especially if I’m using the Logsene UI, where I can filter on sources by clicking on them
  • the grok and kv filters parse all fields as strings. Even if some of them, like took_millis, are numbers

To fix all of the above (remove, rename and convert fields) we’ll use the mutate filter:

mutate {
remove_field => [ "kv_pairs", "timestamp" ]
rename => {
"source" => "source_body"
"log_source" => "source"
}
convert => {
"took_millis" => "integer"
"total_shards" => "integer"
"shard" => "integer"
}
}

Sending Events to Logsene/Elasticsearch
Below is an elasticsearch output configuration that works well with Logsene and Logstash 1.5.0 beta 1. For an external Elasticsearch cluster, you can simply specify the host name and protocol (we recommend HTTP because it’s easier to upgrade both Logstash and Elasticsearch):

output {
elasticsearch {
host => "logsene-receiver.sematext.com"
ssl => true
port => 443
index => "LOGSENE-TOKEN-GOES-HERE"
protocol => "http"
manage_template => false
}
}

If you’re using Logstash 1.4.2 or earlier, there’s no SSL support, so you’ll have to remove the ssl line and set port to 80.

FAQ

Q: Cool, this works well for logs. How about monitoring Elasticsearch metrics like how much heap is used or how many cache hits I get?
A: Check out our SPM, which can monitor lots of applications, including Elasticsearch. If you’re a Logsene user, too, you’ll be able to correlate logs and metrics
Q: I find this logging and parsing stuff is really exciting.
A: Me too. If you want to join us, we’re hiring worldwide
Q: I’m here from the TL;DR note. Can I get the complete config?
A: Here you go (please check the comments for things you might want to change)

input {
file {
path => "/var/log/elasticsearch/*.log"  # tail ES log and slowlogs
type => "elasticsearch"
start_position => "beginning"  # parse existing logs, too
codec => multiline {   # put the whole exception in a single event
pattern => "^\["
negate => true
what => "previous"
}
}
}

filter {
if [type] == "elasticsearch" {
grok {  # parses the common bits
match => [ "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:severity}%{SPACE}
\]\[%{DATA:log_source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}(?<message>(.|\r|\n)*)" ]
overwrite => [ "message" ]
}

if "slowlog" in [path] {  # slowlog-specific parsing
grok {  # parse the index name and the shard number
match => [ "message", "\[%{DATA:index}\]\[%{DATA:shard}\]%{GREEDYDATA:kv_pairs}" ]
}
kv {    # parses named fields
source => "kv_pairs"
field_split => " \],"
value_split => "\["
}
}

date {  # use timestamp from the log
"match" => [ "timestamp", "YYYY-MM-DD HH:mm:ss,SSS" ]
target => "@timestamp"
}

mutate {
remove_field => [ "kv_pairs", "timestamp" ]  # remove unused stuff
rename => {  # nicer field names (especially good for Logsene)
"source" => "source_body"
"log_source" => "source"
}
convert => {  # type numeric fields (they're strings by default)
"took_millis" => "integer"
"total_shards" => "integer"
"shard" => "integer"
}
}

}
}

output {
elasticsearch {   # send everything to Logsene
host => "logsene-receiver.sematext.com"
ssl => true  # works with Logstash 1.5+
port => 443  # use 80 for plain HTTP
index => "LOGSENE-APP-TOKEN-GOES-HERE"  # fill in your token (click Integration from your Logsene app)
protocol => "http"
manage_template => false
}
}

Filed under: Logging Tagged: elasticsearch, grok, kibana, log analytics, log management, logging, logsene, logstash, parsing, slowlog

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

@DevOpsSummit Stories
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering management. To date, IBM has launched more than 50 cloud data centers that span the globe. He has been building advanced technology, delivering “as a service” solutions, and managing infrastructure services for the past 20 years.
The past few years have brought a sea change in the way applications are architected, developed, and consumed—increasing both the complexity of testing and the business impact of software failures. How can software testing professionals keep pace with modern application delivery, given the trends that impact both architectures (cloud, microservices, and APIs) and processes (DevOps, agile, and continuous delivery)? This is where continuous testing comes in. D
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the benefits of the cloud without losing performance as containers become the new paradigm.
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve full cloud literacy in the enterprise world.
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. That means serverless is also changing the way we leverage public clouds. Truth-be-told, many enterprise IT shops were so happy to get out of the management of physical servers within a data center that many limitations of the existing public IaaS clouds were forgiven. However, now that we’ve lived a few years with public IaaS clouds, developers and CloudOps pros are giving a huge thumbs down to the ...
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex to learn. This is because Kubernetes is more of a toolset than a ready solution. Hence it’s essential to know when and how to apply the appropriate Kubernetes constructs.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS – software, platform, and infrastructure as a service.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS – software, platform, and infrastructure as a service.
DevOps at Cloud Expo – being held June 5-7, 2018, at the Javits Center in New York, NY – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits, DevOps is correlated with 20% faster time-to-market, 22% improvement in quality, and 18% reduction in dev and ops costs, according to research firm Vanson-Bourne. It is changing the way IT works, how businesses interact with customers, and how organizations are buying, building, and delivering software.
All clouds are not equal. To succeed in a DevOps context, organizations should plan to develop/deploy apps across a choice of on-premise and public clouds simultaneously depending on the business needs. This is where the concept of the Lean Cloud comes in - resting on the idea that you often need to relocate your app modules over their life cycles for both innovation and operational efficiency in the cloud.
@DevOpsSummit at Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, is co-located with 22nd Cloud Expo | 1st DXWorld Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throughout enterprises of all sizes.
SYS-CON Events announced today that T-Mobile exhibited at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on quality and value. Based in Bellevue, Washington, T-Mobile US provides services through its subsidiaries and operates its flagship brands, T-Mobile and MetroPCS. For more information, visit https://www.t-mobile.com.
SYS-CON Events announced today that Cedexis will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cedexis is the leader in data-driven enterprise global traffic management. Whether optimizing traffic through datacenters, clouds, CDNs, or any combination, Cedexis solutions drive quality and cost-effectiveness. For more information, please visit https://www.cedexis.com.
SYS-CON Events announced today that Google Cloud has been named “Keynote Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Companies come to Google Cloud to transform their businesses. Google Cloud’s comprehensive portfolio – from infrastructure to apps to devices – helps enterprises innovate faster, scale smarter, stay secure, and do more with data than ever before.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In their session at 21st Cloud Expo, Jenny Hung, E2E Engineer Manager at Yahoo Gemini, Haoran Zhao, Software Engineer at Oath Gemini, and Lin Zhang, Software Engineer at Oath (Yahoo), will describe the technical challenges and the principles we followed to build a reliable and scalable test automation infrastructure across desktops, mobile apps, and mobile web platforms on the cloud. We also share some...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lavi, a Nutanix DevOps Solution Architect, explored the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
SYS-CON Events announced today that Vivint to exhibit at SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California. As a leading smart home technology provider, Vivint offers home security, energy management, home automation, local cloud storage, and high-speed Internet solutions to more than one million customers throughout the United States and Canada. The end result is a smart home solution that saves you time and money and ultimately simplifies your life.
SYS-CON Events announced today that Opsani will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Opsani is the leading provider of deployment automation systems for running and scaling traditional enterprise applications on container infrastructure.
Every few years, a disruptive force comes along that prompts us to reframe our understanding of what something means, or how it works. For years, the notion of what a computer is and how you make one went pretty much unchallenged. Then virtualization came along, followed by cloud computing, and most recently containers. Suddenly the old rules no longer seemed to apply, or at least they didn’t always apply. These disruptors made us reconsider our IT worldview.
SYS-CON Events announced today that Nirmata will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nirmata provides a comprehensive platform, for deploying, operating, and optimizing containerized applications across clouds, powered by Kubernetes. Nirmata empowers enterprise DevOps teams by fully automating the complex operations and management of application containers and its underlying resources. Nirmata not only simplifies deployment and management of Kubernetes clusters but also facilitates delivery and operations of applications by continuously monitoring the application and infrastructure for changes, and auto-tuning the application based on pre-defined policies. Using Nirmata, enterprises can accelerate their journey towards becoming cloud-native.
SYS-CON Events announced today that Opsani to exhibit at SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California. Opsani is creating the next generation of automated continuous deployment tools designed specifically for containers. How is continuous deployment different from continuous integration and continuous delivery? CI/CD tools provide build and test. Continuous Deployment is the means by which qualified changes in software code or architecture are automatically deployed to production as soon as they are ready. Adding continuous deployment to your toolchain is the final step to providing push button deployment for your developers.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, will discuss how from store operations and optimization to employee training and insights, all ultimately create the best customer experience both online and in-store.
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous architectural and coordination work to minimize the volatility of the cloud environment and leverage the security features of the cloud to the benefit of the CICD pipeline.