Welcome!

@DevOpsSummit Authors: Yeshim Deniz, Stackify Blog, XebiaLabs Blog, Angsuman Dutta, Jason Bloomberg

Related Topics: @DevOpsSummit, @CloudExpo, @BigDataExpo

@DevOpsSummit: Blog Feed Post

Parsing and Centralizing Elasticsearch Logs By @Sematext | @DevOpsSummit [#DevOps]

How to use Logstash’s file input to tail the main Elasticsearch log and the slowlogs

No, it’s not an endless loop waiting to happen, the plan here is to use Logstash to parse Elasticsearch logs and send them to another Elasticsearch cluster or to a log analytics service like Logsene (which conveniently exposes the Elasticsearch API, so you can use it without having to run and manage your own Elasticsearch cluster).

If you’re looking for some ELK stack intro and you think you’re in the wrong place, try our 5-minute Logstash tutorial. Still, if you have non-trivial amounts of data, you might end up here again. Because you’ll probably need to centralize Elasticsearch logs for the same reasons you centralize other logs:

  • to avoid SSH-ing into each server to figure out why something went wrong
  • to better understand issues such as slow indexing or searching (via slowlogs, for instance)
  • to search quickly in big logs

In this post, we’ll describe how to use Logstash’s file input to tail the main Elasticsearch log and the slowlogs. We’ll use grok and other filters to parse different parts of those logs into their own fields and we’ll send the resulting structured events to Logsene/Elasticsearch via the elasticsearch output. In the end, you’ll be able to do things like slowlog slicing and dicing with Kibana:

logstash_elasticsearch

TL;DR note: scroll down to the FAQ section for the whole config with comments.

Tailing Files
First, we’ll point the file input to *.log from Elasticsearch’s log directory. This will work nicely with the default rotation, which renames old logs to something like cluster-name.log.SOMEDATE. We’ll use start_position => “beginning”, to index existing content as well. We’ll add the multiline codec to parse exceptions nicely, telling it that every line not starting with a [ sign belongs to the same event as the previous line.

input {
file {
path => "/var/log/elasticsearch/*.log"
type => "elasticsearch"
start_position => "beginning"
codec => multiline {
pattern => "^\["
negate => true
what => "previous"
}
}
}

Parsing Generic Content
A typical Elasticsearch log comes in the form of:

[2015-01-13 15:42:24,624][INFO ][node ] [Atleza] starting ...

while a slowlog is a bit more structured, like:

[2015-01-13 15:43:17,160][WARN ][index.search.slowlog.query] [Atleza] [aa][3] took[19.9ms], took_millis[19], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"a":2}}}], extra_source[],

But fields from the beginning, like timestamp and severity, are common, so we’ll parse them first:

grok {
match => [ "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:severity}%{SPACE}
\]\[%{DATA:log_source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}(?(.|\r|\n)*)" ]
overwrite => [ "message" ] }

For the main Elasticsearch logs, the message field now contains the actual message, without the timestamp, severity, and log source, which are now in their own fields.

Parsing Slowlogs
For slowlogs, the message field now looks like this:

[aa][3] took[19.9ms], took_millis[19], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"a":2}}}], extra_source[],

First we’ll parse the index name and the shard number via grok, then the kv filter will take care of the name-value pairs that follow:

if "slowlog" in [path] {
grok {
match => [ "message", "\[%{DATA:index}\]\[%{DATA:shard}\]%{GREEDYDATA:kv_pairs}" ]
}
kv {
source => "kv_pairs"
field_split => " \],"
value_split => "\["
}
}

Some Cleanup
Now our logs are fully parsed, but there are still some niggles to take care of. One is that each log’s timestamp (the time logged by the application) is in the timestamp field, while the standard @timestamp was added by Logstash when it read that event. If you want @timestamp to hold the application-generated timestamp, you can do it with the date filter:

date {
"match" => [ "timestamp", "YYYY-MM-DD HH:mm:ss,SSS" ]
target => "@timestamp"
}

Other potentially annoying things:

  • at this point, timestamp contains the same data as @timestamp
  • the content of kv_pairs from slowlogs is already parsed by the kv filter
  • the log type (for example, index.search.slowlog.query) is in a field called log_source, to make room for a field called source which stores other things (the JSON query, in this case). I would rather store index.search.slowlog.query in source, especially if I’m using the Logsene UI, where I can filter on sources by clicking on them
  • the grok and kv filters parse all fields as strings. Even if some of them, like took_millis, are numbers

To fix all of the above (remove, rename and convert fields) we’ll use the mutate filter:

mutate {
remove_field => [ "kv_pairs", "timestamp" ]
rename => {
"source" => "source_body"
"log_source" => "source"
}
convert => {
"took_millis" => "integer"
"total_shards" => "integer"
"shard" => "integer"
}
}

Sending Events to Logsene/Elasticsearch
Below is an elasticsearch output configuration that works well with Logsene and Logstash 1.5.0 beta 1. For an external Elasticsearch cluster, you can simply specify the host name and protocol (we recommend HTTP because it’s easier to upgrade both Logstash and Elasticsearch):

output {
elasticsearch {
host => "logsene-receiver.sematext.com"
ssl => true
port => 443
index => "LOGSENE-TOKEN-GOES-HERE"
protocol => "http"
manage_template => false
}
}

If you’re using Logstash 1.4.2 or earlier, there’s no SSL support, so you’ll have to remove the ssl line and set port to 80.

FAQ

Q: Cool, this works well for logs. How about monitoring Elasticsearch metrics like how much heap is used or how many cache hits I get?
A: Check out our SPM, which can monitor lots of applications, including Elasticsearch. If you’re a Logsene user, too, you’ll be able to correlate logs and metrics
Q: I find this logging and parsing stuff is really exciting.
A: Me too. If you want to join us, we’re hiring worldwide
Q: I’m here from the TL;DR note. Can I get the complete config?
A: Here you go (please check the comments for things you might want to change)

input {
file {
path => "/var/log/elasticsearch/*.log"  # tail ES log and slowlogs
type => "elasticsearch"
start_position => "beginning"  # parse existing logs, too
codec => multiline {   # put the whole exception in a single event
pattern => "^\["
negate => true
what => "previous"
}
}
}

filter {
if [type] == "elasticsearch" {
grok {  # parses the common bits
match => [ "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:severity}%{SPACE}
\]\[%{DATA:log_source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}(?<message>(.|\r|\n)*)" ]
overwrite => [ "message" ]
}

if "slowlog" in [path] {  # slowlog-specific parsing
grok {  # parse the index name and the shard number
match => [ "message", "\[%{DATA:index}\]\[%{DATA:shard}\]%{GREEDYDATA:kv_pairs}" ]
}
kv {    # parses named fields
source => "kv_pairs"
field_split => " \],"
value_split => "\["
}
}

date {  # use timestamp from the log
"match" => [ "timestamp", "YYYY-MM-DD HH:mm:ss,SSS" ]
target => "@timestamp"
}

mutate {
remove_field => [ "kv_pairs", "timestamp" ]  # remove unused stuff
rename => {  # nicer field names (especially good for Logsene)
"source" => "source_body"
"log_source" => "source"
}
convert => {  # type numeric fields (they're strings by default)
"took_millis" => "integer"
"total_shards" => "integer"
"shard" => "integer"
}
}

}
}

output {
elasticsearch {   # send everything to Logsene
host => "logsene-receiver.sematext.com"
ssl => true  # works with Logstash 1.5+
port => 443  # use 80 for plain HTTP
index => "LOGSENE-APP-TOKEN-GOES-HERE"  # fill in your token (click Integration from your Logsene app)
protocol => "http"
manage_template => false
}
}

Filed under: Logging Tagged: elasticsearch, grok, kibana, log analytics, log management, logging, logsene, logstash, parsing, slowlog

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

@DevOpsSummit Stories
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud platforms. Through its four pillars of focus, Calligo delivers a platform that businesses can trust to deliver the high level of service and protection they expect and is lacking in many cloud offerings.
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big Data are challenging the existing design patterns of the Data Center. The increase in complexity of managing legacy systems alongside new systems is beyond the ability of most IT departments. This leads to multiple tiers of storage and high economic costs, during a time in which IT is expected to do more with less.
DX World EXPO, LLC., a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex to learn. This is because Kubernetes is more of a toolset than a ready solution. Hence it’s essential to know when and how to apply the appropriate Kubernetes constructs.
"DX encompasses the continuing technology revolution, and is addressing society's most important issues throughout the entire $78 trillion 21st-century global economy," said Roger Strukhoff, Conference Chair. "DX World Expo has organized these issues along 10 tracks with more than 150 of the world's top speakers coming to Istanbul to help change the world."
"I think DevOps is now a rambunctious teenager – it’s starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"I will be talking about ChatOps and ChatOps as a way to solve some problems in the DevOps space," explained Himanshu Chhetri, CTO of Addteq, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are an IT services solution provider and we sell software to support those solutions. Our focus and key areas are around security, enterprise monitoring, and continuous delivery optimization," noted John Balsavage, President of A&I Solutions, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with Tintri's web services architecture and APIs. Impress your DevOps team with smart and autonomous infrastructure.
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the works because of misaligned incentives.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise...
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. His expertise is in automating deployment, management, and problem resolution in these environments, allowing his teams to run large transactional applications with high availability and the speed the consumer demands.
As enterprise cloud becomes the norm, businesses and government programs must address compounded regulatory compliance related to data privacy and information protection. The most recent, Controlled Unclassified Information and the EU’s GDPR have board level implications and companies still struggle with demonstrating due diligence. Developers and DevOps leaders, as part of the pre-planning process and the associated supply chain, could benefit from updating their code libraries and design by incorporating changes.
"I'm here to leverage my secret sauce, which is using outsourced development and the company that I utilize is delaPlex Software and they've basically allowed me to win Fortune 500 companies," noted Justin Witz, CTO of FRA and PlanTools, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, discussed the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insights from the vast trove of private and public data available to enterprises.
SYS-CON Events announced today that EnterpriseTech has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. EnterpriseTech is a professional resource for news and intelligence covering the migration of high-end technologies into the enterprise and business-IT industry, with a special focus on high-tech solutions in new product development, workload management, increased efficiency, and maximizing competitive edge.
"We began as LinuxAcademy.com about five years ago as a very small outfit. Since then we've transitioned into more of a DevOps training company - the technologies and the tooling around DevOps," explained Doug Vanderweide, an instructor at Linux Academy, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lavi, a Nutanix DevOps Solution Architect, explored the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of engineers can navigate the Carrier Ecosystem for your IT team acting as an extension of your business, producing a hassle-free experience.