Welcome!

@DevOpsSummit Authors: Jason Bloomberg, Stackify Blog, Aruna Ravichandran, Otto Berkes, Ayman Sayed

Related Topics: @DevOpsSummit, @CloudExpo, @DXWorldExpo

@DevOpsSummit: Blog Feed Post

Parsing and Centralizing Elasticsearch Logs By @Sematext | @DevOpsSummit [#DevOps]

How to use Logstash’s file input to tail the main Elasticsearch log and the slowlogs

No, it’s not an endless loop waiting to happen, the plan here is to use Logstash to parse Elasticsearch logs and send them to another Elasticsearch cluster or to a log analytics service like Logsene (which conveniently exposes the Elasticsearch API, so you can use it without having to run and manage your own Elasticsearch cluster).

If you’re looking for some ELK stack intro and you think you’re in the wrong place, try our 5-minute Logstash tutorial. Still, if you have non-trivial amounts of data, you might end up here again. Because you’ll probably need to centralize Elasticsearch logs for the same reasons you centralize other logs:

  • to avoid SSH-ing into each server to figure out why something went wrong
  • to better understand issues such as slow indexing or searching (via slowlogs, for instance)
  • to search quickly in big logs

In this post, we’ll describe how to use Logstash’s file input to tail the main Elasticsearch log and the slowlogs. We’ll use grok and other filters to parse different parts of those logs into their own fields and we’ll send the resulting structured events to Logsene/Elasticsearch via the elasticsearch output. In the end, you’ll be able to do things like slowlog slicing and dicing with Kibana:

logstash_elasticsearch

TL;DR note: scroll down to the FAQ section for the whole config with comments.

Tailing Files
First, we’ll point the file input to *.log from Elasticsearch’s log directory. This will work nicely with the default rotation, which renames old logs to something like cluster-name.log.SOMEDATE. We’ll use start_position => “beginning”, to index existing content as well. We’ll add the multiline codec to parse exceptions nicely, telling it that every line not starting with a [ sign belongs to the same event as the previous line.

input {
file {
path => "/var/log/elasticsearch/*.log"
type => "elasticsearch"
start_position => "beginning"
codec => multiline {
pattern => "^\["
negate => true
what => "previous"
}
}
}

Parsing Generic Content
A typical Elasticsearch log comes in the form of:

[2015-01-13 15:42:24,624][INFO ][node ] [Atleza] starting ...

while a slowlog is a bit more structured, like:

[2015-01-13 15:43:17,160][WARN ][index.search.slowlog.query] [Atleza] [aa][3] took[19.9ms], took_millis[19], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"a":2}}}], extra_source[],

But fields from the beginning, like timestamp and severity, are common, so we’ll parse them first:

grok {
match => [ "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:severity}%{SPACE}
\]\[%{DATA:log_source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}(?(.|\r|\n)*)" ]
overwrite => [ "message" ] }

For the main Elasticsearch logs, the message field now contains the actual message, without the timestamp, severity, and log source, which are now in their own fields.

Parsing Slowlogs
For slowlogs, the message field now looks like this:

[aa][3] took[19.9ms], took_millis[19], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"a":2}}}], extra_source[],

First we’ll parse the index name and the shard number via grok, then the kv filter will take care of the name-value pairs that follow:

if "slowlog" in [path] {
grok {
match => [ "message", "\[%{DATA:index}\]\[%{DATA:shard}\]%{GREEDYDATA:kv_pairs}" ]
}
kv {
source => "kv_pairs"
field_split => " \],"
value_split => "\["
}
}

Some Cleanup
Now our logs are fully parsed, but there are still some niggles to take care of. One is that each log’s timestamp (the time logged by the application) is in the timestamp field, while the standard @timestamp was added by Logstash when it read that event. If you want @timestamp to hold the application-generated timestamp, you can do it with the date filter:

date {
"match" => [ "timestamp", "YYYY-MM-DD HH:mm:ss,SSS" ]
target => "@timestamp"
}

Other potentially annoying things:

  • at this point, timestamp contains the same data as @timestamp
  • the content of kv_pairs from slowlogs is already parsed by the kv filter
  • the log type (for example, index.search.slowlog.query) is in a field called log_source, to make room for a field called source which stores other things (the JSON query, in this case). I would rather store index.search.slowlog.query in source, especially if I’m using the Logsene UI, where I can filter on sources by clicking on them
  • the grok and kv filters parse all fields as strings. Even if some of them, like took_millis, are numbers

To fix all of the above (remove, rename and convert fields) we’ll use the mutate filter:

mutate {
remove_field => [ "kv_pairs", "timestamp" ]
rename => {
"source" => "source_body"
"log_source" => "source"
}
convert => {
"took_millis" => "integer"
"total_shards" => "integer"
"shard" => "integer"
}
}

Sending Events to Logsene/Elasticsearch
Below is an elasticsearch output configuration that works well with Logsene and Logstash 1.5.0 beta 1. For an external Elasticsearch cluster, you can simply specify the host name and protocol (we recommend HTTP because it’s easier to upgrade both Logstash and Elasticsearch):

output {
elasticsearch {
host => "logsene-receiver.sematext.com"
ssl => true
port => 443
index => "LOGSENE-TOKEN-GOES-HERE"
protocol => "http"
manage_template => false
}
}

If you’re using Logstash 1.4.2 or earlier, there’s no SSL support, so you’ll have to remove the ssl line and set port to 80.

FAQ

Q: Cool, this works well for logs. How about monitoring Elasticsearch metrics like how much heap is used or how many cache hits I get?
A: Check out our SPM, which can monitor lots of applications, including Elasticsearch. If you’re a Logsene user, too, you’ll be able to correlate logs and metrics
Q: I find this logging and parsing stuff is really exciting.
A: Me too. If you want to join us, we’re hiring worldwide
Q: I’m here from the TL;DR note. Can I get the complete config?
A: Here you go (please check the comments for things you might want to change)

input {
file {
path => "/var/log/elasticsearch/*.log"  # tail ES log and slowlogs
type => "elasticsearch"
start_position => "beginning"  # parse existing logs, too
codec => multiline {   # put the whole exception in a single event
pattern => "^\["
negate => true
what => "previous"
}
}
}

filter {
if [type] == "elasticsearch" {
grok {  # parses the common bits
match => [ "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:severity}%{SPACE}
\]\[%{DATA:log_source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}(?<message>(.|\r|\n)*)" ]
overwrite => [ "message" ]
}

if "slowlog" in [path] {  # slowlog-specific parsing
grok {  # parse the index name and the shard number
match => [ "message", "\[%{DATA:index}\]\[%{DATA:shard}\]%{GREEDYDATA:kv_pairs}" ]
}
kv {    # parses named fields
source => "kv_pairs"
field_split => " \],"
value_split => "\["
}
}

date {  # use timestamp from the log
"match" => [ "timestamp", "YYYY-MM-DD HH:mm:ss,SSS" ]
target => "@timestamp"
}

mutate {
remove_field => [ "kv_pairs", "timestamp" ]  # remove unused stuff
rename => {  # nicer field names (especially good for Logsene)
"source" => "source_body"
"log_source" => "source"
}
convert => {  # type numeric fields (they're strings by default)
"took_millis" => "integer"
"total_shards" => "integer"
"shard" => "integer"
}
}

}
}

output {
elasticsearch {   # send everything to Logsene
host => "logsene-receiver.sematext.com"
ssl => true  # works with Logstash 1.5+
port => 443  # use 80 for plain HTTP
index => "LOGSENE-APP-TOKEN-GOES-HERE"  # fill in your token (click Integration from your Logsene app)
protocol => "http"
manage_template => false
}
}

Filed under: Logging Tagged: elasticsearch, grok, kibana, log analytics, log management, logging, logsene, logstash, parsing, slowlog

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

@DevOpsSummit Stories
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to great conferences, helping you discover new conferences and increase your return on investment.
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and controlling infrastructure. The rise of Site Reliability Engineering (SRE) is part of that redefinition of operations vs development roles in organizations.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering management. To date, IBM has launched more than 50 cloud data centers that span the globe. He has been building advanced technology, delivering “as a service” solutions, and managing infrastructure services for the past 20 years.
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous architectural and coordination work to minimize the volatility of the cloud environment and leverage the security features of the cloud to the benefit of the CICD pipeline.
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. This leads to a waste of cloud resources and increased operational overhead.
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS – software, platform, and infrastructure as a service.
"CA has been doing a lot of things in the area of DevOps. Now we have a complete set of tool sets in order to enable customers to go all the way from planning to development to testing down to release into the operations," explained Aruna Ravichandran, Vice President of Global Marketing and Strategy at CA Technologies, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Vulnerability management is vital for large companies that need to secure containers across thousands of hosts, but many struggle to understand how exposed they are when they discover a new high security vulnerability. In his session at 21st Cloud Expo, John Morello, CTO of Twistlock, addressed this pressing concern by introducing the concept of the “Vulnerability Risk Tree API,” which brings all the data together in a simple REST endpoint, allowing companies to easily grasp the severity of the vulnerability. He provided attendees with actionable advice related to understanding and acting on exposure due to new high severity vulnerabilities.
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the public cloud best suits your organization, and what the future holds for operations and infrastructure engineers in a post-container world. Is a serverless world inevitable?
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buyers learn their thoughts on their experience.
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, which can process our conversational commands and orchestrate the outcomes we request across our personal and professional realm of connected devices.
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to close their feedback loops to drive continuous improvement.
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
Sanjeev Sharma Joins June 5-7, 2018 @DevOpsSummit at @Cloud Expo New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual business failure.
"Cloud4U builds software services that help people build DevOps platforms for cloud-based software and using our platform people can draw a picture of the system, network, software," explained Kihyeon Kim, CEO and Head of R&D at Cloud4U, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Is advanced scheduling in Kubernetes achievable?Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations? In his session at @DevOpsSummit at 21st Cloud Expo, Oleg Chunikhin, CTO at Kublr, answered these questions and demonstrated techniques for implementing advanced scheduling. For example, using spot instances and cost-effective resources on AWS, coupled with the ability to deliver a minimum set of functionalities that cover the majority of needs – without configuration complexity.
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory? In her Day 2 Keynote at @DevOpsSummit at 21st Cloud Expo, Aruna Ravichandran, VP, DevOps Solutions Marketing, CA Technologies, was joined by a panel of industry experts and real-world practitioners who shared their insight into an emerging set of best practices that lie at the heart of today's digital transformation.
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone innovative products that help customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business and personal computing needs.
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that's no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, explored how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He explained how automation, orchestration and governance are fundamental to managing today's hybrid cloud environments and are critical for digital businesses to deliver services faster, with better user experience and higher quality, all while saving money.
SYS-CON Events announced today that Google Cloud has been named “Keynote Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Companies come to Google Cloud to transform their businesses. Google Cloud’s comprehensive portfolio – from infrastructure to apps to devices – helps enterprises innovate faster, scale smarter, stay secure, and do more with data than ever before.