Welcome!

@DevOpsSummit Authors: Yeshim Deniz, Stefana Muller, Elizabeth White, Liz McMillan, Pat Romanski

Related Topics: @DevOpsSummit, @CloudExpo, SDN Journal

@DevOpsSummit: Blog Post

Log Analysis for Software-Defined Data Centers | @DevOps Summit [#DevOps]

Log data provides the most granular view into what is happening across your systems, applications, and end users

by Chris Riley

Modern infrastructure constantly generates log data at a rate faster than humans can easily analyze. And now that data centers can be built and torn down with scripts, the amount of activity and data is exponential.

The traditional log analysis practices of manually reviewing log files on a weekly or daily basis, remain inadequate when looking at Software-defined Data Centers (SDDC). The modern architecture of SDDC, with its highly automated and dynamic deployment capabilities for multi-tier applications, necessitates real-time log analytics. Analytics that are key to complex troubleshooting, dynamic provisioning, high performance and superior security.log-analysis-for-software-defined-data-centers-2

In the software defined data center you are looking at many more variables beyond servers. You want to see provisioning volume and time. You want to know performance and iOPS of bare metal machines. You want to know about how the data centers network, and all individual virtual networks are performing, their security, and possible weak spots. And in the case of companies like IaaS and hosting providers you might be managing many of these virtual data centers all at one time.

Identifying root-cause performance bottlenecks, security vulnerabilities and optimizing provisioning of SDDC resources is only possible with a comprehensive log management solution. One that takes log data from individual components and presents a consolidated view of the infrastructure's system log data. The resulting operational intelligence enables deep, enterprise-wide visibility to ensure optimized utilization of SDDC resources, and advanced alerting to call details to pertinent and urgent issues.

Without these capabilities, IT administrators have to rely exclusively on system metrics, limiting their ability to make comprehensive decisions on performance alone, and possibly only performance at the data center level. Things such as memory consumption, CPU utilization and storage overlook valuable diagnostic information stored in log files.

Here are some of the categories of information that log analysis in SDDC can provide.

  • Machine Provisioning, De-Provisioning, and Moves: In the modern datacenter VMs move from physical machine to physical machine sometimes even while running, with technologies like v-motion. In order to optimize the processes for moving VMs to accommodate load historical reporting on VM moves, provisioning, and de-provisioning can help teams understand where to optimize the processes or and and remove bare metal machines.
  • Data Enter to Bare Metal Utilization: Enable the advantage of cloud technologies such as elasticity, on-demand availability and flexibility with the performance, consistency and predictability of bare metal servers. Log analysis allows IT decision makers to incorporate accurate information of machine efficiencies in planning for the overall provisioning, scaling and utilization of SDDC environments.
  • Intrusion Monitoring and Management: Log data can be used toidentify anomalous activities and creating automated alerts to point out areas of concern in real-time. With traditional, manual log analysis practices, IT administrators fail to extract insights from log data pointing to possible performance and security issues. A log analysis based management solution automates these processes, frees IT administrators from tedious manual log analysis tasks and provides enhanced visibility into infrastructure operations to prevent data breaches.
  • Audit Trails for Forensics Analysis and Compliance: Correlate log data to trace suspected intrusions or data loss, and maintain compliance to strict security regulations.
  • Incident Containment: Identify and isolate compromised or underperforming components to prevent infrastructure-wide damages with real-time alert configurations. Users can also analyze log data to identify causal links between independent outages and performance issues, spotting them before they grow.
  • Infrastructure Optimization: Active network log management allows IT decision makers to shape the infrastructure to meet diverse and evolving business demands. DevOps can also use log data in integrated test environments to correlate tests results with log data generated by SDDC infrastructure and applications.
  • Reduced Cost: Fewer tools and IT expertise are required to maintain and manage complex SDDC infrastructure.

And the implementation is easy. Just like server monitoring log pulls from server data are as easy as installing an agent. In the case of the SDDC the agent must already be a part of the script or gold master VM used for all provisioning. But in addition to the VMs, the agent also needs to be installed on all instances of your bare metal hypervisor. For example on each VMware ESX server. The only additional step above and beyond straight server logging is making sure the division between the hypervisor machines, and their provisioned VMs is clear.

Extending log analysis beyond monitoring of individual components to management of the entire SDDC requires users to set up cloud-based log analysis solutions completely independent from the SDDC infrastructure in question. While IT professionals are accustomed to traditional practices of monitoring errors in log data, DevOps running SDDC must identify the underlying network components where the shift in system behavior occurs. And with advanced machine learning-based log management solutions, DevOps can resolve issues and optimize performance with greater effectiveness.

More Stories By Trevor Parsons

Trevor Parsons is Chief Scientist and Co-founder of Logentries. Trevor has over 10 years experience in enterprise software and, in particular, has specialized in developing enterprise monitoring and performance tools for distributed systems. He is also a research fellow at the Performance Engineering Lab Research Group and was formerly a Scientist at the IBM Center for Advanced Studies. Trevor holds a PhD from University College Dublin, Ireland.

@DevOpsSummit Stories
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.
Traditional IT, great for stable systems of record, is struggling to cope with newer, agile systems of engagement requirements coming straight from the business. In his session at 18th Cloud Expo, William Morrish, General Manager of Product Sales at Interoute, will outline ways of exploiting new architectures to enable both systems and building them to support your existing platforms, with an eye for the future. Technologies such as Docker and the hyper-convergence of computing, networking and storage creates a platform for consolidation, migration and enabling digital transformation.
An edge gateway is an essential piece of infrastructure for large scale cloud-based services. In his session at 17th Cloud Expo, Mikey Cohen, Manager, Edge Gateway at Netflix, detailed the purpose, benefits and use cases for an edge gateway to provide security, traffic management and cloud cross region resiliency. He discussed how a gateway can be used to enhance continuous deployment and help testing of new service versions and get service insights and more. Philosophical and architectural approaches to what belongs in a gateway vs what should be in services were also discussed. Real examples of how gateway services are used in front of nearly all of Netflix's consumer facing traffic showed how gateway infrastructure is used in real highly available, massive scale services.
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.