Welcome!

@DevOpsSummit Authors: Elizabeth White, Mehdi Daoudi, Yeshim Deniz, Leon Fayer, Jyoti Bansal

Related Topics: @BigDataExpo, @CloudExpo, @ThingsExpo

@BigDataExpo: Blog Feed Post

Is Data Science Really Science? | @BigDataExpo #BigData #Analytics #DataScience

Science works within systems of laws such as the laws of physics, thermodynamics, mathematics, electromagnetism

My son Max is home from college and that always leads to some interesting conversations.  Max is in graduate school at Iowa State University where he is studying kinesiology and strength training.  As part of his research project, he is applying physics to athletic training in order to understand how certain types of exercises can lead to improvements in athletic speed, strength, agility, and recovery.

Figure 1:  The Laws of Kinesiology

Max was showing me one drill designed to increase the speed and thrust associated with jumping (Max added 5 inches to his vertical leap over the past 6 weeks, and can now dunk over the old man).  When I was asking him about the science behind the drill, he went into great details about the interaction between the sciences of physics, biomechanics and human anatomy.

Max could explain to me how the laws of physics (the study of the properties of matter and energy.), kinesiology (the study of human motion that mainly focuses on muscles and their functions) and biomechanics (they study of movement involved in strength exercise or in the execution of a sport skill) interacted to produce the desired outcomes.  He could explain why it worked.

And that is the heart of my challenges with treating data science as a science.  As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen.  I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that’s going to happen.  And I believe that the inability to explain why something is going to happen is why I struggle to call “data science” a science.

Okay, let the hate mail rain down on me, but let me explain why this is an important distinction!

What Is Science?
Science
is the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment.

Science works within systems of laws such as the laws of physics, thermodynamics, mathematics, electromagnetism, aerodynamics, electricity (like Ohm’s law), Newton’s law of motions, and chemistry.  Scientists can apply these laws to understand why certain actions lead to certain outcomes.  In many disciplines, it is critical (life and death critical in some cases) that the scientists (or engineers) know why something is going to occur:

  • In pharmaceuticals, chemists need to understand how certain chemicals can be combined in certain combinations (recipes) to drive human outcomes or results.
  • In mechanical engineering, building engineers need to know how certain materials and designs can be combined to support the weight of a 40 story building (that looks like it was made out of Lego blocks).
  • In electrical engineering, electrical engineers need to understand how much wiring, what type of wiring and the optimal designs are required to support the electrical needs of buildings or vehicles.

Again, the laws that underpin these disciplines can be used to understand why certain actions or combinations lead to predictable outcomes.

Big Data and the “Death” of Why
An article by Chris Anderson in 2006 titled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” really called into question the “science” nature of the data science role.  The premise of the article was that the massive amounts of data were yielding insights about the human behaviors without requiring the heavy statistical modeling typically needed when using sampled data sets.  This is the quote that most intrigued me:

“Google conquered the advertising world with nothing more than applied mathematics. It didn’t pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.”

With the vast amounts of detailed data available and high-powered analytic tools, it is possible to identify what works without having to worry about why it worked.  Maybe when it comes to human behaviors, there are no laws that can be used to understand (or codify) why humans take certain actions under certain conditions.  In fact, we already know that humans are illogical decision-making machines (see “Human Decision-Making in a Big Data World”).

However, there are some new developments that I think will require “data science” to become more like other “sciences.”

Internet of Things and the “Birth” of Why
The Internet of Things (IOT) will require organizations to understand and codify why certain inputs lead to predictable outcomes.  For example, it will be critical for manufacturers to understand and codify why certain components in a product break down most often, by trying to address questions such as:

  • Was the failure caused by the materials used to build the component?
  • Was the failure caused by the design of the component?
  • Was the failure caused by the use of the component?
  • Was the failure caused by the installation of the component?
  • Was the failure caused by the maintenance of the component?

As we move into the world of IoT, we will start to see increased collaboration between analytics and physics.  See what organizations like GE are doing with the concept of “Digital Twins”.

The Digital Twin involves building a digital model, or twin, of every machine – from a jet engine to a locomotive – to grow and create new business and service models through the Industrial Internet.[1]

Digital twins are computerized companions of physical assets that can be used for various purposes. Digital twins use data from sensors installed on physical objects to represent their real-time status, working condition or position.[2]

GE is building digital models that mirror the physical structures of their products and components.  This allows them to not only accelerate the development of new products, but allows them to test the products in a greater number of situations to determine metrics such as mean-time-to-failure, stress capability and structural loads.

As the worlds of physics and IoT collide, data scientist will become more like other “scientists” as their digital world will begin to be governed by the laws that govern disciplines such as physics, aerodynamics, chemistry and electricity.

Data Science and the Cost of Wrong
Another potential driver in the IoT world is the substantial cost of being wrong.  As discussed in my blog “Understanding Type I and Type II Errors”, the cost of being wrong (false positives and false negatives) has minimal impact when trying to predict human behaviors such as which customers might respond to which ads, or which customers are likely to recommend you to their friends.

However in the world of IOT, the costs of being wrong (false positives and false negatives) can have severe or even catastrophic financial, legal and liability costs.  Organizations cannot afford to have planes falling out of the skies or autonomous cars driving into crowds or pharmaceuticals accidently killing patients.

Summary
Traditionally, big data historically was not concerned with understanding or quantifying “why” certain actions occurred because for the most part, organizations were using big data to understand and predict customer behaviors (e.g., acquisition, up-sell, fraud, theft, attrition, advocacy).  The costs associated with false positives and false negatives were relatively small compared to the financial benefit or return.

And while there may never be “laws” that dictate human behaviors, in the world of IOT where organizations are melding analytics (machine learning and artificial intelligence) with physical products, we will see “data science” advancing beyond just “data” science.  In IOT, the data science team must expand to include scientists and engineers from the physical sciences so that the team can understand and quantify the “why things happen” aspect of the analytic models.  If not, the costs could be catastrophic.

[1] https://www.ge.com/digital/blog/dawn-digital-industrial-era

[2] https://en.wikipedia.org/wiki/Digital_Twins

The post Is Data Science Really Science? appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

@DevOpsSummit Stories
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the application economy.
DevOps is a hot topic. It seems that everyone is talking about it. Some have built business models around DevOps-related tools and themes. There are conferences and trade shows dedicated to DevOps-strategies and techniques. Some people have even made their careers around talking about it. In light of all of that, I find it chuckle-worthy that very few people actually know what DevOps is (just follow #devops on Twitter for proof.) I am not going to be one of many trying to create a buzzword-infested definition of DevOps to suit my particular agenda. Instead, I’d like to talk about what DevOps is not. So, without further ado, DevOps …
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business managers to a full-scale mobile software company with over 200 developers, designers, quality assurance engineers, project managers in house, specializing in the world-class mobile and web development.
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Get certified, manage the full lifecycle of your cloud-based resources, and build your knowledge based using Cloud Academy’s expert-created content, comprehensive Learning Paths, and innovative Hands-on Labs.
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in San Francisco and Tel Aviv, Loom Systems works with customers across industries around the world.
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Providing the needed data for application development and testing is a huge headache for most organizations. The problems are often the same across companies - speed, quality, cost, and control. Provisioning data can take days or weeks, every time a refresh is required. Using dummy data leads to quality problems. Creating physical copies of large data sets and sending them to distributed teams of developers eats up expensive storage and bandwidth resources. And, all of these copies proliferating the organization can lead to inconsistent masking and exposure of sensitive data. But some organizations are adopting a new method of data management for DevOps that is delivering transformational business outcomes in faster time to market, lower costs, and great control. In his session at DevOps Summit, Brian Reagan, Managing Director of Blackthorne Consulting Group, an Actifio company, revi...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
In recent years, containers have taken the world by storm. Companies of all sizes and industries have realized the massive benefits of containers, such as unprecedented mobility, higher hardware utilization, and increased flexibility and agility; however, many containers today are non-persistent. Containers without persistence miss out on many benefits, and in many cases simply pass the responsibility of persistence onto other infrastructure, adding additional complexity.
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a single platform to drive unprecedented simplicity in the data center. Customers can start with a base infrastructure and scale to multi-site and multi-geo infrastructures with predictable economics and performance.
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on quality and value.
SYS-CON Events announced today that Addteq will exhibit at SYS-CON's DevOps Summit at Cloud Expo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Addteq specializes in creating innovative solutions to solve business processes through the use of DevOps automation. Addteq was founded on the firm belief that automation is essential for successful software releases. Addteq's products and services are centered around the fundamental approach of understanding the proper balance between knowing when to integrate software and when to automate software. For software teams looking for build and release management services, Addteq is a one-stop shop for all of their development needs.
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor - all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organizations must focus on what is most relevant to deliver value, reduce IT complexity, create more repeatable agile-based processes and leverage increasingly secure and stable, cloud-based infrastructure platforms.
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that Infranics will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Since 2000, Infranics has developed SysMaster Suite, which is required for the stable and efficient management of ICT infrastructure. The ICT management solution developed and provided by Infranics continues to add intelligence to the ICT infrastructure through the IMC (Infra Management Cycle) based on mathematical analysis and forecasting Big Data Analyze and Control.
Virtualization over the past years has become a key strategy for IT to acquire multi-tenancy, increase utilization, develop elasticity and improve security. And virtual machines (VMs) are quickly becoming a main vehicle for developing and deploying applications. The introduction of containers seems to be bringing another and perhaps overlapped solution for achieving the same above-mentioned benefits. Are a container and a virtual machine fundamentally the same or different? And how? Is one technically superior to the other? What about performance and security? Does IT need either one, or both?
Have you ever noticed how some IT people seem to lead successful, rewarding, and satisfying lives and careers, while others struggle? IT author and speaker Don Crawley uncovered the five principles that successful IT people use to build satisfying lives and careers and he shares them in this fast-paced, thought-provoking webinar. You'll learn the importance of striking a balance with technical skills and people skills, challenge your pre-existing ideas about IT customer service, and gain new insights into how to build your own satisfying and rewarding career by rising above the ordinary and mundane to build an extraordinary life and career as a world-class Compassionate Geek.
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverless applications, such as the AWS Serverless Application Model (AWS SAM), Chalice, and ClaudiaJS.
The essence of cloud computing is that all consumable IT resources are delivered as services. In his session at 15th Cloud Expo, Yung Chou, Technology Evangelist at Microsoft, demonstrated the concepts and implementations of two important cloud computing deliveries: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). He discussed from business and technical viewpoints what exactly they are, why we care, how they are different and in what ways, and the strategies for IT to transition into and take advantages of these emerging service models.
Culture is the most important ingredient of DevOps. The challenge for most organizations is defining and communicating a vision of beneficial DevOps culture for their organizations, and then facilitating the changes needed to achieve that. Often this comes down to an ability to provide true leadership. As a CIO, are your direct reports IT managers or are they IT leaders? The hard truth is that many IT managers have risen through the ranks based on their technical skills, not their leadership ability. Many are unable to effectively engage and inspire, creating forward momentum in the direction of desired change. Renowned for its approach to leadership and emphasis on their people, organizations increasingly look to our military for insight into these challenges.
SYS-CON Events announced today that HTBase will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. HTBase (Gartner 2016 Cool Vendor) delivers a Composable IT infrastructure solution architected for agility and increased efficiency. It turns compute, storage, and fabric into fluid pools of resources that are easily composed and re-composed to meet each application’s needs. With HTBase, companies can quickly provision resources and deploy unique, mission-critical, self-designed solutions to add-onto or create any type of infrastructure as per the business requirement. HTBase is the first company to enable a true multi-cloud strategy, enabling organizations to automate movement of data and workloads between private and public clouds. This means that organizations can now move data and workloads between pub...
SYS-CON Events announced today that Outlyer, a monitoring service for DevOps and operations teams, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outlyer is a monitoring service for DevOps and Operations teams running Cloud, SaaS, Microservices and IoT deployments. Designed for today's dynamic environments that need beyond cloud-scale monitoring, we make monitoring effortless so you can concentrate on running a better service for your users.
SYS-CON Events announced today that Hitrons Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Hitrons Solutions Inc. is distributor in the North American market for unique products and services of small and medium-size businesses, including cloud services and solutions, SEO marketing platforms, and mobile applications.