Welcome!

@DevOpsSummit Authors: Elizabeth White, Leon Fayer, Cloud Best Practices Network, Gaurav Pal, Carl J. Levine

Related Topics: @BigDataExpo, @CloudExpo, @ThingsExpo

@BigDataExpo: Blog Feed Post

Is Data Science Really Science? | @BigDataExpo #BigData #Analytics #DataScience

Science works within systems of laws such as the laws of physics, thermodynamics, mathematics, electromagnetism

My son Max is home from college and that always leads to some interesting conversations.  Max is in graduate school at Iowa State University where he is studying kinesiology and strength training.  As part of his research project, he is applying physics to athletic training in order to understand how certain types of exercises can lead to improvements in athletic speed, strength, agility, and recovery.

Figure 1:  The Laws of Kinesiology

Max was showing me one drill designed to increase the speed and thrust associated with jumping (Max added 5 inches to his vertical leap over the past 6 weeks, and can now dunk over the old man).  When I was asking him about the science behind the drill, he went into great details about the interaction between the sciences of physics, biomechanics and human anatomy.

Max could explain to me how the laws of physics (the study of the properties of matter and energy.), kinesiology (the study of human motion that mainly focuses on muscles and their functions) and biomechanics (they study of movement involved in strength exercise or in the execution of a sport skill) interacted to produce the desired outcomes.  He could explain why it worked.

And that is the heart of my challenges with treating data science as a science.  As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen.  I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that’s going to happen.  And I believe that the inability to explain why something is going to happen is why I struggle to call “data science” a science.

Okay, let the hate mail rain down on me, but let me explain why this is an important distinction!

What Is Science?
Science
is the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment.

Science works within systems of laws such as the laws of physics, thermodynamics, mathematics, electromagnetism, aerodynamics, electricity (like Ohm’s law), Newton’s law of motions, and chemistry.  Scientists can apply these laws to understand why certain actions lead to certain outcomes.  In many disciplines, it is critical (life and death critical in some cases) that the scientists (or engineers) know why something is going to occur:

  • In pharmaceuticals, chemists need to understand how certain chemicals can be combined in certain combinations (recipes) to drive human outcomes or results.
  • In mechanical engineering, building engineers need to know how certain materials and designs can be combined to support the weight of a 40 story building (that looks like it was made out of Lego blocks).
  • In electrical engineering, electrical engineers need to understand how much wiring, what type of wiring and the optimal designs are required to support the electrical needs of buildings or vehicles.

Again, the laws that underpin these disciplines can be used to understand why certain actions or combinations lead to predictable outcomes.

Big Data and the “Death” of Why
An article by Chris Anderson in 2006 titled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” really called into question the “science” nature of the data science role.  The premise of the article was that the massive amounts of data were yielding insights about the human behaviors without requiring the heavy statistical modeling typically needed when using sampled data sets.  This is the quote that most intrigued me:

“Google conquered the advertising world with nothing more than applied mathematics. It didn’t pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.”

With the vast amounts of detailed data available and high-powered analytic tools, it is possible to identify what works without having to worry about why it worked.  Maybe when it comes to human behaviors, there are no laws that can be used to understand (or codify) why humans take certain actions under certain conditions.  In fact, we already know that humans are illogical decision-making machines (see “Human Decision-Making in a Big Data World”).

However, there are some new developments that I think will require “data science” to become more like other “sciences.”

Internet of Things and the “Birth” of Why
The Internet of Things (IOT) will require organizations to understand and codify why certain inputs lead to predictable outcomes.  For example, it will be critical for manufacturers to understand and codify why certain components in a product break down most often, by trying to address questions such as:

  • Was the failure caused by the materials used to build the component?
  • Was the failure caused by the design of the component?
  • Was the failure caused by the use of the component?
  • Was the failure caused by the installation of the component?
  • Was the failure caused by the maintenance of the component?

As we move into the world of IoT, we will start to see increased collaboration between analytics and physics.  See what organizations like GE are doing with the concept of “Digital Twins”.

The Digital Twin involves building a digital model, or twin, of every machine – from a jet engine to a locomotive – to grow and create new business and service models through the Industrial Internet.[1]

Digital twins are computerized companions of physical assets that can be used for various purposes. Digital twins use data from sensors installed on physical objects to represent their real-time status, working condition or position.[2]

GE is building digital models that mirror the physical structures of their products and components.  This allows them to not only accelerate the development of new products, but allows them to test the products in a greater number of situations to determine metrics such as mean-time-to-failure, stress capability and structural loads.

As the worlds of physics and IoT collide, data scientist will become more like other “scientists” as their digital world will begin to be governed by the laws that govern disciplines such as physics, aerodynamics, chemistry and electricity.

Data Science and the Cost of Wrong
Another potential driver in the IoT world is the substantial cost of being wrong.  As discussed in my blog “Understanding Type I and Type II Errors”, the cost of being wrong (false positives and false negatives) has minimal impact when trying to predict human behaviors such as which customers might respond to which ads, or which customers are likely to recommend you to their friends.

However in the world of IOT, the costs of being wrong (false positives and false negatives) can have severe or even catastrophic financial, legal and liability costs.  Organizations cannot afford to have planes falling out of the skies or autonomous cars driving into crowds or pharmaceuticals accidently killing patients.

Summary
Traditionally, big data historically was not concerned with understanding or quantifying “why” certain actions occurred because for the most part, organizations were using big data to understand and predict customer behaviors (e.g., acquisition, up-sell, fraud, theft, attrition, advocacy).  The costs associated with false positives and false negatives were relatively small compared to the financial benefit or return.

And while there may never be “laws” that dictate human behaviors, in the world of IOT where organizations are melding analytics (machine learning and artificial intelligence) with physical products, we will see “data science” advancing beyond just “data” science.  In IOT, the data science team must expand to include scientists and engineers from the physical sciences so that the team can understand and quantify the “why things happen” aspect of the analytic models.  If not, the costs could be catastrophic.

[1] https://www.ge.com/digital/blog/dawn-digital-industrial-era

[2] https://en.wikipedia.org/wiki/Digital_Twins

The post Is Data Science Really Science? appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

@DevOpsSummit Stories
DevOps is a hot topic. It seems that everyone is talking about it. Some have built business models around DevOps-related tools and themes. There are conferences and trade shows dedicated to DevOps-strategies and techniques. Some people have even made their careers around talking about it. In light of all of that, I find it chuckle-worthy that very few people actually know what DevOps is (just follow #devops on Twitter for proof.) I am not going to be one of many trying to create a buzzword-infested definition of DevOps to suit my particular agenda. Instead, I’d like to talk about what DevOps is not. So, without further ado, DevOps …
"Operations is sort of the maturation of cloud utilization and the move to the cloud," explained Steve Anderson, Product Manager for BMC’s Cloud Lifecycle Management, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
"We got started as search consultants. On the services side of the business we have help organizations save time and save money when they hit issues that everyone more or less hits when their data grows," noted Otis Gospodnetić, Founder of Sematext, in this SYS-CON.tv interview at @DevOpsSummit, held June 9-11, 2015, at the Javits Center in New York City.
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
Updating DevOps to the latest production data slows down your development cycle. Probably it is due to slow, inefficient conventional storage and associated copy data management practices. In his session at @DevOpsSummit at 20th Cloud Expo, Dhiraj Sehgal, in Product and Solution at Tintri, will talk about DevOps and cloud-focused storage to update hundreds of child VMs (different flavors) with updates from a master VM in minutes, saving hours or even days in each development cycle. He will also discuss how the "Ops" side of DevOps is making their life easier and becoming invisible to developers for storage-related provisioning and application performance.
SYS-CON Events announced today that WineSOFT will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Based in Seoul and Irvine, WineSOFT is an innovative software house focusing on internet infrastructure solutions. The venture started as a bootstrap start-up in 2010 by focusing on making the internet faster and more powerful. WineSOFT’s knowledge is based on the expertise of TCP/IP, VPN, SSL, peer-to-peer, mobile browser, and live streaming solutions.
SYS-CON Events announced today that Dataloop.IO, an innovator in cloud IT-monitoring whose products help organizations save time and money, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Dataloop.IO is an emerging software company on the cutting edge of major IT-infrastructure trends including cloud computing and microservices. The company, founded in the UK but now based in San Francisco, is developing the next generation of cloud monitoring required for microservices and DevOps.
"delaPlex is a software development company. We do team-based outsourcing development," explained Mark Rivers, COO and Co-founder of delaPlex Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, will discuss how to use Kubernetes to setup a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. His expertise is in automating deployment, management, and problem resolution in these environments, allowing his teams to run large transactional applications with high availability and the speed the consumer demands.
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the application economy.
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry’s single source for the cloud. Fusion’s advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including cloud communications, cloud connectivity, and cloud computing. Fusion’s innovative, yet proven cloud solutions lower our customers’ cost of ownership, and deliver new levels of security, flexibility, scalability, and speed of deployment.
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Get certified, manage the full lifecycle of your cloud-based resources, and build your knowledge based using Cloud Academy’s expert-created content, comprehensive Learning Paths, and innovative Hands-on Labs.
SYS-CON Events announced today that Outlyer, a monitoring service for DevOps and operations teams, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outlyer is a monitoring service for DevOps and Operations teams running Cloud, SaaS, Microservices and IoT deployments. Designed for today's dynamic environments that need beyond cloud-scale monitoring, we make monitoring effortless so you can concentrate on running a better service for your users.
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain.
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, Cloud Expo and @ThingsExpo are two of the most important technology events of the year. Since its launch over eight years ago, Cloud Expo and @ThingsExpo have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, I provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading these essential tips, please take a moment and watch this brief video from Sandy Carter.
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.
Have you ever noticed how some IT people seem to lead successful, rewarding, and satisfying lives and careers, while others struggle? IT author and speaker Don Crawley uncovered the five principles that successful IT people use to build satisfying lives and careers and he shares them in this fast-paced, thought-provoking webinar. You'll learn the importance of striking a balance with technical skills and people skills, challenge your pre-existing ideas about IT customer service, and gain new insights into how to build your own satisfying and rewarding career by rising above the ordinary and mundane to build an extraordinary life and career as a world-class Compassionate Geek.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
TechTarget storage websites are the best online information resource for news, tips and expert advice for the storage, backup and disaster recovery markets. By creating abundant, high-quality editorial content across more than 140 highly targeted technology-specific websites, TechTarget attracts and nurtures communities of technology buyers researching their companies' information technology needs. By understanding these buyers' content consumption behaviors, TechTarget creates the purchase intent insights that fuel efficient and effective marketing and sales activities for clients around the world.
@DevOpsSummit at Cloud taking place June 6-8, 2017, at Javits Center, New York City, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.