Dave Turbide, CFPIM, CIRM, CSCP, CMfgE | July/August 2013 | 23 | 4
Separate fact from fiction in big data analysis
Today, the information technology community is challenged by the volume, velocity, and variety of big data—the mass of information that is becoming available from numerous sources, including smart devices, sensors, and social media. Big data overwhelms traditional database technology, tools, processing, and storage capabilities. To handle the growth of data, software and cloud facilities are being created alongside it.
The user community is intrigued and tested by the potential of all this data to increase visibility into the wants and needs of customers—and the supply chain. Analytical tools and extensions of business intelligence utilities, including predictive analytics, are packaged to make them usable by businesspeople, most of whom are not data scientists.
Without question, this access to new insights is a great benefit of big data analytics. However, users should approach big data with a critical eye and a healthy dose of caution. The trends, relationships, causes and effects, and evaluations you discover may not be as valid and unbiased as you think.
Finding proper representation
In polling and statistical analysis, data is a reflection of the sample and may not be entirely representative of the population you are trying to understand. Say you are interested in knowing what customers think about a new product. You monitor social media and gather comments, and the data says the product is lacking in some manner. Should you immediately redesign something in response to that criticism?
Not so fast! First, take a close look at the population that commented. Perhaps your product works with Blackberry phones, but 90 percent of the critics are iPhone users. The scorn these consumers have for your product comes from the context of their Apple-centric world. They are not potential buyers, and therefore not the customers and prospects whose opinions you want to hear. You may gain valuable information from their comments, but they do not represent the “voice of the customer.”
The potential for big data analytics to mislead also can be seen when tracking the progress of influenza and other diseases based on Twitter posts or search engine queries. Again, it’s important to consider who uses those media. Certain geographies, age groups, and economic levels are over- and under-represented when looking at technology users. If a particular disease is more prevalent among the elderly or the poor, social analysis is likely to underestimate the spread of the disease. However, if the traveling public or those who frequent malls, offices, and crowded city streets are more affected by the disease, estimates are apt to be high, as these groups are likely to use Google and Twitter.
Big data certainly offers unprecedented access to intelligence about customers, markets, tastes, preferences, and supply chain statuses and activities. However, be careful to not simply accept findings as fact without first taking a look at the source. It is critical to ensure you are measuring what you think you are measuring; otherwise, it is easy to become blinded by the shining light of new insights, and you may fail to recognize it as the headlight of the locomotive approaching at high speed.
Dave Turbide, CFPIM, CIRM, CSCP, CMfgE is a New Hampshire-based independent consultant, freelance writer, and president of the APICS Granite State chapter. He may be contacted at email@example.com.
Find the right solutions to your big data challenges. Read the APICS research report, “APICS 2012 Big Data Insights and Innovations,” at apics.org/research.