Stop using the term "big data"
Many think it’s an important concept, and some think it’s revolutionary, but almost everybody wishes for a better, more descriptive name for it.
One finding has been common in all the research I’ve done on big data over the last few years: Nobody likes the term. Many think it’s an important concept, and some think it’s revolutionary, but almost everybody wishes for a better, more descriptive name for it. Some managers are objecting only to the hype around the big data buzzword, but I believe many executives simply yearn for a better way to communicate what they are doing with data and analytics.
Part of the problem is that “big data” just doesn’t describe the phenomenon very effectively. Here are several reasons why the term is highly flawed:
- The term “big” is obviously relative—what’s big today won’t be so large tomorrow. I read today that a terabyte of disk storage, which we used to think of as big, will cost about $25 in 2014 “Black Friday” sales. This is one millionth of what it cost 30 years ago, not even correcting for inflation.
- “Big” is only one aspect of what’s distinctive about new forms of data, and the relatively unstructured nature of much contemporary data is more difficult to address than the size of it. Bigness just demands more powerful servers; lack of structure demands complicated programs to transform data into a form in which it can be analyzed.
- We are all by now familiar with the idea that big data involves several different attributes—the tiresome “3Vs” of volume, variety, and velocity come to mind—but what if your data has volume, but not variety? What if it’s fast-moving and changing, but not particularly large? If you have a multifaceted definition, you must face the issue of what you call the thing when you have only some of the facets. Partial big data? Medium data?
- Nobody seems to be comfortable with the opposing term to big data, “small data.” Anytime someone says to me that they are working with small data, they admit it rather sheepishly. I’m no linguist, but I’m pretty sure that if the opposite of a term isn't itself a valid thing, there is a problem with the original term.
- Too many people—and vendors in particular—are already using “big data” to mean any use of analytics, or in extreme cases even as a term for reporting and conventional business intelligence. Our society’s normal tendency is to take a hot term and throw as much old wine as possible into that new bottle, and we have clearly done so with big data.
For these reasons, and perhaps others, the term just doesn't suit. In one survey I saw recently, over 80 percent of the executives surveyed thought that the term was overstated, confusing, or misleading. They liked the concept, but hated the phrase.
So why don’t we simply stop using it? The problem is that as a society we like to attach revolutionary new labels to things that have been around for a while, but all of a sudden became too big to ignore. We've had various forms of large-volume, unstructured data for a couple of decades now, but the world at large just noticed. The other problem is that there is no obvious alternative as an umbrella term for the relatively new types of data that are increasingly common today. We could try to call it “petabyte or more data,” “all data,” “size doesn't matter data,” or just “data,” but none of these terms quite captures it, and they don’t trip off the tongue either.
What to call this phenomenon probably isn't your company’s most important problem now. However, the continued use of “big data” in conversations within and outside your organization is likely to be causing confusion. If you tell someone you are working on a big data project, you are not really providing much information about it other than that you know the latest, coolest corporate lingo. And it’s not as cool as it used to be anyway. I doubt that your employees, your board of directors, or your investors will take much notice if they hear your company is doing something with big data. Don’t count on it to give your stock price a bounce either.
One approach that is much more descriptive is to deconstruct “big data” a bit in order to signal to stakeholders what you are really interested in doing with these new types of data. You might describe in greater detail the following types of project attributes:
- Type of data you are using—text, voice, voice-to-text, video, genome, clickstream, etc.
- Where the data comes from—our call centers, open data, customer transaction data, etc.
- Problem you will solve—discover consumer sentiment, predict customer attrition, optimize fuel prices, etc.
- Where in the company you will solve it—marketing, finance, supply chain, etc.
If instead of saying, “We’re working on big data,” you say, “We’re extracting customer transaction data from our log files in order to help marketing understand the factors leading to customer attrition,” you will have used more syllables, but you will be much more informative. In addition to providing clarity about your intentions and strategies, this approach avoids endless discussions about whether the data involved are big or small. In fact, the most valuable applications will use a combination of both big and small data, structured and unstructured formats, internal and external sources, and so forth.
I don’t actually think that it’s criminal to use the term “big data,” and you may have noticed that I used it a few times in this essay. It’s not easy to avoid, and there isn't a great substitute that captures all the things embodied in the phrase. However, taking the time to employ a few more descriptive terms and descriptions will get your objective across much more effectively.