9 Ways to Spot Bogus Data

Good decisions should be "data-driven," but that's impossible when that data isn't valid. I've worked around market research and survey data for most of my career and, based on my experience, I've come up with touchstones to tell whether a set of business data is worth using as input to decision-making.

To cull out bogus (and therefore) useless data from valid (and therefore potentially useful) data, ask the following nine questions. If the answer to any question is "yes" then the data is bogus:

1. Will the source of the data make money on it?

If the organization gathering the data receives financial benefit if the data is skewed, the data will be skewed. For example, I once heard a market researcher (an outside consultant) ask the marketer who hired him: "what do you want the data to say?" The data in the research report was carefully massaged to reflect that viewpoint.

2. Is the raw data unavailable?

Any study that publishes results but not raw data is bogus. The raw data isn't being released for one of these reasons:

The raw data actually proves something else entirely.
The raw data would reveal the study as having used weird definitions or biased questions. (See 3 and 5 below.)
The raw data doesn't exist because somebody "pulled the results out of their **s," as they say in the trade.

3. Does it warp normal definitions?

While human language is inherently imprecise, if a questionnaire or survey stretches the meaning of a term beyond its generally accepted definition, any data connected to that term is bogus. For example, a survey that defines "customer satisfaction" as "not returning the product" will give a misleading picture of how well you're serving your customers.

4. Were respondents not selected at random?

If a survey only asks questions of people who are guaranteed to provide a certain response, the data gathered will reflect that response. For example, I once saw an advertising firm measure "ad effectiveness" by polling the sales managers of the publications that bought the ad. Needless to say, the ad was rated "extremely effective."

5. Did the survey use leading questions?

How you ask a question often pre-disposes respondents to answer in a predictable way. To use an example from government, if a researcher asks retirees something like "Are you in favor of government assistance?" you'll get the opposite answer than if you ask something like "Do you support Medicare?"

6. Do the results calculate an average?

Even otherwise good data can generate bad data if the concept of "average" is used to analyze that data. For example, the people in a room containing one billionaire and 999 penniless paupers have an average wealth of $1 million. Valid data should use the "median" which is the middle value when all other values are arranged in order. In the example above, the median wealth is $0.

7. Were respondents self-selected?

Companies frequently run Web polls where people who are accessing the website decide whether or not they want to participate in the survey. However, any statistics based on these "self-selected" polls are automatically bogus. For example, if I stick a question on a website like, "How are we doing on customer service?" only people who have had very good or very bad customer service experiences will bother to answer. You end up with no idea what experience the typical customer has had.

8. Does it assume causality?

Even if two data sets seem to be in lockstep, you have no idea whether that relationship is meaningful until you know for certain that one data set caused the other. For example, if sales revenue spikes upward after your salespeople attend a sales training class, the increased revenue MAY be the result of sales training or MAY be the result of something unrelated, like an improved economy. Correlation is not causality.

9. Does it lack independently confirmation?

Scientific studies are not considered valid until somebody other than the original research confirms the study independently. Unfortunately, most market research is sole-sourced, which makes it inherently unreliable. For example, if sales revenue spikes upward after your salespeople attend a sales training class as in the example above, the increased revenue MAY be the result of sales training or MAY be the result of something unrelated, like an improved economy. Correlation is not causality.

Let's see how this works by looking an actual market research report. Yesterday, the company Millward Brown released its list of the "100 Most Popular Global Brands." Since the point of the report is to attract attention and customers to Millward Brown, the answer to question No. 1 is "yes."

Millward Brown doesn't release its raw data to the public, so the answer to question No. 2 is "yes." The answers to questions 3-7 are unknown (because we don't have the raw data) but the answer question 9 is "yes" because Millward Brown uses a "proprietary" methodology that would make independent confirmation impossible.

I'd take a heavy handful of salt with that report.

Adapted from "Business Without The Bullsh*t: 49 Secrets and Shortcuts That You Need to Know."

Startup

Grow

Lead

Technology

Innovate

Money

Events

BrandView

Inc. 5000

Inc. Unlimited