We live in an era in which "big data" is supposed to solve lots of problems for businesses.  By looking at the buying patterns of a large number of individuals, retailers can get insights into how to sell to customers more effectively.  Credit card companies can detect fraud by exploring purchase patterns that have signaled bad transactions in the past.  Companies can learn about the types of employees who are likely to be effective by exploring characteristics that have been good indicators of success in those roles before.  Websites like  fivethirtyeight.com use data from a variety of sources like polls to help make predictions about events like polls.

This approach signals a shift in which companies are taking an empirical approach to answering questions.  The idea is that data from the past can be used to determine what might be true in the future.

I am a fan of big data and (as a scientist) I am also happy to see people interested in answering questions by looking at sources of data in the world.

At the same time, I think it is important for people to be clear about the kinds of questions that data can and cannot answer.  That means that it is valuable to take a peek at the philosophical field of epistemology, which studies knowledge.

A key question in epistemology is how we can know whether something is true and whether a particular opinion is worth believing.  Data is often a good source of belief, but it is not useful for answering every question.

Data is most valuable for helping to settle matters of fact.  Science advances by using data to distinguish between theories that explain how aspects of the world works.  Different theories make different predictions about what we are likely to observe in a particular situation.  When the data support one theory over another, that increases our confidence that a particular theory is correct.  Of course, no theory can ever be proven definitively.  New data may always come along that contradict a strongly-held theory. But, many theories (like the theory of evolution in biology) have enough support that it would be hard to supplant them.

There are many things that businesses need to know that are matters of fact of this type.  Drug companies, for example, can use the scientific method to explore whether a particular drug is effective at treating a disease.  Oil companies use data to determine whether injecting waste water back into the earth is likely to cause seismic activity in the region around the disposal well. 

Other questions cannot be settled by data. 

Preferences are not determined by data, for example.  Is Bruce Springsteen's Born to Run a great song?  That depends.  I like it.  Lots of people I know like it.  But, that does not mean that some other person is guaranteed to like it.  Perhaps my particular preference is driven by factors like growing up in New Jersey or hearing that song during my childhood.  If I meet someone who hates that song, that doesn't make that person's preference wrong (even if I disagree with it).

This recognition was at the heart of Steve Jobs' comment that focus groups can be misleading when designing a product.  He pointed out that many people don't know what they want until they see it.

Data is also not decisive when settling questions of ethics, even when data is used persuasively.  Opponents of gay rights in the 1980s argued that homosexuality reflected a deviant choice.  Data did influence this debate when scientific studies demonstrated a strong genetic influence on people's sexual orientation.  That data did help some people to be more accepting of gays and lesbians in their community.  Still, it is important to recognize that society could elect to be open to people of any sexual orientation, regardless of whether that orientation was biologically determined or a choice. 

No amount of data will make the decision for a company about how much it should be willing to pollute the environment or whether it is appropriate to increase health care benefits for employees.  Data may help make projections about the influence of those decisions, but ultimately companies must determine for themselves how to navigate their responsibilities toward society.

As more companies wade into data collection and analysis as a way of gathering insights about their businesses, it is important to take a step back and understand how data relates to the question being addressed.  Data can often help distinguish among explanations for aspects of the way the world works.  It may also be valuable for making predictions about people's behavior--as long as the conditions in the future will be similar to those in the past.  Data may also be used to address objects people have in questions of ethics. 

But if your business is focused on understanding preferences or making judgments about the ethical course of action, then data will not absolve leaders from finding answers to these questions.  In these cases, other principles will be needed to determine how a company should proceed.