Data and A.I. A.I. and  data.

You almost always hear the two terms spoken in the same breath. Why is that?

If you're a founder trying to understand more about these topics, whether it's to improve your workflows or products or some aspect of your operations, here's a business owner's primer on what people mean when they insist on saying the two together.

A.I. needs data to do anything.

At its core, A.I. is an algorithm, which in plain English is a process that takes inputs and produces outputs. Much like your car, which is just a hunk of metal sitting the garage until it has fuel to make it go, an algorithm on its own with no data to process can't make anything useful. In fact, it can't make anything at all.

This means that if you want your company to take advantage of A.I., the first task is getting your data together and in shape. This can be a real stumbling block, according to Phuong Nguyen, founder of data science consultancy Partners in Company. "From the client's we've worked with and talked to, the impediments to being more data-driven are usually the basics of having clean, consistent data and it being centralized and secure," she says.

That usually means either getting your data out of spreadsheets or bringing your data together from multiple platforms -- like a customer relationship management (CRM) platform and a marketing platform -- into a centralized repository, where the data can begin to be combined and compared for analysis. Typically, it will then still need to be cleaned and normalized in various ways to make sure it is consistent and in the right form before data teams can draw correct conclusions and then build on the data with A.I.

What's more, most A.I. needs large amounts of data to produce reliable results, for the same reason that you need a large sample of anything in order to make a reasonable judgment. We're all familiar with political polls, where professionals usually claim greater than 95 percent accuracy on how the larger population plans to vote in an election by sampling somewhere around 300 people.

That's for a simple choice between two options. If you're trying to create more complex predictions, such as differentiating between types of customer behavior in your marketing data, you'll want to start with many thousands of samples. Oftentimes, you'll use quite a lot more to get strong confidence in your results.

How much data are we talking about? A proper statistical analysis can give you a precise number for what you're trying to do, but as a general rule, hundreds of thousands of rows is usually on the low end for machine-learning-based analyses. "I'm not used to working with anything under a million rows," says Chantel Perry, a veteran data scientist at large companies and author of the book Data Newbie to Guru.

And for something like a marketing analysis, where the customer tendencies you're trying to understand can vary from day to day and month to month, you also want enough to gather data over a period long enough to make useful predictions: "You want to be in business for at least six months, and collecting data on your customers for at least six months," says Perry.

So now you understand why A.I. needs data. That dependency runs the other direction, too. The truth is, you can't have one without the other.

A lot of data comes out of A.I.

Just as A.I. algorithms need data as their input, their output is often a form of data. 

Let's say your marketing data gets crunched in such a way that you find you have eight major clusters of customers. You might further discover that different clusters of customers should receive different kinds of pitches or advertisements. Those outputs are data that you can feed into another algorithm, one where you can then use that labeling to predict which cluster a future customer will belong to and then have an automated process that assigns them the pitches or advertisements that are predicted to be the most effective.

When you think about it, all data exists as a result of some process akin to an algorithm, often A.I. Sometimes A.I. is powering that data-gathering process, sometimes it isn't, and sometimes the distinction isn't all that clear. Take, for instance, data about average income and spending patterns in a geography you're targeting: It could come from a combination of surveys, government data, data crunched by credit card companies and merchants, and then crunched again into a single number for a single census block, which your marketing algorithms then might use to help you target different customers in different ways.

There's a common saying I often invoke when talking about data science: "Nobody believes in a model, other than the person who wrote it, and everyone believes in a given dataset, other than the person responsible for assembling it." Noodle on that for a minute.

We have a tendency to believe in data as necessarily true and not reliant on a human or A.I. process to be the way it is. But that's often untrue. If you want to arrive at meaningful outcomes, you need to scrutinize the data feeding your models -- as well as the models that produced the data that you're feeding your models.

"The biggest thing that I see issues with is data quality," says Perry. "Anything that's going into the decision-making process needs to be checked for cleanliness, bias, and other issues -- especially with machine learning models."

Understanding this back-and-forth between data and A.I. and their feedback loop will help you avoid relying on analyses that aren't quite as good as they might seem at first glance.