Last month, Facebook's "Trending" tab featured Megyn Kelly as a topic, displaying hoax news items claiming she had been fired by Fox News. Many critics pointed out that the mistake occurred shortly after the social network decided to shed the editorial team in charge of overseeing trending topics. The company had gone almost entirely algorithmic with the feature, and the move had almost immediately failed.
A Quartz postmortem ascribed the incident to an "inmates running the asylum" scenario. The algorithm that surfaced the news based its selection on Facebook users' activity, and for reasons of policy or practice human overseers had neglected to correct the mistake. Facebook did not respond to a request for comment for this story.
The issue of artificial intelligence algorithms failing to distinguish between hoax and reality, truth and falsehood, accuracy and inaccuracy isn't confined to a few isolated cases such as Facebook's Megan Kelly snafu and Microsoft's Twitter bot Tay. (The bot spouted Nazi rhetoric it learned from the Twitter users with which it interacted. Microsoft declined to comment for this story.) It's a problem many companies using such algorithms can reasonably expect to encounter, and the result may not always be restricted to one wonky blip.
"The thing with AI is a lot of people think it's magic, but it's not magic--it's math," says Stephen Pratt, CEO of enterprise AI startup Noodle.ai. "If you understand the math behind it, you can understand how it works" and avoid a case of AI overreach.
Going below the surface data
Pratt and Charles Jolley, CEO of virtual assistant startup Ozlo, say companies relying on AI algorithms may need to reach beyond the limited source data immediately available, and that diversity of data sources is not the only thing they should think about.
If you're using an algorithm that relies largely or entirely on data generated by a random sampling of people, those people can fairly easily misguide the algorithm, as issues with Microsoft's Tay and Facebook's trending topics tab have shown. But even with limited data, you can pinpoint certain features to filter data.
With news stories, if there is a high incidence of typos, that may be an indication the story is a hoax, says Pratt. You may want to set your algorithm to detect frequency of typos. Maybe you train the algorithm by feeding it both fake and factual stories, and telling it to detect and compare the frequency of typos in each.
If a story mentions a celebrity or well-known organization, you might want your algorithm to check if the same story or language similar to that found in the story is appearing in other relevant sources. Is the subject of the story tweeting, blogging, or posting on Facebook about it? Does the organization mention the story or news on its website?
In the case of Noodle.ai, which makes custom apps based on AI models to help companies process their data, features are of particular importance. A generic app for processing retail data won't necessarily work for two different retail businesses. For example: In the case of some retailers, factors such as weather and the timing of a football game matter for predicting sales. For others, those details are irrelevant.
Another feature to look for: Quality of source material. Let's say you want to know when a particular film was released and what actors it features, so you ask an artificially intelligent virtual assistant you've downloaded to your phone. You're expecting one of two things to happen: Either your phone tells you the year, or it returns search results and you click on one of a series of links to find the information.
That first response has more going on behind it than you might think, Jolley says. The assistant still has to learn the material somewhere, and that somewhere is probably the web.
Some websites devoted to films might be accurate about cast listings but not always accurate about date of release, and others the reverse. So the virtual assistant not only must be able to identify reputable sources for film information, but also specifically what sorts of film information. This example may seem inconsequential, but you can extrapolate to instances where the stakes are higher.
Jolley says an aim of Ozlo is to teach its AI to synthesize conflicting pieces of information so it can give accurate answers when information on the web is not immediately clear. "We've actually had to teach Ozlo to understand that there often is no right answer to things," he says.
Problems with distinguishing accurate information can potentially spread pretty quickly, Pratt says, such as if one AI model makes a mistake, and another AI model relies on output from it.
For example, if a retail company's chatbot turned up false information about a competitor's prices, in addition to misinforming customers, it could feed the inaccurate data into another of the company's algorithms that decides when to offer a discount. Ultimately the more AI proliferates, the greater the likelihood of such scenarios--and the wider-reaching their consequences-- will become.