Remember (vaguely) how you learned to walk, talk, ride a bike, or drive? It was messy and full of mistakes, but the skills you learned that way stayed. Outside of living systems, it's been challenging to structure strong enough algorithms to take in "real life experience" and develop sticky, adaptable behaviors for artificial intelligence.
Well, Alpha Go Zero just did it.
"It starts from a blank slate and figures out only for itself, only from self-play, and without any human knowledge, or any human data, or features, or examples, or intervention from humans. It discovers how to play the game of Go from first principles," says DeepMind's professor David Silver.
The AI has had several iterations, each smarter and more capable than the one before. The previous version used a huge database of previous games alongside a bunch of algorithms that pointed it toward winning. That approach led to the defeat of the reigning world champion professional Go player. In poker, the AI Libratus recently skinned the world's top poker players by almost $2 million, also by learning through self-play instead of human game data.
Now, in this latest version of Alpha Go, the artificial intelligence program taught itself how to play Go--with no human background.
Running millions of game simulations against itself, it took 40 days for it to learn--from scratch--how to beat the World champion version of itself. That is truly game-changing, not only for Go, but also for how new knowledge is discovered. How accurate or complete is your domain expertise? There's a lot more to discover, is what this fascinating experiment in learning with Alpha Go Zero is telling us.
"The idea of Alpha Go is not to go out and defeat humans, but actually to discover what it means to do science--for a program to be able to learn from itself what knowledge is," according to Silver in a YouTube post about the achievement.
The Alpha Go Zero Deep Mind team calls it first-principle, "tabula rasa" (blank slate) learning.
"If you can achieve tabula rasa learning, you have an agent that can be transplanted from the game of Go to any other domain, and the specifics of the game you're in, you come up with an algorithm that is so general it can be applied anywhere," he says. That's a provocative idea when you extend the concept. Just think what we could do with a set of strong, learning algorithms that could systematically tackle tough problems and learn faster than our civilization's collective knowledge . . . in days, not decades.
For now, the big take away is, "algorithms matter much more than either computing or data available," said Silver. This alone is a game-changer in how we approach extending the known world. While Alpha Go runs on about $25 million in hardware--it's not exactly a lightweight system--you know AI gurus have long been working on creating cleaner, better data sets. Today, many big data sets are considered too noisy--full of bad data--to accurately train an artificial intelligence. If the AI is learning from data, and the data is bad, it doesn't learn. Big problem.
What if you didn't need clean data, but just experience, and the artificial intelligence could train itself?
That's the exciting achievement in Alpha Go Zero. Even though it's in the niche, rule-based world of games, it has big implications in every industry working from physical rules--think chemistry, traffic, biology, pharmacology, travel, logistics, and manufacturing. If we can design rules so flexible they can work from broader experience, and so directional that they always create stronger skill--like Alpha Go Zero--then it's possible to achieve artificial intelligence that masterminds systems. These systems would need no outside data, have no data cleansing problems, and need no human-in-the-loop slowdowns. That's partly why Google's parent company, Alphabet, bet the company on artificial intelligence and is investing in artificial intelligence at a rapid rate. (Amazon is also investing in artificial intelligence, like its latest AI acquisition BodyLabs.)
Deep Mind professor David Silver says, "the fact that we've seen a program achieve a high-level performance...should mean now we can start to tackle some of the most challenging and impactful problems for humanity."
This post has been updated to clarify that AI Libratus recently beat top poker players using a strategy that involves self-play rather than human-entered data.