The promise of A.I. is everywhere and in everything. From our homes to our cars and our refrigerators to our toothbrushes, it would seem that A.I. is finally ready to revolutionize our lives and our world. 

Not so fast.

While computing power and advances in computer architectures have finally reached a point where they can learn quickly enough to achieve human-level intelligence, in narrow applications, there's still one part of A.I. that virtually nobody is talking about. And it's likely to be the single greatest limiting factor in the progress of A.I.

First, some background to help provide some perspective.

The Bottomless Cloud

We often hear about how the rate of data storage is increasing exponentially. For example, it's projected that by 2035 the world will have access to more than one yottabyte of data -- that's one million petabytes, or more than the number of stars in the visible universe. In my new book, The Bottomless Cloud, we project that by 2200, at a rather conservative 40 percent compounded annual growth rate, storage capacity will exceed the storage available if we were to use every atom that makes up our planet!  

That all sounds very cool and it makes for great graphics and PowerPoint slideware, but the math ignores the simple fact that while data is itself an infinite commodity, it's storage has very real and severely limiting costs. Yes, that cost is going down as storage densities go up, in concert with Moore's law, but not fast enough. A petabyte in the cloud today costs about $400,000 on a traditional cloud such as Amazon's S3, or about $60,000 on a next generation cloud such as Wasabi's Hot Cloud. If the storage cost trends of the past 60 years continue, that petabyte will cost pennies by 2060. So, what's the problem, right? 

The problem is that when you start looking at just how much data is required for A.I. to achieve human-level intelligence, you soon start to realize that our current and near-term data storage alternatives will simply not work, pushing A.I. out by decades. For example, to fully encode the 40 trillion cells in a human body would require 60 zetabytes of digital storage. Which means that even at one penny per petabyte, it would cost somewhere on the order of $5 trillion to create a digital twin for every human on the planet. That's about 25 percent of U.S. GDP.  

The Real Price of Autonomy

An even better example, and one closer to home, is that of autonomous vehicles (AVs). One of the least-often talked about implications of AVs is that their relationship with data is radically different than that of almost any device in the past. (By the way, what I'm about to describe applies equally to any device that relies on A.I. and even rudimentary machine learning.) 

The decisions an AV makes consist of two critical components: First, they need to be made fast, typically in fractions of a second, and second, the AV needs to learn from its decisions as well as the decisions of other AVs. The implications of this are fascinating and unexpected. 

Because of the speed with which decisions need to be made, the AV requires significant onboard computing power and data storage capability. The increase in onboard data storage is the result of all of the sensors, contextual data about the vehicle and its environment, and data gathered from communication with other AVs in its proximity. This onboard data is used for real-time decision making, since the latency of communicating with the cloud can be a severe impediment to the speed with which these decisions need to be made. It's one thing to drop a cell call with your co-worker and another all together for an AV to not have access to the data needed to make a nanosecond decision. 

The volumes of data that go into this sort of real-time decision making, and the gathering of all the contextual data that goes into them, then need to be uploaded to the cloud to fuel the ongoing learning that is so critical to future decisions. This creates a cycle of decision making and learning that dramatically accelerates the rate of both data capture and storage.

The net effect is that while an AV today may generate somewhere in the neighborhood of one to two terabytes of data per hour, the increase on onboard sensors as AVs progress to full autonomy will result in a dramatic increase of data storage requirements, with the potential for a single AV to generate dozens of terabytes hourly. Storing this all on board is well outside the scope of any technology available today or in the foreseeable future. Yet it is also well outside the cost-effective scope of the big three cloud storage solutions from Amazon, Google, and Microsoft. 

For example, if an AV generates 20 terabytes of data a day (which is an incredibly conservative estimate), the storage requirements would amount to 7.3 petabytes yearly. At the current costs of cloud data storage, that would amount to approximately $3 million annually. That's 60 times the cost of the automobile! 

Fueling the Revolution

The bottom line is that A.I. is simply not affordable at these costs. Which brings up a fascinating analogy that's just as rarely talked about. 

In the very early part of the 20 century, as gasoline-powered cars were just beginning to make their appearance on roadways, the infrastructure of gas stations didn't exist. Early car owners would buy their gas at the general store or from modified heating oil trucks. A gallon of gas cost between $5 and $8 in today's dollars. It wasn't until the 1920s and '30s that gas prices dropped to an affordable rate of about $2 a gallon. That fueled the automotive industry. Without that affordability, personal transportation would have not have taken off the way it did as quickly as it did. 

The same applies to the evolution of A.I. And, although I'm using AVs as an example, the logic applies to any fully autonomous device. 

The challenge isn't proving that A.I. works. It's easy to do that as long as you don't have to worry about the cost of data storage at scale. For example, Google proved that DeepMind's AlphaGo Zero could win at the 3,000-year-old game of Go against the world's reigning Go champion Lee Sedol. Scaling A.I. so that it can be used broadly and affordably is where the challenge sets in. 

This isn't a small problem. Many of the areas where A.I. promises to have revolutionary impact, such as health care, transportation, manufacturing, agriculture, and education, are desperately in need of quantum advances in order to scale to meet the needs of the seven, soon to be 10, billion inhabitants of the planet. 

So, is this a hard stop for A.I.? It may very well be if a few things don't happen. 

  • First, we will need some monumental improvements in storage technology. Don't discount this. In 1960, an IBM 350 disk drive held about 3.5 megabytes and weighed in at two tons. Today, we can store 300,000 times as much data on a device that is one-millionth of the weight.
  • Second, we need to challenge the antiquated and ridiculously complex cloud data storage pricing models of the big three, which use the industrial era notion of tiered storage. Basically, it's a carryover from file cabinets and banker's boxes filled with paper. Locking digital data up in cold storage eliminates its value. 
  • Third, the cloud itself is evolving. Having only three options for cloud storage is unlikely to provide the sort of competitive pressure and innovation needed to drive costs down quickly enough to meet the demand created by A.I. and machine learning applications.
  • Fourth, at some point this will become an issue of national importance. Whichever nation is first to fully develop A.I. is very likely to have an enormous advantage over other nations. (Check out my recent podcast in which I talk about Putin's quote on this topic.) In some ways, this is not different from the nuclear arms race, with the exception that you cannot police who owns A.I. and how they use it. If data is indeed the new oil, then we need to think about its value from the standpoint of the value it has for a national competitive agenda. 

The bottom line is that we need to challenge everything from the business models to the technologies used for data storage and make investments in data a national priority. In much the same way that the infrastructure for electric utilities was the foundation for 20th-century industry, the data utility will be the infrastructure for the 21st century.

A.I. may well hold the answers to many of the largest problems humanity will face as we move toward the inevitability of 10 billion global inhabitants. But it's only going to give those answers up if we are able to affordably capture and store the data needed to realize its promise.  

Published on: Feb 26, 2019
Like this column? Sign up to subscribe to email alerts and you'll never miss a post.