In 2010, I attended the IEEE (Institute of Electrical and Electronics Engineers) CVPR (Computer Vision and Pattern Recognition) conference at the Hyatt in downtown San Francisco. I didn't expect the conference to be as large as it was, but it had more than 1,500 in attendance, to the best of my recollection. The conference reminded me of the size of the conferences held at the same hotel when the industry was arguing over different standards for Wi-Fi, with multi-billion dollar markets at stake.
However, unlike the practical approach of implementing the maturing Wi-Fi technology, where presentations were mainly made by engineers working for companies competing over their ability to assert their intellectual rights into the standards, the CVPR presentations were mainly made by university researchers, and researchers from "deep-research" arms of some of the world's largest technology companies, who didn't expect the fruit of their research to reach maturity anytime soon.
One of the presentations I sat through struck a chord with me. The presenter showed a 30-second video taken from a dash camera. As a speed limit sign appeared in the field of view, the program identified it, extracted the speed limit information from it, and displayed it to warn the driver. That was one of the coolest things I ever saw. To that point, I would rely on my own eyes to see those signs, and if I missed them--I could look at the navigation system which, typically, would show the speed limit known in the database. Of course, the navigation system's speed limit data could be outdated, or simply inaccurate. I'm sure that telling a Police officer, who stopped you for speeding, that the GPS said the speed limit was 40, where in reality it was 30, would not get you off the hook for a ticket.
Excited, I asked whether the program was running in real time. Running in real time meant that it could use a live video feed from the dash-cam and alert the driver to speed limit signs as they come into the field of view. "Not exactly," was the answer. As it turned out, that 30-second video took a week of constant processing to analyze and produce the appropriate alerts.
Needless to say, I was disappointed, but not too much. The underlying technology enabling this application (as well as self-driving cars) is called Computer Vision. The problem (and opportunity) with this field is that it is tremendously processor-intensive. As an example, analyzing a high-quality audio stream of 44KHz means that the processor must analyze 44,000 samples every second. Analyzing an HD video stream (1080P at 30 frames per second) means that the processor must handles more than 62 million samples every second. A thousand times more intensive.
However, processing power is one of the fastest growing technology parameter, along with storage density, and a few more that I identified in my book Bowling with a Crystal Ball. Generally (and inaccurately) referred to as "Moore's Law" (after an article published by Gordon Moore, one of Intel's founders in 1965), it was estimated that processing power doubles every two years (which, by the way, was not what the original article stated...). If so, then since the CVPR conference in 2010 and today, processing power has increased by an order of magnitude.
The other factor affecting the disruptive nature of Computer Vision is the development of algorithms capable of efficiently performing artificial vision functions. Together, both factors allowed the quick transformation from the ability to detect speed signs (albeit not in real time) in 2010 to completely and autonomously controlling a driver-less vehicle in 2016.
The Next Big Thing?
As those technological trends continue in their aggressive path, what other applications will they enable?
One of the first areas desperately in need for computer vision capabilities is the area of unmanned aerial vehicles/systems (UAV/UAS). A 2013 FAA Report on Integration of Civil UAS in the National Airspace System Roadmap was concerned mainly with the ability of such systems to "Detect and Avoid" other traffic. The potentially catastrophic consequences of a mid-air collision between a passenger jet and a small UAV are incomprehensible. However, so was the thought of autonomous vehicles less than a decade ago, and yet see where that industry is now. Due to the continued fast growth in processing power and algorithms, computer vision already reached the point of trusting the passenger's (and pedestrians') lives with it. Next--UAVs, and Amazon (and others') ability to deliver packages "over-the-air" to their destination quickly. Following that, I would expect to see passenger-carrying pilot-less air-taxies such as the Chinese E-Hang 184.
One more application I would expect to see growing as a result of progress in Computer Vision is smart security cameras, that can detect unusual behavior and warn ahead of terror and other criminal activity.