It seems that the voice-operated home automation became the new front that all major players had to play in. Google, Microsoft, Apple, Amazon, and even Facebook have all introduced their respective products without delay.

When I worked on my MBA program in 1999, I tried using Nuance's Dragon Naturally Speaking voice-to-text software. For its current products, the company claims 99% accuracy. At the time, they claimed approximately 90%. While 90% reliability seems high, ask yourself this: would it be OK if one in ten commercial flights crashes? Obviously, not. Neither was 90% accuracy in understanding what I said back in 1999. It was a pain to dictate to my computer, using a somewhat not so natural voice and cadence, and then manually correct what the system understood. So I abandoned it.

Voice recognition and voice-to-text systems are much more difficult to implement for many reasons, but the main one remains that the mechanism in our brain that understands speech is very complex, and uses very complicated context (not just the sentence, but a familiarity with the speaker, as well). Bottom line, it takes a lot of processing power, storage capacity, and fast storage access to come anywhere near the human ability to understand speech.

Apple's Siri was perhaps the biggest leap in that direction, with a system that was so much more natural and accurate than anything available for a personal computer, and on an even smaller platform--your mobile phone! How can your mobile phone all of a sudden have the processing power, storage capacity, and storage access speed to perform that well? It's simple--it doesn't. Siri "resides" in a massive computing center at Apple's facilities. That's where the processing is done. So when you "talk" to Siri, your voice is minimally analyzed, and then sent to the Apple servers in the cloud, who perform the "heavy lifting" of understanding what you want, and send the response back to you via your phone.

It gets better in your home. Through local connectivity such as Wi-Fi, Bluetooth, or ZigBee (or even IoT connectivity through the cloud) your Siri-like device (Apple's Siri, Microsoft's Cortana, Google Assistant, Amazon's Alexa, and now Facebook's Jarvis) can communicate with other devices at your home and control them.

So what about your car?

Somehow, my 2010 BMW understood me better than my 2015 Jeep. Maybe it liked me more... Both cars came equipped with voice recognition. Somehow, in 4-out-of-5 attempts, whenever I said "call home" to my Jeep, it tried doing something else, from editing my phonebook, to calling towing assistance, to calling "David" (for the life of me, I could never understand how "David" sounds like "Home"). It frustrated me so much that I had to cancel my home phone line. OK, that wasn't the reason, but still...

The problem.

One problem with speech recognition in the car as it is implemented today is that the processing power available for Siri through the cloud is not available to the car, which has to rely on local processing resources, which are orders of magnitude more limited, because they are not equipped with a high-speed, low-latency data connection like phones do, unless it is actually through the phone (hint...). Another problem is that the microphone in the car doesn't hear what we hear. Unfortunately, sometimes it hears more. Much more... It hears the driving sounds more than we do. If the A/C fan happens to blow in the microphone's general direction, it will almost hear nothing else, while we hardly hear that fan at all.

The Solutions (and the opportunities).

One obvious solution is to use what's already built into your phone. The phone already has the wireless link to the powerful voice-processing center in the cloud. It can understand you better than any car can with its limited local processing resources. However, the location of the microphone is going to be critical. You may leave the phone in your pocket, and it will "hear" nothing. You can put it on the dashboard, and it will "hear" the A/C fan, or other noises. Opportunity 1: embedded car microphones that can filter noise, and communicate to your phone (Bluetooth, Wi-Fi, USB, or any other method).

Another problem is in controlling the car from your phone. This problem is two-fold. There must be a standard protocol that allows controlling elements of the car (such as climate, seats, radio, navigation, etc.) from your phone. Any phone. Opportunity 2: develop such a standard. You would likely have to do it through a standard-developing organization. The other issue is that car (and car electronics) manufacturers may become more "commoditized" while the main stage is taken by the phone. This reduces some of the differentiation that car manufacturers enjoy today with their unique control systems. That's a business issue. Not a technical one.

Another solution is to create a dedicated, high-speed, low-latency wireless network especially for cars. I will not even entertain this option as an opportunity, as never have I seen evidence where building a new infrastructure, especially alongside an existing one that can serve similar requirements, was successful.

Opportunity 3: embed cellular data access in the car as a standard, and then allow you to choose which speech-recognition cloud server you want to connect to (Siri, Google, Jarvis, Alexa, etc.), who would offer car-control applications, running in your car (and the cloud), rather than your phone. While easier to implement, this would hurt car manufacturer differentiation the same way as interfacing the phone to control the car. Which brings Opportunity 4: The car manufacturers developing their own cloud-based voice recognition and car control centers.

This leaves only one question--with the growing use of autonomous cars, at what point will we trust voice AI to take driving instructions? After all, if I say "Drive Home," I really don't want my car to take me to "David." Any David...

Published on: Jan 3, 2017
Like this column? Sign up to subscribe to email alerts and you'll never miss a post.