Voice recognition technology might be becoming standard in new gadgets, but its accuracy will be what determines whether it really becomes a can't-live-without feature.
That's one of the messages delivered by Silicon Valley venture capitalist Mary Meeker in her annual Internet Trends report. Meeker points out that voice input has the potential to be the most efficient form of computing: Humans can speak 150 words per minute on average, but can only type 40. Now is the time for voice recognition to take over, too, since the technology is a logical fit with Internet of Things-connected devices, such as Amazon Echo or the Apple Watch.
What's kept speech recognition from becoming a dominant form of computing is its unreliability. Regional accents and speech impediments can throw off word recognition platforms, and background noise can be difficult to penetrate. And simply recognizing sounds isn't enough--to have any level of effectiveness, systems need to be able to distinguish between homophones (words with the same pronunciation but different meanings) and learn new words and proper names.
But it's getting closer. Meeker's presentation cited Andrew Ng, former Stanford professor and current chief scientist at Chinese search engine Baidu, as saying that 99 percent is the key metric: As accuracy in low-noise environments rises from 95 to 99 percent, voice recognition technology will expand from limited usage to massive adoption.
As recently as 2010, Meeker's presentation says, industry leaders were hovering around 70 percent accuracy. Now, some are approaching that key 99 percent threshold. Here are some of the best, in order of accuracy.
The "Google of China" is the country's biggest search engine, and at 96 percent, its voice recognition is better than most humans at identifying spoken words. The software it uses, Deep Speech 2, was developed in Silicon Valley and learned to understand words by listening to thousands of hours of recordings while simultaneously reading their transcriptions. The system understands both English and Mandarin, and it's growing in popularity in China, where voice commands are more popular due to the time it takes to type with the massive Mandarin alphabet--and, of course, where Google is blocked by the communist government.
The Hound app, Silicon Valley company SoundHound's flagship product, is a digital assistant that launched in March. It answers verbal questions and completes tasks like calculations, correctly identifying 95 percent of words in the process. A product nine years in the making, the app has a Shazam-like feature that identifies songs--including, in some cases, ones hummed into it. Founder Keyvan Mohajer told Tech Crunch that his company started working on the technology before industry leaders like Apple did, which has given it a head start in creating some of the best voice recognition technology there is.
Apple's Siri might frustrate when it comes to finding answers, but as far as voice recognition goes, America's most-used personal assistant is near the top. At 95 percent accuracy, Siri outpaces all its fellow Silicon Valley giants. And as for those faulty or nonsensical answers, the company hired a team of speech recognition experts trained in deep learning in 2014. The assistant's accuracy and intelligence should keep improving, which should make it less likely that Siri responds to your request for help with a gambling problem with a list of casinos.
4. Google Now
Google's voice search is 92 percent accurate, and can be used via the Google app or for voice diction on Android phones. Baidu's Ng, who used to work at Google, predicted that 50 percent of web searches will be performed using speech or images by 2019--and you can fully expect Google to lead that charge. Google has done more work lately to improve accuracy in loud places, a feature that could help put it over the top.
The Palo Alto startup was just 18 months old and had recently finished a $3 million seed round when it was acquired by Facebook in early 2015. At the time, it had already bypassed some long-existing companies, with accuracy rates in the low nineties. It's unknown exactly what Facebook will do with the company, but voice-controlled posts or Messenger messages seem likely.
6. Microsoft Cortana
Cortana, the Microsoft phone assistant now built into Windows 10, composes messages, performs searches, and sets calendar events by way of voice commands. It's been measured above 90 percent accuracy--quite an improvement considering Windows 95 had an error rate of close to 100 percent.
7. Amazon Alexa
The Amazon Echo can do a lot--play music, adjust lighting, read recipes--without needing a screen or any manual activation. While the company won't reveal its internal word error rates, many users have pegged its word recognition as being a shade behind other voice platforms. The good news, though, is that Alexa adapts to your voice over time, helping offset any issues it has with your particular dialect. And while others require the speaker to be within a few feet of its microphones, Alexa operates from the next room.