Science

Microsoft Invents Better-Than-Human Speech Recognition

Oct. 18, 2016

Microsoft has become the world’s first company to develop speech recognition software that is more accurate than humans. In the paper “Achieving Human Parity in Conversational Speech Recognition” published Monday, the software produced transcripts that contained half a percent fewer errors than human efforts, which is incredible considering how good people are at understanding speech. The breakthrough opens the door for new A.I. assistants that are more accurate than ever before.

It’s impressive how far the technology has come in such a short space of time. It wasn’t too long ago that computer speech recognition was a weird niche that seemed hopelessly distant. Watch this demonstration of Windows Vista’s speech software from 2006:

The team used the National Institute of Standards and Technology (NIST) 2000 test, used across the industry to measure the reliability of speech transcriptions. A conversation takes place between two participants over the phone, turn by turn, before the resultant script is compared and checked against dictionary spellings.

In the switchboard portion, where two strangers speak for the first time, the human error rate is around 5.9 percent, while on the call home portion, where two people that know each other speak, the error rate is around 11.3 percent. Microsoft’s software scored around a 0.4 percent lower error rate.

The breakthrough will help bring new forms of immersive A.I.. In August, student Joshua Browder took the wraps off his DoNotPay chatbot, which can help homeless people get free legal advice. Combined with recognition advancements, it’s easy to picture a future where people ask a virtual assistant for help with housing by having a regular conversation with their computer.

At this stage, researchers are considering how A.I.-powered speech recognition can give smarter responses. Sensay, an anger-detecting A.I. from the lab behind Siri, uses advanced recognition capabilities to detect if a user is feeling angry or confused, changing its answers to suit the situation. Removing the barrier of error-prone voice recognition, creates exciting new opportunities for virtual assistance.

Related Tags

A.I.