Microsoft's speech recognition tech is now more accurate than ever before

Geronimo Vena
Agosto 21, 2017

Previous reports have shown that the human word error rate is now at 5,1%, which means Microsoft's speech recognition system effectively is as accurate as humans.

This milestone means that, for the first time, a computer can recognise the words in a conversation as well as a person would.

"Our research team reached that 5.1 percent error rate with our speech recognition system, a new industry milestone, substantially surpassing the accuracy we achieved a year ago", Microsoft said in a blog post late on Sunday.

An announcement from Xuedong Huang, a Microsoft Technical fellow, reveals that the firm's latest tests show a 5.1 human parity word rate error, an improvement from the 5.9 previously announced, which was already better than that of a regular, casual human conversation and twice that of a Loose Women panellist.

A year ago in October, the team from Microsoft Artificial Intelligence and Research reported a speech recognition system that makes the same or fewer errors than professional transcriptionists. That's down from the system's previous WER of 5.9%.

Both systems are benchmarked against the Switchboard corpus, a dataset of recorded telephone conversations that speech research technologists have been using for over two decades to measure the capability of transcription systems.

Additionally, Microsoft's investment in cloud compute infrastructure, specifically Azure GPUs, helped improve the effectiveness and speed. Speech recognition system of Microsoft is used in many of its services such as Cortana, Presentation Translator, and Microsoft Cognitive Services. That's a pretty huge achievement on Microsoft's part, as it's been trying to reach human parity for the last 25 years.

"Moving from recognising to understanding speech is the next major frontier for speech technology", the post added.

Altre relazioni OverNewsmagazine

Discuti questo articolo

SEGUI I NOSTRI GIORNALE