CEO of Startup Using AI to Transcribe Languages Tells Us How It Beat Google

Speechmatics has big plans.

Unsplash / Jason Rosewell

Speechmatics is going after Google. The British startup took the wraps off Automatic Linguist last month, a powerful artificial intelligence that can learn any language for speech-to-text transcription in a matter of days. The team wants to enable the technology for every one of the approximately 7,000 languages in the world, with the potential to transform lives.

Since launching the linguist tool, Speechmatics has been working on Omniglot, a challenge to build a language a day. Last week, the company hit a big milestone: It has officially beaten Google, with 72 unique languages in total.

The system uses machine learning to match audio data with a transcript counterpart. It then uses linguistic patterns from other languages to make the process as straightforward as possible, identifying the similarities between sounds and grammatical structures and applying them to new languages. The process is highly effective: as an example, Speechmatics’ work on Hindi took just two weeks to reach 80 percent accuracy. When the final product was tested against Google’s efforts, it made 23 percent fewer mistakes.

Inverse spoke with Benedikt von Thüngen, CEO of Speechmatics, to find out more.

How did you team first start working on the project?

We started project Omniglot as a challenge to ourselves - to see how many languages we’d be able to build in six weeks. We realised a while back that the traditional approach of building each language individually is no longer viable when looking to scale at a rapid rate. Bearing that in mind, we had to re-think what a language is, how it is structured and what similarities there are between different languages. We have found a way to use those commonalities to identify patterns and help our A.I.-powered framework, Automatic Linguist (AL), to build languages faster than ever - 46 in six weeks to be precise, or about one language a day for six weeks!


How does this differ from Google’s efforts?

Our approach to language building is one of the main aspects that differentiates us from Google. While we assume that they build their languages individually (or by what we call ‘brute force’), we’re using the power of A.I. to streamline and speed up the language-building process. Additionally, while other services like Google focus on building dialects rather than unique languages, we’re proud to say that our efforts have been focused on unique languages from all over the world, including areas which have been previously underserved by the big technology companies.

The team's progress.


What are some of the real-world applications for this?

We now have the tech and knowledge to make our service more far-reaching than ever before and bring automatic speech recognition (ASR) to everyone. This is particularly relevant in countries with low literacy rates, where the ability to use previously unavailable speech-to-text technology to communicate can make all the difference for people. Other real-life cases where ASR technology can help have to do with issues of accessibility - hearing and/or vision impaired people from all over the world can now use a device as simple as a phone to interact with those around them.

Does this improve the accuracy of well-covered languages like English?

As we continue developing more languages, our A.I. framework will become increasingly adept at identifying linguistic features and patterns. We will use this knowledge to continue perfecting our current language base, including English.

Could this improve something like the Google Pixel Buds’ real-time translation tools?

We definitely see projects like Omniglot helping to improve real-time translation tools going forward. As more resource is invested in expanding the reach and accuracy of languages, we will see continuous improvement in the translation services sector.

The countries with a language that is covered by Speechmatics.

Does this work with any language, even constructed languages like Klingon?

We’ve yet to try and build any conlangs, but we do not see any reasons why they would not work. As these languages are still spoken by humans, they also follow similar structural rules and constraints as everyday languages (such as number of phonemes), which would give AL enough data for a build.

Are you open sourcing the project?

No, we do not have plans for that in place.

How will licensing work?

The languages offered under project Omniglot are free of charge and can’t be used for commercial purposes. As such, there will not be any licensing attached to them for the foreseeable future.

What are the next steps from here?

Project Omniglot is just the start for us. We want to eventually build every language in the world, so we’ll be working hard towards that goal!

Related Tags