A new recorder — When Google debuted the Pixel 4 back in October, one of the more interesting features was the Recorder app with live transcription. Now that the device is in people’s hands (and on older Pixels) the app has become a fan favorite, especially among journalists and meeting takers. But how does Google’s transcription work? And how does it work offline, without the cloud?
Machine model — In a new blog post, Google explains the company has improved on an on-device machine learning model called an "RNN transducer," which first made its debut in Gboard back in March of 2019. The company says it “made sure that this model can transcribe long audio recordings (a few hours) reliably, while also indexing conversation by mapping words to timestamps as computed by the speech recognition model,” allowing it to pick up playback by tapping on a word.
Deep classification — Google also explained another one of the Recorder app’s abilities: live audio classification. The company is apparently using what’s called a Convolutional Neural Network, which was initially used for image classification, to determine whether the audio coming in is music, speech, or a dog barking in the background. The goal, according to Google, is to make long transcriptions easier to visually search through.
If you’re a frequent user of the Recorder app, or you’re just into machine learning, give Google’s blog post a read.