A Canadian machine learning outfit has received ringing endorsements with the voices of Donald Trump, Hillary Clinton, and Barack Obama discussing its technology. Only it’s not actually them, it’s an artificial intelligence-powered imitation that’s scary-accurate.
Montreal-based LyreBird uses machine learning to produce realistic-sounding voice audio from preexisting voices, but you wouldn’t know it. By analyzing the voices of all three U.S. political figures, it was able to (openly) produce all three fake quotes, and while America may need such an unlikely alliance of political ideologies, the public is anything but enthused.
LyreBird takes a sample of any voice, at least one full minute of speech, and looks at the aspects of the voice that define it, that differentiate it most reliably from any other voice in the world. It then takes these reliable deviations from the platonic ideal of an English voice and tells its voice-synthesis component to make those same adjustments to its audio “wave-forms” as those sound curves are generated. By doing this with enough specificity, the voice generated can have not just the general sound or accent of a person, but the quirks and minor tics, as well.
In the era of “fake news,” it should not be surprising that this ability to fake quotes from real people did not go over without comment about its potential impacts on society. The company cannot possibly have been taken off guard by this, (view their ethics statement here) but as of this writing Inverse has not received a response to an emailed request for comment.
It was only a few months ago that Inverse reported on the development of DeepVoice, for easy, novel voice synthesis. In that report, we noted that the ability to modulate the pitch of a word would allow the A.I. to produce the impression of emotion in its words. Though LyreBird doesn’t have the real-time speediness of DeepVoice, it does do basically the same sort of pitch modulation to produce inflection in the quote. As the company’s website says, LyreBird “can mimic a person’s voice and have it read any text with a given emotion.”
The stated uses for the product are to create new voices for “people with disabilities, for animation movies or for video game studios.” All but the use for people with disabilities seem questionable. A person who has lost their ability to speak unassisted might very plausibly want to make a machine voice sound like their old speaking style, but how many video game companies would realistically find a use for this?
The real question is whether these machine-identified verbal tics could be exaggerated to make a comedic parody — machine impressions that apply ruthless comedic insight to find our most defining verbal qualities, and make fun of them.
Combined with the new FaceApp photo modification software, we’re seeing the beginnings of the trends that could change the very concept of evidence. At the very least, it could (finally) extend the threat of automation to Alec Baldwin, and his terrible Trump impression.