OpenAI, an artificial intelligence research laboratory, has released a new project called Jukebox that generates music, including human singing, in a variety of genres and artist styles. The company in a blog post shared a couple samples of songs "in the style of" artists including Katy Perry and Elvis Presley.
Jukebox accepts genre, artist, and lyrics as input, and then creates a new sample from scratch. Open AI trained the project using 1.2 million songs from across the web paired with lyrics and metadata from LyricWiki. You can fine-tune this metadata down to the year songs were released, which is how the project can create music specifically in the likeness of Katy Perry if you ask it to.
The samples don't always sound exactly like the artist named, but the similarity here is impressive:
OpenAI didn't explain exactly why it thinks you'd want to create such synthetic songs. The music creation process has been upended over recent years by services like Splice that make it easy to purchase samples and loops and put together a song without having to do much of that work from scratch. "Old Town Road" was famously created using samples found online, but the examples are aplenty. Who knows, maybe Jukebox will enable you to someday create a country voice for your track even if you don't have your own Billy Ray Cyrus. Or cut out the Billy Ray's entirely.
Ethical questions — As the "deepfaked" videos of President Obama demonstrated, rapid advancements in AI are making it increasingly difficult to classify content online as fake or genuine. AI deepfakes risk plunging us into an age where we can't determine whether something is fake or genuine. And online mobs can be swift and reactionary when they decide someone needs to be canceled for their sins. What happens if an audio deepfake is created to make it seem as though a celebrity or public figure has said something offensive?
Jay-Z recently had a video removed from YouTube that utilized AI to create a humorous track in his likeness.
As they make such deepfake technology, researchers including those at OpenAI are simultaneously developing algorithms to detect fake video and audio content. These algorithms look for inconsistencies that wouldn't appear in authentic content. The theory is that those researchers developing deepfake technology will have the best ability to counteract it. But the existence of synthetic content is yet another challenge in fighting misinformation online that we really don't need right now.
Legal thorniness — Likeness has also been the subject of many lawsuits in the music industry, with artists going to town on one another for making songs that sound similar to another hit track. Musicians' whole careers are based on their voice, and their musical style — especially today when many hit tracks aren't written by the artist themselves but rather by someone else entirely, using a formula. If Katy Perry or Ed Sheeran's voice can perfectly imitated by anyone they basically lose the only thing that makes them sellable.
OpenAI says that it is conducting research into the intellectual property rights surrounding such generative audio. But it believes that collaboration between humans and computer models could unlock further creativity in the space, though the musicians it has shared the project with have said that they "did not find it immediately applicable" due to its current infancy and limitations.