Nobody expected the world’s first A.I.-generated pop song to be good, especially one titled “Daddy’s Car.” But the song, a swirling, surf-rock ditty with Magical Mystery Tour-era Beatles melodies and a vaguely Scandinavian vocal released in September by Sony’s CSL Labs, is unexpectedly catchy. In fact, it’s indistinguishable from human compositions. The song passed the creative Turing Test, meaning that listeners were left to feel duped, played by software.
Pop music has been evolving from its acoustic origins toward electronic semi-autonomy ever since synthesizers, arpeggiators, and autotune became mainstream instruments, but the release of “Daddy’s Car” marked the first time it had broken free of human control altogether and taken on a computer-driven life of its own. As A.I.-composed music becomes more sophisticated, the implications for human composers and listeners alike will only become more difficult to grapple with.
But we can rest assured for now, say Dartmouth University computer science and music professor Michael Casey, Ph.D. and Carnegie Mellon musician and computer scientist Gus Xia, because the music is good — but isn’t perfect. Here, they tell Inverse the four telltale signs that a song was generated by a robotic songsmith.
From the Weeknd’s “Starboy” to the Chainsmokers’ “Closer,” every pop song on the Billboard Top 100 can be broken up into discernible parts: the individually well-defined verse, pre-chorus, chorus, and bridge, together with the occasional intro and instrumental breakdown, are part of what make it so easy to identify pop and digest it. The formulaic simplicity of a pop song may suggest that composing one would be an easy, paint-by-numbers sort of affair for an A.I., but Xia, who also composes classical music, explains that fitting together many discrete musical parts is actually really difficult.
“If you have a sonata, then we have a beginning, an introduction, a recap, a cadenza, sometimes,” he says. “This kind of music structure is very hard to automate.” The same idea, he explains, applies for pop music. In “Daddy’s Car,” it’s easy enough to parse out a verse, chorus, and pre-chorus because they involve different vocal melodies and are broken up by instrumentals, but a closer listen will reveal that the underlying chord progressions for each of them are simple, involving four chords at best, often in the same order.
The structure of a human-made pop song, Xia says, shouldn’t just be “clear” but also “complex.” “If there is not — just a simple chord progression following the tune — though it sounds very interesting, it may be created by an artificial program.” Make of that what you will: The entirety of Bruno Mars’s “Uptown Funk” revolves around two chords.
The lead singer on “Daddy’s Car” may sound a bit weird, but not to the point where his provenance is suspect. At worst, he sounds like Swedish folk-pop artist Jens Lekman with a bad cold, which is remarkable for a voice that was generated entirely by A.I. We already have a hard enough time discerning human from robotic voices, Casey notes, because we’ve gotten so used to over-processed vocals and autotune: Just think of Kanye West’s “Welcome to Heartbreak” or literally any song by T-Pain. “With all the roboticization of the human voice in pop music, it’s not surprising that robot voices can pass for human,” he laughs. (It’s worth noting, however, that research has shown that people are actually pretty good at discerning human voices from robotic ones.)
For now, we can listen carefully for the less polished details of a natural vocal performance, like vocal cracks, tics in pronunciation, or sharp intakes of breath (although those are apparent in “Daddy’s Car”). Distinguishing between the two isn’t going to get any easier, however, and, as Xia adds, it might not be long until we start to prefer robotic voices to human ones. Already, holographic pop stars, voiced by the Yamaha-backed vocal synthesizer known as Vocaloid, are making it big in Japan. “The young generation, in Japan, they are crazy about that,” Xia says. “But maybe a more interesting and actual problem — what does matter — is do humans really like 100% natural human sound, or do they like a little robotic sound?”
Although the lyrics of “Daddy’s Car” were written by a human musician named Benoît Carr, they’re only vaguely sensible. “In daddy’s car, it sounds so good / like something new, it turns me on” could have very well have been written by an A.I., suggests Casey, because they can mean anything. Our best bet for spotting robot lyrics, he explains, is paying attention to word choice and themes. While we’ve learned what ideas are common and appropriate to sing about in pop songs — love, loneliness, sex, and, occasionally, politics — the robotic default isn’t set to anything. Referring to Hong Kong University’s recently unveiled rap-generating robot, he notes, “There’s certain words you’d never use in a rap. Mundane words. Unless you were a really awesome rapper, how are you going to talk about the laundry?”
The greatest advantage human composers have over robotic ones, Casey concludes, is having experienced what it means to listen — and how it feels to get bored. “We intuitively can access when we are surprised or bored by repetition,” he says. “For a computer, it’s just repetition.”
A pop song can’t be catchy without repeated elements. A recurring chorus gives a song structure, much in the same way certain melodic licks — think of the now-classic “dolphin sound” in Justin Bieber’s “Where Are Ü Now” — can act as a song’s unifying element. But with too much repetition, a composer runs the risk of boring listeners, and this is something algorithms don’t yet know how to avoid.
The hallmark of human composers, Casey explains, is that they know how to play with the expectations that “you yourself as a listener are forming as you’re listening to the song.” While types of musical surprises would be easy enough for an algorithm to identify — they can involve anything from an unexpected key change to a chorus that comes a beat too early — it isn’t easy to drop them into a song in a surprising way. “It’s how you do that repeat, and the nuance of what you don’t exactly repeat,” he says. “That’s actually where the music lives.”