Will Virtual Reality Get Lost in the Uncanny Valley Of Sound?

Audio engineers struggle to figure out how to build virtual realities with audio and wonder — is it even possible?

If you know anything about robotics or CGI or have ever seen an animated Robert Zemeckis movie, you know about the “Uncanny Valley,” the dip in the relatability of manufactured beings as they progress from being representational to photorealistic. Fortunately for filmmakers, the visual lowland are relatively easy to avoid: Just film real life. But the geography of human reactions is complicated by virtual reality, which has more than one valley. According to acoustic engineer Francis Rumsey, truly immersive experiences will be difficult because the media has trained us to be comfortable with representational sound. When things get real, sound doesn’t.

Visuals get the front page (for obvious reasons), but we’re quietly entering an age of more advanced speaker technology. Binaural audio, for instance, puts a microphone close to each ear and drastically increases the numbers of loudspeaker in sound systems. Rumsey, the technical chair of the Audio Engineering Society and the director of sound consulting company Logophon, tells Inverse that this represents progress, but not a breakthrough. “In many cases what we would call the timbral quality of sound — that’s the sound of color, and what novices think of as sound quality — has not necessarily increased correspondingly,” he explains. “And that led me to wonder whether there was something rather similar to what has happened in the video animation field.”

Rumsey believes believes there is. He hasn’t seen the valley, but, while working with sophisticated virtual reality systems, he’s heard the wind blow through it. The best systems out there right now use surround sound or wave field synthesis systems, which use a large number of loudspeakers to simulate movement and the audio effects of environment. According to Rumsey, “the results are not as convincing as we’d hope them to be.”

The problem isn’t so much in our stars as in ourselves. In the past, traditional two channel stereo has served us well; it’s unsophisticated but hard to break and well-suited to being blasted at people looking in the same direction and not moving. The only time this sort of audio causes a problem when paired with traditional video is when it fails to run in parallel with the video. It becomes frustrating (or funny). But that problem is relatively easy to solve — it’s a matter of time not resonance.

Binaural audio and other virtual reality sound reproduction methods have the potential to be very good, “but also potentially very odd and disturbing if they’re not done absolutely perfectly.” He likens it to Masahiro Mori’s theory that the level of realism ultimately decides on if something jumps from cartoonish to uncanny (Rumsey prefers “unnatural” for environmental audio and “uncanny” for human spatial voices, as similarly proposed by his AES colleague Glen Dickens).


The question facing Rumsey and a lot of audio engineers is whether or not we can build a bridge across the valley capable of suspending our disbelief.

As forward-looking as Rumsey is, he’s not optimistic here. The uncanny valley isn’t a completely mapped-out psychological landscape — some researchers doubt its existence or think that it might disappear with conditioning, but other scientists have proposed an uncanny cliff, in which attempts to go back up the other side to human ultimately fail. Zombies all the way down. For uncanny or unnatural sound, Rumsey falls into the latter camp.

“With many types of spatial sound synthesis,” he says, 100 percent accuracy is “not actually possible. You can never get the loudspeakers small enough, or close enough, so we may always be chasing an ever-decreasing sort of level of accuracy. It may be an asymptote, if you like.”

The virtual realities we visit may be profoundly realistic, but the sound will stay animated. Or not. It is possible that there is another side to the valley, but we don’t know. Audio engineers like Rumsey can only make noise and listen for echoes.

But there’s one reason to stay awhile, if not relax, in this descending pit of aural creepsounds: A wonderfully disturbing sense of horror. In the 2009 paper “The audio Uncanny Valley: Sound, Fear and the Horror Game,” music professor Mark Grimshaw, then at the University of Bolton in the U.K., proposes that freaky sounds are desirable in certain settings. He offers, by example, the sonic bumps in the night of games like Left 4 Dead, in which “a wolflike howl heralds the swarm’s attack and it is the predatory denotation and lycanthropic connotation that is designed to send a chill up the spine.” Such predatory and inhuman screeches, Grimshaw says, could be the reason why nails on a chalkboard sounds are so jarring: It’s the sound of something going to eat you, or the cry of “proto-human” alarm. How to harness the uncanny? Grimshaw’s tips: Exaggerated mouths while speaking, strange twists on familiar sounds; and aural fidelity noticeably crappier than the visuals.

When it comes to 3D spatial sound, you could mess with the binaural phase so that “the sound to one ear is pushing where the other one is pulling,” according to Rumsey. Otherwise, intentionally altering voices to make them artificial or odd (the Bane effect) isn’t really a new idea. “People have done that forever,” Rumsey says. He’s worried more about our unintentional flubs, because the auditory equivalent of rubberized Tom Hanks is all the more disturbing when you’re least expecting it.