The Surprising Hazards of Sounding Like a Robot

The proliferation of A.I. has people more focussed than ever on the human voice.


Hillary Clinton is not a robot, but seeming like one might be her biggest political liability. There’s precedent for that: Marco Rubio’s pre-programmed demeanor doomed him to an onslaught of Donald Trump punches that all seemed to land. For Hillbot, the glitches — that characteristic pinching of her fingers into the Clinton thumb, the over-determined smiling, the heavily scripted policy speeches — are less of a problem than the UI. Her voice has been pounced upon as “robotic” with incredible frequency.

It’s a sexist criticism and should be called out as such. But it’s more than that. It’s indicative of our current cultural obsession with robot spotting, the same one fueling the enthusiasm with the new HBO series Westworld. That show is about androids — some disguised, some not — becoming increasingly human. There is a common theme here: The way beings express themselves governs the emotional reactions of humans. You are how you speak or, more precisely, you are how you are heard.

Juliana Schroeder is a social cognition expert at the University of California, Berkeley who has studied gestures and how they affect our understanding of what it means to be human. This past August, she — along with behavioral scientist Nicholas Epley — published a study in The Journal of Experimental Psychology that focused on how the voice is consumed by the human mind. Schroeder likens her work to the recently viral study of how text messages and email are hard to decipher for their sarcasm (or lack thereof); the human voice carries with it more than just communication of ideas to the receiving end of a conversation.

“There’s something about the voice that can accurately convey complex reactions — maybe beyond conveying information, voice actually signals that they have a mental capacity, that a person seems more thoughtful and more emotional,” Schroeder says, calling this humanness “mindful,” or the fact that a sentient being seems to possess a sense of thought, emotion, and intelligence, a “mindfulness” that indicates, to a certain extent, that there is an identity and soul within the person.

Schroeder and Epley conducted an experiment by using a bot-produced script and a human-written one; they then paired each with a human voice. They found that voice was integral to a person’s conception of the script: If people heard a voice reading the bot-produced script, they were able to usually tell it was machine-made — “People can tell, the text is just so weird” — but hesitated just because a human was reading it. The researchers then tried to create a video where a person was on mute reading the text, which ran as subtitles, and found that people weren’t fooled into thinking a bot was a human, or vice versa.

“Voice is humanizing,” Schroeder says simply.

That’s where this crazy election comes in. In an upcoming, unpublished experiment, Schroeder took Clinton and Trump supporters and had people of each camp either read statements or hear a person give those statements. “There are paralinguistic cues in a person’s voice,” she said. “There’s something about the variance of the tone of the voice that is giving these signals that there’s a mind behind those words.” And that’s important in understanding the other side — you probably won’t get persuaded to agree with the opposition’s, but you’ll view them as human.

Or, alternatively, you’ll accuse them of being robotic in an attempt to fundamentally disengage with their humanity. Hillbot and MechaTrump are easier to ignore than their fleshy counterparts.

The fact that Schroeder’s and Epley’s experiments hedge on the fact that a voice is indicative of emotions and a more human-like presence is an interesting one, since today’s artificial intelligence has taken voice to be a central aspect of our digital experience. Yet, humans aren’t falling for it. The vice presidential debate moderator, Elaine Quijano, was teased for her almost robotically soothing voice, and critics have wrestled with the overabundance of female autobots — Siri, of course, but there’s also Amazon’s Alexa, Microsoft’s Cortana, and the nameless GPS guide.

But Schroeder’s and Epley’s experiments show that humans are smart enough to discern a human from a non-human, even if the non-human is trying to fool us with a “voice.” Voices that modulate and vary — whether we’re shouting, bellowing, laughing, heaving, or what have you — are clues to our audience that we’ve got emotions too, and our brain interprets that — rather than the monotony of artificial intelligence’s current “voice” — to be an indicator of a human. This could be a defense mechanism from our ancestors — we are programmed, so to speak, to recognize voices as those of our tribe and be particularly cued into what that voice is conveying so we know if someone’s going to pull a Judas on us. Or it could just be that we’re inherently suspicious of machines and predictive movements, whether it be a factory robot or C3PO or Hillary Clinton.

In other words, we think of a being that sounds and acts like a human — whether it be on Westworld or on the campaign trail — as more convincing and thoughtful and less frightening than, well, a robot.

Related Tags