Advanced Video Rendering Will Allow Virtual Assistants to Take Human Form

Future virtual assistants will soon be able to smile, express concern, and even blush.

Bret Hartman / TED

When we queue up future virtual assistants to get advice on how we should dress for the day’s weather or which route to work will be quickest, the answers may soon look like they’re coming from a living, breathing human being thanks to advanced digital rendering techniques.

That’s according to Doug Roble, senior director of software of research and development at Digital Domain, an Oscar-winning visual effects studio in California. In a 15 minute Ted Talk, presented as part of Ted’s annual Ted Conference hosted in Vancouver this week, Roble explained how his team was able to merge 3D motion capture and machine learning to generate hyper-realistic, human models.

These models are a lot more perceptive than a disembodied voice could ever be, and can mimic subtle facial mannerisms, map eye-lash movement, and even recreate the appearance of blood flow to make its cheeks look flushed. Roble gave a demonstration of the technology by stepping on on stage fully suited in motion capture gear, which he then used to render a mirror image of himself, called DigiDoug, on the screen behind him.

Roble demoing DigiDoug on stage at TED2019.

TED 2019

Roble said being able to recreate such lifelike humans could revolutionize the film industry, bring faraway friends, family, and colleagues closer together, and give voice assistants digital flesh and blood.

“This is going to be used to give virtual assistants a body and a face, a humanity,” he said. “I already love it that when I talk to virtual assistants they answer back in a soothing human-like voice, now they’ll have a face and you’ll get all of those non-verbal cues that make communication so much easier.”

Past psychological studies have provided evidence that certain facial expressions elicit emotional responses, a facet of communication that voice assistants simply can’t access, at least for now. But Roble and his team at Digital Domain have created a method to give Siri and its counterparts a mug.

A neural network using motion capture data from Roble's face to move a virtual model of his face in real-time.

TED 2019

Using a device called the “light stage” at the University of Southern California, Roble captured thousands of images of his face under various lighting conditions. This allowed him to recreate a strikingly realistic 3D model of his face, which a deep neural network maniupates using real-time motion capture data.

So far, the database is only made up of Roble’s face, but with more volunteers, it could be used to render a body for Alexa. This way the assistant can connect with users on another level and potentially even be able to provide emotional support further down the line.

“You’re going to be able to tell when a virtual assistant is busy, confused, or concerned about something,” said Roble.

In other words, we might all soon be able to meet our virtual friends face-to-face, a step that could enable more intimate and fulfilling interactions with our technology than ever before.