Are humans better than AI at detecting deepfakes? It’s complicated.

In one study, the leading algorithm identified deepfakes with just 65% accuracy. In another, humans seem to perform better... sometimes.

sample of AI-generated faces from

Within the past decade, image-altering technology has thrust us into the unnerving world of deepfakes. The forged videos, named after a Redditor called ‘deepfake’ who popularized the practice, use machine learning tools to create startlingly convincing face-swap videos. Gone are the days when advanced forgery was limited to big movies with huge CGI budgets: now, anyone with a working knowledge of neural networks and a consumer-grade GPU can take part.

Some deepfakes are relatively harmless, such as Donald Trump’s face plastered onto Kevin’s character in The Office, an alternate Game of Thrones ending, or the very inexplicable “Dr. Phil but everyone is Dr. Phil.” But others are ethical catastrophes that amount to financial fraud, national security threats, fake celebrity porn, and more.

While industry and governments have put forth efforts to limit deepfake use, bans are almost impossible to enforce because, at this time, no one can accurately say whether a video is real or not. The best deepfakes leave no pixelated evidence of a messy edit; their artificiality is virtually undetectable.

Algorithmic detection — Tech companies, ill-equipped to judge the veracity of videos, need a detection system desperately. So in 2019, some of them teamed up to present their challenge to the world. The competition was called the Deepfake Detection Challenge, an attempt “to spur researchers around the world to build innovative new technologies that can help detect deepfakes and manipulated media."

The Deepfake Detection Challenge released a massive public dataset of 100,000 total clips and sent participants off to the races. After entries by 2114 teams, Belarusian engineer Selim Seferbekov claimed the first prize of $500,000 for his winning model which was 82.56 percent accurate on the public dataset and, when presented with brand new videos, achieved an accuracy of 65.18 percent.

How do humans stack up? The algorithmic detection system’s accuracy score is about the equivalent of a “D” grade in school: certainly not perfect. But what if computers aren’t the sole answer?

“There was all sorts of panic about deepfakes, and we wondered, maybe people are actually decent at detecting digital manipulations,” said Matt Groh, an MIT Ph.D. student who is one of the authors of a recent paper in the Proceedings of the National Academy of Sciences. While his previous research has harnessed AI neural networks for tasks like dermatologist-level classification, he wanted to tap into the facial recognition skills of humans, rather than AI, to see how we stacked up.

“The fascinating thing about deepfake manipulation compared to other forms of manipulation is that it involves faces. And humans — even babies — tend to be really good at identifying faces,” he told me over the phone.

So he and other researchers put together a series of media snippets for humans on the internet to judge. Try it out for yourself!

The findings — The research found that people are better than AI — but still not very good — at telling deepfakes from genuine videos. When 882 participants were shown side-by-side videos (one real, one deepfake), 82 percent outperformed the winning AI model. Way to go humans! Interestingly, the research participants’ scores didn’t improve with more practice or more time spent.

In a more challenging task, participants viewed a single video and guessed whether it was a deepfake or not, moving a slider to report their response ranging from “100% confidence this is NOT a DeepFake” to “100% confidence this is a DeepFake.” Here, the results were different. Only some people, between 13 percent and 37 percent, performed better than the leading AI model.

Human-AI collaboration — Humans and AI both got a lot of the questions wrong, but each struggled with different aspects of deepfakes. The humble Homo sapien struggled with dark or blurry settings, but unlike the AI, we navigated unfamiliar angles or zoom like pros. People could think critically about context clues (would Kim Jong-un really say that?) but couldn’t match the computer’s skills in detecting visual clues like geometric shadows, reflections, and perspective.

What’s the takeaway? Groh sees this research as support for content moderation systems that include people rather than exclude people. While deepfake detection is still in early stages, the best identification takes place when humans are in the loop. “Algorithms are cheap and convenient when they work, but they’re prone to unexpected errors that humans would not have. The path forward is human-AI collaboration.”