Three researchers at Cornell Tech in New York City have discovered that blurred and pixelated images are no match for artificial intelligence. Though obscured images remain incomprehensible to human eyes, and so seem to protect their sensitive content, neural networks can often tell exactly who’s who in the original image.

In other words, humans are no longer the litmus test. We can no longer ask merely whether something defeats all human brains. A.I.s — even simple A.I.s — can outperform humans, so defeating them, too, must always be part of the equation.

The Cornell Tech researchers’ study focused on testing privacy-preserving algorithms, which blur or pixelates certain information or parts of pictures. Previously, we trusted privacy-preserving software or algorithms implicitly, figuring that the information they obscured was secure because no human could tell who was behind the digital veil. The study shows that that era is over, and related anonymization methods won’t last long either. Neural networks, met with these privacy measures, are unfazed.

A pixelated image that, to humans but not necessarily machines, obscures identities.
A pixelated image that, to humans but not necessarily machines, obscures identities.

Richard McPherson is a Ph.D. candidate in computer science at the University of Texas, Austin, who followed his professor, Vitaly Shmatikov, to Cornell Tech. Together, along with Reza Shokri, they demonstrated that simple neural networks could unmask common image obfuscation techniques. The technique is relatively unsophisticated, which makes the discovery more worrisome: These are common, accessible methods, and they were able to defeat the industry norms for obfuscation.

Neural networks are big, layered structures of nodes, or artificial neurons, that mimic the basic structure of the brain. They’re “based off of a simplified understanding of how neurons work,” McPherson tells Inverse. “Give it some input, and the neuron either fires or doesn’t fire.”

They’re also capable of “learning,” by a rough definition of the term. If you show a feral (completely uneducated) human something “red,” and tell them to pick out all “red” things from a bucket, they’ll struggle at first but improve over time. So too with neural networks. Machine learning just means teaching a computer to pick out the “red” things, for example, from a virtual bucket of variegated things.

That’s how McPherson and company trained their neural network. “In our system, we create a model — an architecture of neural networks, a structured set of these artificial neurons — and then we give them a large amount of obfuscated images,” he says. “For example, we might give them a hundred different pictures of Carol that have been pixelated, then a hundred different pictures of Bob that have been pixelated.”

The researchers then label these pixelated images, and in so doing tell the model who’s in each image. After processing this data set, the network functionally knows what Pixelated Bob and Pixelated Carol look like. “We can then give it a different pixelated picture of Bob or Carol, without the label,” McPherson explains, “and it can make a guess and say, ‘I think this is Bob with 95 percent accuracy.’”

The model does not reconstruct the obfuscated image, but the fact that it’s able to defeat the most common and formerly most reliable anonymization methods is disconcerting in and of itself. “They’re able to figure out what’s being obfuscated, but they don’t know what it originally looked like,” McPherson says.

But the neural networks are still able to do so far better than humans. When the images were most obfuscated using one industry-standard technique, the system was still over 50 percent accurate. For slightly less obfuscated images, the system proved remarkable, at around 70 percent accuracy. YouTube’s norm for blurring faces utterly failed; even the most blurred images were trounced by the neural network, which proved 96 percent accurate.

On the top, visual examples of two industry-standard obfuscation methods. On the bottom, the neural networks' accuracy. Each number corresponds with the obfuscation level above, left to right (or non-obfuscated to most-obfuscated).
On the top, visual examples of two industry-standard obfuscation methods. On the bottom, the neural networks' accuracy. Each number corresponds with the obfuscation level above, left to right (or non-obfuscated to most-obfuscated).

Other previously untarnished data, text, and image anonymization techniques are likewise unreliable. “There was a work over the summer that looked at anonymizing text using pixelation and blurring, and showed that they were able to be broken as well,” McPherson says. And other once-trustworthy methods may be on their way out the door as well. Though he doesn’t know the ins-and-outs of voice obfuscation techniques, like those used for anonymous TV interviews, he “wouldn’t be surprised” if neural networks could break the anonymization.

McPherson’s discovery, then, proves that “the privacy-preserving methods that we had in the past aren’t really up to snuff, especially with modern machine learning techniques.” In other words, we’re coding ourselves into irrelevance, training machines to outdo us in all realms.

“As the power of machine learning grows, this tradeoff will shift in favor of the adversaries,” the researchers wrote.

Photos via McPherson, Shokri, and Shmatikov, Unsplash