After Edward Snowden leaked data in 2013 that revealed the National Security Agency is, and has been, spying on the American public through mass metadata collection, there was outrage — but there wasn’t a revolution.
Partly because people wondered: So what? Metadata doesn’t include any substance, just who is talking when and for how long and sometimes where. A new study from researchers at Stanford University, however, shows that metadata is a lot more.
The study, called “Evaluating the privacy properties of telephone metadata,” collected metadata from 823 volunteers through an app called MetaPhone. The relatively small sample size yielded more than 250,000 phone calls and 1.2 million texts. What they found is that people who worried about their metadata aren’t tin-foil hat conspiracy theorists after all.
In short, their findings can be broken down into two points:
“The first is that metadata is not totally anonymous and can be used to infer sensitive information,” Patrick Mutchler, a Ph.D. candidate at Stanford, tells Inverse. “Legal distinctions between metadata and content are perhaps not justified. The second is that it is essential to base public policy on sound science. Citizens and policymakers should be able to understand the consequences of policy.”
Researchers gathered and analyzed metadata in the same way the NSA does when it subpoenas metadata. The subpoenaed target (known as the “seed” number) is the main target that can be legally investigated, but the NSA can also access the metadata of connections to that seed, called “hops.” The NSA can leapfrog two hops from the seed going back 18 months.
Those hops are a crucial part of the data collection, as hops can be the factor that turns metadata into content data. That’s because hubs of heavy phone users, such as customer service lines, connect a large portion of the population. Think of each customer service hub as a spider hive. Each baby spider leaving the hive represents a user who can be reached by an NSA hop as it flies away from the hive, connected by a web that the NSA can follow. Then, when each of those baby spiders has babies of their own, the NSA can follow those new spiders as well.
In the Stanford study, heavy communicators represented “hubs that connect meaningful proportions of the entire participant population.” It’s an uncomfortable amount of data collection when extrapolated out on an NSA level.
“Applied to the NSA’s program, our results strongly suggest that until 2013, analysts had legal authority to access telephone records for the majority of the entire U.S. population,” the study authors write. After slightly more strict regulations under the USA FREEDOM Act passed in 2015, “an analyst could in expectation access records for ~25,000 subscribers with a single seed.”
That’s 25,000 people implicated from a subpoena of one person.
“The hub nodes make any ‘hop’ based restrictions on the NSA’s authority mostly useless and it’s essential that they be removed in some way before the NSA is able to access the metadata database,” Mutchler says.
Finding the face behind the metadata
Of course, you might argue that metadata is just metadata. It doesn’t have names, or as the NSA puts it, “personally identifiable information.” Stanford’s researchers found, however, that metadata doesn’t necessarily stay metadata.
A short list of things that can be determined from metadata includes health records, location histories, web search queries, web browsing activity, movie reviews, and social network graphs.
The study attempted to re-identify the people who willingly offered up their metadata through MetaPhone. Researchers randomly selected 30,000 numbers from their data, and then ran them through Yelp, Google Places, and Facebook. The search connected more than 9,500 of the numbers, or 32 percent, to names, faces, and businesses. That was done using free public databases, and the number would be much higher with commercial databases.
The researchers identified contacts that people had with specialized pharmacies, a cardiovascular medical center, an AR rifle dealer, Planned Parenthood, and one person who contacted “a hardware outlet, locksmiths, a hydroponics store, and a head shop in under three weeks.” No one is suggesting that last person is starting a dope marijuana grow-op in his house, but no one is suggesting he isn’t either.
All of this was found on a university research budget. The exact resources the NSA has at its web-tracing fingertips is unknown, but the total budget for spy agencies in the United States is somewhere in the ballpark of $52.6 billion.
Will people ever care?
Snowden is still fighting to make people care about metadata collection to this day. In return, he’s been labelled as everything from a traitor to a Russian spy. The story that he helped tell through journalists Glenn Greenwald and Laura Poitras was world changing, but it didn’t make people care.
The Stanford researchers’ findings empirically prove identification is possible.
“Our results attempt to show the legal and technical limitations on the metadata collection programs,” Mutchler says. “We cannot say that the NSA is actually performing any of the inferences mentioned in our paper or accessing as much data as we show is legally permissible in our paper. We can only say what the NSA can do, not what they are actually doing.”
“People’s opinions about the NSA programs are their own and I don’t want to force people to believe one thing or another. What our paper does is give people the facts they need to make an informed decision about the programs.”