In April authorities in the Sacramento area captured the rapist and murderer known as the Golden State Killer — using publicly available genetic data. Shortly after the arrest of Joseph James DeAngelo, it was revealed that the consumer genomic database GEDmatch.com had helped bring the 40-year investigation to end. Law enforcement linked DeAngelo to his string of crimes with the assistance of his relatives’ DNA data that were available online. In a study published Thursday in the journal Science, researchers point out that consumer testing is only likely to become more prevalent and more powerful — but this may come at the cost of our privacy, as publicly available genetic databases can identify us even if we haven’t contributed to them.
As of April 2018, more than 15 million people had used direct-to-consumer autosomal genetic tests. These include services like 23andMe and Helix. In the new study, researchers explain that this deluge of participation means it’s becoming extremely easy to identify people by their DNA through public genetic genealogy databases, even if they have not undergone genetic testing themselves. The study’s authors report that at least 60 percent of individuals in the United States with European ancestry can be identified.
This study specifically focused on Americans of European descent, because this population is over-represented in DNA databases. The team analyzed a dataset of over 1.2 million anonymous individuals who had undergone commercial sequencing through a company called My Heritage (study co-author Yaniv Elrich, Ph.D., is the company’s Chief Science Officer). They discovered that for about 60 percent of the people within the data set, the scientists could find a family member with matching DNA segments. Once that information is combined with publicly available genealogical records and one or more relatives are found, it’s not too hard to figure who belongs to what DNA.
“This technique,” the scientists write, “could implicate nearly any US-individual of European descent in the near future.”
That is seemingly good news for forensic scientists and police officers — and it underscores a privacy risk for people at large. The scientists emphasize that, because it’s so easy to track people down through DNA data, there need to be new policies designed to protect their privacy and halt the misuse of the information:
Taken together, we posit that our results warrant a revaluation of the status quo regarding the identifiability of DNA data, especially of US individuals. While policymakers and the general public may be in favor of such enhanced forensic capabilities for solving crimes, it relies on databases and services that are open to everyone. Thus, the same technique could also be exploited for harmful purposes, such as re-identification of research subjects from their genetic data.
The scientists also posit that if legal authorities combine crime scene DNA with publicly available DNA information, they can likely drive down their pool of suspects to about 17 people. As more people take consumer genetics tests — something that these authors say is inevitable — this process is only going to get easier. With mathematical modeling, they determined that once genetic databases cover at least 2 percent of a target population, nearly any person within that group could be matched at least at the third cousin level.
This route offers a powerful alternative to the forensic database search authorities have relied on until now — but it also means that an innocent fourth cousin who doesn’t know their information is out there might get a knock on the door.
Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US-individual of European-descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. Based on these results, we propose a potential mitigation strategy and policy implications to human subject research.
Interested in DNA? Then watch these DNA nanorobots kill tumors: