We’re used to giving up our personal data for access to a free and open internet, but with growing amounts of genetic information being shared through direct-to-consumer genetic testing services like 23andMe and Ancestry.com, we might be giving away a lot more about ourselves than we think. In a study published Monday, researchers report that genomic data, once thought to be so complex that it couldn’t reveal a person’s identifying traits, is now alarmingly readable.
The study, published in Proceedings of the National Academy of Sciences by a team led by controversial geneticist J. Craig Venter, Ph.D. — a man famous for revealing that a chunk of the DNA mapped out in the Human Genome Project was his — reports that genetic information, which current insurance privacy regulations don’t consider as personally identifying information, could potentially be used to identify individuals without their knowledge or consent.
Venter and his team came to this conclusion after sequencing the genomes of 1,061 volunteers from diverse ethnic backgrounds in order to determine what exactly could be “read” from that data. In addition to genomic information, they also collected physical data about the volunteers, such as height, weight, and eye color, and they made 3D scans of the volunteers’ faces and recorded their voices.
Doing this gave them two sets of information: One set, the genomes, can’t be easily tied to any individual, while the other set, the people’s physical characteristics, literally defines them. Venter’s team wanted to see whether they could match up the two sets.
To do this, the researchers used the physical information and genomic data to train a machine learning algorithm to identify people’s traits based on their genomes. If you’ve read this far, you won’t be surprised by what they found.
Their study showed that it’s possible to identify people, with a fair degree of certainty, by their genetic data alone. In fact, the algorithm was even able to recreate many of the subjects’ faces.
This is a big deal, say the researchers, because it means that even when you remove a person’s personally identifiable information from their genetic data, someone with the proper software could reconnect the dots.
“Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans,” write the study’s authors. “This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.”
In short, this means that they were able to identify more than 80 percent of individuals in an ethnically diverse group by their genetic information. The study’s authors warn that their findings call into question how private anyone’s genetic information really is.
“If conducted for unethical purposes, this approach could compromise the privacy of individuals who contributed their genomes into a database,” they write.
Venter, the study’s lead author who played a major role in private industry’s effort to map the human genome in the 1990s, has created more than a little controversy in regards to genetic privacy. Soon after the Human Genome Project was completed in 2000, Venter revealed that much of the DNA mapped by his company, Celera Genomics, belonged to none other than Venter himself. This caused an uproar in the scientific community, as Venter had circumvented the rigorous privacy controls and selection process that typically accompanied DNA sample collections.
So if Venter, someone who’s exposed his own genetic information to the whole world, is saying that we should be worried about our genetic privacy, perhaps we should listen.