In early September, notorious geneticist and businessman J. Craig Venter published a controversial study claiming that faces and other personal details could be back-engineered from supposedly “private” genetic data, eliciting an uproar about the future of our genetic security. People had good reason to be scared: His study cast serious doubt on the privacy of the enormous amounts of data already collected from users of sites like like 23andMe and Ancestry.com.
But as concerns brewed in public, many in the science community — some of them Venter’s colleagues — swiftly turned their backs on him.
They were not convinced that Venter’s claims in his Proceedings of the National Academy of Sciences paper were sound. In the article, he and a team of researchers reported that an individual whole-genome sequence, long considered to be so dense and unwieldy that it couldn’t be possibly used to identify someone, could in fact be used to single out a person with a fair degree of certainty. They said their work “challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.”
Multiple scientists spoke up quickly, speaking out in Nature. They did not agree with the paper’s findings, and some accused Venter of having ulterior motives.
Erlich, who is also chief scientific officer of the genealogy website MyHeritage.com, had reviewed the paper when Venter and his colleagues first tried submitting it to Science, where it was rejected. Meanwhile, data engineer Jason Piper, who co-authored the paper, distanced himself from Venter the day the paper was published, asserting on Twitter that its real aim was to prevent people from sharing genetic information.
If Piper’s claims are true, there is plenty for Venter to gain. What many outside the scientific community may not realize is that Venter owns a company called Human Longevity, Inc. (HLI), whose goal is to build a private genetic library and would certainly benefit from a world where genomic data was not public. Piper, who used to work for HLI, quit after the paper was published. He did not respond to Inverse’s request for comment.
Venter and HLI have a lot to gain by discouraging open sharing of genetic information, as his history in the field seems to foreshadow. In 1998, as the founder of Celera Genomics, Venter led the private industry effort to sequence the human genome. Celera’s interests butted up against those of the National Institutes of Health (NIH), where government-funded researchers were working to do the same under the Human Genome Project. Whereas the NIH project aimed to yield a publicly available map of the human genome, Celera’s data were meant to be available to paying customers only. Venter proposed to patent a large number of genes, which drew strong criticism from his peers at the NIH and beyond.
Venter later revealed that much of the genetic information Celera sequenced was actually his own, raising ethical concerns from colleagues who were appalled he circumvented the usual careful donor selection and anonymization process.
He left Celera and has since founded HLI. Today, HLI lets customers sequence their entire genome to discover their unique disease risk factors for $4,900, and its growing database of human genomes will only benefit from discouraging public sharing.
Venter did not respond to Inverse’s request for comment.
The fact that Venter was able to publish his paper in PNAS led Erlich to write a heated rebuttal. In it, Erlich claims that HLI’s paper doesn’t actually show the novel thing it claims to show — that whole-genome sequences can be used to identify differences between individuals of the same ethnic group.
Those sequences can reveal personal differences — but so can all the demographic information about a person that’s attached to that data, says Erlich. It turns out that the genomic data Venter is raising fears about is actually not that revealing at all.
“If you go to any genetic biobank, any online resource where you can download genomes, you always have some sort of auxiliary information,” says Erlich. “Ethnicity, sex, sometimes age. It’s very common to get those identifiers.” In a study in 2013, his team even used Venter’s genome to illustrate this point.
“We took the genome of Craig Venter and showed that we could predict the name Venter from his genome,” Erlich says. “This was in 2013. We didn’t know he would publish a paper in 2017.”
Yet despite the holes in the science, Venter got his study published in PNAS, a widely respected journal. This is problematic, says Erlich, because policymakers that cite his research in the future won’t necessarily know the controversial lengths he went to in order to get it published. Venter, he says, chose the reviewers for his paper.
“He selected people that are experts in the bioethics of genetic privacy, experts in general data privacy,” says Erlich. “They’re not experts in statistical genetics. The mistakes in the paper are very obvious to anyone who’s a statistical geneticist. He created for himself a very favorable review process, to begin with.”
“This track has been abused by the conflict of interest that the authors have,” says Erlich.
On September 11, days after Erlich published his rebuttal, Venter and his team — minus Piper — finally responded to the community’s criticisms on bioRxiv.
In this paper, Venter and his colleagues defend their work, arguing that Erlich’s claim — that the HLI algorithm using whole-genome sequences doesn’t perform any better than a simulation using simple demographic data — is “misleading.” As a secondary argument, they assert that it’s important to separate the study from its potential implications.
The paper wasn’t exactly convincing. Ran Blekhman, a microbiome genomics researcher not linked to either paper, responded quickly on Twitter:
Blekhman declined further comment, but other scientists spoke up. Mark Shriver, a professor of anthropology at Penn State University who also reviewed the paper before it was rejected by Science, publicly disagreed with Venter and HLI’s claims.
“Calling it predicting from the genome is what’s wrong,” Shriver told MIT Technology Review. “The main message is way overstated. They just didn’t have enough people to find the genes that distinguish people. This is not the paper that is going to convince people that this is going to affect privacy or help forensics.”
Erlich says the most significant problem with the PNAS paper is that it could provide a basis for legislation, and if it does, then those changes will be based on inaccurate science.
”Real policy change can be made based on those papers, and if you feed them inaccurate information, they have the wrong policy,” he says.
“You cannot just give them some fake science.”