Here’s Exactly What Can — And Can’t — Be Gleaned From Your Genetic Data
Some biobanks link biospecimen data to other collected data, such as sexual behavior, medical history, weight, diet, and lifestyle
Imagine you agreed to be part of a new and exciting long-term research study to better understand human health and behavior. For the past few years, you’ve been visiting a collection site where you fill out some questionnaires about your health and daily activities. Research assistants take your height, weight, and some other physical characteristics about you. Because you agreed to contribute your genetic data to the study, you also provided a saliva sample during your first visit.
Later, you see a news article reporting that researchers analyzing data from the study you’re participating in have found genetic variants that predict the likelihood of someone completing college. You remember reading a long form when you consented to give your data, but you can’t quite remember all the details. You know the study was about health, but how do these findings about genes and education have anything to do with health? Did they analyze your data specifically? What did they find?
What are biobanks?
Many scientific research studies collect data meant to answer a specific research question. For example, to study the genetics of diabetes, researchers might collect data on your blood pressure and lipid levels in addition to genetic data. But increasingly, scientists are collecting large amounts of data to be kept in biobanks — repositories that store genetic data and other biospecimens like blood, urine, or tumor tissue to be used in a wide number of future studies.
Some biobanks, like the UK Biobank, link biospecimen data to other collected data, such as sexual behavior, medical history, weight, diet, and lifestyle. Private companies like 23andMe also obtain consent from their customers to have their data used in research efforts.
As a researcher interested in the intersection between social behaviors and genetics, I frequently have conversations with people who aren’t aware of how their genetic data is being used. They’re often surprised that the genetic data they consented to be used for research at a private company by using a DNA testing kit or at a biobank while visiting their local clinic might be used to study the genetics of same-sex sexual behavior or risk-taking.
In our newly published research, my colleagues and I found that even choosing not to respond to survey questions can reveal information about the population (we found that not responding to survey questions is correlated with a person’s education, health, and income levels) if genetic data is available.
Genetic data and informed consent
The research that can be done with biobank data might sound scary, but it shouldn’t be. Genetic data, like the data used in our study, is de-identified. This means that it cannot be linked back to individual research participants, who remain anonymous. Further, genetic data for these sorts of genetic studies is used at the aggregate level, meaning it isn’t used to predict or evaluate any one particular individual’s responses or behaviors.
Researchers aren’t using genetic data to target individuals with certain genetic profiles. Almost all genetic research is used to better understand how health behaviors and other factors affect health and to figure out ways to improve outcomes. This goal is why most research participants agree to contribute their data to research in the first place: to help the world through science.
The problem is whether research participants really understand how their data can be used. Many of the original ideas around the development of the informed consent process and Institutional Review Boards, or IRBs, intended to protect research participants from direct harm or privacy violations were based on the expectation that research studies would be addressing particular questions about a single subject, like cardiovascular disease or lung cancer. This focus was so as not to repeat unethical research atrocities like the infamous Tuskegee Syphilis Study, where researchers did not tell participants, who were all Black men, that they had syphilis and withheld treatment that was already widely available and known to be highly effective.
But since genetic data is de-identified, it is often considered exempt from full IRB review, which is a protocol to ensure studies meet ethical standards and institutional policies. And the broad number of research questions that can be explored with biobanks, along with the amount and types of data collected, has made these original protections to ensure truly informed consent insufficient.
Improving informed consent
To be clear, biobanks are enormously important for public health research. They allow researchers to link many different outcomes and variables together to paint a critical overall picture of human health and behavior. And in contrast with the personally identifiable online or phone data that companies collect to show you targeted ads, biobanks collect de-identified data that is evaluated in aggregate.
In the age of vast data collection, ensuring that participants are aware of how their data can and cannot be used is necessary to ensure that biobanks are a transparent tool for the global good. Biobanks can’t predict how a participant’s data will be used in the future, so it can be difficult for researchers and ethicists to bring back the “informed” part of “informed consent.” Even so, more needs to be done to earn the trust of the valuable research participants who contribute the data to improve science and the world.