Ground-Breaking Genetics Library Finally Represents the Whole of Humanity

This collection of human genomes will ensure that no genetic variation is left out.

Originally Published: 
An illustration of a human genome stretching over the globe.
Darryl Leja, NHGRI

Twenty years ago last month, scientists sequenced the first human genome in the landmark Human Genome Project. Among the many things they discovered was that while any two humans have 99.6 percent of their genome in common, the remaining 0.4 percent leaves plenty of room for variation. That tiny fraction is also likely responsible for many diseases and conditions. Two decades later, scientists are still trying to unpack how that 0.4 percent influences us.

Now, swaths of researchers around the world have been working to dive into that remaining, highly variable part of human DNA. Published today in the journal Nature, with authors from Canada, Denmark, Germany, Italy, Japan, Spain, the UAE, the UK, and the US, researchers created what they are calling the human pangenome reference consortium, a collection of sequenced human genomes that aims to eventually represent as many possible DNA sequences found across our species. Their work offers a starting point for comparing genetic variation so that we may better understand how genes vary and mutate across our species.

The new pangenome reference is a collection of different genomes from which to compare an individual genome sequence.

Darryl Leja, NHGRI

What is the pangenome?

This burgeoning genetic library will represent genomes across Homo sapiens so that we may better understand genetic variation among ourselves. By comparing the myriad changes, we can fill in gaps to help treat those of different ancestry who are prone to certain conditions. The pangenome is a composite of genome sequences from 47 people compiled into one data structure.

Albeit extremely simplified, the pangenome might function as a collection of names that includes all the many ways to spell Eric. As an example, at this particular press conference for this research, there were three speakers who were all affiliated with this research: Eric Green, director of the National Human Genome Research Institute; Erich Jarvis, a genetics professor at Rockefeller University’s Howard Hughes Medical Institute; and Erik Garrison, a genetics professor at University of Tennessee’s College of Medicine.

These three first names share a good deal of overlap — between 75 and 80 percent identical — but with crucial variations. A reference consortium that looks at names, for instance, could include all the many ways to spell this name, highlighting what’s shared and what diverges.

Instead of looking at the letters in a name, the pangenome looks at the base pairs in DNA. This library compares among the 47 genomes what’s identical and what differs. The more genomes it contains, the more variation it can account for, just like with these three homophonous names. If there’s only one genome on record — Eric — then all variations on that genome (or name) are left out. If the only sequenced genomes on record come from healthy white people of Western ancestry, then it doesn’t account for the myriad variations among the rest of the world’s population.

While the Human Genome Project only looked at the genomic equivalent of the name Eric, the pangenome accounts for different ways to spell the name.

Why is it important?

The Human Genome Project sourced genetic material from about 20 people, though the majority of its information came from just one person. If that one person were to serve as the singular genetic blueprint, think of all the types of people whose genes would go unaccounted for. The pangenome creates a composite of different genomes to highlight where variations occur.

“This pangenome reference represents an incredible scientific achievement,” said Green in a press conference, “providing an expanding view of humanity's DNA blueprint with a significantly greater human diversity than previous reference sequences.”

The genome is a map for genetic researchers. The 3.2 billion base pairs (all some combination of guanine, adenine, cytosine, and thymine) provide instructions for how every cell encodes and builds the proteins that we need to function every day. Among those base pairs also lie the changes and mutations that cause genes to code differently, resulting in different diseases or conditions.

The pangenome is a game changer for all genetic researchers. Wendy Chung, a molecular geneticist at Columbia University Irving Medical Center, who was not involved in the research, recognizes that the genome can vary in slight but potent ways depending on one’s origins. As this research helps us move forward in treating diseases, she tells Inverse it’s important to her that “we’re not leaving anyone behind.”

While Chung recognizes that this first draft isn’t perfect, she calls it “an important step forward” in incorporating genes from those around the world.

How will the pangenome help researchers?

This wealth of data will be a boon for genetics and diagnostics research. The authors acknowledged that risk conditions from coronary heart disease to schizophrenia are linked to genetic mutations that still aren’t fully understood. Co-author Evan Eichler, professor of genome sciences at the University of Washington School of Medicine, said that these complete sequences account for complete genetic variation that may increase the risk for these conditions. This means that it will be easier for diagnostics researchers to pinpoint the genetic mutation responsible for a disease.

“The mechanics in terms of how we’re building this reference are essentially going to transform the discovery of rare diseases or genetic causes,” he said.

Some populations are more prone to genetic mutations than others. For example, those of Ashkenazi descent are at higher risk for Tay-Sachs Disease, and African Americans are at higher risk for heart disease. A comparative genomic collection creates a more complete portrait of the human genome, accounting for how genomes may diverge depending on a person’s origins.

“We now understand that having one map with a single human genome cannot adequately represent all of humanity,” said co-author Karen Miga, a biomolecular engineering professor at the University of California, Santa Cruz, during the press conference. “It really is understanding and cataloging these differences between genomes that allow us to understand how cells operate.”

What’s next?

By 2024, the pangenome is expected to grow from 47 individuals to 350. Miga said that important next steps involve bringing in more partnerships and stakeholders, eventually establishing an international consortium called the Human Pangenome Project. Researchers currently work alongside Global Alliance for Genomics and Health, an international nonprofit that supports and shares genomic research.

If the pangenome reference consortium grows, it will account for more genetic variations. The goal, researchers say, is for no variation to be left out.

This article was originally published on

Related Tags