The New, Expanded Genetic Alphabet Rewrites Life

Two synthetic nucleotides add unprecedented flexibility to the genetic code.


Until very recently, life on Earth was dictated by strings of just four letters: G, A, T, and C. These bases shuffle together to form sequences, short and long, that determine everything from the shape of chicken eggs to the color of our skin. Deciphering nature’s secret language gave scientists unprecedented insight into the ways DNA encodes life.

But as they became fluent in DNA, scientists realized how much its limited vocabulary restricted its potential. They wanted to expand nature’s abilities to fit the needs of humankind. So they did the natural next step: added more letters.

Working in a laboratory just off the Pacific coast at the Scripps Research Institute in La Jolla, California, Floyd Romesberg, Ph.D. and his team created two new bases — X and Y — and successfully incorporated them permanently into the DNA of the bacteria E. coli. This achievement, which was documented in a breakthrough paper published in PNAS in January, took Romesberg one step closer toward his goal of manipulating the DNA of organisms to make them create proteins the Earth has never seen.

“When you consider the properties that a protein can have, the sorts of disease it can treat, it must be limited in some way by the amino acid components it’s made of,” he tells Inverse, referring to the building blocks that make proteins from DNA.

Romesberg inserted X and Y into the DNA of the bacteria E. coli, where it was retained over multiple generations.


The “central dogma” of biology describes how the G-A-T-C sequence in our genes encodes a very specific progression of amino acids, which in turn gives rise to one very specific kind of protein. As far as we know, there are only 20 amino acids that can lock together in different patterns to form proteins, and to chemists, many of them are “redundant” and “really not interesting,” according to Romesberg. Rewriting the code of life will allow scientists to produce “unnatural” amino acids, which can join together to form proteins with unprecedented functions.

The possibilities are limitless, but customizing proteins for medicine is Romesberg’s top priority. Each year, drug companies spend billions of dollars testing protein-based therapies, which have the potential to treat illnesses that regular small-molecule drugs cannot. For some immune diseases and several types of cancer, Romesberg says, “protein drugs have provided the first significant impact on disease progression, extending lifetimes.” It’s no wonder that the majority of the drugs in the FDA’s investigational new drug pipeline are proteins, and that some experts estimate the plasma protein industry will be worth $31 billion by 2024.

But the problem with proteins — which are derived from natural sources — is that they’re not as customizable as small molecules are. This characteristic is crucial to drug development, a science that involves fitting tiny pegs snugly into oddly shaped holes. After all, what good is a DNA-binding drug if it doesn’t latch on tightly? Here’s where the expanded genetic alphabet can help: In theory, Romesberg can spell out a DNA sequence, incorporating X and Y, to manipulate the pattern of natural and unnatural amino acids in the protein he intends to create. Computer modeling can tell him which amino acids will produce a protrusion here, a cavern there. In this way, Romesberg can custom-fit a protein to its target site.

“We could develop proteins that have whole new properties — whole new abilities to treat different types of diseases better — by virtue of having these unnatural amino acids in them,” Romesberg says.

Before they master protein design, Romesberg’s team will have to figure out how to get their semi-synthetic organisms to make the proteins encoded in their genes in the first place — a difficult step that his team is “continuing to work on.” He reminds us that synthetic biology is only a fledgling science, and simply getting an organism to retain its unnatural Xs and Ys over multiple generations is already difficult. Now that his team has that figured out, however, they can look toward protein customization — and changing the “fundamental chemistry” of organisms themselves.

Romesberg envisions a far-off future in which human manipulation of another organism’s DNA gives way to a more symbiotic relationship. “Instead of tricking them to produce proteins you want to steal from them, you could get cells to make proteins that they use for something,” he says. For example, if introducing the X and Y bases gives rise to proteins that allow bacteria to eat only crude oil, then we are spared the work of harvesting its proteins and spreading them in the sea; we can just drop the bacteria themselves into an oil spill, and everyone will be happy. And no, there’s no chance they’ll go rogue, he says; the bacteria can’t create their own unnatural Xs and Ys, so if humans don’t supply them, they won’t be able to reproduce and will die.

The expanded genetic code could give rise to organisms with unprecedented behaviors, like bacteria that can eat oil.

Getty Images

In a way, Romesberg is to the genetic code as Shakespeare and James Joyce were to the English language. Bored with what language let them express, they simply coined new words based on old structures to convey what needed to be said. Romesberg crafted his unnatural bases using nature’s nucleotide structure as a template over 12 years. And in the past four years, he has proudly watched his expanded alphabet find permanent places in nature’s lexicon. Once the buzz about the birth of the world’s first semisynthetic organism dies down, he’s going to put it to work.

“This is not a theoretical exercise,” he says. “We want to create something that has applications.”

Related Tags