Skip to main content

UNC researcher Charles Carter, PhD, and Peter Wills, PhD, from the University of Aukland, show how genes were first translated into proteins to offer insight into a long-time scientific mystery.


CHAPEL HILL, NC – All living things use the genetic code to “translate” DNA-based genetic information into proteins, which are the main working molecules in cells. Precisely how the complex process of translation arose in the earliest stages of life on Earth more than four billion years ago has long been mysterious, but two theoretical biologists have now made a significant advance in resolving this mystery.

Charles Carter, PhD, professor of biochemistry and biophysics at the UNC School of Medicine, and Peter Wills, PhD, an associate professor of biochemistry at the University of Auckland, used advanced statistical methods to analyze how modern translational molecules fit together to perform their job – linking short sequences of genetic information to the protein building blocks they encode.

The scientists’ analysis, published in Nucleic Acids Research, reveals previously hidden rules by which key translational molecules interact today. The research suggests how the much-simpler ancestors of these molecules began to work together at the dawn of life.

“I think we have clarified the underlying rules and the evolutionary history of genetic coding,” Carter said. “This had been unresolved for 60 years.”

Wills added, “The pairs of molecular patterns we have identified may be the first that nature ever used to transfer information from one form to another in living organisms.”

The discoveries center on a cloverleaf-shaped molecule called transfer RNA (tRNA), a key player in translation. A tRNA is designed to carry a simple protein building-block, known as an amino acid, onto the assembly line of protein production within tiny molecular factories called ribosomes. When a copy or “transcript” of a gene called a messenger RNA (mRNA) emerges from the cell nucleus and enters a ribosome, it is bound to tRNAs carrying their amino acid cargoes.

The mRNA is essentially a string of genetic “letters” spelling out protein-making instructions, and each tRNA recognizes a specific three-letter sequence on the mRNA. This sequence is called a “codon.” As the tRNA binds to the codon, the ribosome links its amino acid to the amino acid that came before it, elongating the growing peptide. When completed, the chain of amino acids is released as a newly born protein.


Proteins in humans and most other life forms are made from 20 different amino acids. Thus there are 20 distinct types of tRNA molecules, each capable of linking to one particular amino acid. Partnering with these 20 tRNAs are 20 matching helper enzymes known as synthetases (aminoacyl-tRNA synthetases), whose job it is to load their partner tRNAs with the correct amino acid.

“You can think of these 20 synthetases and 20 tRNAs collectively as a molecular computer that evolution has designed to make gene-to-protein translation happen,” Carter said.

Biologists have long been intrigued by this molecular computer and the puzzle of how it originated billions of years ago. In recent years, Carter and Wills have made this puzzle their principal research focus. They have shown, for example, how the 20 synthetases, which exist in two structurally distinct classes of 10 synthetases, likely arose from just two simpler, ancestral enzymes.

A similar class division exists for amino acids, and Carter and Wills have argued that the same class division must apply to tRNAs. In other words, they propose that at the dawn of life on Earth, organisms contained just two types of tRNA, which would have worked with two types of synthetases to perform gene-to-protein translation using just two different kinds of amino acids.

The idea is that over the course of eons this system became ever more specific, as each of the original tRNAs, synthetases, and amino acids was augmented or refined by new variants until there were distinct classes of 10 in place of each of the two original tRNAs, synthetases, and amino acids.

In their most recent study, Carter and Wills examined modern tRNAs for evidence of this ancient duality. To do so they analyzed the upper part of the tRNA molecule, known as the acceptor stem, where partner synthetases bind. Their analysis showed that just three RNA bases, or letters, at the top of the acceptor stem carry an otherwise hidden code specifying rules that divide tRNAs into two classes – corresponding exactly to the two classes of synthetases.

“It is simply the combinations of these three bases that determine which class of synthetase binds to each tRNA,” Carter said.

The study serendipitously found evidence for another proposal about tRNAs. Each modern tRNA has at its lower end an “anticodon” that it uses to recognize and stick to a complementary codon on an mRNA. The anticodon is relatively distant from the synthetase binding site, but scientists since the early 1990s have speculated that tRNAs were once much smaller, combining the anticodon and synthetase binding regions in one. Wills and Carter’s analysis shows that the rules associated with one of the three class-determining bases – base number 2 in the overall tRNA molecule – effectively imply a trace of the anticodon in an ancient, truncated version of tRNA.

“This is a completely unexpected confirmation of a hypothesis that has been around for almost 30 years,” Carter said.

These findings strengthen the argument that the original translational system had just two primitive tRNAs, corresponding to two synthetases and two amino acid types. As this system evolved to recognize and incorporate new amino acids, new combinations of tRNA bases in the synthetase binding region would have emerged to keep up with the increasing complexity – but in a way that left detectable traces of the original arrangement.

“These three class-defining bases in contemporary tRNAs are like a medieval manuscript whose original texts have been rubbed out and replaced by newer texts,” Carter said.

The findings narrow the possibilities for the origins of genetic coding. Moreover, they narrow the realm of future experiments scientists could conduct to reconstruct early versions of the translational system in the laboratory – and perhaps even make this simple system evolve into more complex, modern forms of the same translation system. This would further show how life evolved from the simplest of molecules into cells and complex organisms.

The National Institute of General Medical Sciences and the John Templeton Foundation funded this research.

The opinions expressed in this publication are those of the author(s) and do not necessarily reflect the views of the John Templeton Foundation.