Clues to the Last Common Ancestor

Table with the DNA code.
Credit: Access Excellence

Molecular detectives have traced human ancestry back to the so-called Mitochondrial Eve, the last female common ancestor. More recent research has posited a Y-chromosome Adam, the last male common ancestor.

Monica Riley, a microbiologist specializing in molecular evolution, is aiming farther back. She and her colleagues at The Marine Biological Laboratory in Woods Hole, Massachusetts, are looking for molecular traces of precursors to the last common ancestor of all modern cells.

Riley seeks to estimate how many proteins, and what kinds of proteins, developed into the set of proteins that the first primitive cell may have used to carry out its life processes. Sets, or families, of contemporary proteins may provide some important clues. "My goal is to amalgamate contemporary proteins into sets that could have arisen from single ancestral proteins in the distant past," Riley says.

Proteins, while individually unique in each contemporary species, fall into families, the members of which are present in most modern living species. Proteins within a family can be related by amino acid sequence, 3-D shape, function or a combination of these properties. Different research groups define protein families differently, some postulating hundreds of families and others more than a thousand. The existence of families, however, is not disputed.

Riley begins with public DNA databases now containing the complete genomes of nearly 50 microorganisms. An examination of the DNA sequence of a gene, read out of the genome database and translated using the familiar three-letter code, yields the amino acid sequence that constitutes the basic molecular structure of a protein. Special software helps researchers find amino acid sequence similarities. (See "Genomics Meets Geology" <http://nai.arc.nasa.gov/genomic.htm>)

After finding sets of protein sequences that appear to represent distinct families, Riley investigates their3-D shapes. Similar 3-D shapes can be found in more than one sequence-related family. In fact, proteins can be related by shape even when they are only distantly related by sequence.

The 3-D shape of a protein can be deduced if its sequence corresponds closely to the sequence of a protein whose 3-D shape already is known. A variety of algorithms are available that "thread" the unknown sequence along the backbone of the known structure.

Riley likens the 3-D structure of a protein to a wadded-up ball of thread. Spots on the thread that lie far apart when the thread is stretched out may come into close proximity when it’s wadded.

"The protein, in order to do its job, has to have certain amino acids within certain distances of each other at pockets of activity." Riley says. During protein synthesis, a string of amino acids folds into a precise 3-D shape, creating a pocket with particular amino acids on its surface. The molecule the protein affects, called the substrate, fits into this pocket, attracted to the amino acids. That unique fit accelerates, or catalyzes, a chemical reaction in the substrate.

X-ray crystallography is an experimental technique that exploits the fact that X-rays are diffracted by crystals.
Credit: UC Irvine

Labs can determine the 3-D shape of a protein using two methods. The first is x-ray crystallography, the method Rosalind Franklin used to provide the information on which Watson and Crick based their 3-D structure of DNA. The second is nuclear magnetic resonance (NMR). Each method has limitations, with x-ray diffraction working only on proteins that crystallize and NMR working only on small proteins. This leaves some holes in the picture.

"Our work concentrates now on enzymes as the best characterized kinds of proteins in a cell. Eventually we’ll have to leave this comfortable turf to turn to membrane-bound proteins and other more difficult types of proteins," Riley says. "We do not expect to find in current databases 3-D structures for all the kinds of proteins known to be present in a cell. Still, we can start now assembling familial information on [some] proteins.

Within some families of enzymes, the only differences among the functions of individual proteins are differences in the molecules they act upon. Members of one family, for example, all transfer the amino group of an amino acid to another molecule. The reactions of these related proteins are the same, differing only in what the donor and acceptor molecules are.

Members of other kinds of enzyme families perform a wide variety of functions, harnessing a similar chemistry to give rise to a variety of reactions.

Members of the crotonase family exhibit wide variety in both function and specific chemistry. Not all members of this family share enough of the amino acid sequence to be recognized by sequence-analysis software alone, but a close look at the 3-D structures of the active sites reveals many of the same important amino acids in much the same relative positions. There are enough 3-D structural similarities to categorize all crotonases into one family, Riley says. "The crotonase family includes proteins that are related distantly by sequence, but quite closely by structure. And depending on which amino acids are active at the catalytic site, they are able to effect different reactions."

Differences in rates of protein evolution.
Credit: BioMed Central

Riley’s research aims not just at finding family relationships among proteins. Eventually, she hopes to enumerate a set of proteins that were the ancestors of the proteins that performed all of the cellular functions in the Last Common Ancestor. "Last Common Ancestor" is the term used by scientists to describe the ancient cell that ultimately gave rise to the three great domains of life: Archaea, Bacteria and Eukarya.

"This is not a new concept with me at all," Riley says. "It’s just that we are now tackling it for the protein families that are most universally held by all of [the currently sequenced] microorganisms. How many ancestral proteins were there? If we end up with a hundred families, we will postulate a hundred ancestor proteins. Or a thousand families, a thousand ancestors."

S. Blair Hedges, a genomicist at Pennsylvania State University, sees Riley’s work as important. "What she does is quite interesting because the fossil record for these earliest events [in the origin and early evolution of life] is very poor. When you get back before the Cambrian explosion, almost everything in the fossil record is microscopic, if it’s at all present. So that’s why molecular data, protein data like this, are very useful, because you can figure out a lot of information about the evolution of life without having to rely on the very sparse fossil record."

Hedges uses the same data that Riley uses, but mines with a different aim. "We start with the same data and I extract time information from it to time splitting events of organisms, like dating the last common ancestor, for example." He uses so-called molecular clocks to construct family trees of archaeal, bacterial and eukaryotic cells.

What’s Next?

Proteins are chemically interesting molecules, of course, but it’s their functions such as moving molecules in and out of cells and breaking down food molecules (the process of metabolism) that interest researchers outside of the organic chemistry community. "If you know the complement of genes and proteins present in these early organisms then you can kind of extrapolate what they might have been doing," Hedges says.

Could Riley’s ancestral protein set lead to a picture of the metabolic functions that the last common ancestor could carry out? "Actually I hadn’t thought of undertaking that, but if you had it in front of you how many families there were and what their characteristics were you probably could make up a metabolic story," Riley says. "But that isn’t something that I would concentrate on because it’s just too speculative. I would be satisfied if we could list the unique types of proteins, one of each, that is all you would have to have to develop the metabolic diversity you have today."

"We won’t have all these answers, but it’s a path that we’re taking, to describe the first cell, to figure out from the raw beginnings what its limited capabilities would have been. … to try and define early stages of life on Earth and then hopefully to apply that to figuring out other possible origins of life elsewhere."