Unpuzzling Proteins

SGI Origin 3000
The Origin 3000. This new supercomputer is helping scientists determine the structure of proteins.
Credit: SGI

Thanks to a new supercomputer, scientists may be a step closer to understanding one of nature’s more difficult puzzles. Scientists at NASA Ames Research Center are using the SGI 512-processor Origin 3000, the most powerful parallel supercomputer of its kind, to try to determine the structure of proteins.

Proteins play a fundamental role in living cells, acting as catalysts for all chemical reactions. Proteins also act as a kind of "nervous system" for a cell, transmitting signals from the outside environment. They assist in transporting nutrients into the cell, and also help convert that food into energy.

The function of a particular protein is determined by its shape. A protein consists of a rigid backbone of carbon, oxygen and nitrogen atoms. Loosely attached to this backbone are chains of amino acids (the link between each amino acid is called a peptide bond). This chainlike molecule, which may contain from 50 to several hundred amino acids, is called a polypeptide. Some proteins only consist of one polypeptide chain, while others consist of several chains held together by weak molecular bonds. These chains of amino acids determine the shape of the protein.

"Starting with information about the amino acid sequence, it should be possible, using computational methods, to discover the structure of a protein from the sequence alone," says Andrew Pohorille, a scientist at the Ames Research Center.

But scientists have found it very difficult to determine the structure of a protein even when they know the sequence of its amino acids, because each amino acid can have five possible different orientations. Because proteins contain at least 50 amino acids, even the shortest proteins can have amino acids arranged in thousands of different possible combinations.

Amino Acid sequence
The amino acid sequence of a protein determines the higher levels of structure of the molecule. A single change in the primary structure (the amino acid sequence) can have a profound biological effect on the overall structure and function of the protein.
Credit: IACR

Chains of amino acids loop about each other in a variety of ways, folding into a distinctive shape. How they fold into each other determines both the structure and function of the protein. The difficulty in determining the structure of a protein from its constituent amino acids is known as "the protein-folding problem."

Pohorille says there are a variety of computational methods that try to address the protein-folding problem. The most direct method tries to determine the series of small steps that led to the protein being folded into a certain shape.

This method is extremely time-consuming, however, because there are approximately a thousand trillion steps. A computer capable of calculating one million of these steps per second would have to work round-the-clock for 32 years to complete this task. The largest computer simulations performed so far have extended to only a trillion steps.

"We hope that by a combination of new, powerful computers, efficient parallel programming and novel algorithms, we will be able to achieve our goal," says Pohorille. "The techniques of 3D visualization and structure manipulation that are being developed by Chris Henze and his group should greatly aid our efforts."

"What used to take a year to calculate on a single processor might be done in less than a day on a 512-processor machine," says Henze, another scientist at Ames who is working on the simulations of protein formation with Pohorille. "Nevertheless, with current supercomputer power it’ll take months or years of calculations to simulate how even a small protein molecule folds into a certain shape."

Glycophorin-A protein molecule
Glycophorin-A protein molecule. New supercomputers and efficient algorithms will allow scientists to determine the structure of proteins such as this one from its constituent amino acids.
Credit: NASA

This research is a part of a broader project funded by NASA’s Astrobiology Institute to build laboratory models of protocells (ancestors of the first cells) through a combination of experimental and computational studies. The proteins in modern cells are much more complicated than they would have been in the primitive protocells. Since protocells do not exist today, the only way to understand how they might have worked is to reconstruct them in a laboratory.

"One of the goals of astrobiology is to understand the origin of life on Earth and elsewhere in the Universe," says Pohorille. "Undoubtedly, proteins played an important role in this process. So far, attempts to construct simple, but still functional proteins have been largely unsuccessful. We want to design such proteins."

What Next?

Pohorille and his team are collaborating with the Harvard Medical School on a new experimental technique to create proteins. Pohorille says the Harvard researchers have developed small proteins that perform certain desired functions. The research results will be published in an upcoming issue of Nature.

"However, the structure of this new protein remains unknown," says Pohorille. "We want to be able to fold this protein and other, similar proteins that will be evolved soon."

Pohorille is also working on a project that involves modeling a small protein. This protein is capable of inserting itself into the cell membrane and forming channels that transport protons into the cell. Such proton transport is an essential step in converting energy to drive the cell’s chemistry.

"We already gained some understanding how this protein works – it was a subject of a short article in New Scientist last year," says Pohorille. "However, to make it useful for protocells, the protein needs to be somewhat redesigned. The new computer should make it possible."