Proteins: Higher Orders of Structure - Structures & Functions of Proteins & Enzymes - Harper’s Illustrated Biochemistry, 29th Edition (2012)

Harper’s Illustrated Biochemistry, 29th Edition (2012)

SECTION I. Structures & Functions of Proteins & Enzymes

Chapter 5. Proteins: Higher Orders of Structure

Peter J. Kennelly, PhD & Victor W. Rodwell, PhD

OBJECTIVES

After studying this chapter, you should be able to:

Image Indicate the advantages and drawbacks of several approaches to classifying proteins.

Image Explain and illustrate the primary, secondary, tertiary, and quaternary structure of proteins.

Image Identify the major recognized types of secondary structure and explain supersecondary motifs.

Image Describe the kind and relative strengths of the forces that stabilize each order of protein structure.

Image Describe the information summarized by a Ramachandran plot.

Image Indicate the present state of knowledge concerning the stepwise process by which proteins are thought to attain their native conformation.

Image Identify the physiologic roles in protein maturation of chaperones, protein disulfide isomerase, and peptidylproline cis-trans-isomerase.

Image Describe the principal biophysical techniques used to study tertiary and quaternary structure of proteins.

Image Explain how genetic and nutritional disorders of collagen maturation illustrate the close linkage between protein structure and function.

Image For the prion diseases, outline the overall events in their molecular pathology and name the life forms each affects.

BIOMEDICAL IMPORTANCE

In nature, form follows function. In order for a newly synthesized polypeptide to mature into a biologically functional protein capable of catalyzing a metabolic reaction, powering cellular motion, or forming the macromolecular rods and cables that provide structural integrity to hair, bones, tendons, and teeth, it must fold into a specific three-dimensional arrangement, or conformation. In addition, during maturation post-translational modifications may add new chemical groups or remove transiently-needed peptide segments. Genetic or nutritional deficiencies that impede protein maturation are deleterious to health. Examples of the former include Creutzfeldt–Jakob disease, scrapie, Alzheimer’s disease, and bovine spongiform encephalopathy (“mad cow disease”). Scurvy represents a nutritional deficiency that impairs protein maturation.

CONFORMATION VERSUS CONFIGURATION

The terms configuration and conformation are often confused. Configuration refers to the geometric relationship between a given set of atoms, for example, those that distinguish L- from D-amino acids. Interconversion of configurational alternatives requires breaking (and reforming) covalent bonds. Conformation refers to the spatial relationship of every atom in a molecule. Interconversion between conformers occurs without covalent bond rupture, with retention of configuration, and typically via rotation about single bonds.

PROTEINS WERE INITIALLY CLASSIFIED BY THEIR GROSS CHARACTERISTICS

Scientists initially approached structure–function relationships in proteins by separating them into classes based upon properties such as solubility, shape, or the presence of nonprotein groups. For example, the proteins that can be extracted from cells using aqueous solutions of physiologic pH and ionic strength are classified as soluble. Extraction of integral membrane proteins requires dissolution of the membrane with detergents. Globular proteins are compact, roughly spherical molecules that have axial ratios (the ratio of their shortest to longest dimensions) of not over 3. Most enzymes are globular proteins. By contrast, many structural proteins adopt highly extended conformations. These fibrous proteins possess axial ratios of 10 or more.

Lipoproteins and glycoproteins contain covalently bound lipid and carbohydrate, respectively. Myoglobin, hemoglobin, cytochromes, and many other metalloproteins contain tightly associated metal ions. While more precise classification schemes have emerged based upon similarity, or homology, in amino acid sequence and three-dimensional structure, many early classification terms remain in use.

PROTEINS ARE CONSTRUCTED USING MODULAR PRINCIPLES

Proteins perform complex physical and catalytic functions by positioning specific chemical groups in a precise three-dimensional arrangement. The polypeptide scaffold containing these groups must adopt a conformation that is both functionally efficient and physically strong. At first glance, the biosynthesis of polypeptides comprised of tens of thousands of individual atoms would appear to be extremely challenging. When one considers that a typical polypeptide can adopt ≥1050 distinct conformations, folding into the conformation appropriate to their biologic function would appear to be even more difficult. As described in Chapters 3 and 4, synthesis of the polypeptide backbones of proteins employs a small set of common building blocks or modules, the amino acids, joined by a common linkage, the peptide bond. Similarly, a stepwise modular pathway simplifies the folding and processing of newly synthesized polypeptides into mature proteins.

FOUR ORDERS OF THE PROTEIN STRUCTURE

The modular nature of protein synthesis and folding are embodied in the concept of orders of the protein structure: primary structure—the sequence of the amino acids in a polypeptide chain; secondary structure—the folding of short (3- to 30- residue), contiguous segments of polypeptide into geometrically ordered units; tertiary structure—the assembly of secondary structural units into larger functional units such as the mature polypeptide and its component domains; and quaternary structure—the number and types of polypeptide units of oligomeric proteins and their spatial arrangement.

SECONDARY STRUCTURE

Peptide Bonds Restrict Possible Secondary Conformations

Free rotation is possible about only two of the three covalent bonds of the polypeptide backbone: the α-carbon (Cα) to the carbonyl carbon (Co) bond, and the Cα to nitrogen bond (Figure 3–4). The partial double-bond character of the peptide bond that links Co to the α-nitrogen requires that the carbonyl carbon, carbonyl oxygen, and α-nitrogen remain coplanar, thus preventing rotation. The angle about the Cα—N bond is termed the phi (Φ) angle, and that about the Co—Cα bond the psi (Ψ) angle. For amino acids other than glycine, most combinations of phi and psi angles are disallowed because of steric hindrance (Figure 5–1). The conformations of proline are even more restricted due to the absence of free rotation of the N—Cα bond.

Image

FIGURE 5–1 Ramachandran plot of the main chain phi (Φ) and psi (Ψ) angles for approximately 1000 nonglycine residues in eight proteins whose structures were solved at high resolution. The dots represent allowable combinations, and the spaces prohibited combinations, of phi and psi angles. (Reproduced, with permission, from Richardson JS: The anatomy and taxonomy of protein structures. Adv Protein Chem 1981;34:167. Copyright © 1981. Reprinted with permission from Elsevier.)

Regions of ordered secondary structure arise when a series of aminoacyl residues adopt similar phi and psi angles. Extended segments of polypeptide (eg, loops) can possess a variety of such angles. The angles that define the two most common types of secondary structure, the α helix and the β sheet, fall within the lower and upper left-hand quadrants of a Ramachandran plot, respectively (Figure 5–1).

Alpha Helix

The polypeptide backbone of an α helix is twisted by an equal amount about each α-carbon with a phi angle of approximately –57° and a psi angle of approximately –47°. A complete turn of the helix contains an average of 3.6 amino-acyl residues, and the distance it rises per turn (its pitch) is 0.54 nm (Figure 5–2). The R groups of each aminoacyl residue in an α helix face outward (Figure 5–3). Proteins contain only L-amino acids, for which a right-handed α helix is by far the more stable, and only right-handed α helices are present in proteins. Schematic diagrams of proteins represent α helices as coils or cylinders.

Image

FIGURE 5–2 Orientation of the main chain atoms of a peptide about the axis of an α helix.

Image

FIGURE 5–3 View down the axis of an α helix. The side chains (R) are on the outside of the helix. The van der Waals radii of the atoms are larger than shown here; hence, there is almost no free space inside the helix. (Slightly modified and reproduced, with permission, from Stryer L: Biochemistry, 3rd ed. Freeman, 1995. Copyright © 1995 W.H. Freeman and Company.)

The stability of an α helix arises primarily from hydrogen bonds formed between the oxygen of the peptide bond carbonyl and the hydrogen atom of the peptide bond nitrogen of the fourth residue down the polypeptide chain (Figure 5–4). The ability to form the maximum number of hydrogen bonds, supplemented by van der Waals interactions in the core of this tightly packed structure, provides the thermodynamic driving force for the formation of an α helix. Since the peptide bond nitrogen of proline lacks a hydrogen atom to contribute to a hydrogen bond, proline can only be stably accommodated within the first turn of an α helix. When present elsewhere, proline disrupts the conformation of the helix, producing a bend. Because of its small size, glycine also often induces bends in α helices.

Image

FIGURE 5–4 Hydrogen bonds (dotted lines) formed between H and O atoms stabilize a polypeptide in an α-helical conformation. (Reprinted, with permission, from Haggis GH, et al, (1964), “Introduction to Molecular Biology”. Science 146:1455-1456. Reprinted with permission from AAAS.)

Many α helices have predominantly hydrophobic R groups on one side of the axis of the helix and predominantly hydrophilic ones on the other. These amphipathic helices are well adapted to the formation of interfaces between polar and nonpolar regions such as the hydrophobic interior of a protein and its aqueous environment. Clusters of amphipathic helices can create a channel, or pore, that permits specific polar molecules to pass through hydrophobic cell membranes.

Beta Sheet

The second (hence “beta”) recognizable regular secondary structure in proteins is the β sheet. The amino acid residues of a β sheet, when viewed edge-on, form a zigzag or pleated pattern in which the R groups of adjacent residues point in opposite directions. Unlike the compact backbone of the α helix, the peptide backbone of the β sheet is highly extended. But like the a helix, β sheets derive much of their stability from hydrogen bonds between the carbonyl oxygens and amide hydrogens of peptide bonds. However, in contrast to the α helix, these bonds are formed with adjacent segments of the β sheet (Figure 5–5).

Image

FIGURE 5–5 Spacing and bond angles of the hydrogen bonds of antiparallel and parallel pleated β sheets. Arrows indicate the direction of each strand. Hydrogen bonds are indicated by dotted lines with the participating α-nitrogen atoms (hydrogen donors) and oxygen atoms (hydrogen acceptors) shown in blue and red, respectively. Backbone carbon atoms are shown in black. For clarity in presentation, R groups and hydrogen atoms are omitted. Top:Antiparallel β sheet. Pairs of hydrogen bonds alternate between being close together and wide apart and are oriented approximately perpendicular to the polypeptide backbone. Bottom: Parallel β sheet. The hydrogen bonds are evenly spaced but slant in alternate directions.

Interacting β sheets can be arranged either to form a parallel β sheet, in which the adjacent segments of the polypeptide chain proceed in the same direction amino to carboxyl, or an antiparallel sheet, in which they proceed in opposite directions (Figure 5–5). Either configuration permits the maximum number of hydrogen bonds between segments, or strands, of the sheet. Most β sheets are not perfectly flat but tend to have a right-handed twist. Clusters of twisted strands of β sheet form the core of many globular proteins (Figure 5–6). Schematic diagrams represent β sheets as arrows that point in the amino to the carboxyl terminal direction.

Image

Image

FIGURE 5–6 Examples of the tertiary structure of proteins. Top: The enzyme triose phosphate isomerase complexed with the substrate analog 2-phosphoglycerate (red). Note the elegant and symmetrical arrangement of alternating β sheets (light blue) and α helices (green), with the β sheets forming a β-barrel core surrounded by the helices. (Adapted from Protein Data Bank ID no. 1o5x.) Bottom: Lysozyme complexed with the substrate analog penta-N-acetyl chitopentaose (red). The color of the polypeptide chain is graded along the visible spectrum from purple (N-terminal) to tan (C-terminal). Notice how the concave shape of the domain forms a binding pocket for the pentasaccharide, the lack of β sheet, and the high proportion of loops and bends. (Adapted from Protein Data Bank ID no. 1sfb.)

Loops & Bends

Roughly half of the residues in a “typical” globular protein reside in α helices or β sheets, and half in loops, turns, bends, and other extended conformational features. Turns and bends refer to short segments of amino acids that join two units of the secondary structure, such as two adjacent strands of an antiparallel β sheet. A β turn involves four aminoacyl residues, in which the first residue is hydrogen-bonded to the fourth, resulting in a tight 180° turn (Figure 5–7). Proline and glycine often are present in β turns.

Image

FIGURE 5–7 A β turn that links two segments of anti-parallel. β sheet. The dotted line indicates the hydrogen bond between the first and fourth amino acids of the four-residue segment Ala-Gly-Asp-Ser.

Loops are regions that contain residues beyond the minimum number necessary to connect adjacent regions of secondary structure. Irregular in conformation, loops nevertheless serve key biologic roles. For many enzymes, the loops that bridge domains responsible for binding substrates often contain aminoacyl residues that participate in catalysis. Helix-loop-helix motifs provide the oligonucleotide-binding portion of many DNA-binding proteins such as repressors and transcription factors. Structural motifs such as the helix-loop-helix motif that are intermediate in scale between secondary and tertiary structures are often termed supersec-ondary structures. Since many loops and bends reside on the surface of proteins and are thus exposed to solvent, they constitute readily accessible sites, or epitopes, for recognition and binding of antibodies.

While loops lack apparent structural regularity, many adopt a specific conformation stabilized through hydrogen bonding, salt bridges, and hydrophobic interactions with other portions of the protein. However, not all portions of proteins are necessarily ordered. Proteins may contain “disordered” regions, often at the extreme amino or carboxyl terminal, characterized by high conformational flexibility. In many instances, these disordered regions assume an ordered conformation upon binding of a ligand. This structural flexibility enables such regions to act as ligand-controlled switches that affect protein structure and function.

Tertiary & Quaternary Structure

The term “tertiary structure” refers to the entire three-dimensional conformation of a polypeptide. It indicates, in three-dimensional space, how secondary structural features—helices, sheets, bends, turns, and loops—assemble to form domains and how these domains relate spatially to one another. A domain is a section of the protein structure sufficient to perform a particular chemical or physical task such as binding of a substrate or other ligand. Most domains are modular in nature, and contiguous in both primary sequence and three-dimensional space (Figure 5–8). Simple proteins, particularly those that interact with a single substrate, such as lysozyme or triose phosphate isomerase (Figure 5–6) and the oxygen storage protein myoglobin (Chapter 6), often consist of a single domain. By contrast, lactate dehydrogenase is comprised of two domains, an N-terminal NAD+ -binding domain and a C-terminal binding domain for the second substrate, pyruvate (Figure 5–8). Lactate dehydrogenase is one of the family of oxidoreductases that share a common N-terminal NAD(P)+ -binding domain known as the Rossmann fold. By fusing the Rossmann fold domain to a variety of C-terminal domains, a large family of oxidoreductases have evolved that utilize NAD(P)+/NAD(P)H for the oxidation and reduction of a wide range of metabolites. Examples include alcohol dehydrogenase, glyceraldehyde-3-phosphate dehydrogenase, malate dehydrogenase, quinone oxidoreductase, 6-phosphog-luconate dehydrogenase, D-glycerate dehydrogenase, formate dehydrogenase, and 3α, 20β-hydroxysteroid dehydrogenase.

Image

FIGURE 5–8 Polypeptides containing two domains. Top: Shown is the three-dimensional structure of a monomer unit of the tetrameric enzyme lactate dehydrogenase with the substrates NADH (red) and pyruvate (blue) bound. Not all bonds in NADH are shown. The color of the polypeptide chain is graded along the visible spectrum from blue (N-terminal) to orange (C-terminal). Note how the N-terminal portion of the polypeptide forms a contiguous domain, encompassing the left portion of the enzyme, responsible for binding NADH. Similarly, the C-terminal portion forms a contiguous domain responsible for binding pyruvate. (Adapted from Protein Data Bank ID no. 3ldh.) Bottom: Shown is the three-dimensional structure of the catalytic subunit of the cAMP-dependent protein kinase (Chapter 42) with the substrate analogs ADP (red) and peptide (purple) bound. The color of the polypeptide chain is graded along the visible spectrum from blue (N-terminal) to orange (C-terminal). Protein kinases transfer the γ-phosphate group of ATP to protein and peptide substrates (Chapter 9). Note how the N-terminal portion of the polypeptide forms a contiguous domain rich in β sheet that binds ADP. Similarly, the C-terminal portion forms a contiguous, α helix-rich domain responsible for binding the peptide substrate. (Adapted from Protein Data Bank ID no. 1jbp.)

Not all domains bind substrates. Hydrophobic membrane domains anchor proteins to membranes or enable them to span membranes. Localization sequences target proteins to specific subcellular or extracellular locations such as the nucleus, mitochondria, secretory vesicles, etc. Regulatory domains trigger changes in protein function in response to the binding of allosteric effectors or covalent modifications (Chapter 9). Combining domain modules provides a facile route for generating proteins of great structural complexity and functional sophistication (Figure 5–9).

Image

FIGURE 5–9 Some multidomain proteins. The rectangles represent the polypeptide sequences of a forkhead transcription factor; 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase, a bifunctional enzyme whose activities are controlled in a reciprocal fashion by allosteric effectors and covalent modification (Chapter 20); phenylalanine hydroxylase (Chapters 27 and 29), whose activity is stimulated by phosphorylation of its regulatory domain; and the epidermal growth factor (EGF) receptor (Chapter 41), a transmembrane protein whose intracellular protein kinase domain is regulated via the binding of the peptide hormone EGF to its extracellular domain. Regulatory domains are colored green, catalytic domains dark blue and light blue, protein–protein interaction domains light orange, DNA binding domains dark orange, nuclear localization sequences medium blue, and transmembrane domains yellow. The kinase and bisphosphatase activities of 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase are catalyzed by the N- and C-terminal proximate catalytic domains, respectively.

Proteins containing multiple domains can also be assembled through the association of multiple polypeptides, or protomers. Quaternary structure defines the polypeptide composition of a protein and, for an oligomeric protein, the spatial relationships between its protomers or subunits. Monomeric proteins consist of a single polypeptide chain. Dimeric proteins contain two polypeptide chains. Homodimers contain two copies of the same polypeptide chain, while in a heterodimer the polypeptides differ. Greek letters (α, β, γ, etc) are used to distinguish different subunits of a hetero-oligomeric protein, and subscripts indicate the number of each subunit type. For example, α4 designates a homotetrameric protein, and α2β2γ a protein with five subunits of three different types.

Since even small proteins contain many thousands of atoms, depictions of protein structure that indicate the position of every atom are generally too complex to be readily interpreted. Simplified schematic diagrams thus are used to depict the key features of a protein’s tertiary and quaternary structure. Ribbon diagrams (Figures 5–6 and 5–8) trace the conformation of the polypeptide backbone, with cylinders and arrows indicating regions of α helix and β sheet, respectively. In an even simpler representation, line segments that link the α carbons indicate the path of the polypeptide backbone. These schematic diagrams often include the side chains of selected amino acids that emphasize specific structure-function relationships.

MULTIPLE FACTORS STABILIZE THETERTIARY & QUATERNARY STRUCTURE

Higher orders of protein structure are stabilized primarily—and often exclusively—by noncovalent interactions. Principal among these are hydrophobic interactions that drive most hydrophobic amino acid side chains into the interior of the protein, shielding them from water. Other significant contributors include hydrogen bonds and salt bridges between the carboxylates of aspartic and glutamic acid and the oppositely charged side chains of protonated lysyl, argininyl, and histidyl residues. While individually weak relative to a typical covalent bond of 80–120 kcal/mol, collectively these numerous interactions confer a high degree of stability to the biologically functional conformation of a protein, just as a Velcro fastener harnesses the cumulative strength of a multitude of tiny plastic loops and hooks.

Some proteins contain covalent disulfide (S—S) bonds that link the sulfhydryl groups of cysteinyl residues. Formation of disulfide bonds involves oxidation of the cysteinyl sulfhydryl groups and requires oxygen. Intrapolypeptide disulfide bonds further enhance the stability of the folded conformation of a peptide, while interpolypeptide disulfide bonds stabilize the quaternary structure of certain oligomeric proteins.

THREE-DIMENSIONAL STRUCTURE IS DETERMINED BY X-RAY CRYSTALLOGRAPHY OR BY NMR SPECTROSCOPY

X-Ray Crystallography

Following the solution in 1960 by John Kendrew of the three-dimensional structure of myoglobin, x-ray crystallography revealed the structures of thousands of biological macromolecules ranging from proteins to many oligonucleotides and a few viruses. For the solution of its structure by x-ray crystallography, a protein is first precipitated under conditions that form large, well-ordered crystals. To establish appropriate conditions, crystallization trials use a few microliters of protein solution and a matrix of variables (temperature, pH, presence of salts or organic solutes such as polyethylene glycol) to establish optimal conditions for crystal formation. Crystals mounted in quartz capillaries are first irradiated with monochromatic x-rays of approximate wavelength 0.15 nm to confirm that they are protein, not salt. Protein crystals may then be frozen in liquid nitrogen for subsequent collection of a high-resolution data set. The patterns formed by the x-rays that are diffracted by the atoms in their path are recorded on a photographic plate or its computer equivalent as a circular pattern of spots of varying intensity. The data inherent in these spots are then analyzed using a mathematical approach termed a Fourier synthesis, which summates wave functions. The wave amplitudes are related to spot intensity, but since the waves are not in phase, the relationship between their phases must next be determined.

The traditional approach to solution of the “phase problem” employs isomorphous displacement. Prior to irradiation, an atom with a distinctive x-ray “signature” is introduced into a crystal at known positions in the primary structure of the protein. Heavy atom isomorphous displacement generally uses mercury or uranium, which bind to cysteine residues. An alternative approach uses the expression of plasmid-encoded recombinant proteins in which selenium replaces the sulfur of methionine. Expression uses a bacterial host auxotrophic for methionine biosynthesis and a defined medium in which selenomethionine replaces methionine. Alternatively, if the unknown structure is similar to one that has already been solved, molecular replacement on an existing model provides an attractive way to phase the data without the use of heavy atoms. Finally, the results from the phasing and Fourier summations provide an electron density profile or three-dimensional map of how the atoms are connected or related to one another.

Laue X-Ray Crystallography

The ability of some crystallized enzymes to catalyze chemical reactions strongly suggests that structures determined by crystallography are indeed representative of the structures present in the free solution. Classic crystallography provides, however, an essentially static picture of a protein that may undergo significant structural changes in vivo, such as those that accompany enzymic catalysis. The Laue approach uses diffraction of polychromatic x-rays, and many crystals. The time-consuming process of rotating the crystal in the x-ray beam is avoided, which permits the use of extremely short exposure times. Detection of the motions of residues or domains of an enzyme during catalysis uses crystals that contain an inactive or “caged” substrate analog. An intense flash of visible light cleaves the caged precursor to release free substrate and initiate catalysis in a precisely controlled manner. Using this approach, data can be collected over time periods as short as a few nanoseconds.

Nuclear Magnetic Resonance Spectroscopy

Nuclear magnetic resonance (NMR) spectroscopy, a powerful complement to x-ray crystallography, measures the absorbance of radio frequency electromagnetic energy by certain atomic nuclei. “NMR-active” isotopes of biologically relevant elements include 1H, 13C, 15N, and 31Pn,. The frequency, or chemical shift, at which a particular nucleus absorbs energy is a function of both the functional group within which it resides and the proximity of other NMR-active nuclei. Once limited to metabolites and relatively small macromolecules, ≤30 kDa, today proteins and protein complexes of >100 kDa can be analyzed by NMR. Two-dimensional NMR spectroscopy permits a three-dimensional representation of a protein to be constructed by determining the proximity of these nuclei to one another. NMR spectroscopy analyzes proteins in aqueous solution. Not only does this obviate the need to form crystals (a particular advantage when dealing with difficult to crystallize membrane proteins), it renders possible real-time observation of the changes in conformation that accompany ligand binding or catalysis. It also offers the possibility of perhaps one day being able to observe the structure and dynamics of proteins (and metabolites) within living cells.

Molecular Modeling

A valuable adjunct to the empirical determination of the three-dimensional structure of proteins is the use of computer technology for molecular modeling. When the three-dimensional structure is known, molecular dynamicsprograms can be used to simulate the conformational dynamics of a protein and the manner in which factors such as temperature, pH, ionic strength, or amino acid substitutions influence these motions. Molecular docking programs simulate the interactions that take place when a protein encounters a substrate, inhibitor, or other ligand. Virtual screening for molecules likely to interact with key sites on a protein of biomedical interest is extensively used to facilitate the discovery of new drugs.

Molecular modeling is also employed to infer the structure of proteins for which x-ray crystallographic or NMR structures are not yet available. Secondary structure algorithms weigh the propensity of specific residues to become incorporated into α helices or β sheets in previously studied proteins to predict the secondary structure of other polypeptides. In homology modeling, the known three-dimensional structure of a protein is used as a template upon which to erect a model of the probable structure of a related protein. Scientists are working to devise computer programs that will reliably predict the three-dimensional conformation of a protein directly from its primary sequence, thereby permitting determination of the structures of the many unknown proteins for which templates currently are lacking.

PROTEIN FOLDING

Proteins are conformationally dynamic molecules that can fold into their functionally competent conformation in a time frame of milliseconds, and often can refold if their conformation becomes disrupted, or denatured. How is this remarkable process achieved? Folding into the native state does not involve a haphazard search of all possible structures. Denatured proteins are not just random coils. Native contacts are favored, and regions of the native structure persist even in the denatured state. Discussed below are factors that facilitate folding and refolding, and current concepts and proposed mechanisms based on more than 40 years of largely in vitro experimentation.

Native Conformation of a Protein is Thermodynamically Favored

The number of distinct combinations of phi and psi angles specifying potential conformations of even a relatively small—15 kDa—polypeptide is unbelievably vast. Proteins are guided through this vast labyrinth of possibilities by thermodynamics. Since the biologically relevant—or native—conformation of a protein generally is the one that is most energetically favored, knowledge of the native conformation is specified in the primary sequence. However, if one were to wait for a polypeptide to find its native conformation by random exploration of all possible conformations, the process would require billions of years to complete. Clearly, in nature, protein folding takes place in a more orderly and guided fashion.

Folding Is Modular

Protein folding generally occurs via a stepwise process. In the first stage, as the newly synthesized polypeptide emerges from the ribosome, short segments fold into secondary structural units that provide local regions of organized structure. Folding is now reduced to the selection of an appropriate arrangement of this relatively small number of secondary structural elements. In the second stage, the hydrophobic regions segregate into the interior of the protein away from solvent, forming a “molten globule,” a partially folded polypeptide in which the modules of the secondary structure rearrange until the mature conformation of the protein is attained. This process is orderly, but not rigid. Considerable flexibility exists in the ways and in the order in which elements of secondary structure can be rearranged. In general, each element of the secondary or super-secondary structure facilitates proper folding by directing the folding process toward the native conformation and away from unproductive alternatives. For oligomeric proteins, individual protomers tend to fold before they associate with other subunits.

Auxiliary Proteins Assist Folding

Under appropriate laboratory conditions, many proteins will spontaneously refold after being denatured (ie, unfolded) by treatment with acid or base, chaotropic agents, or detergents. However, refolding under these conditions is slow—minutes to hours. Moreover, many proteins fail to spontaneously refold in vitro. Instead they form insoluble aggregates, disordered complexes of unfolded or partially folded polypeptides held together predominantly by hydrophobic interactions. Aggregates represent unproductive dead ends in the folding process. Cells employ auxiliary proteins to speed the process of folding and to guide it toward a productive conclusion.

Chaperones

Chaperone proteins participate in the folding of over half of all mammalian proteins. The hsp70 (70 kDa heat shock protein) family of chaperones binds short sequences of hydrophobic amino acids that emerge while a new polypeptide is being synthesized, shielding them from solvent. Chaperones prevent aggregation, thus providing an opportunity for the formation of appropriate secondary structural elements and their subsequent coalescence into a molten globule. The hsp60 family of chaperones, sometimes called chaperonins, differ in sequence and structure from hsp70 and its homologs. Hsp60 acts later in the folding process, often together with an hsp70 chaper-one. The central cavity of the donut-shaped hsp60 chaperone provides a sheltered environment in which a polypeptide can fold until all hydrophobic regions are buried in its interior, thus preempting any tendency toward aggregation.

Protein Disulfide Isomerase

Disulfide bonds between and within polypeptides stabilize tertiary and quaternary structures. However, disulfide bond formation is nonspecific. Under oxidizing conditions, a given cysteine can form a disulfide bond with the—SH of any accessible cysteinyl residue. By catalyzing disulfide exchange, the rupture of an S—S bond and its reformation with a different partner cysteine, protein disulfide isomerase facilitates the formation of disulfide bonds that stabilize a protein’s native conformation.

Proline-cis, trans-Isomerase

All X-Pro peptide bonds—where X represents any residue—are synthesized in the trans configuration. However, of the X-Pro bonds of mature proteins, approximately 6% are cis. The cis configuration is particularly common in β turns. Isomerization from trans to cis is catalyzed by the enzyme proline-cis, trans-isomerase (Figure 5–10).

Image

FIGURE 5–10 Isomerization of the N1 prolyl peptide bond from a cis to a trans configuration relative to the backbone of the polypeptide.

Folding Is a Dynamic Process

Proteins are conformationally dynamic molecules that can fold and unfold hundreds or thousands of times in their lifetime. How do proteins, once unfolded, refold and restore their functional conformation? First, unfolding rarely leads to the complete randomization of the polypeptide chain inside the cell. Unfolded proteins generally retain a number of contacts and regions of the secondary structure that facilitate the refolding process. Second, chaperone proteins can “rescue” unfolded proteins that have become thermodynamically trapped in a misfolded dead end by unfolding hydrophobic regions and providing a second chance to fold productively. Glutathione can reduce inappropriate disulfide bonds that may be formed upon exposure to oxidizing agents such as O2, hydrogen peroxide, or superoxide (Chapter 52).

PERTURBATION OF PROTEIN CONFORMATION MAY HAVE PATHOLOGIC CONSEQUENCES

Prions

The transmissible spongiform encephalopathies, or prion diseases, are fatal neurodegenerative diseases characterized by spongiform changes, astrocytic gliomas, and neuronal loss resulting from the deposition of insoluble protein aggregates in neural cells. They include Creutzfeldt–Jakob disease in humans, scrapie in sheep, and bovine spongiform encephalopathy (mad cow disease) in cattle. A variant form of Creutzfeldt-Jacob disease (vCJD) that afflicts younger patients is associated with early-onset psychiatric and behavioral disorders. Prion diseases may manifest themselves as infectious, genetic, or sporadic disorders. Because no viral or bacterial gene encoding the pathologic prion protein could be identified, the source and mechanism of transmission of prion disease long remained elusive.

Today it is recognized that prion diseases are protein conformation diseases transmitted by altering the conformation, and hence the physical properties, of proteins endogenous to the host. Human prion-related protein (PrP), a glycoprotein encoded on the short arm of chromosome 20, normally is monomeric and rich in α helix. Pathologic prion proteins serve as the templates for the conformational transformation of normal PrP, known as PrPc, into PrPsc. PrPsc is rich in β sheet with many hydrophobic aminoacyl side chains exposed to solvent. As each new PrPsc molecule is formed, it triggers the production of yet more pathologic variants in a conformational chain reaction. Because PrPsc molecules associate strongly with one other through their exposed hydrophobic regions, the accumulating PrPsc units coalesce to form insoluble protease-resistant aggregates. Since one pathologic prion or prion-related protein can serve as template for the confor-mational transformation of many times its number of PrPc molecules, prion diseases can be transmitted by the protein alone without involvement of DNA or RNA.

Alzheimer’s Disease

Refolding or misfolding of another protein endogenous to human brain tissue, β-amyloid, is a prominent feature of Alzheimer’s disease. While the main cause of Alzheimer’s disease remains elusive, the characteristic senile plaques and neurofibrillary bundles contain aggregates of the protein β-amyloid, a 4.3 kDa polypeptide produced by proteolytic cleavage of a larger protein known as amyloid precursor protein. In Alzheimer’s disease patients, levels of β-amyloid become elevated, and this protein undergoes a conformational transformation from a soluble α helix-rich state to a state rich in β sheet and prone to self-aggregation. Apolipoprotein E has been implicated as a potential mediator of this conformational transformation.

Beta-Thalassemias

Thalassemias are caused by genetic defects that impair the synthesis of one of the polypeptide subunits of hemoglobin (Chapter 6). During the burst of hemoglobin synthesis that occurs during erythrocyte development, a specific chaperone called α-hemoglobin-stabilizing protein (AHSP) binds to free hemoglobin α-subunits awaiting incorporation into the hemoglobin multimer. In the absence of this chaperone, free α-hemoglobin subunits aggregate, and the resulting precipitate has cytotoxic effects on the developing erythrocyte. Investigations using genetically modified mice suggest a role for AHSP in modulating the severity of β-thalassemia in human subjects.

COLLAGEN ILLUSTRATES THE ROLE OF POSTTRANSLATIONAL PROCESSING IN PROTEIN MATURATION

Protein Maturation Often Involves Making & Breaking of Covalent Bonds

The maturation of proteins into their final structural state often involves the cleavage or formation (or both) of covalent bonds, a process of post-translational modification. Many polypeptides are initially synthesized as larger precursors called proproteins. The “extra” polypeptide segments in these proproteins often serve as leader sequences that target a polypeptide to a particular organelle or facilitate its passage through a membrane. Other segments ensure that the potentially harmful activity of a protein such as the proteases trypsin and chymotrypsin remains inhibited until these proteins reach their final destination. However, once these transient requirements are fulfilled and the now superfluous peptide regions are removed by selective proteolysis. Other covalent modifications may take place that add new chemical functionalities to a protein. The maturation of collagen illustrates both of these processes.

Collagen Is a Fibrous Protein

Collagen is the most abundant of the fibrous proteins that constitute more than 25% of the protein mass in the human body. Other prominent fibrous proteins include keratin and myosin. These fibrous proteins represent a primary source of structural strength for cells (ie, the cytoskeleton) and tissues. Skin derives its strength and flexibility from an intertwined mesh of collagen and keratin fibers, while bones and teeth are buttressed by an underlying network of collagen fibers analogous to steel strands in reinforced concrete. Collagen also is present in connective tissues such as ligaments and tendons. The high degree of tensile strength required to fulfill these structural roles requires elongated proteins characterized by repetitive amino acid sequences and a regular secondary structure.

Collagen Forms a Unique Triple Helix

Tropocollagen, the repeating unit of a mature collagen fiber, consists of three collagen polypeptides, each containing about 1000 amino acids, bundled together in a unique conformation, the collagen triple helix (Figure 5–11). A mature collagen fiber forms an elongated rod with an axial ratio of about 200. Three intertwined polypeptide strands, which twist to the left, wrap around one another in a right-handed fashion to form the collagen triple helix. The opposing handedness of this superhelix and its component polypeptides makes the collagen triple helix highly resistant to unwinding—a principle also applied to the steel cables of suspension bridges. A collagen triple helix has 3.3 residues per turn and a rise per residue nearly twice that of an α helix. The R groups of each polypeptide strand of the triple helix pack so closely that, in order to fit, one of the three must be H. Thus, every third amino acid residue in collagen is a glycine residue. Staggering of the three strands provides appropriate positioning of the requisite glycines throughout the helix. Collagen is also rich in proline and hydroxyproline, yielding a repetitive Gly-X-Y pattern (Figure 5–11) in which Y generally is proline or hydroxyproline.

Image

FIGURE 5–11 Primary, secondary, and tertiary structures of collagen.

Collagen triple helices are stabilized by hydrogen bonds between residues in different polypeptide chains, a process helped by the hydroxyl groups of hydroxyprolyl residues. Additional stability is provided by covalent cross links formed between modified lysyl residues both within and between polypeptide chains.

Collagen Is Synthesized as a Larger Precursor

Collagen is initially synthesized as a larger precursor poly-peptide, procollagen. Numerous prolyl and lysyl residues of procollagen are hydroxylated by prolyl hydroxylase and lysyl hydroxylase, enzymes that require ascorbic acid (vitamin C; see Chapters 27 and 44). Hydroxyprolyl and hydroxylysyl residues provide additional hydrogen bonding capability that stabilizes the mature protein. In addition, glucosyl and galac-tosyl transferases attach glucosyl or galactosyl residues to the hydroxyl groups of specific hydroxylysyl residues.

The central portion of the precursor polypeptide then associates with other molecules to form the characteristic triple helix. This process is accompanied by the removal of the globular amino terminal and carboxyl terminal extensions of the precursor polypeptide by selective proteolysis. Certain lysyl residues are modified by lysyl oxidase, a copper-containing protein that converts ε-amino groups to aldehydes. The aldehydes can either undergo an aldol condensation to form a Image double bond or to form a Schiff base (eneimine) with the ε-amino group of an unmodified lysyl residue, which is subsequently reduced to form a C—N single bond. These covalent bonds cross-link the individual polypeptides and imbue the fiber with exceptional strength and rigidity.

Nutritional & Genetic Disorders Can Impair Collagen Maturation

The complex series of events in collagen maturation provide a model that illustrates the biologic consequences of incomplete polypeptide maturation. The best-known defect in collagen biosynthesis is scurvy, a result of a dietary deficiency of vitamin C required by prolyl and lysyl hydroxylases. The resulting deficit in the number of hydroxyproline and hydroxylysine residues undermines the conformational stability of collagen fibers, leading to bleeding gums, swelling joints, poor wound healing, and ultimately death. Menkes’ syndrome, characterized by kinky hair and growth retardation, reflects a dietary deficiency of the copper required by lysyl oxidase, which catalyzes a key step in the formation of the covalent cross-links that strengthen collagen fibers.

Genetic disorders of collagen biosynthesis include several forms of osteogenesis imperfecta, characterized by fragile bones. In the Ehlers–Danlos syndrome, a group of connective tissue disorders that involve impaired integrity of supporting structures, defects in the genes that encode α collagen-1, procollagen N-peptidase, or lysyl hydroxylase result in mobile joints and skin abnormalities (see also Chapter 48).

SUMMARY

Image Proteins may be classified based on their solubility, shape, or function or on the presence of a prosthetic group, such as heme.

Image The gene-encoded primary structure of a polypeptide is the sequence of its amino acids. Its secondary structure results from folding of polypeptides into hydrogen-bonded motifs such as the α helix, the β pleated sheet, β bends, and loops. Combinations of these motifs can form supersecondary motifs.

Image Tertiary structure concerns the relationships between secondary structural domains. Quaternary structure of proteins with two or more polypeptides (oligomeric proteins) concerns the spatial relationships between various types of polypeptides.

Image Primary structures are stabilized by covalent peptide bonds. Higher orders of structure are stabilized by weak forces—multiple hydrogen bonds, salt (electrostatic) bonds, and association of hydrophobic R groups.

Image The phi (Φ) angle of a polypeptide is the angle about the Cα—N bond; the psi (Ψ) angle is that about the Cα—Co bond. Most combinations of phi-psi angles are disallowed due to steric hindrance. The phi-psi angles that form the α helix and the β sheet fall within the lower and upper left-hand quadrants of a Ramachandran plot, respectively.

Image Protein folding is a poorly understood process. Broadly speaking, short segments of newly synthesized polypeptide fold into secondary structural units. Forces that bury hydrophobic regions from solvent then drive the partially folded polypeptide into a “molten globule” in which the modules of the secondary structure are rearranged to give the native conformation of the protein.

Image Proteins that assist folding include protein disulfide isomerase, proline-cis, trans-isomerase, and the chaperones that participate in the folding of over half of mammalian proteins. Chaperones shield newly synthesized polypeptides from solvent and provide an environment for elements of secondary structure to emerge and coalesce into molten globules.

Image X-Ray crystallography and NMR are key techniques used to study higher orders of protein structure.

Image Prions—protein particles that lack nucleic acid—cause fatal transmissible spongiform encephalopathies such as Creutzfeldt–Jakob disease, scrapie, and bovine spongiform encephalopathy. Prion diseases involve an altered secondarytertiary structure of a naturally occurring protein, PrPc. When PrPc interacts with its pathologic isoform PrPSc, its conformation is transformed from a predominantly α-helical structure to the β-sheet structure characteristic of PrPSc.

Image Collagen illustrates the close linkage between protein structure and biologic function. Diseases of collagen maturation include Ehlers–Danlos syndrome and the vitamin C deficiency disease scurvy.

REFERENCES

Caughey B, Baron GS, Chesebro B, et al: Getting a grip on prions: oligomers, amyloids, and pathological membrane interactions. Annu Rev Biochem 2009;78:177.

Chiti F, Dobson CM: Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem 2006;75:519.

Foster MP, McElroy CA, Amero CD: Solution NMR of large molecules and assemblies. Biochemistry 2007;46:331.

Gothel SF, Marahiel MA: Peptidyl-prolyl cis–trans isomerases, a superfamily of ubiquitous folding catalysts. Cell Mol Life Sci 1999;55:423.

Hardy J: Toward Alzheimer therapies based on genetic knowledge. Annu Rev Med 2004;55:15.

Hartl FU, Hayer-Hartl M: Converging concepts of protein folding in vitro and in vivo. Nat Struct Biol 2009;16:574.

Ho BK, Thomas A, Brasseur R: Revisiting the Ramachandran plot: hard-sphere repulsion, electrostatics, and H-bonding in the α-helix. Protein Sci 2003;12:2508.

Hristova K, Wimley WC, Mishra VK, et al: An amphipathic alpha-helix at a membrane interface: a structural study using a novel X-ray diffraction method. J Mol Biol 1999;290:99.

Irani DN, Johnson RT: Diagnosis and prevention of bovine spongiform encephalopathy and variant Creutzfeldt–Jakob disease. Annu Rev Med 2003;54:305.

Jorgensen WL: The many roles of computation in drug discovery. Science 2004;303:1813.

Khare SD, Dokholyan NV: Molecular mechanisms of polypeptide aggregation in human disease. Curr Protein Pept Sci 2007;8:573.

Kim J, Holtzman DM: Prion-like behavior of amyloid-β. Science 2010;330:918.

Kong Y, Zhou S, Kihm AJ, et al: Loss of alpha-hemoglobin-stabilizing protein impairs erythropoiesis and exacerbates beta-thalassemia. J Clin Invest 2004;114:1457.

Myllyharju J: Prolyl 4-hydroxylases, the key enzymes of collagen biosynthesis. Matrix Biol 2003;22:15.

Rider MH, Bertrand L, Vertommen D, et al: 6-Phosphofructo-2-kinase/fructose-2,6-bisphosphatase: head-to-head with a bifunctional enzyme that controls glycolysis. Biochem J 2004;381:561.

Shoulders MD, Raines RT: Collagen structure and stability. Annu Rev Biochem 2009;78:929.

Stoddard BL, Cohen BE, Brubaker M, et al: Millisecond Laue structures of an enzyme-product complex using photocaged substrate analogs. Nat Struct Biol 1998;5:891.

Wegrzyn RD, Deuerling E: Molecular guardians for newborn proteins: ribosome-associated chaperones and their role in protein folding. Cellular Mol Life Sci 2005;62:2727.

Young JC, Moarefi I, Hartl FU: Hsp90: a specialized but essential protein-folding tool. J Cell Biol 2001;154:267.