CHEMICAL BIOLOGY
Chemical Properties of Amino Acids
Donard S. Dwyer, Louisiana State University Health Sciences Center, Shreveport
doi: 10.1002/9780470048672.wecb007
A complete understanding of protein folding, protein recognition, enzyme catalysis, and allosteric regulation will require intimate knowledge about the fundamental physico-chemical properties of amino acids. This article will summarize the chemistry of the 20 naturally occurring amino acids with special emphasis on the relationship to protein folding and biologic activity. The major features of the amino acids are classified into three categories, which are as follows: 1) physico-chemical properties, 2) biologic properties, and 3) electronic properties. Cross-correlations among more than 34 different parameters revealed that these measures segregate into four main properties that are largely independent, as follows: 1) steric effects (polarizability), 2) hydrophilicity (Hp index), 3) inductive effects, and 4) field effects. Advances in quantum chemistry, nuclear magnetic resonance (NMR) analysis, and theoretic physics encourage efforts to derive all-electronic expressions for the fundamental properties of amino acids that will provide mechanistic insights into protein structure and function.
Proteins are complex polymers composed mainly of the 20 naturally occurring amino acids arranged in a series of peptide linkages. The precise sequence of amino acids determines the folding of a polypeptide chain and its ultimate 3-dimensional (3-D) structure. In addition, the amino acid sequence and the geometry of amino acid side chains specify protein binding sites and functional activities. Based on a comparison of heavy atoms, the 20 amino acids are on average greater than 50% identical in terms of their composition. Yet, it is the differences (sometimes very subtle) among individual amino acids that cause the variety of proteins in nature: from heat-stable DNA polymerases of thermal vent bacteria to antifreeze proteins of arctic fish, and from highly conserved structural proteins of microtubules to highly diverse signaling molecules of the G-protein-coupled receptor family. These differences are the focus of this review. The article is divided into three main sections that highlight the overall biologic relevance, the specific chemical properties of amino acid side chains, and methods for additional characterization of these properties.
Biologic Background
What is the biologic significance of diversity in amino acid side chains? The amino acid sequence is the blueprint for protein structure. Consequently, the complexity of protein structures is a function of the variety and the length of the sequences of polypeptide chains. In fact, multiple amino acid sequence alignments have improved the accuracy of secondary structure prediction and homology modeling greatly. Nevertheless, it is not known exactly how the properties of a single amino acid or a short stretch of amino acids determine the probability of that residue or sequence assuming a particular secondary structure. This issue is complicated additionally by the fact that identical sequences of five or more amino acids assume different secondary structures in proteins depending on the context (1, 2). A deeper appreciation of the chemical properties of amino acid side chains may improve modeling efforts and the prediction of secondary structure.
Whereas the 3-D structure of a protein is specified by its amino acid sequence, the active sites of receptors and enzymes are determined largely by the topological arrangement of noncontiguous amino acid side chains. Typically, ligand binding and enzyme catalysis require a precise geometry of functional groups of side chains to achieve specificity and catalytic activity. Moreover, amino acid side chains with distinct chemical properties are well suited for specialized tasks. For example, the presence of a cysteine residue in the active site defines a family of proteases (the caspases) that cleave at aspartic acid motifs in substrate proteins involved in apoptosis.
Introduction of mutations into proteins to study the effects on protein structure and function is now a routine application of molecular biology. In many cases, the goal is to evaluate the contribution of a single residue or small segment of a protein to overall activity through site-directed mutagenesis. One issue with this approach concerns the nature of conservative amino acid substitutions. For instance, serine is often considered a conservative replacement for threonine (3), yet the propensity of these two residues for secondary structure is distinct with serine preferring coil or loop structures and threonine preferring P-strand conformations. A second goal of mutagenesis studies is to mimic posttranslational modifications, such as phosphorylation of serine, by substitution with aspartic acid to create a constitutively active form of a signaling molecule. Therefore, precise information about the physico-chemical properties of the side chains may be essential to decide which amino acid to use as a substitute in site-directed mutagenesis.
More detailed analysis of the chemistry of amino acid side chains is required to understand the protein recognition and the binding of small molecules including drugs. The 3-D structures of proteins in the apo form and with ligand bound are used increasingly as the starting point for structure-based drug discovery (4). Whereas early studies viewed ligand-binding sites of proteins as rigid structures—a “lock and key” arrangement—it is now clear that these sites are flexible and examples of induced fit abound. This realization has complicated efforts to develop automated docking software to analyze the fit and the orientation of small molecules in a defined protein-binding site. More advanced docking methods now incorporate side chain flexibility into the computational program (4). In addition, local electric fields may play a significant role in the selectivity of protein binding sites and the stabilization of ligand binding (5). Additional characterization of salient side chain properties will enhance docking analysis and the investigation of electric fields at protein active sites. As a starting point, the physico-chemical properties of the 20 naturally occurring amino acids are summarized in the next section.
Chemistry
Before discussing the chemistry of amino acid side chains, it is worthwhile to consider briefly the unique structure of amino acids and the functional implications. During evolution, amino acids likely were among the first chemical compounds to emerge on primitive earth (6, 7). Early experiments on the origins of life sought to recreate primordial atmospheric conditions with hydrogen, methane, ammonia, and water, and then to introduce a source of energy (e.g., electric discharges to mimic lightning strikes) or ultraviolet light to catalyze chemical reactions. Over time, these reactions yielded amino acids and other simple organic molecules that served as building blocks for the eventual synthesis of proteins, polynucleotides, and complex carbohydrates (6). Amino acids seem to have been formed from an initial reaction between aldehyde and ammonia and from additional conversion in the Strecker synthesis (7). From a chemical standpoint, it is interesting that the first two building blocks of amino acids have opposite properties with respect to electron affinity. The amide group of ammonia releases electrons, whereas the aldehyde group tends to withdraw electrons when acting as a substituent group. This point is important because the bipolar construction of amino acids confers two of their most significant features. First, amino acids in aqueous solutions at physiologic pH are zwitterions (i.e., they carry a positive charge at the amino group and a negative charge at the carboxyl group). A single Ca carbon separates these two oppositely charged species. Second, amino acids combine readily in condensation reactions to form polymers. This feature enabled the modular assembly of proteins from a diverse collection of interchangeable units that differ only in the side chain attached to the Ca atom.
Electron delocalization in peptides
The bipolar nature of amino acids has additional ramifications. Amino acids are about 1000 times stronger than comparable aliphatic carboxylic acids because of electron withdrawal by the charged amino group. Thus, significant electronic effects (inductive and field effects) exist among the main chain atoms of an amino acid. Several findings support the existence of electron delocalization along the main chain of proteins. The peptide bond in proteins is planar with the partial double-bond character that reflects short-range electron delocalization clearly. Measurement of the pKas of dipeptides, tripeptides, and tetrapeptides (8) and nuclear magnetic resonance (NMR) studies of inductive effects (9) demonstrate that the delocalization is not restricted to the peptide bond, but it extends over a span of 3-4 residues. In addition, electron tunneling in proteins enables charge migration over very long distances and proceeds more efficiently through bonds than through space (10). Charges can migrate across the peptide bond (11), and, in fact, proteins behave as semiconductors under appropriate conditions (12). Finally, the bipolar nature of amino acids generates a distinct dipole in a-helical segments of proteins that is powerful enough to stabilize the binding of cofactors and ligands of opposite charge (13).
The Ca atom is located in a unique position along the main chain because of electron delocalization between the amino and carbonyl groups. As discussed elsewhere, amino acid side chains can be considered as substituents along the peptide backbone that affect resonance and electron density at main chain atoms (14, 15). In turn, this reaction will affect bond lengths and rotational flexibility—the ultimate determinants of secondary structure. The chemical features of the side chains modulate the properties of localized segments of a protein in the same way that different substituent groups affect the reactivity and orientation of reactions that involve substituted molecules in classic chemistry. The idea that amino acid side chains affect the electron density along a polymer composed of repetitive units is consistent with observations of the effects of side chain composition on the conductance of semiconducting materials (16). We will return to this important notion of side chains as substituent groups along the peptide backbone during discussion of the electronic properties of amino acids. The next three sections will summarize the physico-chemical, biologic, and electronic properties of amino acid side chains.
Physico-chemical properties
For the purpose of this article, the various properties of amino acid side chains have been classified into three separate categories. The physico-chemical properties are represented by values that can be measured directly for each amino acid or that can be calculated directly from the behavior of component atoms or chemical groups. The biologic properties reflect indirect measures or context-dependent behavior of the amino acids [e.g., their preference for coil or helical conformations in proteins, and their partitioning into different solvents (hy- drophobicity scales)]. Finally, the electronic properties refer to a mixture of measured and calculated parameters that attempt to describe fundamental electronic effects of amino acid side chains. The electronic properties of an amino acid determine ultimately its physico-chemical properties; however, these two categories are discussed separately here to highlight the need for better characterization of these electronic effects. Summaries of these various properties are presented in Tables 1-3.
The physico-chemical properties of amino acids are summarized in Table 1. This includes a wide array of measures, from refractivity and melting point to the pKa at the amino group. Some of the parameters span a narrow range of values (e.g., the molecular weights and melting points). Other parameters differ by a factor of 100-fold or more, which include the pKas at the amino group and solubility. The AAindex database compiled by Kawashima et al. (21) is an excellent source of additional information that concerns the physico-chemical properties of amino acids.
Notable cross-correlations exist between physico-chemical properties and biologic and electronic parameters (Table 4). Molecular weight, refractivity, and free energy of solution are correlated highly with various measures of the steric effects of amino acids (r = 0.66-0.98), which include the bulk scale of Kidera et al. (22), the gyration scale of Levitt (23), and polarizability (15). Previous work has established the relationship between polarizability and steric effects with polarizability serving as a measure of molecular deformability (15). It is known that molar refraction is proportional to polarizability and that both are additive properties, which explains the correlation with molecular weight. However, these observations also illustrate the fact that many of the properties considered here are not pure, but rather they are interrelated or perhaps different facets of a common underlying property. It is tempting to consider polarizability as the fundamental descriptor of steric effects and other measures as surrogates for this parameter. However, polarizability neglects significant contributions of hyperconjugation to the phenomenon of steric hindrance (24).
The data in Table 4 reveal additional cross-correlations between heat of formation, free energy of solution, and electronic (field and VHSE5) effects of amino acids (r = 0.61-0.65). The meaning of these relationships is uncertain, although field effects are proportional to the number of nonmethylene groups in the side chain, which would correspond to an increase in both the heat of formation and hydrogen-bond formation with water. Other significant correlations were observed among the following: 1) melting point and dipole moment, average hydrophobicity (P-P), and long-range inter-residue interactions (IILONG), 2) solubility and pKa, and 3) pKa at the amino group and the z3 score of Hellberg et al. (25). The last relationship likely reflects the fact that pKa was one of the multiple components used to derive the z3-score, which is a composite of different electronic effects. An inverse relationship exists between free energy of solution and field effects (Table 4). This relationship may result from the polarity of the amino acid side chains.
Table 1. Physico-chemical properties of amino acids
Amino acid |
M.W. |
Refractivity1 |
tm2 °C |
Heat of Formation2 kJ/mol |
Solubility2 g/kg |
∆G of Solution3 |
pKa (NH)4 |
pKa (s.c.)2 |
ala A |
89.09 |
4.34 |
297 |
-604.0 |
165.0 |
-350 |
9.69 |
— |
arg R |
174.20 |
26.6 |
244 |
-623.5 |
182.6 |
— |
9.04 |
12.10 |
asn N |
132.12 |
13.28 |
235 |
-789.4 |
25.1 |
950 |
8.80 |
— |
asp D |
133.10 |
12.00 |
270 |
-973.3 |
4.9 |
2550 |
9.60 |
3.71 |
cys C |
121.16 |
35.77 |
240 |
-534.1 |
v.s. |
— |
10.28 |
8.14 |
glu E |
147.13 |
17.26 |
160 |
-1009.7 |
8.6 |
2300 |
9.67 |
4.15 |
gln Q |
146.15 |
17.56 |
185 |
-826.4 |
42.0 |
— |
9.13 |
— |
gly G |
75.07 |
0 |
290 |
-528.5 |
250.9 |
-650 |
9.60 |
— |
his H |
155.16 |
21.81 |
287 |
-466.7 |
43.5 |
— |
9.17 |
6.04 |
ile I |
131.17 |
19.06 |
284 |
-637.8 |
34.2 |
700 |
9.68 |
— |
leu L |
131.17 |
18.78 |
293 |
-637.4 |
22.0 |
950 |
9.60 |
— |
lys K |
146.19 |
21.29 |
224 |
-678.7 |
5.8 |
— |
8.95 |
10.67 |
met M |
149.21 |
21.64 |
281 |
-577.5 |
56.0 |
850* |
9.21 |
— |
phe F |
165.19 |
29.40 |
283 |
-466.9 |
27.9 |
1000 |
9.13 |
— |
pro P |
115.13 |
10.93 |
221 |
-515.2 |
1623.0 |
-1450 |
10.60 |
— |
ser S |
105.09 |
6.35 |
228 |
-732.7 |
50.2 |
450* |
9.15 |
— |
thr T |
119.12 |
11.01 |
256 |
-807.2 |
98.1 |
— |
9.10 |
— |
trp W |
204.23 |
42.53 |
289 |
-415.3 |
13.2 |
1750 |
9.39 |
— |
tyr Y |
181.19 |
31.53 |
343 |
-685.1 |
0.5 |
3550 |
9.11 |
10.10 |
val V |
117.15 |
13.92 |
315 |
-617.9 |
88.5 |
200 |
9.62 |
— |
1 The refractivity data are from Jones (17).
2 The melting point (tm), heat of formation, solubility, and pKa of the side chain (s.c.) data were obtained from the CRC Handbook of Chemistry and Physics (18).
3 Free energy of solution (∆G) values were obtained from Greenstein and Winitz (19).
4 The pKas at the amino group were published by Edsall (20).
Table 2. Biologic properties of amino acids
Amino acid |
Pα1 |
Pβ1 |
Pcoil1 |
C-F Pα2 |
C-F Pβ2 |
C-F Pcoil2 |
Bulk3 |
Gyration4 |
Polariz.5 A3 |
K-D6 |
RF7 |
Hp5 |
M-P8 |
P-P9 |
IIMED10 |
IILONG10 |
ala A |
1.44 |
0.76 |
0.87 |
1.45 |
0.97 |
0.66 |
-1.67 |
0.77 |
1.1 |
1.8 |
9.9 |
-3.42 |
12.97 |
0.05 |
2.11 |
3.92 |
arg R |
1.25 |
0.92 |
1.01 |
0.79 |
0.90 |
1.20 |
1.27 |
2.38 |
8.5 |
-4.5 |
4.6 |
0.14 |
11.72 |
-1.05 |
1.94 |
3.78 |
asn N |
1.15 |
0.75 |
1.77 |
0.73 |
0.65 |
1.33 |
-0.07 |
1.45 |
3.7 |
-3.4 |
5.4 |
1.35 |
11.42 |
-0.74 |
1.84 |
3.64 |
asp D |
1.24 |
0.66 |
1.87 |
0.98 |
0.80 |
1.09 |
-0.22 |
1.43 |
3.0 |
-3.5 |
2.8 |
-0.42 |
10.85 |
-1.04 |
1.80 |
2.85 |
cys C |
0.53 |
1.35 |
0.62 |
0.77 |
1.30 |
1.07 |
-0.89 |
1.22 |
2.7 |
2.5 |
2.8 |
-1.34 |
14.63 |
0.62 |
1.88 |
5.55 |
glu E |
1.45 |
0.61 |
0.96 |
1.53 |
0.26 |
0.87 |
0.19 |
1.77 |
4.1 |
-3.5 |
3.2 |
-2.65 |
11.89 |
-0.99 |
2.09 |
2.72 |
gln Q |
1.14 |
0.88 |
0.87 |
1.17 |
1.23 |
0.79 |
0.24 |
1.75 |
4.8 |
-3.5 |
9.0 |
-0.97 |
11.76 |
-0.73 |
2.03 |
3.06 |
gly G |
0.54 |
0.66 |
1.68 |
0.53 |
0.81 |
1.42 |
-1.96 |
0 |
0.03 |
-0.4 |
5.6 |
-1.02 |
12.43 |
-0.26 |
1.53 |
4.31 |
his H |
0.83 |
0.67 |
1.08 |
1.24 |
0.71 |
0.92 |
0.52 |
1.78 |
6.3 |
-3.2 |
8.2 |
-1.84 |
12.16 |
-0.21 |
1.98 |
3.77 |
ile I |
1.30 |
1.85 |
0.70 |
1.00 |
1.60 |
0.78 |
-0.16 |
1.56 |
4.3 |
4.5 |
17.1 |
-10.33 |
15.67 |
1.14 |
1.77 |
5.58 |
leu L |
1.18 |
1.04 |
0.55 |
1.34 |
1.22 |
0.66 |
0 |
1.54 |
4.2 |
3.8 |
17.6 |
-10.31 |
14.90 |
0.99 |
2.19 |
4.59 |
lys K |
1.15 |
0.81 |
0.95 |
1.07 |
0.74 |
1.05 |
0.82 |
2.08 |
5.2 |
-3.9 |
3.5 |
-4.70 |
11.36 |
-1.15 |
1.96 |
2.79 |
met M |
1.13 |
1.18 |
0.40 |
1.20 |
1.67 |
0.61 |
0.18 |
1.80 |
5.1 |
1.9 |
14.9 |
-6.90 |
14.39 |
0.70 |
2.27 |
4.14 |
phe F |
0.92 |
1.54 |
0.76 |
1.12 |
1.28 |
0.81 |
0.98 |
1.90 |
8.0 |
2.8 |
18.8 |
-7.60 |
14.00 |
1.19 |
1.98 |
4.53 |
pro P |
0.45 |
0.65 |
1.28 |
0.59 |
0.62 |
1.45 |
-0.33 |
1.25 |
4.3 |
-1.6 |
14.8 |
-6.22 |
11.37 |
-0.17 |
1.32 |
3.57 |
ser S |
0.78 |
0.74 |
1.21 |
0.79 |
0.72 |
1.27 |
-1.08 |
1.08 |
1.6 |
-0.8 |
6.9 |
-0.32 |
11.23 |
-0.43 |
1.57 |
3.75 |
thr T |
0.76 |
1.29 |
0.88 |
0.82 |
1.20 |
1.05 |
-0.70 |
1.24 |
2.7 |
-0.7 |
9.5 |
-2.63 |
11.69 |
-0.30 |
1.57 |
4.09 |
trp W |
1.05 |
1.25 |
0.82 |
1.14 |
1.19 |
0.82 |
2.10 |
2.21 |
12.1 |
-0.9 |
17.1 |
-5.95 |
13.93 |
1.13 |
1.90 |
4.89 |
tyr Y |
1.09 |
1.32 |
0.82 |
0.61 |
1.29 |
1.19 |
1.48 |
2.13 |
8.8 |
-1.3 |
15.0 |
-4.55 |
13.42 |
0.44 |
1.67 |
4.93 |
val V |
0.88 |
1.89 |
0.60 |
1.14 |
1.65 |
0.66 |
-0.71 |
1.29 |
3.2 |
4.2 |
14.3 |
-8.02 |
15.71 |
0.78 |
1.63 |
5.43 |
1 The preferences for a-helix, p-strand, and coil structures, Pα, Pβ, and Pcoil, respectively, were taken from Dwyer (14).
2 Secondary structural preference data were obtained by Chou and Fasman (C-F) (26).
3 The bulk measure was derived by Kidera et al. (22).
4 Levitt (23) calculated the gyration index.
5 Polarizability and hydrophilicity (Hp) measures were derived from QM calculations by Dwyer (15, 32). Polarizability refers to the a-component calculated with the PM3 QM method.
6 The hydropathy scores from Kyte and Doolittle (K-D index) (30) are presented.
7 The RF data were obtained from Zimmerman et al. (31). These values were based on the average mobility of the amino acids in a series of solvents determined by paper chromatography.
8 The hydrophobicity index of Manavalan and Ponnuswamy (M-P) (33) was calculated from average surrounding hydrophobicity based on 3-D structures of proteins.
9 Palliser and Parry (P-P) calculated a hydrophobicity index based on the average normalized values of 127 scales (35).
10 Gromiha and Selveraj (34) determined the average number of medium and long-range inter-residue interactions (IIMED and IILONG, respectively) for each of the 20 amino acids based on crystallographic data.
Biologic properties
The three major groupings in the biologic properties correspond to preference for secondary structure (Pα, Pβ, and Pcoil), steric or bulk effects (bulk, gyration, and polarizability), and hydrophilicity (K-D, RF, Hp, M-P, P-P, and IILONG) (see Table 3). Preferences for secondary structure (Px) were derived by Dwyer (14) and Chou and Fasman (26) from statistical analysis of large databases of nonredundant protein structures. These data are in close agreement with similar statistical analyses of structural propensities of amino acids, for example that of Williams et al. (27). A second method to evaluate the preference of amino acids for secondary structure is host-guest analysis of short synthetic peptides (13). In these studies, amino acids are substituted into peptides that assume a-helical or P-strand structures and the effects of the substitution on structural stability are assessed. Several limitations to this approach exist; for instance, an overemphasis on preference at central positions within a segment leads to underestimates of the structural propensity of amino acids that are found commonly at the ends of secondary structures (e.g., asparagine and aspartic acid), which are excellent N-cap residues in a-helices (28). Nevertheless, the a-helix preferences summarized in Table 3 show significant correlation (r ~ 0.7) with indices derived from host-guest analysis, such as that of O’Neil and DeGrado (29). Finally, secondary structural preferences show significant correlations with electronic properties, including dipole, γLOCAL&NON-LOCAL, and NMR shift (Table 4).
Some amino acids show a clear preference for a particular secondary structure. For example, glutamic acid and alanine show a very high propensity for a-helices and are found at much lower frequencies in other structures. Similarly, valine, phenylalanine, cysteine, and threonine mainly prefer P-strands, whereas glycine, proline, and serine favor coil or turn conformations strongly. Other amino acids such as arginine, glutamine, and lysine, do not show an overwhelming preference for a single structure and seem to be stable in many conformations. However, all structural preference scales suffer a common shortcoming, namely the data are context-dependent. Thus, few membrane-resident proteins (e.g., ion channels and G-protein-coupled receptors) are included in the structural databases used for statistical analysis, and host-guest studies reflect largely aqueous phase preferences. Nevertheless, many of the trends of amino acids for particular secondary structures seem to be valid and reflect fundamental properties of the side chains.
The fact that amino acids prefer certain secondary structures does not address the question of why these structural preferences are observed. This issue will be discussed in greater detail in a later section. Here, the role of steric effects and hydrophobicity will be considered. The composite bulk scale of Kidera et al. (22) and the gyration scale of Levitt (23) represent faithfully the bulk or the steric effects of amino acids. Therefore, it was interesting to observe such a striking correlation between these scales and polarizability, which is based on the calculation of a single, defined electronic feature of a molecule. Data in Table 4 suggest that the scales for steric effects are reasonably pure, and show no correlation with secondary structure. That is not to say that steric effects are unimportant in protein folding. Rather, it seems safe to conclude that they are not a primary driving force for the formation of secondary structure.
Two of the hydrophilicity scales in Table 2 were derived from experimental measures of the behavior of amino acids in various solvents, namely partitioning coefficients [K-D index of Kyte and Doolittle (30)] or mobility in paper chromatography [RF index of Zimmerman et al. (31)]. By contrast, the Hp index was obtained from quantum mechanics (QM) calculations of electron densities of side chain atoms in comparison with water (32). The Hp index is correlated highly with these two established hydrophobicity scales (Table 4). Therefore, like the polarizability index, it is possible to represent fundamental chemical properties of amino acids (hydrophilicity, Hp) with parameters derived from ab initio calculations of electronic properties. However, in contrast to polarizability (steric effects), hydrophilicity shows significant correlation with preference for secondary structure. Thus, hydrophobic amino acids prefer P-strands (and P-sheet conformations) and typically are buried in protein structures, whereas hydrophilic residues are found commonly in turns (coil structure) at the protein surface.
Several scales have been developed to quantify the degree to which a residue is buried in the native protein (related to hydrophobicity) and the number of inter-residue interactions it forms (33, 34, 36). Three such indices derived by Manavalan and Ponnuswamy (M-P) (33) and Gromiha and Selvaraj (IIMID and IILONG) (34) are presented in Table 2. In addition, a mean hydropathy index from Palliser and Parry (P-P) (35) is included, which represents the average normalized values from an analysis of 127 individual hydrophobicity scales. The M-P and P-P scales show a high degree of correlation with the Hp index (r ~ 0.8). The M-P, P-P, and IILONG scales predict secondary structural preferences of amino acids effectively, in particular P-strand conformations (Table 4). The IIMED index correlates with preference for α-helices (r = 0.66-0.77).
An electronic measure, the NMR chemical shift values of the amide proton in coil conformations (Table 3) also show a high degree of correlation (r = 0.70-0.89) with hydrophilicity scales and with strand versus coil conformations (Table 4). NMR studies reveal that the amide proton is shielded to a greater extent in coil conformations as compared with extended (P) structures (37); increased electron density exists at this atom in the coil conformation. Taken together, the data suggest strong interactions between hydrophilicity and electronic parameters in folding and provide support for additional refinement of the Hp index.
Electronic properties
The electronic properties of amino acid side chains are summarized in Table 3, and they represent a wide spectrum of measures. The NMR data are derived experimentally (37). The dipole (38), CaMULL, inductive, field, and resonance effects were derived from QM calculations (15). The VHSE5 (39) and z3 (25) scales were developed for use in quantitative structure-activity relationship analysis of the biologic activity of natural and synthetic peptides. Both were derived from principal components analysis of assorted physico-chemical properties, which included NMR chemical shift data, electron-ion interaction potentials, charges, and isoelectric points. Therefore, these scales are composites rather than primary measures of electronic effects. The validity of these measures is indicated by their lack of overlap with hydrophobicity and steric parameters and by their ability to predict biologic activity of synthetic peptide analogs (25, 39). Finally, coefficients of electrostatic screening by amino acid side chains (YLOCAL and YNON-LOCAL) were derived from an empirical data set (40), and they represent a composite of electronic effects.
Table 3. Electronic properties of amino acids
Amino acid |
Dipole1 |
γLOCAL2 |
γNON-LOCAL2 |
NMR shift3 |
VHSE54 |
z35 |
CαMULL6 |
Inductive6 |
Resonance6 |
Field6 |
ala A |
0 |
0.163 |
0.236 |
8.12 |
0.02 |
0.09 |
4.5978 |
0.05 |
0 |
0.05 |
arg R |
5.78 |
0.220 |
0.233 |
8.23 |
1.55 |
-3.44 |
4.5381 |
-0.26 |
-0.49 |
0.27 |
asn N |
4.06 |
0.124 |
0.189 |
8.33 |
-0.55 |
0.84 |
4.5431 |
-0.14 |
-0.06 |
-0.56 |
asp D |
4.33 |
0.212 |
0.168 |
8.38 |
-2.68 |
2.36 |
4.3934 |
0.51 |
1.29 |
-1.77 |
cys C |
1.78 |
0.316 |
0.259 |
8.18 |
0 |
4.13 |
4.6375 |
-0.01 |
0.01 |
0.06 |
glu E |
6.13 |
0.212 |
0.306 |
8.40 |
-2.16 |
-0.07 |
4.4447 |
0.68 |
0.57 |
-1.14 |
gln Q |
3.89 |
0.274 |
0.314 |
8.19 |
0.09 |
-1.14 |
4.6050 |
-0.10 |
0.03 |
-0.35 |
gly G |
— |
0.080 |
-0.170 |
8.36 |
-0.53 |
0.30 |
4.7053 |
0 |
0 |
0 |
his H |
4.04 |
0.315 |
0.256 |
8.36 |
0.51 |
1.11 |
4.5323 |
-0.01 |
0.22 |
-0.58 |
ile I |
0.07 |
0.474 |
0.391 |
7.99 |
0.30 |
-1.03 |
4.4995 |
0.06 |
0.02 |
0.04 |
leu L |
0.09 |
0.315 |
0.293 |
7.99 |
0.22 |
-0.98 |
4.5929 |
0.02 |
0.05 |
-0.03 |
lys K |
9.07 |
0.255 |
0.231 |
8.29 |
1.64 |
-3.14 |
4.5119 |
-0.16 |
-0.95 |
0.51 |
met M |
1.80 |
0.356 |
0.367 |
8.12 |
0.23 |
-0.41 |
4.6201 |
0.08 |
-0.12 |
-0.30 |
phe F |
0.29 |
0.410 |
0.328 |
7.93 |
0.25 |
0.45 |
4.5783 |
0.04 |
0.02 |
-0.45 |
pro P |
1.47 |
— |
— |
— |
-0.01 |
2.23 |
— |
0 |
0.10 |
0.02 |
ser S |
1.83 |
0.290 |
0.202 |
8.30 |
-0.32 |
0.57 |
4.6620 |
-0.03 |
-0.02 |
-0.38 |
thr T |
1.79 |
0.412 |
0.308 |
8.17 |
-0.06 |
-1.40 |
4.6438 |
-0.05 |
0.02 |
-0.44 |
trp W |
1.79 |
0.325 |
0.197 |
8.03 |
0.75 |
0.85 |
4.5755 |
0.06 |
0.09 |
-0.24 |
tyr Y |
1.44 |
0.354 |
0.223 |
8.10 |
0.53 |
0.01 |
4.5836 |
0.05 |
-0.03 |
-0.42 |
val V |
0.06 |
0.515 |
0.436 |
8.08 |
0.22 |
-1.29 |
4.6039 |
0.01 |
0.08 |
-0.04 |
1 The dipole data were derived by Chipot et al. (38) from QM calculations.
2 γLOCAL and γNON-LOCAL represent composite coefficients calculated by Avbelj (40).
3 The NMR shift data refer to the chemical shifts measured by NMR for the amide proton in the coil conformation (37).
4 Mei et al. (39) derived the VHSE5 composite index of electronic effects. This scale (Vectors of Hydrophobic, Steric, and Electronic properties) was derived from principal components analysis of 50 different physico-chemical variables.
5 The z3 electronic index was taken from Hellberg et al. (25). This scale was derived from principal components analysis of 29 variables. The z3 index was completely independent from hydrophobic (z1 scale) and steric (z2 scale) effects.
6 CαMULL refers to the Mulliken population at the Cα atom calculated from QM analysis of the amino acids. These values and the inductive, resonance, and field effects were published previously (15).
Additional analysis of these electronic parameters reveals that they fall into two major categories. The NMR chemical shift data for the amide proton, the YLOCAL and YNON-LOCAL coefficients, and to some degree the dipole index are related closely to the hydrophilicity of the amino acid side chains. These findings are shown in Table 4, and they include significant correlations with K-D, Hp, M-P, and P-P indices. Given the extent of cross-correlation, it is probably best to consider the first three electronic scales in particular as surrogate indicators of hydrophilicity. The parameters that remain in Table 4 bear some relationship to each other and are considered, for the purpose of this article, to represent conventional electronic effects of amino acid side chains. This includes CaMULL, inductive and field effects, z3, and VHSE5 parameters. The correlation between VHSE5 and inductive field and resonance effects is remarkable. Whereas VHSE5 is a composite from principal components analysis, the inductive and resonance effects are derived directly from simple QM calculations (15). Thus, the cross-correlations between empirically derived electronic parameters (VHSE5) and QM-derived calculations suggest that the QM measures are valid expressions of the electronic properties of amino acids.
Table 4. Correlations among the various properties of amino acids*
*Linear regression analysis was performed to compare the various scales listed in Tables 1-3. The r values shown here are only for the significant correlations where p < 0.01 or p < 0.05 as noted by an asterisk. The (-) signs indicate the slope of the regression line.
Closer examination of the electronic properties of individual amino acids reveals good agreement with expectations. Thus, arginine and lysine with positively charged ammonium groups are the strongest electron-withdrawing side chains (inductive effects), whereas the side chains of aspartic acid and glutamic acid are the strongest electron donors. Conversely, field effects (with respect to the amide proton) are opposite in direction for the acidic versus basic side chains. Serine, threonine, and histidine show weak inductive (through-bond) effects, but intermediate to strong field (through-space) effects. The aliphatic side chains of alanine, leucine, isoleucine, and valine produce minimal field or inductive effects, which is consistent with the properties of alkyl groups. In the future, analysis of electronic effects over short segments of amino acid sequence may reveal patterns related to preferences for secondary structure or functional sites in proteins.
Substituent effects of amino acid side chains
The concept of amino acid side chains as substituent groups along the peptide backbone has been developed previously (14, 15, 32). According to this idea, the amino acids side chains affect the electron density at main chain atoms as a function of their physico-chemical properties (e.g., the degree to which they donate or withdraw electrons from the peptide backbone). The side chains modulate local electron density, and thus the bond lengths and rotation along the main chain (15, 32). Summed together, the various electronic effects (e.g., inductive effects and Hp) determine the preference of a protein segment for a particular secondary structure. Of course, this preference is modified by solvent effects, electrostatic screening, and ultimately by interactions between residues that are only brought into contact through the folding process. These long-range interactions determine both the folding rate and the stability of the folded protein (33, 34, 41).
Theoretic and experimental studies of charge migration in proteins support the notion of gating effects of amino acid side chains (8-12). Thus, movement of charge between adjacent residues through the peptide bond depends on the molecular motion and orientation of the side chains (11). Moreover, ab initio analysis of the electronic features of amino acids reveals that electronic effects are conformation sensitive (42). Therefore, different side chain rotamers will produce distinct electronic effects at the main chain, although the rotational preference of a side chain is also a function of its fundamental physico-chemical (electronic) properties. Quantum effects in molecular electronic devices reveal that side chain groups affect electron density and current flow through main chain (nonpeptide) atoms such that current transmission is blocked at eigenvalues of the side chains (16). The effects of various side chains are additive in this system. Therefore, main chain structure and rotational flexibility (i.e., folding) is linked inextricably to the electronic properties of amino acid side chains and to the propagation of electronic effects along the peptide backbone.
Nearest-neighbor effects
Nearest-neighbor effects refer to the reciprocal influence of adjacent amino acids on protein folding. In some cases, this term also refers to amino acids that are close in the 3-D structure of the protein (<8-10 A away), but distant in the sequence. Early studies found a nonrandom assorting of amino acids in secondary structures by pair-wise analysis of protein sequences (43). More recently, it was reported that the preference of pairs of amino acids for secondary structure was determined, in part, by the electronic properties of the neighboring residues (14, 32). Thus, adjacent pairs of amino acids that act as strong electron donors preferred a-helical conformations, whereas adjacent residues with ambivalent electron affinity preferred strongly coil conformations. The existence of nearest-neighbor electronic effects in proteins is confirmed by NMR studies (9, 37), pKa measurements (8), and QM analysis of electron densities in dipeptides (32). Finally, nearest-neighbor interactions in the final folded state contribute to the stability of a protein (44).
The electronic properties of neighboring amino acids can affect protein folding in complex ways. Theoretically, the electronic effects of adjacent residues may be additive, opposing, or neutral. Some effects extend over 3-4 residues in a peptide or protein (8, 9), which corresponds roughly to a loop, short P-strand, or the first turn of an a-helix. The summation of these various nearest-neighbor effects will then determine the electron density along the peptide backbone, bond lengths, and rotational flexibility. Consequently, segments of a protein where strong electronic effects are exerted on the main chain atoms will tend to form different secondary structures than segments where the electronic effects are weak.
Chemical Tools and Refinements
A compelling case can be made for replacing empirically- derived scales of amino acid properties with parameters either measured directly (e.g., chemical shift data or infrared spectra) or calculated from basic principles. The goal would be to develop all-electronic expressions for the physico-chemical properties of amino acids based on computational methods that include QM calculations. A start in this direction was provided by the successful description of steric effects in terms of polarizability and hydrophilicity as a function of electron density (32). Application of more sophisticated computational approaches will speed progress toward this objective.
As discussed here and elsewhere, NMR chemical shift data reveal details about the secondary structure environment of amino acids in proteins and thus are measures of protein folding. The pioneering work of Oldfield and colleagues (45) and other groups demonstrated that QM calculations on model peptides estimated NMR chemical shifts accurately. Furthermore, chemical shifts at the amide proton are excellent indicators of P-strand and coil conformation (Table 4). Therefore, the fact that QM calculations can be used to derive chemical shifts in peptides suggests that these calculations also provide insight into protein folding. For some of the QM analysis of amino acids, older semiempirical methods have been used (15). These methods are sufficiently accurate for the relative assessment of electronic properties (i.e., for comparisons between amino acids). However, higher-level theory will be needed to obtain more precise quantitative values for various electronic parameters, for example, through the application of density functional theory with correction for electron correlation effects (46). Alternatively, perturbation methods such as Mpller-Plesset have also proven useful to derive the electronic properties of amino acids (46).
Several scales presented in Table 3 show promise as measures of fundamental electronic properties of amino acids as does the Hp index of hydrophilicity. Nevertheless, additional improvements are desirable. The polarizability index of steric effects should include hyper conjugation as a component. Clearly, the movement of electrons into antibonding orbitals contributes to molecular deformation (24). The Hp index is based on PM3 calculations of electron densities for the component atoms of amino acid side chains. A more integrative approach with higher-order theory is likely to refine this measure additionally.
Of the other electronic parameters described here, inductive effects and field effects seem to represent distinct properties as calculated from QM analysis and equations of localized substituent effects. However, inductive and resonance effects need to be better isolated from each other. QM calculation of these effects in a series of different host molecules may improve discrimination between these two parameters. Finally, it will be important to characterize the electronic properties of amino acid side chains in short dipeptides and tripeptides, which will lead to a better understanding of nearest-neighbor and context effects in proteins.
Conclusions
A better understanding of the chemical biology of amino acids is key to clarifying issues related to protein structure and function. Interactions that involve amino acid side chains contribute to the rate of protein folding, the stability of the protein fold, molecular recognition (e.g., ligand binding), and catalysis. Participation of amino acids in these processes is a function of the properties of the side chain groups. For example, the hydrophilicity of a side chain determines largely whether it is found at the protein surface or buried in the interior. Similarly, the propensity of an amino acid for medium to long-range inter-residue interactions is related to preference for secondary structure. Electronic properties of side chain groups contribute to folding preferences and create electric fields involved in recognition and catalysis. Many empirical measures of the properties of amino acids are now available. An important goal for the future will be to replace these empirical measures with fundamental parameters derived from QM calculations. Finally, the conceptualization of amino acid side chains as substituent groups that affect electron density along the main chain through gating effects may provide insight into how the amino acid sequence specifies the 3-D structure of a protein.
References
1. Kabsch W, Sander C. On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. Proc. Natl. Acad. Sci. U.S.A. 1984; 81:1075-1078.
2. Minor DL Jr, Kim PS. Context-dependent secondary structure formation of a designed protein sequence. Nature 1996; 380:730-734.
3. Gromiha MM, An J, Kono H, Oobatake M, Uedaira H, Sarai A. ProTherm: thermodynamic database for proteins and mutants. Nucl. Acids Res. 1999; 27:286-288.
4. Waszkowycz B. Structure-based approaches to drug design and virtual screening. Curr. Opin. Drug Discov. Devel. 2002; 5:407-413.
5. Suydam IT, Snow CD, Pande VS, Boxer SG. Electric fields at the active site of an enzyme: direct comparison of experiment with theory. Science 2006; 313:200-204.
6. Miller SL. A production of amino acids under possible primitive earth conditions. Science 1953; 117:528-529.
7. Dickerson RE. Chemical evolution and the origin of life. Sci. Am. 1978; 239:70-86.
8. Ellenbogen E. Dissociation constants of peptides. I. A survey of the effect of optical configuration. J. Am. Chem. Soc. 1952; 74:5198-5201.
9. Gmeiner WH, Facelli JC. Quantum mechanical and experimental measurement of N-terminal charge effects on 1HN and 1HCα chemical shifts in peptides. Biopolymers 1996; 38:573-581.
10. Beratan DN, Onuchic JN, Winkler JR, Gray HB. Electron-tunelling pathways in proteins. Science 1992; 258:1740-1741.
11. Patten F, Gordy W. Temperature effects on free radical formation and electron migration in irradiated proteins. Proc. Natl. Acad. Sci. U.S.A. 1960; 46:1137-1144.
12. Eley DS, Spivey DI. Semiconductivity in hydrated hemoglobin. Nature 1960; 188:725.
13. Chakrabartty A, Baldwin RL. Stability of a-helices. Adv. Prot. Chem. 1995; 46:141-176.
14. Dwyer DS. Electronic properties of the amino acid side chains contribute to the structural preferences in protein folding. J. Biomol. Struct. Dyn. 2001; 18:881-892.
15. Dwyer DS. Electronic properties of amino acid side chains: quantum mechanics calculations of substituent effects. BMC Chem. Biol. 2005; 5:2.
16. Ernzerhof M, Zhuang M, Rocheleau P. Side-chain effects in molecular electronic devices. J. Chem. Phys. 2005; 123:134704.
17. Jones DD. Amino acid properties and side-chain orientation in proteins: a cross correlation approach. J. Theor. Biol. 1975; 50:167-183.
18. Lide DR, ed. CRC Handbook of Chemistry and Physics. 85th ed. 2004. CRC Press, Boca Raton, FL.
19. Greenstein JP, Winitz M. Chemistry of the Amino Acids, Volume 1. 1961. John Wiley and Sons, New York.
20. Edsall JT. Dipolar ions and acid-base equilibria. In Proteins, Amino Acids and Peptides as Dipolar Ions. Cohn EJ, Edsall JT, eds. 1965. Hafner Publishing, New York. pp. 75-115.
21. Kawashima S, Ogata H, Kanehisa M. AAindex: amino acid index database. Nucl. Acids Res. 1999; 27:368-369.
22. Kidera A, Konishi Y, Oka M, Ooi T, Scheraga HA. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J. Prot. Chem. 1985; 4:23-55.
23. Levitt M. A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 1976; 104:59-107.
24. Weinhold F. A new twist on molecular shape. Nature 2001; 411:539-541.
25. Hellberg S, Sjostrom M, Skagerberg B, Wold S. Peptide quantitative structure-activity relationships, a multivariate approach. J. Med. Chem. 1987; 30:1126-1135.
26. Chou PY, Fasman GD. Conformational parameters for amino acids in helical, ß-sheet, and random coil regions calculated from proteins. Biochemistry 1974; 13:211-245.
27. Williams RW, Chang A, Juretic D, Loughran S. Secondary structure predictions and medium range interactions. Biochim. Biophys. Acta 1987; 916:200-204.
28. Richardson JS, Richardson DC. Principles and patterns of protein conformation. In Prediction of Protein Structure and the Principles of Protein Conformation. Fasman GD, ed. 1989. Plenum Press, New York. pp. 2-99.
29. O’Neil KT, DeGrado WF. A thermodynamic scale for the helixforming tendencies of the commonly occurring amino acids. Science 1990; 250:646-651.
30. Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982; 157:105-132.
31. Zimmerman JM, Eliezer N, Simha R. The characterization of amino acid sequences in proteins by statistical methods. J. Theor. Biol. 1968; 21:170-201.
32. Dwyer DS. Nearest-neighbor effects and structural preferences in dipeptides are a function of the electronic properties of amino acid side-chains. Proteins 2006; 63:939-948.
33. Manavalan P, Ponnuswamy PK. Hydrophobic character of amino acid residues in globular proteins. Nature 1978; 275:673-674.
34. Gromiha MM, Selveraj S. Influence of medium and long range interactions in protein folding. Prep. Biochem. Biotechnol. 1999;29: 339-351.
35. Palliser CC, Parry DA. Quantitative comparison of the ability of hydropathy scales to recognize surface ^-strands in proteins. Proteins 2001; 42:243-255.
36. Cid H, Bunster M, Arriagada E, Campos M. Prediction of secondary structure of proteins by means of hydrophobicity profiles. FEBS Lett. 1982; 150:247-254.
37. Wishart DS, Sykes BD, Richards FM. Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. J. Mol. Biol. 1991; 222:311-333.
38. Chipot C, Maigret B, Rivail J-L. Modeling amino acid side chains. 1. Determination of net atomic charges from ab initio self-consistent-field molecular electrostatic properties. J. Phys. Chem. 1992; 96:10276-10284.
39. Mei H, Liao ZH, Zhou Y, Li SZ. A new set of amino acid descriptors and its application in peptide QSARs. Biopolymers 2005; 80:775-786.
40. Avbelj F. Amino acid conformational preferences and solvation of polar backbone atoms in peptides and proteins. J. Mol. Biol. 2000; 300:1335-1359.
41. Plaxco KW, Simons KT, Baker D. Contact order, transition state placement and the refolding of single domain proteins. J. Mol. Biol. 1998; 277:985-994.
42. Gronert S, O’Hair RAJ. Ab initio studies of amino acid conformation. 1. The conformers of alanine, serine, and cysteine. J. Am. Chem. Soc. 1995; 117:2071-2081.
43. Cserzo M, Simon I. Regularities in the primary structure of proteins. Int. J. Pept. Prot. Res. 1989; 34:184-195.
44. Yi TM, Lander ES. Protein secondary structure prediction using nearest neighbor methods. J. Mol. Biol. 1993; 232:1117-1129.
45. de Dios AC, Pearson JG, Oldfield E. Secondary and tertiary structural effects on protein NMR chemical shifts: an ab initio approach. Science 1993; 260:1491-1496.
46. Friesner RA. Ab initio quantum chemistry: methodology and applications. Proc. Natl. Acad. Sci. U.S.A. 2005; 102:6648-6653.
Further Reading
Finkelstein AV, Galzitskaya OV. Physics of protein folding. Phys. Life Rev. 2004; 1:23-56.
Gromiha MM, Selveraj S. Inter-residue interactions in protein folding and stability. Prog. Biophys. Mol. Biol. 2004; 86:235-277.
Matta CF, Bader RFW. Atoms-in-molecules study of the genetically-encoded amino acids. II. Computational study of molecular geometries. Proteins 2002; 48:519-538.
Munoz V, Serrano L. Elucidating the folding problem of helical peptides using empirical parameters. II. Helix macrodipole effects and rational modification of the helical content of natural peptides. J. Mol. Biol. 1995; 245:275-296.
Oldfield E. Quantum chemical studies of protein structures. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2005; 360:1347-1361.
Topsom RD. Some theoretical studies of electronic substituent effects in organic chemistry. Prog. Phys. Org. Chem. 1987; 16:125-191.
Wang Y, Jardetzky O. Investigation of the neighboring residue effects of protein chemical shifts. J. Am. Chem. Soc. 2002; 124:14075-14084.
Wishart DS, Case DA. Use of chemical shifts in macromolecular structure determination. Methods Enzymol. 2001; 338:3-34.
See Also
Chemistry and Chemical Reactivity of Proteins, Matthew Francis
Energetics of Protein Folding, Robert Baldwin
NMR to Study Proteins, Angela Gronenborn
Physical Chemistry in Biology, Allan Cooper
Synthetic Peptides and Proteins to Elucidate Biological Functions, Roger Goody