Watson-Crick Base Pairs, Character and Recognition of - CHEMICAL BIOLOGY

CHEMICAL BIOLOGY

Watson-Crick Base Pairs, Character and Recognition of

Wilma K. Olson, Andrew V. Colasanti and Xiang-Jun Lu, Rutgers the State University of New Jersey, New Brunswick

Victor B. Zhurkin, National Institutes of Health, Bethesda, Maryland

doi: 10.1002/9780470048672.wecb452

The unique Watson-Crick arrangement of hydrogen-bonded bases in DNA accommodates two different, complementary purine-pyrimidine pairs, A∙T = T∙A and G∙C = C∙G, in a common spatial setting. Nature takes advantage of these isomorphous structures, which store genetic information in terms of the proton donor and acceptor atoms that hold the bases in place. As outlined here, the Watson-Crick base pairs carry other chemical signals that are used to recognize and to process specific sequences of bases. The relative stabilities of GC versus AT pairs reflect their different electronic structures. The distributions of electronic charge on the exposed major-groove and minor-groove edges of the base pairs present unique motifs for direct sequence recognition, and the deformations of the paired bases from ideal, planar configurations provide subtle, indirect recognition elements. The biologic significance of the latter signals is not fully understood but is becoming clearer as more and more high-resolution structures of DNA and RNA are determined.

The simple, yet elegant structure of double-helical DNA—two sugar-phosphate strands wrapped along antiparallel right-handed pathways around a central core of stacked and hydrogen-bonded base pairs—provides the molecular basis to interpret the storage, duplication, and rearrangement of genetic information. The same type of base pairing persists in double-stranded RNA, DNA-RNA hybrid duplexes, and synthetic multi-stranded polymers, such as PNA (1), which allow the chemical message to be duplicated, transcribed, blocked, and so on. The information reported below draws on the three-dimensional spatial arrangements of Watson-Crick base pairs and bound ligands in the many DNA and RNA structures now stored in the Nucleic Acid Database (NDB) (2).

Complementarity

Classic Watson-Crick base pairs are formed by unique hydrogen-bonding interactions between the nitrogenous bases of DNA and RNA. The purine adenine associates specifically with the pyrimidine thymine in DNA (or the related unmethylated analog, uracil, in RNA), and the purine guanine interacts with the pyrimidine cytosine. These complementarity rules, A∙T or A∙U and G∙C pairs, make it possible to build regular double-stranded structures of arbitrary base sequence and to provide a mechanism to copy the genetic code. That is, if the sequence of one strand of DNA is known, the sequence of the other strand is determined automatically. Therefore, if the strands are separated and new DNA is synthesized, two double-stranded DNA molecules are obtained, each an exact copy of the original.

These complementarity rules owe their discovery to the chemical analysis of DNA by Chargaff and associates (3). The DNA from many different organisms shows the same patterns of base composition, namely A and T are always present in equal quantities, as are G and C. The immediate corollary of this observation, that a purine base (R) exists for every pyrimidine base (Y) and vice versa, led Watson and Crick to propose that two helical strands in DNA are held together by specific, intermolecular purine-pyrimidine (R∙Y) interactions (4). In turn, this unique chemical complementarity of the double-helical structure, proved to be a major breakthrough to understand the self-recognition and self-reproduction of DNA and forms the cornerstone of structural biology as we know it today, more than half a century later.

By contrast, the proportion of GC versus AT base pairs is highly variable in the DNA from different organisms, with over-representation and under-representation of residues found at dimeric, i.e., adjacent base-pair step, and higher levels (5). Factors, which may underlie the observed compositional patterns, are not yet understood.

Isomorphous Base Pairs

The Watson-Crick postulate places complementary bases in similar spatial configurations so that the two-stranded molecule can adopt a regular structure. The bases of each purine-pyrimidine pair lie in a common plane, with the distance between C1' sugar atoms approximately constant (~10.5 A) and the C1'∙∙∙C1 vector forming roughly equivalent (~55°) angles with the (purine C1'-N9 and pyrimidine C1'-N1) glycosidic bonds that join the bases to the sugar-phosphate backbone (Fig. 1). A pseudo-twofold symmetry axis, also referred to as the dyad axis, passes through the center of each base pair, which permits the exchange of complementary bases with no change in the relative positions of either the attached sugar residues or the selected base-recognition elements (see below). In particular, an A∙T/U base pair is converted into a T/U∙A pair; and a G∙C base pair is converted into a C∙G pair by a 180° rotation about this axis. As a result of this isomorphous geometry, any base-pair combinations can be fitted into the same regular structural framework. As a first approximation, the “ideal” double-helical structure of Watson and Crick is sequence-independent. As noted below, this regularity breaks down in high-resolution crystal structures of DNA and is impossible without the R∙Y base-pairing rules, e.g., if larger purines or smaller pyrimidines were paired with each other.

Hydrogen-bond Recognition and Stability

In turn, the isomorphous structures of the Watson-Crick base pairs dictate specific hydrogen-bonding recognition patterns. The A∙T/U pairs associate via two hydrogen bonds that involve N1(A)∙∙∙H-N3(T/U) and N6(A)-H∙∙∙O4(T/U), and the G∙C base pairs are held in place by three such interactions—N1(G)-H∙∙∙N3(C), N2(G)-H∙∙∙O2(C), and O6(G)∙∙∙H-N4(C) (Fig. 1, top). The extra hydrogen bond of the GC pair apparently gives rise to the higher melting temperature of GC- versus AT-rich DNA (6). Direct in vacuo measurements of the binding energies support this idea. Isolated GC base pairs are more stable than free AT pairs (-21 vs. -13 kcal mol-1) (7, 8). Furthermore, GC pairs are typically less deformed from ideal planar geometry than AT pairs (see Table 1 and discussion below).

Base pairing is substantially weaker in solution. For example, the GC hydrogen-bonding energy drops to -5.8 kcal mol-1 in chloroform (9). The hydrogen-bonding energies in water, however, are uncertain because isolated planar bases prefer to associate in parallel stacked arrays rather than to pair with complementary bases (10). According to direct experimental solution measurements of the melting properties of oligonucleotides (which contain both base side groups and the sugar-phosphate backbone), hydrogen bonding adds 0.5-1.8-kcal mol-1 stabilization per base pair depending on DNA sequence (11). Thus, base-pair formation in aqueous solution is governed by base-base hydrogen bonds only slightly more favorable than base-water hydrogen bonds.

The electrostatic character of hydrogen bonding brings the protons of one base and the (N and O) acceptors of the complementary base closer together (1.9-2.0 A) than their characteristic (2.7-2.8 A) van der Waals’s separation distance. Accordingly, the base-pairing interaction is stronger and more specific than ordinary van der Waals’s interactions. The hydrogens, which are “shared” in the hydrogen-bonding interactions, have partial positive charges because of their attachment to nitrogen donor atoms, whereas the (carbonyl oxygen C=O or imidazole nitrogen N:) acceptors on the complementary bases bear partial negative charges. To maximize the electrostatic attraction, the hydrogen usually approaches the donor atom along the direction of and in a plane with the lone-pair orbitals of O or N. The partial charges of the donor, hydrogen, and acceptor atoms of the common bases (Fig. 2) determine the overall character of the electrostatic potentials that guide the mutual recognition of base pairs and their interactions with other molecules (see below).

Interestingly, the pairing of guanine and cytosine depicted initially by Watson and Crick (12) entailed only the two hydrogen bonds, O6(G)∙∙∙H-N4(C) and N1(G)-H∙∙∙N3(C), congruent with the N6(A)-H∙∙∙O4(T) and N1(A)∙∙∙H-N3(T) hydrogen bonds holding adenine and thymine in place. Later, Pauling and Corey (13) showed that guanine and cytosine were joined by a third N2(G)-H∙∙∙O2(C) hydrogen bond in the minor groove, and Crick (14) used the three G∙C hydrogen bonds to account for the higher stability of the Watson-Crick pair compared with a “wobble” G∙U pair. Subsequent crystallographic investigations have revealed the existence of “weak” C-H∙∙∙O hydrogen bonds between nitrogenous bases, also somewhat shorter than the sum of the van der Waals’s distances (15); see Table 2. The geometry of the Watson-Crick A∙T base pair naturally forces a third such C2(A)-H∙∙∙O2(T) hydrogen bond (16), which emphasizes its similarity to the GC base pair (Fig. 1). Alternatively, the direct contact between adenine and thymine in the minor groove can be interpreted as an electrostatic attraction (17). In any case, this additional interaction in the A∙T pair is advantageous for selective recognition during replication and transcription (Fig. 2).

Figure 1. Comparison of hydrogen-bonding interactions, chemical structures (including double bonds), and relative displacement of bases composed of normal A∙T and G∙C Watson-Crick pairs (top), the rare G(enol) T pair (bottom left) made possible by the chemical modification of guanine, and the ''wobble'' GT pair (bottom right). Hydrogen bonds are designated by dashed lines and the ''weak'' CH∙∙∙O bond (16), or electrostatic attraction at close distance (17), of the A T base pair by a thin wavy line. Conventional proton donor and acceptor atoms are colored red and blue, respectively. The donor and acceptor atoms involved in ''weak'' CH∙∙∙O interactions are highlighted in pink and light blue. Structures generated with 3DNA (18) from the average parameters reported in Tables 1 and 2. Notice the unfavorable acceptor-acceptor interaction between O6(G) and O4(T) in the G(enol)T pair. The dashed line joining C1' atoms on associated bases illustrates the isomorphous geometry of the Watson-Crick and G(enol)T pairs (with roughly equivalent ~55° angles formed between the long virtual bond and each of the C1’—N9 and C1’—N1 glycosidic bonds) compared with the G∙T ''wobble'' pair (with corresponding angles of ~45° and ~70°).

Hybridization

In principle, the hydrogen atoms of the purine and pyrimidine bases can rearrange in different tautomeric or hybridized forms. The exocyclic nitrogen atoms attached to the adenine and cytosine rings usually are in the amino (NH2) form rather than the imino (NH) configuration. Likewise, the exocyclic oxygen atoms attached to the carbons of guanine and thymine rings normally adopt the keto (C=O) form rather than the enol (C-OH) configuration. Watson and Crick suggested that keto to enol or amino to imino base tautomerism could be the origin of the point mutations that underlie evolution (12). Such rearrangements would allow adenine to associate with cytosine or guanine to bind to thymine in geometries close to those of the canonical base pairs. For example, the normal d:d:a pattern of hydrogen-bond donors (d) and acceptors (a) in guanine is converted by keto-enol tautomerism to the d:a:d motif, which complements the preferred a:d:a motif of thymine (Fig. 1). Errors like these destroy the perfect complementarity between opposing chains that gives DNA its capacity for self-replication.

In this regard, significant chemical modification is required to effect either keto-enol or amino-imino tautomerism of the nitrogenous bases, with the consequent formation of A∙C and G∙T mispairs that fit into the canonical double-helical structure. Notably, the N6-methoxy A∙C mispairs and the O6-methylated G∙T mispairs, which are observed in crystalline duplex structures, are isomorphous with standard A∙T and G∙C base pairs (Tables 1 and 2); NDB entries: bd0009, bdlb26, and bdlb58 (19-21).

Fortunately, the imino forms of A and C and the enol forms of G and T occur rarely. Most A∙C and G∙T mispairs observed to date in high-resolution crystal structures (e.g., References 22 and 23) associate through a “wobble” configuration (14), with the bases “sheared” past one another relative to the Watson-Crick configuration (Table 1 and Fig. 1). These structural perturbations (Table 1 and Table 2) alter the patterns of atomic charges and accessibility that are presumably required for protein recognition and enzymatic action (see discussion below).

Table 1. Parameters describing complementary base-pair geometry in high-resolution DNA crystal structures*

Base pair

Buckle (°)

Propeller (°)

Opening (°)

Shear (A)

Stretch (A)

Stagger (A)

K

π

σ

Sx

Sy

Sz

Watson-Crick base pairs1

A∙T (A DNA)

+2.1(±3.9)

—11.4(±2.8)

0.0(±4.0)

+0.01(±0.08)

—0.19(±0.08)

+0.15(±0.16)

G∙C

—3.2(±7.4)

—11.5(±5.4)

0.0(±2.5)

—0.11(±0.19)

—0.18(±0.10)

0.00(±0.35)

A∙T (B DNA)

+0.8(±5.5)

—13.2(±4.9)

+2.0(±3.5)

+0.05(±0.23)

—0.14(±0.14)

+0.06(±0.18)

G∙C

+5.9(±6.6)

—9.7(±4.8)

—0.4(±2.5)

—10.11(±0.17)

—0.16(±0.15)

+0.12(±0.20)

Keto-enol tautomers (modified purines)2

A*∙C (bd0009)

+4.3

—14.9

+5.3

—0.47

—0.16

+0.04

G*∙T (bdlb26)

+9.5

—11.5

+0.2

—0.12

—0.12

—0.03

G*∙T (bdlb58)

+0.1

—16.3

+3.4

+0.04

+0.02

+0.22

“Wobble” base pairs2

A1∙C (bdl011)

+10.7

—10.6

+3.5

—1.94

—0.36

+0.25

G∙T (bdl011)

+11.7

—9.2

—0.4

—2.61

—0.66

—0.07

*See schematic illustrations in Fig. 3 for parameter definitions. Parameters are given for purine-pyrimidine (A∙T and G∙C) pairs. The parameters are identical for the corresponding pyrimidine-purine (T-A and C-G) pairs, except that K and Sx are of the opposite sign.

1Data based on the analysis, within 3DNA (18), of 328 base pairs in 32 A-DNA and 24 B-DNA crystal structures of 2.0 A or better resolution without chemical modification, mismatches, drugs, or proteins from the Nucleic Acid Database (2). Numerical values reflect the relative orientation and displacement of coordinate frames on complementary bases, the positions and directions of which are determined by the superposition of ideal, planar bases on real bases in the selected structures (20 A∙T and 148 G∙C from A-type duplexes; 99 A∙T and 61 G∙C from B-type duplexes). Mean values and standard deviations (subscripted values in parentheses) exclude terminal base pairs and side groups attached to nicked backbones. Deformational trends are similar across the (0.7-2.0 A) range of structural resolution considered here. See the following URL for complete sequences and literature citations: http://rutchem.rutgers.edu/~olson/Tsukuba.

2Inter-base parameters of the chemically modified rare base-pair tautomers (19-21), congruent with Watson-Crick base pairs, and “wobble” A∙C and G∙T base pairs (22, 23).

Table 2. Average hydrogen-bonding geometry of Watson-Crick base pairs in high-resolution DNA structures*

Base pair

dC1'...C1'

(A)

λY**

(deg)

λR**

(deg)

Hydrogen-bond distances (A)

Watson-Crick base pairs

N2-H∙∙∙O2

N1-H∙∙∙N3

O6∙∙∙H-N4

C2-H~O2

N1-H∙∙∙N3

N6-H∙∙∙O4

A∙T (A DNA)

10.4(±0.2)

55.2(±2.2)

55.7(±2.1)

3.5(±0.2)

2.7(±0.1)

2.9(±0.1)

G∙C

10.6(±0.2)

55.4(±2.5)

55.7(±1.9)

2.8(±0.1)

2.9(±0.1)

2.9(±0.1)

A∙T (B DNA)

10.5(±0.1)

56.0(±2.2)

55.7(±1.9)

3.5(±0.1)

2.8(±0.1)

3.0(±0.1)

G∙C

10.7(±0.1)

55.1(±2.6)

55.8(±2.1)

3.5(±0.1)

2.8(±0.1)

3.0(±0.1)

Keto-enol tautomers

N2—H∙∙∙O2

N1∙∙∙H-N3

O6-H∙∙∙O4

C2-H~O2

N1-H∙∙∙N3

N6-H∙∙∙N4

A*∙C

10.4

58.5

55.5

3.4

2.8

3.0

G*∙T

10.6

57.9

54.6

2.8

2.9

3.0

G*∙T

10.5

58.4

56.9

2.7

3.1

3.2

‘Wobble’ base pairs

N1-H∙∙∙O2

O6-H∙∙∙N3

N1+∙∙∙H-O2

N6-H∙∙∙N3

A+∙C

10.3

68.2

47.5

2.8

2.9

G∙T

10.4

71.0

43.6

2.7

2.8

*See Table 1 for structures included in the survey.

**Angles λY = ∠C1'(Y)-N1(Y)∙∙∙C1'(R) and λR = ∠C1'(R)-N9(R)∙∙∙C1'(Y) describe the pivoting of complementary bases in the base-pair plane (Fig. 1).

Antiparallel Strand Alignment

Because of the lack of symmetry in their chemical structures, individual nucleic acid bases have unique faces (24), which specify the directions of the DNA strands in the Watson-Crick model. The upper faces (tops) of complementary bases point in opposite directions, with the two attached sugar-phosphate backbones aligned in an antiparallel sense. The top of a base corresponds with the configuration that orients the C1'→N glycosidic bond vector in a “northeast” heading with respect to its “north-south” Watson-Crick base-pairing edge (found between the associated bases in Figs. 1 and 2). If the base and sugar are attached by the normally preferred anti glycosidic linkage (with the six-membered purine ring or the pyrimidine O2 directed away from the sugar ring), the DNA backbone roughly runs perpendicular to the plane of the base, with the 3'-oxygen displaced above the top side and the 5'-oxygen below the bottom side of each base. The vector that connects the tops of consecutive bases coincides with the 5'→3' direction of the sugar-phosphate chain (Fig. 3).

Figure 2. Color-coded electrostatic surface of A∙T (left) and G∙C (right) base pairs produced with the GRASP software package (25, 26) from ideal planar atomic coordinates (27) and partial atomic charges of the CHARMM27 nucleic acid force field (28): (top) the major-groove edges showing the unique donor-acceptor patterns of the base pairs; (middle) the upper faces (24) of purines and the under sides of pyrimidines (see also Fig. 1); (bottom) minor-groove edges showing the common Watson-Crick donor atom "signature." Conventional hydrogen-bond donor and acceptor atoms on base-pair edges are designated respectively by - and + symbols. The CH and CH3 groups in the major and minor grooves are noted by (+) to emphasize their moderate positive charges and their tendency to be in close contact with O and N acceptor atoms (17, 29). Isopotential contours, in units of kT (numerical scale at top of figure), reveal the approximate electrostatic equivalence of the minor-groove edges of the base pairs. Molecular surfaces generated using a spherical probe with 1.4 A radius. Bases "neutralized" by adjustment of the partial charges on C1' atoms. Electrostatic potential omits counterions and incorporates the difference in dielectric between water and bases, i.e., 80 vs. 2. Essentially, the same results are found with other well-known sets of partial atomic charges (30, 31), including those determined with state-of-the-art quantum mechanical methods (32).

Duplex Grooves and Recognition

The attachment of the sugars to the same side of each Watson-Crick base pair introduces an asymmetry in base-pair accessibility inside the grooves formed by the DNA backbone. The edge of the Watson-Crick base pair that contains the pyrimidine O2 and the purine N3 atoms is called the minor groove, and the longer edge on the opposite side of the glycosidic bonds is termed the major groove (Fig. 1). The hydrogen-bond donors and acceptors that line each groove (Figs. 1, 2) serve as recognition motifs for interactions of DNA with proteins, drugs, and solvent molecules. The pseudo-symmetric positioning of the N3 acceptor atoms of A and G and the O2 acceptor atoms of T and C provides a common minor-groove recognition element for all four Watson-Crick base pairs (33) (shown by the pattern of red and blue atoms in Fig. 2).

The positioning of the N3/O2 acceptor atoms provides a simple and reliable mechanism to distinguish the Watson-Crick pairs from the “wobble” pair and other mismatches. As suggested initially by Bruskov and Poltev (34), the fidelity of nucleic-acid biosynthesis would be increased substantially if the recognition elements of a polymerase had NH or OH groups that interacted with the invariant N3/O2 atoms in the minor groove of the growing double helix. This idea has been confirmed in the crystal structures of DNA complexed with different DNA polymerases (35, 36), where two highly conserved proton-donating amino acid residues (arginine and glutamine) associate with the N3/O2 acceptors at the 3'-end of the primer, i.e., the point from which the DNA chain grows.

The orientation of the amino proton attached to the N2 donor on G differentiates the G∙C and C∙G base pairs from each other as well as from the A∙T and T∙A pairs. The latter base pairs can be discriminated by small synthetic molecules that take advantage of both the asymmetric steric structure of the adenine C2-H and the capability of the thymine O2 (with two sets of lone pair electrons) to form an additional hydrogen bond not possible with the pseudo-symmetrically related adenine N3 (37, 38). It is not yet clear whether naturally occurring, DNA-binding proteins use similar principles to distinguish between A∙T and T∙A base pairs in the minor groove.

By contrast, all four bases present unique protein recognition patterns in the major groove (33) (Figs. 1 and 2). The N7 acceptor atoms of A and G set the purines apart from the pyrimidines, whereas the pseudo-symmetrically placed N6 donor and O6 acceptor, respectively, separate A from G. The corresponding isomorphic interchange of the O4 acceptor and N4 donor on T and C discriminates the two pyrimidines. On the other hand, the latter motifs provide a common identification mechanism for C and A or G and T.

In addition to the classic donor-acceptor recognition mechanism described above, hydrophobic and electrostatic interactions facilitate the discrimination of the base pairs one from another. As shown in Figs. 1 and 2, the pyrimidines are hydrophobic in the vicinity of C5 and either have a positive charge (cytosine) or are approximately neutral (thymine). These features distinguish the pyrimidines from the purines, both of which are hydrophilic and negatively charged at the pseudo-symmetrically related N7 position. Major-groove binding proteins take advantage of this difference. First, in many cases, hydrophobic amino-acid side chains interact directly with the thymine methyl group (29, 39). Second, negatively charged protein carbonyl oxygens frequently are found in the vicinity of the C5-H group of cytosine or the CH3 group of thymine but rarely in the vicinity of the purine N7 (29).

The positioning of Watson-Crick pairs along the global axis of the double-stranded B-DNA structure enhances the accessibility of major-groove versus minor-groove atoms, thereby favoring the binding of proteins that recognize specific sequences. The narrow B-DNA minor groove is less receptive to proteins but easily accommodates long, crescent-shaped drugs (e.g., References 37 and 38). The binding of proteins in the minor groove often necessitates a partial and sometimes a complete B→A conformational change in DNA, which displaces the base pairs away from the global axis and concomitantly exposes unpaired atoms on the minor-groove edges (40). On the other hand, small molecules bind specifically to the narrow major groove of double-stranded RNA (41), which adopts only A-type geometry.

Nonplanar Geometry

The Watson-Crick double-stranded model of DNA has been confirmed abundantly and refined with fiber and single-crystal X-ray diffraction studies. The high resolution crystal structures accumulated to date, starting with the 0.8-A resolution structure of the dinucleoside phosphate adenosyl-3',5'-uridine miniduplex (43) and now including structures of oligonucleotide duplexes of comparable resolution (44, 45), show that complementary Watson-Crick bases are not perfectly coplanar. The bases in most solved structures are twisted with respect to each other like the blades of a propeller, with the C1'-atom on the sequence strand typically shifted below and that on the complementary strand displaced above the mean base-pair plane, i.e., negative propeller (see parameter definitions in Fig. 3 where positive propeller is illustrated). This deformation stabilizes the right-handed B-DNA structure by enhancing stacking overlaps with bases in adjacent residues (46). The degree of propeller twisting depends on both base-pair and conformational context. The A∙T pairs in B-DNA are more distorted on average than the GC pairs (-13° vs. -10°), and almost no propeller twisting of base pairs exists in the left-handed Z-DNA conformation, e.g., -3° in the d(CGCGCG)2 duplex structures (NDB entry: zdf001) (47). Buckle, although fixed on average near zero, shows more pronounced variability than propeller and, for G-C base pairs, exhibits a notable dependence on helical conformation. The G∙C base pairs tend to buckle in a positive sense in B-DNA duplexes (as shown in Fig. 3) and in a negative direction in A-DNA structures. The constraints of hydrogen-bond stretching and bending (Table 2) presumably lead to the more limited variations in opening and stretch (Table 1) compared with the other complementary base-pair angles and distances.

Figure 3. Illustration of parameters used to describe the relative orientation and displacement of complementary Watson-Crick base pairs (42). The sequence strand (I) is on the left in black, and the complementary strand (II) is on the right in red. Darkened corners represent the glycosidic linkage to the sugar-phosphate backbone. The reference frame attached to a base pair (top) is constructed such that the x-axis points away from the (shaded) minor-groove edge along the pseudo-dyad axis of an "ideal" pair (in which all six parameters are zero), and the y-axis points in the direction of the sequence strand (27). Images created with 3DNA (18) illustrate positive values of the designated parameters. The gray arrows designate the positive signs of rotations for strand I, and the red arrows for strand II. Heavy dots in images of distorted pairs designate the origins of the base-pair frames.

On average, the A∙T pairs are characterized by relatively small magnitudes of buckle, especially in the B form (Table 1), but they normally show a larger variability in propeller twisting compared with the G-C pairs. This observation is consistent with the different hydrogen bonding of A∙T versus G∙C. Both the buckle and the propeller angles enhance base stacking in B DNA. In G∙C pairs, stabilized by three strong hydrogen bonds, excessive propeller twisting is expected to be unfavorable, as this would distort two NH∙∙∙O bonds in the minor and major grooves (Fig. 1). Thus, propeller twist is expected to be less pronounced in G∙C than in A∙T pairs. To improve base stacking, apparently the G-C pairs “take advantage” of the other angular, degree of freedom, buckle, which is less prohibitive for the hydrogen bonds. In A∙T pairs with only two strong hydrogen bonds (Fig. 1), a large propeller of 15-20° is acceptable. As a result, there is no need for a large buckle, which remains ~1° on average (Table 1, B-DNA).

Protein-induced Base-pair Deformations

The associations of DNA with proteins and drugs introduce additional deformations of Watson-Crick geometry. Several examples, which are illustrated in Fig. 4, demonstrate the functional importance of the base-pair deformations discussed above.

Buckle

The partial intercalation of aromatic side groups of the yeast TATA-box binding protein between DNA base pairs (48) is accompanied by a pronounced (32°) buckling in one of the two adjacent A-T pairs, which apparently facilitates “penetration” of the phenylalanine rings into the minor groove (NDB entry: pdt012; Fig. 4a). A second example involves the integration host factor-DNA complex (49) (NDB entry: pdt040; Fig. 4b), where the partial minor-groove insertion of arginine 63 and close association of arginine 60 result in marked buckling in the opposite direction of the surrounding T-37∙A37and G-36∙C36 base pairs (with respective buckle angles of -47° and -35°).

Figure 4. Protein-induced distortions of complementary base-pair parameters in representative crystal structures: (a) view from the major groove of the large buckling of the (upper) A8∙T22 base pair in the DNA complexed with the yeast TATA-box binding protein (48) brought about by the partial minor-groove insertion at the A8A9∙T22T21 dimer step of phenylalanine 99 (wire-frame ring connected to the magenta (glutamic acid 93 and isoleucine103) polypeptide ribbon; NDB entry: pdt012); (b) major-groove view of negatively buckled (upper) T-37∙A37 and (lower) G-36∙C36 pairs, the hydrogen-bonded (wireframe) arginine 60 (lower right) and arginine 63 (upper left) side groups, and a fragment (glutamine 59 and lysine 66) of the minor-groove-bound, extended β-sheet recognition element of integration host factor (49) (NDB entry: pdt040); (c) view of the upper face of the opened A13∙T19 base pair and the minor-groove contacting (serine 183 and asparagine 190) C-terminal ribbon in the Hin recombinase-DNA complex (50) (NDB entry: pde009); (d) major-groove view of the extreme opening and base-pair displacement of the chemically modified 4’-thio-2’-deoxycytidine in the C5Me407G408C409-G428C*427G426 trimer step (C5Me407-G428 at top, C409 G426 at bottom) and closely associated amino acids (glycine 78 and lysine 91) in the complex of DNA with Hhal methyltransferase (51) (NDB entry: pde141); (e) minor-groove view of the staggered (lower) A1541∙T1499 base pair, the N6(A1541)∙∙∙O6(G1498) inter-strand hydrogen bond (dashed line), and the major-groove recognition helix (serine 1151 and lysine 1165) of RXR-a in the complex with its idealized direct repeat DNA target (52) (NDB entry: pd0071). Images created with 3DNA (18) and Raster3D (53). Planes of bases colored as follows: A—red; T—blue; G—green; C—yellow.

Opening and stretching

A subtle (24°) base-pair opening of A13∙T19 is induced by contacts of Hin recombinase (50) with the minor-groove edge of T19 (NDB entry: pde009; Fig. 4c). By contrast, the major-groove capture of cytosine by HhaI DNA cytosine-5-methyltransferase (51) introduces nearly maximal opening (178°) and extreme lateral base-pair displacement, i.e., stretch (8.5 A), of the broken base pair (NDB entry: pde141; Fig. 4d).

Stagger

The close fit of the recognition helix of the 9-cis retinoic acid receptor, RXR, against the DNA major groove is responsible for the noticeable stagger (1.2 A) of the T1499∙A1541 pair and the accompanying inter-strand N6(A1541)∙∙∙O6(G1498) hydrogen bonding (52) (NDB entry: pd0071; Fig. 4e).

Summary

The Watson-Crick base-pairing scheme is characterized by several unique structural properties, including the complementarity and perfect isomorphism that are used in replication and in transcription of the genetic code. These general features underlie the ability of DNA to incorporate any arbitrary sequence in a nearly regular duplex. On the other hand, the subtle, sequence-dependent variability of A∙T versus G∙C base-pairing geometry is used by the DNA-binding proteins involved in regulation. The ingenious base-pairing principle postulated more than half a century ago and subsequently confirmed in high resolution crystal structures of DNA and RNA continues to surprise us by its beauty, simplicity, and complexity.

Acknowledgments

We are grateful to Dr. A.R. Srinivasan for generous help. Support of this work through U.S.P.H.S. Grant GM20861 is gratefully acknowledged. Computations were carried out at the Rutgers University Center for Computational Chemistry and through the facilities of the Nucleic Acid Database project.

References

1. Egholm M, Buchardt O, Nielsen PE, Berg RH. Peptide nucleic acids (PNA). Oligonucleotide analogues with an achiral peptide backbone. J. Am. Chem. Soc. 1992; 114:1895-1897.

2. Berman HM, Olson WK, Beveridge DL, Westbrook J, Gelbin A, Demeny T, Hsieh S-H, Srinivasan AR, Schneider B. The Nucleic Acid Database: A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 1992; 63:751-759.

3. Zamenhof S, Brawarman G, Chargaff E. On the deoxypentose nucleic acids from several microorganisms. Biochim. Biophys. Acta 1952; 9:402-405.

4. Watson JD, Crick FHC. A structure for deoxyribose nucleic acid. Nature 1953; 171:737-738.

5. Karlin S, Campbell AM, Mrazek J. Comparative DNA analysis across diverse genomes. Annu. Rev. Genet. 1998; 32:185-225.

6. Blake RD, Delcourt SG. Thermal stability of DNA. Nucleic Acids Res. 1998; 26:3323-3332.

7. Yanson IK, Teplitsky AB, Sukhodub LF. Experimental studies of molecular interactions between nitrogen bases of nucleic acids. Biopolymers 1979; 18:1149-1170.

8. Sukhodub LF. Interaction and hydration of nucleic acid bases in a vacuum. Experimental study. Chem. Rev. 1987; 87:589-606.

9. Williams LD, Chawla B, Shaw BR. The hydrogen bonding of cytosine with guanine: calorimetric and 1H-NMR analysis of the molecular interactions of nucleic acid bases. Biopolymers 1987; 26:591-603.

10. Ts’o POP. Bases, nucleosides, and nucleotides. In: Basic Principles in Nucleic Acid Chemistry, volume I. Ts’o POP, ed. 1974. Academic Press, New York. pp. 453-584.

11. Turner DH, Sugimoto N, Kierzek R, Dreiker SD. Free energy increments for hydrogen bonds in nucleic acid base pairs. J. Am. Chem. Soc. 1987; 109:3783-3785.

12. Watson JD, Crick FHC. Genetical implications of the structure of deoxyribonucleic acid. Nature 1953; 171:964-967.

13. Pauling L, Corey RB. Specific hydrogen-bond formation between pyrimidines and purines in deoxyribonucleic acids. Arch. Biochem. Biophys. 1956; 65:164-181.

14. Crick FHC. Codon-anticodon pairing: The wobble hypothesis. J. Mol. Biol. 1966; 19:548-555.

15. Wahl MC, Sundaralingam M. C-H∙∙∙O hydrogen bonding in biology. Trends Biochem. Sci. 1997; 22:97-102.

16. Leonard GA, McAuley-Hecht K, Brown T, Hunter WN. Do C-H∙∙∙O hydrogen bonds contribute to the stability of nucleic acid base pairs? Acta Crystall. 1995; D51:136-139.

17. Zhurkin VB, Raghunathan G, Ulyanov NB, Camerini-Otero RD, Jernigan RL. Recombination triple helix, R-form DNA. A stereochemical model for recognition and strand exchange. In: Structural Biology: The State of the Art, Vol. 2. Sarma RH, Sarma MH, eds. 1994. Adenine Press, Schenectady, NY. pp. 43-66.

18. Lu X-J, Olson WK. 3DNA: A software package for the analysis, rebuilding, and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res. 2003; 31:5108-5121.

19. Leonard GA, Thomson J, Watson WP, Brown T. High-resolution structure of a mutagenic lesion in DNA. Proc. Natl. Acad. Sci. U.S.A. 1990; 87:9573-9576.

20. Vojtechovsky J, Eaton MD, Gaffney B, Jones R, Berman HM. Structure of a new crystal form of a DNA dodecamer containing T.(O6Me) G base pairs. Biochemistry 1995; 34:16632-16640.

21. Chatake T, Ono A, Ueno Y, Matsuda A, Takenaka A. Crystallographic studies on damaged DNAs. I. An N6-methoxyadenine residue forms a Watson-Crick pair with a cytosine residue in a B-DNA duplex. J. Mol. Biol. 1999; 294:1215-1222.

22. Hunter WN, Brown T, Kennard O. Structural features and hydration of a dodecamer duplex containing two C-A mispairs. Nucleic Acids Res. 1987; 15:6589-6605.

23. Hunter WN, Brown T, Kneale G, Anand NN, Rabinovich D, Kennard O. The structure of guanosine-thymidine mismatches in B-DNA at 2.5 Angstroms resolution. J. Biol. Chem. 1987; 262: 9962-9970.

24. Rose IA, Hanson KR, Wilkinson KD, Wimmer MJ. A suggestion for naming faces of ring compounds. Proc. Natl. Acad. Sci. U.S.A. 1980; 77:2439-2441.

25. Nicholls A, Sharp K, Honig B. Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins 1991; 4:281-296.

26. Nicholls A. GRASP: Graphical Representation and Analysis of Surface Properties. 1992. Columbia University, New York.

27. Olson WK, Bansal M, Burley SK, Dickerson RE, Gerstein M, Harvey SC, Heinemann U, Lu X-J, Neidle S, Shakked Z, Sklenar H, Suzuki M, Tung C-S, Westhof E, Wolberger C, Berman HM. A standard reference frame for the description of nucleic acid base-pair geometry. J. Mol. Biol. 2001; 313:229-237.

28. Foloppe N, MacKerell AD Jr. All-atom empirical force field for nucleic acids: I. Parameter optimization based on small molecule and condensed phase macromolecular data. J. Comp. Chem. 2000; 21:86-104.

29. Mandel-Gutfreund Y, Margalit H, Jernigan RL, Zhurkin VB. A role for CH∙∙∙O interactions in protein-DNA recognition. J. Mol. Biol. 1998; 277:1129-1140.

30. Renugopalakrishnan V, Lakshminarayanan AV, Sasisekharan V. Stereochemistry of nucleic acids and polynucleotides. 3. Electronic charge distribution. Biopolymers 1971; 10:1159-1167.

31. Zhurkin VB, Poltev VI, Florent’ev VL. Atom-atom potential functions for conformational calculations of nucleic acids. Mol. Biol. (USSR) 1980; 14:1116-1130.

32. Sponer J, Leszczynski J, Hobza P. Electronic properties, hydrogen bonding, stacking, and cation binding of DNA and RNA bases. Biopolymers 2001-2002; 61:3-31.

33. Seeman NC, Rosenberg JM, Rich A. Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl. Acad. Sci. U.S.A. 1976; 73:804-808.

34. Bruskov VI, Poltev VI. On molecular mechanisms of nucleic acid synthesis. Fidelity aspects: II. Contribution of protein-nucleotide recognition. J. Theor. Biol. 1979; 78:29-41.

35. Doublie S, Tabor S, Long AM, Richardson CC, Ellenberger T. Crystal structure of a bacteriophage T7 DNA replication complex at 2.2 A resolution. Nature 1998; 391:251-258.

36. Kiefer JR, Mao C, Braman JC, Beese LS. Visualizing DNA replication in a catalytically active Bacillus DNA polymerase crystal. Nature 1998; 391:304-307.

37. White S, Szewczyk JW, Turner JM, Baird EE, Dervan PB. Recognition of the four Watson-Crick base pairs in the DNA minor groove by synthetic ligands. Nature 1998; 391:468-471.

38. Kielkopf CL, White S, Szewszyk JW, Turner JM, Baird EE, Dervan PB, Rees DC. A structural basis for recognition of AT and UA base pairs in the minor groove of B-DNA. Science 1998; 282:111-115.

39. Tucker-Kellogg L, Rould MA, Chambers KA, Ades SE, Sauer RT, Pabo CO. Engrailed (Gln50-->Lys) homeodomain-DNA complex at 1.9 A resolution: structural basis for enhanced affinity and altered specificity. Structure 1997; 5:1047-1054.

40. Lu X-J, Shakked Z, Olson WK. A-form conformational motifs in ligand-bound DNA structures. J. Mol. Biol. 2000; 300:819-840.

41. Dickerson RE, Bansal M, Calladine CR, Diekmann S, Hunter WN, Kennard O, von Kitzing E, Lavery R, Nelson HCM, Olson WK, Saenger W, Shakked Z, Sklenar H, Soumpasis DM, Tung C-S, Wang AH-J, Zhurkin VB. Definitions and nomenclature of nucleic acid structure parameters. J. Mol. Biol. 1989; 208:787-791.

42. Jin E, Katritch V, Olson WK, Kharatisvilli M, Abagyan R, Pilch DS. Aminoglycoside binding in the major groove of duplex RNA: The thermodynamic and electrostatic forces that govern recognition. J. Mol. Biol. 2000; 298:95-110.

43. Rosenberg JM, Seeman NC, Kim JJP, Suddath FL, Nicholas HB, Rich A. Double helix at atomic resolution. Nature 1973; 243:150-154.

44. Egli M, Tereshko V, Teplova M, Minasov G, Joachimiak A, Sanishvili R, Weeks CM, Miller R, Maier MA, An H, Dan Cook P, Manoharan M. X-ray crystallographic analysis of the hydration of A- and B-form DNA at atomic resolution. Biopolymers 1998; 48:234-252.

45. Hays FA, Teegarden AT, Jones ZJR, Harms M, Raup D, Watson J, Cavaliere E, Ho PS. How does sequence define structure? A crystallographic map of DNA structure and conformation. Proc. Natl. Acad. Sci. U.S.A. 2005; 102:7157-7162.

46. Levitt M. How many base-pairs per turn does DNA have in solution and in chromatin? Some theoretical calculations. Proc. Natl. Acad. Sci. U.S.A. 1978; 75:640-644.

47. Wang AH-J, Quigley GJ, Kolpak FJ, Crawford JL, van Boom JH, van der Marel GA, Rich A. Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature 1979; 282:680-686.

48. Kim Y, Geiger JH, Hahn S, Sigler PB. Crystal structure of a yeast TBP/TATA-box complex. Nature 1993; 365:512-520.

49. Rice PA, Yang S-W, Mizuuchi K, Nash HA. Crystal structure of an IHF-DNA complex: a protein-induced DNA U-turn. Cell. 1996; 87:1295-1306.

50. Feng J-A, Johnson RC, Dickerson RE. Hin recombinase bound to DNA: the origin of specificity in major and minor groove interactions. Science 1994; 263:348-355.

51. Kumar S, Horton JR, Jones GD, Walker RT, Roberts RJ, Cheng X. DNA containing 4-thio-2;-deoxycytidine inhibits methylation by HhaI methyltransferase. Nucleic Acids Res. 1997; 25:2773-2783.

52. Zhao Q, Chasse SA, Devarakonda S, Sierk ML, Ahvazi B, Rastinejad F. Structural basis of RXR-DNA interactions. J. Mol. Biol. 2000; 296:509-520.

53. Merritt EA, Bacon DJ. Raster 3D photorealistic molecular graphics. Meth. Enzymol. 1997; 277:505-524.

See Also

DNA-Based Structures;

DNA Recognition by Enzymes;

Nucleic Acid Hydration;

Nucleic Acid Recognition;

Peptide Nucleic Acids;

Protein-Nucleic Acid Interactions;

Small Molecule-Nucleic Acid Interactions