DNA Organization, Replication, & Repair - Structure, Function, & Replication of Informational Macromolecules - Harper’s Illustrated Biochemistry, 29th Edition (2012)

Harper’s Illustrated Biochemistry, 29th Edition (2012)

SECTION IV. Structure, Function, & Replication of Informational Macromolecules

Chapter 35. DNA Organization, Replication, & Repair

P. Anthony Weil, PhD

OBJECTIVES

After studying this chapter, you should be able to:

Image Appreciate that roughly 3 × 109 base pairs of DNA that compose the haploid genome of humans are divided uniquely between 23 linear DNA units, the chromosomes. Humans, being diploid, have 23 pairs of chromosomes: 22 autosomes and 2 sex chromosomes.

Image Understand that human genomic DNA, if extended end-to-end, would be meters in length, yet still fits within the nucleus of the cell, an organelle that is only microns (μ; 10–6 meters) in diameter. Such condensation in DNA length is induced following its association with the highly positively charged histone proteins resulting in the formation of a unique DNA-histone complex termed the nucleosome. Nucleosomes have DNA wrapped around the surface of an octamer of histones.

Image Explain that strings of nucleosomes form along the linear sequence of genomic DNA to form chromatin, which itself can be more tightly packaged and condensed, which ultimately leads to the formation of the chromosomes.

Image Appreciate that while the chromosomes are the macroscopic functional units for DNA recombination, gene assortment, and cellular division, it is DNA function at the level of the individual nucleotides that composes regulatory sequences linked to specific genes that are essential for life.

Image Explain the steps, phase of the cell cycle, and the molecules responsible for the replication, repair, and recombination of DNA, and understand the negative effects of errors in any of these processes upon cellular and organismal integrity and health.

BIOMEDICAL IMPORTANCE*

The genetic information in the DNA of a chromosome can be transmitted by exact replication or it can be exchanged by a number of processes, including crossing over, recombination, transposition, and conversion. These provide a means of ensuring adaptability and diversity for the organism but, when these processes go awry, can also result in disease. A number of enzyme systems are involved in DNA replication, alteration, and repair. Mutations are due to a change in the base sequence of DNA and may result from the faulty replication, movement, or repair of DNA and occur with a frequency of about one in every 106 cell divisions. Abnormalities in gene products (either in RNA, protein function, or amount) can be the result of mutations that occur in coding or regulatory-region DNA. A mutation in a germ cell is transmitted to offspring (so-called vertical transmission of hereditary disease). A number of factors, including viruses, chemicals, ultraviolet light, and ionizing radiation, increase the rate of mutation. Mutations often affect somatic cells and so are passed on to successive generations of cells, but only within an organism (ie, horizontally). It is becoming apparent that a number of diseases—and perhaps most cancers—are due to the combined effects of vertical transmission of mutations as well as horizontal transmission of induced mutations.

CHROMATIN IS THE CHROMOSOMAL MATERIAL IN THE NUCLEI OF CELLS OF EUKARYOTIC ORGANISMS

Chromatin consists of very long double-stranded DNA (dsDNA) molecules and a nearly equal mass of rather small basic proteins termed histones as well as a smaller amount of nonhistone proteins (most of which are acidic and larger than histones) and a small quantity of RNA. The nonhistone proteins include enzymes involved in DNA replication and repair, and the proteins involved in RNA synthesis, processing, and transport to the cytoplasm. The dsDNA helix in each chromosome has a length that is thousands of times the diameter of the cell nucleus. One purpose of the molecules that comprise chromatin, particularly the histones, is to condense the DNA; however, it is important to note that the histones also integrally participate in gene regulation (Chapters 36, 38, and 42); indeed histones contribute importantly to all DNA-directed molecular transactions. Electron microscopic studies of chromatin have demonstrated dense spherical particles called nucleosomes, which are approximately 10 nm in diameter and connected by DNA filaments (Figure 35–1). Nucleosomes are composed of DNA wound around a collection of histone molecules.

Image

FIGURE 35–1 Electron micrograph of nucleosomes (white, ball-shaped) attached to strands of DNA (thin, gray line); see also Figure 35–2. (Reproduced, with permission, from Shao Z: Probing nanometer structures with atomic force microscopy. News Physiol Sci, 1999;14:142–149. Courtesy of Professor Zhifeng Shao, University of Virginia.)

Histones Are the Most Abundant Chromatin Proteins

Histones are a small family of closely related basic proteins. H1 histones are the ones least tightly bound to chromatin (Figures 35–1, 35–2, and 35–3) and are, therefore, easily removed with a salt solution, after which chromatin becomes more soluble. The organizational unit of this soluble chromatin is the nucleosome. Nucleosomes contain four major types of histones: H2A, H2B, H3, and H4. The structures of all four histones—H2A, H2B, H3, and H4, the so-called core histones that form the nucleosome—have been highly conserved between species, although variants of the histones exist and are used for specialized purposes. This extreme conservation implies that the function of histones is identical in all eukaryotes and that the entire molecule is involved quite specifically in carrying out this function. The carboxyl terminal two-thirds of the histone molecules are hydrophobic, while their amino terminal thirds are particularly rich in basic amino acids. These four core histones are subject to at least six types of covalent modification or posttranslational modifications (PTMs): acetylation, methylation, phosphorylation, ADP-ribosylation, monoubiquitylation, and sumoylation. These histone modifications play an important role in chromatin structure and function, as illustrated in Table 35–1.

TABLE 35–1 Possible Roles of Modified Histones

Image

The histones interact with each other in very specific ways. H3 and H4 form a tetramer containing two molecules of each (H3–H4)2, while H2A and H2B form dimers (H2A–H2B). Under physiologic conditions, these histone oligomers associate to form the histone octamer of the composition (H3–H4)2–(H2A–H2B)2.

The Nucleosome Contains Histone & DNA

When the histone octamer is mixed with purified dsDNA under appropriate ionic conditions, the same X-ray diffraction pattern is formed as that observed in freshly isolated chromatin. Electron microscopic studies confirm the existence of reconstituted nucleosomes. Furthermore, the reconstitution of nucleosomes from DNA and histones H2A, H2B, H3, and H4 is independent of the organismal or cellular origin of the various components. Neither the histone H1 nor the nonhistone proteins are necessary for the reconstitution of the nucleosome core.

In the nucleosome, the DNA is supercoiled in a left-handed helix over the surface of the disk-shaped histone octamer (Figure 35–2). The majority of core histone proteins interact with the DNA on the inside of the supercoil without protruding, although the amino terminal tails of all the histones are thought to extend outside of this structure and are available for regulatory PTMs (see Table 35–1).

Image

FIGURE 35–2 Model for the structure of the nucleosome, in which DNA is wrapped around the surface of a flat protein cylinder consisting of two each of histones H2A, H2B, H3, and H4 that form the histone octamer.The ~145 bp of DNA, consisting of 1.75 superhelical turns, are in contact with the histone octamer. The position of histone H1, when it is present, is indicated by the dashed outline at the bottom of the figure. Histone H1 interacts with DNA as it enters and exits the nucleosome.

The (H3–H4)2 tetramer itself can confer nucleosome-like properties on DNA and thus has a central role in the formation of the nucleosome. The addition of two H2A–H2B dimers stabilizes the primary particle and firmly binds two additional half-turns of DNA previously bound only loosely to the (H3–H4)2. Thus, 1.75 superhelical turns of DNA are wrapped around the surface of the histone octamer, protecting 145–150 bp of DNA and forming the nucleosome core particle (Figure 35–2). In chromatin, core particles are separated by an about 30-bp region of DNA termed “linker.” Most of the DNA is in a repeating series of these structures, giving the so-called beads-on-a-string appearance when examined by electron microscopy (see Figure 35–1).

The assembly of nucleosomes is mediated by one of several nuclear chromatin assembly factors facilitated by histone chaperones, a group of proteins that exhibit high-affinity histone binding. As the nucleosome is assembled, histones are released from the histone chaperones. Nucleosomes appear to exhibit preference for certain regions on specific DNA molecules, but the basis for this nonrandom distribution, termed phasing, is not yet completely understood. Phasing is likely related both to the relative physical flexibility of particular nucleotide sequences to accommodate the regions of kinking within the supercoil, as well as the presence of other DNA-bound factors that limit the sites of nucleosome deposition.

HIGHER ORDER STRUCTURES PROVIDE FOR THE COMPACTION OF CHROMATIN

Electron microscopy of chromatin reveals two higher orders of structure—the 10-nm fibril and the 30-nm chromatin fiber—beyond that of the nucleosome itself. The disk-like nucleosome structure has a 10-nm diameter and a height of 5 nm. The 10-nm fibril consists of nucleosomes arranged with their edges separated by a small distance (30 bp of DNA) with their flat faces parallel to the fibril axis (Figure 35–3). The 10-nm fibril is probably further supercoiled with six or seven nucleosomes per turn to form the 30-nm chromatin fiber (Figure 35–3). Each turn of the supercoil is relatively flat, and the faces of the nucleosomes of successive turns would be nearly parallel to each other. H1 histones appear to stabilize the 30-nm fiber, but their position and that of the variable length spacer DNA are not clear. It is probable that nucleosomes can form a variety of packed structures. In order to form a mitotic chromosome, the 30-nm fiber must be compacted in length another 100-fold (see below).

Image

Image

FIGURE 35–3 Shown is the extent of DNA packaging in metaphase chromosomes (top) to noted duplex DNA (bottom). Chromosomal DNA is packaged and organized at several levels as shown (see Table 35–2). Each phase of condensation or compaction and organization (bottom to top) decreases overall DNA accessibility to an extent that the DNA sequences in metaphase chromosomes are almost totally transcriptionally inert. In toto, these five levels of DNA compaction result in nearly a 104-fold linear decrease in end-to-end DNA length. Complete condensation and decondensation of the linear DNA in chromosomes occur in the space of hours during the normal replicative cell cycle (see Figure 35–20).

TABLE 35–2 The Packing or Compaction Ratios of Each of the Orders of DNA Structure

Image

In interphase chromosomes, chromatin fibers appear to be organized into 30,000–100,000 bp loops or domains anchored in a scaffolding (or supporting matrix) within the nucleus, the so-called nuclear matrix. Within these domains, some DNA sequences may be located nonrandomly. It has been suggested that each looped domain of chromatin corresponds to one or more separate genetic functions, containing both coding and noncoding regions of the cognate gene or genes. This nuclear architecture is likely dynamic, having important regulatory effects upon gene regulation. Recent data suggest that certain genes or gene regions are mobile within the nucleus, moving obligatorily to discrete loci within the nucleus upon activation. Further work will determine both if this is a general phenomenon, and what molecular mechanisms are responsible.

SOME REGIONS OF CHROMATIN ARE “ACTIVE” & OTHERS ARE “INACTIVE”

Generally, every cell of an individual metazoan organism contains the same genetic information. Thus, the differences between cell types within an organism must be explained by differential expression of the common genetic information. Chromatin containing active genes (ie, transcriptionally or potentially transcriptionally active chromatin) has been shown to differ in several ways from that of inactive regions. The nucleosome structure of active chromatin appears to be altered, sometimes quite extensively, in highly active regions. DNA in active chromatin contains large regions (about 100,000 bases long) that are relatively more sensitive to digestion by a nuclease such as DNase I. DNase I makes single-strand cuts in nearly any segment of DNA (ie, low-sequence specificity). It will digest DNA that is not protected, or bound by protein, into its component deoxynucleotides. The sensitivity to DNase I of active chromatin regions reflects only a potential for transcription rather than transcription itself and in several systems can be correlated with a relative lack of 5-methyldeoxycytidine (meC) in the DNA and particular histone variants and/or PTMs (phosphorylation, acetylation, etc; see Table 35–1).

Within the large regions of active chromatin there exist shorter stretches of 100–300 nucleotides that exhibit an even greater (another 10-fold) sensitivity to DNase I. These hypersensitive sites probably result from a structural conformation that favors access of the nuclease to the DNA. These regions are often located immediately upstream from the active gene and are the location of interrupted nucleosomal structure caused by the binding of nonhistone regulatory transcription factor proteins (see Chapters 36 and 38). In many cases, it seems that if a gene is capable of being transcribed, it very often has a DNase-hypersensitive site(s) in the chromatin immediately upstream. As noted above, nonhistone regulatory proteins involved in transcription control and those involved in maintaining access to the template strand lead to the formation of hypersensitive sites. Such sites often provide the first clue about the presence and location of a transcription control element.

By contrast, transcriptionally inactive chromatin is densely packed during interphase as observed by electron microscopic studies and is referred to as heterochromatin; transcriptionally active chromatin stains less densely and is referred to as euchromatin. Generally, euchromatin is replicated earlier than heterochromatin in the mammalian cell cycle (see below). The chromatin in these regions of inactivity is often high in meC content, and histones therein contain relatively lower levels of covalent modifications.

There are two types of heterochromatin: constitutive and facultative. Constitutive heterochromatin is always condensed and thus essentially inactive. It is found in the regions near the chromosomal centromere and at chromosomal ends (telomeres). Facultative heterochromatin is at times condensed, but at other times it is actively transcribed and, thus, uncondensed and appears as euchromatin. Of the two members of the X chromosome pair in mammalian females, one X chromosome is almost completely inactive transcriptionally and is heterochromatic. However, the heterochromatic X chromosome decondenses during gametogenesis and becomes transcriptionally active during early embryogenesis—thus, it is facultative heterochromatin.

Certain cells of insects, for example, Chironomus and Drosophila, contain giant chromosomes that have been replicated for multiple cycles without separation of daughter chromatids. These copies of DNA line up side by side in precise register and produce a banded chromosome containing regions of condensed chromatin and lighter bands of more extended chromatin. Transcriptionally active regions of these polytene chromosomes are especially decondensed into “puffs” that can be shown to contain the enzymes responsible for transcription and to be the sites of RNA synthesis (Figure 35–4). Using highly sensitive fluorescently labeled hybridization probes, specific gene sequences can be mapped, or “painted,” within the nuclei of human cells, even without polytene chromosome formation, using FISH (fluorescence in situ hybridization; Chapter 39) techniques.

Image

FIGURE 35–4 Illustration of the tight correlation between the presence of RNA polymerase II (Table 36–2) and messenger RNA synthesis. A number of genes, labeled A, B (top), and 5C, but not genes at locus (band) BR3 (5C, BR3, bottom) are activated when Chironomus tentans larvae are subjected to heat shock (39°C for 30 min). (A) Distribution of RNA polymerase II in isolated chromosome IV from the salivary gland (at arrows). The enzyme was detected by immunofluorescence using an antibody directed against the polymerase. The 5C and BR3 are specific bands of chromosome IV, and the arrows indicate puffs. (B) Autoradiogram of a chromosome IV that was incubated in 3H-uridine to label the RNA. Note the correspondence of the immunofluorescence and presence of the radioactive RNA (black dots). Image. (Reproduced, with permission, from Sass H: RNA polymerase B in polytene chromosomes. Cell 1982;28:274. Copyright © 1982. Reprinted with permission from Elsevier.)

DNA IS ORGANIZED INTO CHROMOSOMES

At metaphase, mammalian chromosomes possess a twofold symmetry, with the identical duplicated sister chromatids connected at a centromere, the relative position of which is characteristic for a given chromosome (Figure 35–5). The centromere is an adenine-thymine (A–T)-rich region containing repeated DNA sequences that range in size from 102 (brewers’ yeast) to 106 (mammals) base pairs (bp). Metazoan centromeres are bound by nucelosomes containing the histone H3 variant protein CENP-A and other specific centromere-binding proteins. This complex, called the kine-tochore, provides the anchor for the mitotic spindle. It thus is an essential structure for chromosomal segregation during mitosis.

Image

FIGURE 35–5 The two sister chromatids of mitotic human chromosome 12. The location of the A+T-rich centromeric region connecting sister chromatids is indicated, as are two of the four telomeres residing at the very ends of the chromatids that are attached one to the other at the centromere. (Courtesy of Biophoto Associates/Photo Researchers, Inc.)

The ends of each chromosome contain structures called telomeres. Telomeres consist of short TG-rich repeats. Human telomeres have a variable number of repeats of the sequence 5′-TTAGGG-3’, which can extend for several kilobases. Telomerase, a multisubunit RNA template-containing complex related to viral RNA-dependent DNA polymerases (reverse transcriptases), is the enzyme responsible for telomere synthesis and thus for maintaining the length of the telomere. Since telomere shortening has been associated with both malignant transformation and aging, this enzyme has become an attractive target for cancer chemotherapy and drug development. Each sister chromatid contains one dsDNA molecule. During interphase, the packing of the DNA molecule is less dense than it is in the condensed chromosome during metaphase. Metaphase chromosomes are nearly completely transcriptionally inactive.

The human haploid genome consists of about 3 × 109 bp and about Image nucleosomes. Thus, each of the 23 chromatids in the human haploid genome would contain on the average Image nucleotides in one dsDNA molecule. Therefore, the length of each DNA molecule must be compressed about 8000-fold to generate the structure of a condensed metaphase chromosome. In metaphase chromosomes, the 30-nm chromatin fibers are also folded into a series of looped domains, the proximal portions of which are anchored to a nonhistone proteinaceous nuclear matrix scaffolding within the nucleus (Figure 35–3). The packing ratios of each of the orders of DNA structure are summarized in Table 35–2. The packaging of nucleoproteins within chromatids is not random, as evidenced by the characteristic patterns observed when chromosomes are stained with specific dyes such as quinacrine or Giemsa stain (Figure 35–6).

Image

FIGURE 35–6 A human karyotype (of a man with a normal 46,XY constitution), in which the metaphase chromosomes have been stained by the Giemsa method and aligned according to the Paris Convention. (Courtesy of H Lawce and F Conte.)

From individual to individual within a single species, the pattern of staining (banding) of the entire chromosome complement is highly reproducible; nonetheless, it differs significantly between species, even those closely related. Thus, the packaging of the nucleoproteins in chromosomes of higher eukaryotes must in some way be dependent upon species-specific characteristics of the DNA molecules.

A combination of specialized staining techniques and high-resolution microscopy has allowed cytogeneticists to quite precisely map many genes to specific regions of mouse and human chromosomes. With the recent elucidation of the human and mouse genome sequences (among others), it has become clear that many of these visual mapping methods were remarkably accurate.

Coding Regions Are Often Interrupted by Intervening Sequences

The protein coding regions of DNA, the transcripts of which ultimately appear in the cytoplasm as single mRNA molecules, are usually interrupted in the eukaryotic genome by large intervening sequences of nonprotein-coding DNA. Accordingly, the primary transcripts of DNA, mRNA precursors, (originally termed hnRNA because this species of RNA was quite heterogeneous in size [length] and mostly restricted to the nucleus), contain noncoding intervening sequences of RNA that must be removed in a process which also joins together the appropriate coding segments to form the mature mRNA. Most coding sequences for a single mRNA are interrupted in the genome (and thus in the primary transcript) by at least one—and in some cases as many as 50—noncoding intervening sequences (introns). In most cases, the introns are much longer than the coding regions (exons). The processing of the primary transcript, which involves precise removal of introns and splicing of adjacent exons, is described in Chapter 36.

The function of the intervening sequences, or introns, is not totally clear. Introns may serve to separate functional domains (exons) of coding information in a form that permits genetic rearrangement by recombination to occur more rapidly than if all coding regions for a given genetic function were contiguous. Such an enhanced rate of genetic rearrangement of functional domains might allow more rapid evolution of biologic function. In some instances other protein or noncoding RNAs are localized within the intronic DNA of certain genes (Chapter 34). The relationships among chromosomal DNA, gene clusters on the chromosome, the exon–intron structure of genes, and the final mRNA product are illustrated in Figure 35–7.

Image

FIGURE 35–7 The relationship between chromosomal DNA and mRNA. The human haploid DNA complement of 3 × 109 bp is distributed between 23 chromosomes. Genes are often clustered on these chromosomes. An average gene is Image bp in length, including the regulatory region (red-hatched area), which is usually located at the 5′ end of the gene. The regulatory region is shown here as being adjacent to the transcription initiation site (arrow). Most eukaryotic genes have alternating exons and introns. In this example, there are nine exons (blue colored areas) and eight introns (green colored areas). The introns are removed from the primary transcript by the processing reactions, and the exons are ligated together in sequence to form the mature mRNA. (nt, nucleotides.)

MUCH OF THE MAMMALIAN GENOME APPEARS REDUNDANT & MUCH IS NOT HIGHLY TRANSCRIBED

The haploid genome of each human cell consists of 3 × 109 bp of DNA subdivided into 23 chromosomes. The entire haploid genome contains sufficient DNA to code for nearly 1.5 million average-sized genes. However, studies of mutation rates and of the complexities of the genomes of higher organisms strongly suggest that humans have significantly fewer than 100,000 proteins encoded by the ~1% of the human genome that is composed of exonic DNA. Indeed current estimates suggest there are 25,000 or less protein-coding genes in humans. This implies that most of the DNA is nonprotein-coding—that is, its information is never translated into an amino acid sequence of a protein molecule. Certainly, some of the excess DNA sequences serve to regulate the expression of genes during development, differentiation, and adaptation to the environment, either by serving as binding sites for regulatory proteins or by encoding regulatory RNAs (ie, miRNAs and ncRNAs). Some excess clearly makes up the intervening sequences or introns (24% of the total human genome) that split the coding regions of genes, and another portion of the excess appears to be composed of many families of repeated sequences for which clear functions have not yet been defined, though some small RNAs transcribed from these repeats can modulate transcription, either directly by interacting with the transcription machinery or indirectly by affecting the activity of the chromatin template. A summary of the salient features of the human genome is presented in Chapter 39. Interestingly, the ENCODE Project Consortium (Chapter 39) has shown that for the 1% of the genome studied most of the genomic sequence was indeed transcribed at a low rate. Further research will elucidate the role(s) played by such transcripts.

The DNA in a eukaryotic genome can be divided into different “sequence classes.” These are unique-sequence DNA, or nonrepetitive DNA and repetitive-sequence DNA. In the haploid genome, unique-sequence DNA generally includes the single copy genes that code for proteins. The repetitive DNA in the haploid genome includes sequences that vary in copy number from 2 to as many as 107 copies per cell.

More Than Half the DNA in Eukaryotic Organisms Is in Unique or Nonrepetitive Sequences

This estimation (and the distribution of repetitive-sequence DNA) is based on a variety of DNA–RNA hybridization techniques and, more recently, on direct DNA sequencing. Similar techniques are used to estimate the number of active genes in a population of unique-sequence DNA. In brewers’ yeast (Saccharomyces cerevisiae, a lower eukaryote), about two-thirds of its 6200 genes are expressed, but only ~1/5 are required for viability under laboratory growth conditions. In typical, tissues in a higher eukaryote (eg, mammalian liver and kidney), between 10,000 and 15,000 genes are actively expressed. Different combinations of genes are expressed in each tissue, of course, and how this is accomplished is one of the major unanswered questions in biology.

In Human DNA, at Least 30% of the Genome Consists of Repetitive Sequences

Repetitive-sequence DNA can be broadly classified as moderately repetitive or as highly repetitive. The highly repetitive sequences consist of 5–500 base pair lengths repeated many times in tandem. These sequences are often clustered in centromeres and telomeres of the chromosome and some are present in about 1–10 million copies per haploid genome. The majority of these sequences are transcriptionally inactive and some of these sequences play a structural role in the chromosome (Figure 35–5; see Chapter 39).

The moderately repetitive sequences, which are defined as being present in numbers of less than 106 copies per haploid genome, are not clustered but are interspersed with unique sequences. In many cases, these long interspersed repeats are transcribed by RNA polymerase II and contain caps indistinguishable from those on mRNA.

Depending on their length, moderately repetitive sequences are classified as long interspersed repeat sequences (LINEs) or short interspersed repeat sequences (SINEs). Both types appear to be retroposons; that is, they arose from movement from one location to another (transposition) through an RNA intermediate by the action of reverse transcriptase that transcribes an RNA template into DNA. Mammalian genomes contain 20,000–50,000 copies of the 6–7 kbp LINEs. These represent species-specific families of repeat elements. SINEs are shorter (70–300 bp), and there may be more than 100,000 copies per genome. Of the SINEs in the human genome, one family, the Alu family, is present in about 500,000 copies per haploid genome and accounts for ~10% of the human genome. Members of the human Alu family and their closely related analogs in other animals are transcribed as integral components of mRNA precursors or as discrete RNA molecules, including the well-studied 4.5S RNA and 7S RNA. These particular family members are highly conserved within a species as well as between mammalian species. Components of the short interspersed repeats, including the members of the Alu family, may be mobile elements, capable of jumping into and out of various sites within the genome (see below). These transposition events can have disastrous results, as exemplified by the insertion of Alu sequences into a gene, which, when so mutated, causes neurofibromatosis. Additionally, Alu B1 and B2 SINE RNAs have been shown to regulate mRNA production at the levels of transcription and mRNA splicing.

Microsatellite Repeat Sequences

One category of repeat sequences exists as both dispersed and grouped tandem arrays. The sequences consist of 2–6 bp repeated up to 50 times. These microsatellite sequences most commonly are found as dinucleotide repeats of AC on one strand and TG on the opposite strand, but several other forms occur, including CG, AT, and CA. The AC repeat sequences occur at 50,000–100,000 locations in the genome. At any locus, the number of these repeats may vary on the two chromosomes, thus providing heterozygosity of the number of copies of a particular microsatellite number in an individual. This is a heritable trait, and because of their number and the ease of detecting them using the polymerase chain reaction (PCR) (Chapter 39), such repeats are useful in constructing genetic linkage maps. Most genes are associated with one or more microsatellite markers, so the relative position of genes on chromosomes can be assessed, as can the association of a gene with a disease. Using PCR, a large number of family members can be rapidly screened for a certain microsatellite polymorphism. The association of a specific polymorphism with a gene in affected family members—and the lack of this association in unaffected members—may be the first clue about the genetic basis of a disease.

Trinucleotide sequences that increase in number (microsatellite instability) can cause disease. The unstable p(CGG)n repeat sequence is associated with the fragile X syndrome. Other trinucleotide repeats that undergo dynamic mutation (usually an increase) are associated with Huntington’s chorea (CAG), myotonic dystrophy (CTG), spinobulbar muscular atrophy (CAG), and Kennedy disease (CAG).

ONE PERCENT OF CELLULAR DNA IS IN MITOCHONDRIA

The majority of the polypeptides in mitochondria (about 54 out of 67) are coded by nuclear genes, while the rest are coded by genes found in mitochondrial (mt) DNA. Human mitochondria contain 2–10 copies of a small circular dsDNA molecule that makes up approximately 1% of total cellular DNA. This mtDNA codes for mt-specific ribosomal and transfer RNAs and for 13 proteins that play key roles in the respiratory chain (Chapter 13). The linearized structural map of the human mitochondrial genes is shown in Figure 35–8. Some of the features of mtDNA are shown in Table 35–3.

Image

FIGURE 35–8 Maps of human mitochondrial genes. The maps represent the so-called heavy (upper strand) and light (lower map) strands of linearized mitochondrial (mt) DNA, showing the genes for the subunits of NADH-coenzyme Q oxidoreductase (ND1 through ND6), cytochrome c oxidase (CO1 through CO3), cytochrome b (CYT B), and ATP synthase (ATPase 8 and 6) and for the 12S and 16S ribosomal mt rRNAs. The transfer RNAs are denoted by small blue boxes. The origin of heavy-strand (OH) and light-strand (OL) replication and the promoters for the initiation of heavy-strand (PH1 and PH2) and light-strand (PL) transcription are indicated by arrows. (Reproduced, with permission, from Moraes CT et al: Mitochondrial DNA deletions in progressive external ophthalmoplegia and Kearns-Sayre syndrome. N Engl J Med 1989;320:1293. Copyright ©1989. Massachusetts Medical Society. All rights reserved.)

TABLE 35–3 Major Features of Human Mitochondrial DNA

Image

An important feature of human mitochondrial mtDNA is that—because all mitochondria are contributed by the ovum during zygote formation—it is transmitted by maternal nonmendelian inheritance. Thus, in diseases resulting from mutations of mtDNA, an affected mother would in theory pass the disease to all of her children but only her daughters would transmit the trait. However, in some cases, deletions in mtDNA occur during oogenesis and thus are not inherited from the mother. A number of diseases have now been shown to be due to mutations of mtDNA. These include a variety of myopathies, neurologic disorders, and some cases of diabetes mellitus.

GENETIC MATERIAL CAN BE ALTERED & REARRANGED

An alteration in the sequence of purine and pyrimidine bases in a gene due to a change—a removal or an insertion—of one or more bases may result in an altered gene product. Such alteration in the genetic material results in a mutation whose consequences are discussed in detail in Chapter 37.

Chromosomal Recombination Is One Way of Rearranging Genetic Material

Genetic information can be exchanged between similar or homologous chromosomes. The exchange, or recombination event, occurs primarily during meiosis in mammalian cells and requires alignment of homologous metaphase chromosomes, an alignment that almost always occurs with great exactness. A process of crossing over occurs as shown in Figure 35–9. This usually results in an equal and reciprocal exchange of genetic information between homologous chromosomes. If the homologous chromosomes possess different alleles of the same genes, the crossover may produce noticeable and heritable genetic linkage differences. In the rare case where the alignment of homologous chromosomes is not exact, the crossing over or recombination event may result in an unequal exchange of information. One chromosome may receive less genetic material and thus a deletion, while the other partner of the chromosome pair receives more genetic material and thus an insertion or duplication (Figure 35–9). Unequal crossing over does occur in humans, as evidenced by the existence of hemoglobins designated Lepore and anti-Lepore (Figure 35–10). The farther apart two sequences are on an individual chromosome, the greater the likelihood of a crossover recombination event. This is the basis for genetic mapping methods. Unequal crossover affects tandem arrays of repeated DNAs whether they are related globin genes, as in Figure 35–10, or more abundant repetitive DNA. Unequal crossover through slippage in the pairing can result in expansion or contraction in the copy number of the repeat family and may contribute to the expansion and fixation of variant members throughout the repeat array.

Image

FIGURE 35–9 The process of crossing over between homologous metaphase chromosomes to generate recombinant chromosomes. See also Figure 35–12.

Image

FIGURE 35–10 The process of unequal crossover in the region of the mammalian genome that harbors the structural genes encoding hemoglobins and the generation of the unequal recombinant products hemoglobin delta-beta Lepore and beta-delta anti-Lepore. The examples given show the locations of the crossover regions within amino acid coding regions of the indicated genes (ie, β and δ globin genes). (Redrawn and reproduced, with permission, from Clegg JB, Weatherall DJ: β0 Thalassemia: time for a reappraisal? Lancet 1974;2:133. Copyright © 1974. Reprinted with permission from Elsevier.)

Chromosomal Integration Occurs with Some Viruses

Some bacterial viruses (bacteriophages) are capable of recombining with the DNA of a bacterial host in such a way that the genetic information of the bacteriophage is incorporated in a linear fashion into the genetic information of the host. This integration, which is a form of recombination, occurs by the mechanism illustrated in Figure 35–11. The backbone of the circular bacteriophage genome is broken, as is that of the DNA molecule of the host; the appropriate ends are resealed with the proper polarity. The bacteriophage DNA is figuratively straightened out (“linearized”) as it is integrated into the bacterial DNA molecule—frequently a closed circle as well. The site at which the bacteriophage genome integrates or recombines with the bacterial genome is chosen by one of two mechanisms. If the bacteriophage contains a DNA sequence homologous to a sequence in the host DNA molecule, then a recombination event analogous to that occurring between homologous chromosomes can occur. However, some bacteriophages synthesize proteins that bind specific sites on bacterial chromosomes to a nonhomologous site characteristic of the bacteriophage DNA molecule. Integration occurs at the site and is said to be “site specific.”

Image

FIGURE 35–11 The integration of a circular genome from a virus (with genes A, B, and C) into the DNA molecule of a host (with genes 1 and 2) and the consequent ordering of the genes.

Many animal viruses, particularly the oncogenic viruses—either directly or, in the case of RNA viruses such as HIV that causes AIDS, their DNA transcripts generated by the action of the viral RNA-dependent DNA polymerase, or reverse transcriptase—can be integrated into chromosomes of the mammalian cell. The integration of the animal virus DNA into the animal genome generally is not “site specific” but does display site preferences.

Transposition Can Produce Processed Genes

In eukaryotic cells, small DNA elements that clearly are not viruses are capable of transposing themselves in and out of the host genome in ways that affect the function of neighboring DNA sequences. These mobile elements, sometimes called “jumping DNA,” or jumping genes, can carry flanking regions of DNA and, therefore, profoundly affect evolution. As mentioned above, the Alu family of moderately repeated DNA sequences has structural characteristics similar to the termini of retroviruses, which would account for the ability of the latter to move into and out of the mammalian genome.

Direct evidence for the transposition of other small DNA elements into the human genome has been provided by the discovery of “processed genes” for immunoglobulin molecules, α-globin molecules, and several others. These processed genes consist of DNA sequences identical or nearly identical to those of the messenger RNA for the appropriate gene product. That is, the 5′-nontranslated region, the coding region without intron representation, and the 3′ poly(A) tail are all present contiguously. This particular DNA sequence arrangement must have resulted from the reverse transcription of an appropriately processed messenger RNA molecule from which the intron regions had been removed and the poly(A) tail added. The only recognized mechanism this reverse transcript could have used to integrate into the genome would have been a transposition event. In fact, these “processed genes” have short terminal repeats at each end, as do known transposed sequences in lower organisms. In the absence of their transcription and thus genetic selection for function, many of the processed genes have been randomly altered through evolution so that they now contain nonsense codons that preclude their ability to encode a functional, intact protein (see Chapter 37). Thus, they are referred to as “pseudogenes.”

Gene Conversion Produces Rearrangements

Besides unequal crossover and transposition, a third mechanism can effect rapid changes in the genetic material. Similar sequences on homologous or nonhomologous chromosomes may occasionally pair up and eliminate any mismatched sequences between them. This may lead to the accidental fixation of one variant or another throughout a family of repeated sequences and thereby homogenize the sequences of the members of repetitive DNA families. This latter process is referred to as gene conversion.

Sister Chromatids Exchange

In diploid eukaryotic organisms such as humans, after cells progress through the S phase they contain a tetraploid content of DNA. This is in the form of sister chromatids of chromosome pairs (Figure 35–6). Each of these sister chromatids contains identical genetic information since each is a product of the semiconservative replication of the original parent DNA molecule of that chromosome. Crossing over can occur between these genetically identical sister chromatids. Of course, these sister chromatid exchanges (Figure 35–12) have no genetic consequence as long as the exchange is the result of an equal crossover.

Image

FIGURE 35–12 Sister chromatid exchanges between human chromosomes. The exchanges are detectable by Giemsa staining of the chromosomes of cells replicated for two cycles in the presence of bromodeoxyuridine. The arrows indicate some regions of exchange. (Courtesy of S Wolff and J Bodycote.)

Immunoglobulin Genes Rearrange

In mammalian cells, some interesting gene rearrangements occur normally during development and differentiation. For example, in mice the VL and CL genes for a single immunoglobulin molecule (see Chapter 38) are widely separated in the germ line DNA. In the DNA of a differentiated immunoglobulin-producing (plasma) cell, the same VL and CL genes have been moved physically closer together in the genome and into the same transcription unit. However, even then, this rearrangement of DNA during differentiation does not bring the VL and CL genes into contiguity in the DNA. Instead, the DNA contains an interspersed or interruption sequence of about 1200 bp at or near the junction of the V and C regions. The interspersed sequence is transcribed into RNA along with the VL and CL genes, and the interspersed information is removed from the RNA during its nuclear processing (Chapters 36 and 38).

DNA SYNTHESIS & REPLICATION ARE RIGIDLY CONTROLLED

The primary function of DNA replication is understood to be the provision of progeny with the genetic information possessed by the parent. Thus, the replication of DNA must be complete and carried out in such a way as to maintain genetic stability within the organism and the species. The process of DNA replication is complex and involves many cellular functions and several verification procedures to ensure fidelity in replication. About 30 proteins are involved in the replication of the Escherichia coli chromosome, and this process is more complex in eukaryotic organisms. The first enzymologic observations on DNA replication were made by Arthur Kornberg, who described in E coli the existence of an enzyme now called DNA polymerase I. This enzyme has multiple catalytic activities, a complex structure, and a requirement for the triphosphates of the four deoxyribonucleosides of adenine, guanine, cytosine, and thymine. The polymerization reaction catalyzed by DNA polymerase I of E coli has served as a prototype for all DNA polymerases of both prokaryotes and eukaryotes, even though it is now recognized that the major role of this polymerase is proofreading and repair.

In all cells, replication can occur only from a single-stranded DNA (ssDNA) template. Therefore, mechanisms must exist to target the site of initiation of replication and to unwind the dsDNA in that region. The replication complex must then form. After replication is complete in an area, the parent and daughter strands must re-form dsDNA. In eukaryotic cells, an additional step must occur. The dsDNA must re-form the chromatin structure, including nucleosomes, that existed prior to the onset of replication. Although this entire process is not completely understood in eukaryotic cells, replication has been quite precisely described in prokaryotic cells, and the general principles are the same in both. The major steps are listed in Table 35–4, illustrated in Figure 35–13, and discussed, in sequence, below. A number of proteins, most with specific enzymatic action, are involved in this process (Table 35–5).

TABLE 35–4 Steps Involved in DNA Replication in Eukaryotes

Image

Image

FIGURE 35–13 Steps involved in DNA replication. This figure describes DNA replication in an E coli cell, but the general steps are similar in eukaryotes. A specific interaction of a protein (the dnaA protein) to the origin of replication (oriC) results in local unwinding of DNA at an adjacent A+T-rich region. The DNA in this area is maintained in the single-strand conformation (ssDNA) by single-strand-binding proteins (SSBs). This allows a variety of proteins, including helicase, primase, and DNA polymerase, to bind and to initiate DNA synthesis. The replication fork proceeds as DNA synthesis occurs continuously (long red arrow) on the leading strand and discontinuously (short black arrows) on the lagging strand. The nascent DNA is always synthesized in the 5′ to 3′ direction, as DNA polymerases can add a nucleotide only to the 3′ end of a DNA strand.

TABLE 35–5 Classes of Proteins Involved in Replication

Image

The Origin of Replication

At the origin of replication (ori), there is an association of sequence-specific dsDNA-binding proteins with a series of direct repeat DNA sequences. In bacteriophage λ, the oriλ is bound by the λ-encoded O protein to four adjacent sites. In E coli, the oriC is bound by the protein dnaA. In both cases, a complex is formed consisting of 150–250 bp of DNA and multimers of the DNA-binding protein. This leads to the local denaturation and unwinding of an adjacent A+T-rich region of DNA. Functionally similar autonomously replicating sequences (ARS) or replicators have been identified in yeast cells. The ARS contains a somewhat degenerate 11-bp sequence called the origin replication element (ORE). The ORE binds a set of proteins, analogous to the dnaA protein of E coli, the group of proteins is collectively called the origin recognition complex (ORC). ORC homologs have been found in all eukaryotes examined. The ORE is located adjacent to an approximately 80-bp A+T-rich sequence that is easy to unwind. This is called the DNA unwinding element (DUE). The DUE is the origin of replication in yeast and is bound by the MCM protein complex.

Consensus sequences similar to ori or ARS in structure have not been precisely defined in mammalian cells, though several of the proteins that participate in ori recognition and function have been identified and appear quite similar to their yeast counterparts in both amino acid sequence and function.

Unwinding of DNA

The interaction of proteins with ori defines the start site of replication and provides a short region of ssDNA essential for initiation of synthesis of the nascent DNA strand. This process requires the formation of a number of protein–protein and protein–DNA interactions. A critical step is provided by a DNA helicase that allows for processive unwinding of DNA. In uninfected E coli, this function is provided by a complex of dnaB helicase and the dnaC protein. Single-stranded DNA-binding proteins (SSBs) stabilize this complex. In λ phage-infected E coli, the phage protein P binds to dnaB and the P/dnaB complex binds to oriλ by interacting with the O protein. dnaB is not an active helicase when in the P/dnaB/O complex. Three E coli heat shock proteins (dnaK, dnaJ, and GrpE) cooperate to remove the P protein and activate the dnaB helicase. In cooperation with SSB, this leads to DNA unwinding and active replication. In this way, the replication of the λ phage is accomplished at the expense of replication of the host E coli cell.

Formation of the Replication Fork

A replication fork consists of four components that form in the following sequence: (1) the DNA helicase unwinds a short segment of the parental duplex DNA; (2) a primase initiates synthesis of an RNA molecule that is essential for priming DNA synthesis; (3) the DNA polymerase initiates nascent, daughter-strand synthesis; and (4) SSBs bind to ssDNA and prevent premature reannealing of ssDNA to dsDNA. These reactions are illustrated in Figure 35–13.

The DNA polymerase III enzyme (the dnaE gene product in E coli) binds to template DNA as part of a multiprotein complex that consists of several polymerase accessory factors (β, γ, δ, δ′, and τ). DNA polymerases only synthesize DNA in the 5′–3’ direction, and only one of the several different types of polymerases is involved at the replication fork. Because the DNA strands are antiparallel (Chapter 34), the polymerase functions asymmetrically. On the leading (forward) strand, the DNA is synthesized continuously. On the lagging (retrograde) strand, the DNA is synthesized in short (1–5 kb; see Figure 35–16) fragments, the so-called Okazaki fragments. Several Okazaki fragments (up to a thousand) must be sequentially synthesized for each replication fork. To ensure that this happens, the helicase acts on the lagging strand to unwind dsDNA in a 5′–3’ direction. The helicase associates with the primase to afford the latter proper access to the template. This allows the RNA primer to be made and, in turn, the polymerase to begin replicating the DNA. This is an important reaction sequence since DNA polymerases cannot initiate DNA synthesis de novo. The mobile complex between helicase and primase has been called a primosome. As the synthesis of an Okazaki fragment is completed and the polymerase is released, a new primer has been synthesized. The same polymerase molecule remains associated with the replication fork and proceeds to synthesize the next Okazaki fragment.

The DNA Polymerase Complex

A number of different DNA polymerase molecules engage in DNA replication. These share three important properties: (1) chain elongation, (2) processivity, and (3) proofreading. Chain elongation accounts for the rate (in nucleotides per second; nt/s) at which polymerization occurs. Processivity is an expression of the number of nucleotides added to the nascent chain before the polymerase disengages from the template. The proofreading function identifies copying errors and corrects them. In E coli, DNA polymerase III (pol III) functions at the replication fork. Of all polymerases, it catalyzes the highest rate of chain elongation and is the most processive. It is capable of polymerizing 0.5 Mb of DNA during one cycle on the leading strand. Pol III is a large (>1 MDa), multisubunit protein complex in E coli. DNA pol III associates with the two identical β subunits of the DNA sliding “clamp”; this association dramatically increases pol III-DNA complex stability, processivity (100 to >50,000 nucleotides) and rate of chain elongation (20–50 nt/s) generating the high degree of processivity the enzyme exhibits.

Polymerase I (pol I) and II (pol II) are mostly involved in proofreading and DNA repair. Eukaryotic cells have counterparts for each of these enzymes plus a large number of additional DNA polymerases primarily involved in DNA repair. A comparison is shown in Table 35–6.

TABLE 35–6 A Comparison of Prokaryotic and Eukaryotic DNA Polymerases

Image

In mammalian cells, the polymerase is capable of polymerizing at a rate that is somewhat slower than the rate of polymerization of deoxynucleotides by the bacterial DNA polymerase complex. This reduced rate may result from interference by nucleosomes.

Initiation & Elongation of DNA Synthesis

The initiation of DNA synthesis (Figure 35–14) requires priming by a short length of RNA, about 10–200 nucleotides long. In E coli this is catalyzed by dnaG (primase), in eukaryotes DNA Pol asynthesizes these RNA primers. The priming process involves nucleophilic attack by the 3′-hydroxyl group of the RNA primer on the phosphate of the first entering deoxynucleoside triphosphate (N in Figure 35–14) with the splitting off of pyrophosphate; this transition to DNA synthesis is catalyzed by the appropriate DNA polymerases (DNA pol III in E coli; DNA pol δ and ε in eukaryotes). The 3′-hydroxyl group of the recently attached deoxyribonucleoside monophosphate is then free to carry out a nucleophilic attack on the next entering deoxyribonucleoside triphosphate (N + 1 in Figure 35–14), again at its α phosphate moiety, with the splitting off of pyrophosphate. Of course, selection of the proper deoxyribonucleotide whose terminal 3′-hydroxyl group is to be attacked is dependent upon proper base pairing with the other strand of the DNA molecule according to the rules proposed originally by Watson and Crick (Figure 35–15). When an adenine deoxyribonucleoside monophosphoryl moiety is in the template position, a thymidine triphosphate will enter and its α phosphate will be attacked by the 3′-hydroxyl group of the deoxyribonucleoside monophosphoryl most recently added to the polymer. By this stepwise process, the template dictates which deoxyribonucleoside triphosphate is complementary and by hydrogen bonding holds it in place while the 3′-hydroxyl group of the growing strand attacks and incorporates the new nucleotide into the polymer. These segments of DNA attached to an RNA initiator component are the Okazaki fragments (Figure 35–16). In mammals, after many Okazaki fragments are generated, the replication complex begins to remove the RNA primers, to fill in the gaps left by their removal with the proper base-paired deoxynucleotide, and then to seal the fragments of newly synthesized DNA by enzymes referred to as DNA ligases.

Image

FIGURE 35–14 The initiation of DNA synthesis upon a primer of RNA and the subsequent attachment of the second deoxyribonucleoside triphosphate.

Image

FIGURE 35–15 The RNA-primed synthesis of DNA demonstrating the template function of the complementary strand of parental DNA.

Image

FIGURE 35–16 The discontinuous polymerization of deoxyribonucleotides on the lagging strand; formation of Okazaki fragments during lagging strand DNA synthesis is illustrated. Okazaki fragments are 100–250 nucleotides long in eukaryotes, 1000–2000 nucleotides in prokaryotes.

Replication Exhibits Polarity

As has already been noted, DNA molecules are double stranded and the two strands are antiparallel. The replication of DNA in prokaryotes and eukaryotes occurs on both strands simultaneously. However, an enzyme capable of polymerizing DNA in the 3′ to 5′ direction does not exist in any organism, so that both of the newly replicated DNA strands cannot grow in the same direction simultaneously. Nevertheless, the same enzyme does replicate both strands at the same time. The single enzyme replicates one strand (“leading strand”) in a continuous manner in the 5′ to 3′ direction, with the same overall forward direction. It replicates the other strand (“lagging strand”) discontinuously while polymerizing the nucleotides in short spurts of 150–250 nucleotides, again in the 5′ to 3′ direction, but at the same time it faces toward the back end of the preceding RNA primer rather than toward the unreplicated portion. This process of semidiscontinuous DNA synthesis is shown diagrammatically in Figures 35–13 and 35–16.

Formation of Replication Bubbles

Replication proceeds from a single ori in the circular bacterial chromosome, composed of roughly 5 × 106 bp of DNA. This process is completed in about 30 min, a replication rate of Image bp/min. The entire mammalian genome replicates in approximately 9 h, the average period required for formation of a tetraploid genome from a diploid genome in a replicating cell. If a mammalian genome (3 × 109 bp) replicated at the same rate as bacteria (ie, 3 × 105bp/min) from but a single ori, replication would take over 150 h! Metazoan organisms get around this problem using two strategies. First, replication is bidirectional. Second, replication proceeds from multiple origins in each chromosome (a total of as many as 100 in humans). Thus, replication occurs in both directions along all of the chromosomes, and both strands are replicated simultaneously. This replication process generates “replication bubbles” (Figure 35–17).

Image

FIGURE 35–17 The generation of “replication bubbles” during the process of DNA synthesis. The bidirectional replication and the proposed positions of unwinding proteins at the replication forks are depicted.

The multiple ori sites that serve as origins for DNA replication in eukaryotes are poorly defined except in a few animal viruses and in yeast. However, it is clear that initiation is regulated both spatially and temporally, since clusters of adjacent sites initiate replication synchronously. Replication firing, or DNA replication initiation at a replicator/ori, is influenced by a number of distinct properties of chromatin structure that are just beginning to be understood. It is clear, however, that there are more replicators and excess ORC than needed to replicate the mammalian genome within the time of a typical S-phase. Therefore, mechanisms for controlling the excess ORC-bound replicators must exist. Understanding the control of the formation and firing of replication complexes is one of the major challenges in this field.

During the replication of DNA, there must be a separation of the two strands to allow each to serve as a template by hydrogen bonding its nucleotide bases to the incoming deoxynucleoside triphosphate. The separation of the DNA double helix is promoted by SSBs in E coli, a protein termed replication protein A (RPA) in eukaryotes. These molecules stabilize the single-stranded structure as the replication fork progresses. The stabilizing proteins bind cooperatively and stoichiometrically to the single strands without interfering with the abilities of the nucleotides to serve as templates (Figure 35–13). In addition to separating the two strands of the double helix, there must be an unwinding of the molecule (once every 10 nucleotide pairs) to allow strand separation. The hexameric DNA β protein complex unwinds DNA in E coli, whereas the hexameric MCM complex unwinds eukaryotic DNA. This unwinding happens in segments adjacent to the replication bubble. To counteract this unwinding, there are multiple “swivels” interspersed in the DNA molecules of all organisms. The swivel function is provided by specific enzymes that introduce “nicks” in one strand of the unwinding double helix, thereby allowing the unwinding process to proceed. The nicks are quickly resealed without requiring energy input, because of the formation of a high-energy covalent bond between the nicked phosphodiester backbone and the nicking-sealing enzyme. The nicking-resealing enzymes are called DNA topoisomerases. This process is depicted diagrammatically in Figure 35–18 and there compared with the ATP-dependent resealing carried out by the DNA ligases. Topoisomerases are also capable of unwinding supercoiled DNA. Supercoiled DNA is a higher-ordered structure occurring in circular DNA molecules wrapped around a core, as depicted in Figures 35–2 and 35–19.

Image

FIGURE 35–18 Comparison of two types of nick-sealing reactions on DNA. The series of reactions at left is catalyzed by DNA topoisomerase I, that at right by DNA ligase; P, phosphate; R, ribose; A, adenine. (Slightly modified and reproduced, with permission, from Lehninger AL: Biochemistry, 2nd ed. Worth, 1975. Copyright © 1975 by Worth Publishers. Used, with permission, from W. H. Freeman and Company.)

Image

FIGURE 35–19 Supercoiling of DNA. A left-handed toroidal (solenoidal) supercoil, at left, will convert to a right-handed interwound supercoil, at right, when the cylindric core is removed. Such a transition is analogous to that which occurs when nucleosomes are disrupted by the high salt extraction of histones from chromatin.

There exists in one species of animal viruses (retroviruses) a class of enzymes capable of synthesizing a single-stranded and then a dsDNA molecule from a single-stranded RNA template. This polymerase, RNA-dependent DNA polymerase, or “reverse transcriptase,” first synthesizes a DNA–RNA hybrid molecule utilizing the RNA genome as a template. A specific virus-encoded nuclease, RNase H, degrades the hybridized template RNA strand, and the remaining DNA strand in turn serves as a template to form a dsDNA molecule containing the information originally present in the RNA genome of the animal virus.

Reconstitution of Chromatin Structure

There is evidence that nuclear organization and chromatin structure are involved in determining the regulation and initiation of DNA synthesis. As noted above, the rate of polymerization in eukaryotic cells, which have chromatin and nucleosomes, is slower than that in prokaryotic cells, which lack canonical nucleosomes. It is also clear that chromatin structure must be re-formed after replication. Newly replicated DNA is rapidly assembled into nucleosomes, and the preexisting and newly assembled histone octamers are randomly distributed to each arm of the replication fork. These reactions are facilitated through the actions of histone chaperone proteins working in concert with chromatin remodeling complexes.

DNA Synthesis Occurs During the S Phase of the Cell Cycle

In animal cells, including human cells, the replication of the DNA genome occurs only at a specified time during the life span of the cell. This period is referred to as the synthetic or S phase. This is usually temporally separated from the mitotic, or M phase, by nonsynthetic periods referred to as gap 1 (G1) and gap 2 (G2) phases, occurring before and after the S phase, respectively (Figure 35–20). Among other things, the cell prepares for DNA synthesis in G1 and for mitosis in G2. The cell regulates the DNA synthesis process by allowing it to occur only once per cell cycle and only at specific times in cells preparing to divide by a mitotic process.

Image

FIGURE 35–20 Progress through the mammalian cell cycle is continuously monitored via multiple cell-cycle checkpoints. DNA, chromosome, and chromosome segregation integrity is continuously monitored throughout the cell cycle. If DNA damage is detected in either the G1 or the G2 phase of the cell cycle, if the genome is incompletely replicated, or if normal chromosome segregation machinery is incomplete (ie, a defective spindle), cells will not progress through the phase of the cycle in which defects are detected. In some cases, if the damage cannot be repaired, such cells undergo programmed cell death (apoptosis).

All eukaryotic cells have gene products that govern the transition from one phase of the cell cycle to another. The cyclins are a family of proteins whose concentration increases and decreases at specific times, that is, “cycle” during the cell cycle—thus their name. The cyclins turn on, at the appropriate time, different cyclin-dependent protein kinases (CDKs) that phosphorylate substrates essential for progression through the cell cycle (Figure 35–21).For example, cyclin D levels rise in late G1 phase and allow progression beyond the start (yeast) or restriction point (mammals), the point beyond which cells irrevocably proceed into the S or DNA synthesis phase.

Image

FIGURE 35–21 Schematic illustration of the points during the mammalian cell cycle during which the indicated cyclins and cyclin-dependent kinases are activated. The thickness of the various colored lines is indicative of the extent of activity.

The D cyclins activate CDK4 and CDK6. These two kinases are also synthesized during G1 in cells undergoing active division. The D cyclins and CDK4 and CDK6 are nuclear proteins that assemble as a complex in late G1phase. The complex is an active serine-threonine protein kinase. One substrate for this kinase is the retinoblastoma (Rb) protein. Rb is a cell-cycle regulator because it binds to and inactivates a transcription factor (E2F) necessary for the transcription of certain genes (histone genes, DNA replication proteins, etc) needed for progression from G1 to S phase. The phosphorylation of Rb by CDK4 or CDK6 results in the release of E2F from Rb-mediated transcription repression—thus, gene activation ensues and cell-cycle progression takes place.

Other cyclins and CDKs are involved in different aspects of cell-cycle progression (Table 35–7). Cyclin E and CDK2 form a complex in late G1. Cyclin E is rapidly degraded, and the released CDK2 then forms a complex with cyclin A. This sequence is necessary for the initiation of DNA synthesis in S phase. A complex between cyclin B and CDK1 is rate-limiting for the G2/M transition in eukaryotic cells.

TABLE 35–7 Cyclins and Cyclin-Dependent Kinases Involved in Cell-Cycle Progression

Image

Many of the cancer-causing viruses (oncoviruses) and cancer-inducing genes (oncogenes) are capable of alleviating or disrupting the apparent restriction that normally controls the entry of mammalian cells from G1 into the S phase. From the foregoing, one might have surmised that excessive production of a cyclin, loss of a specific CDK inhibitor, or production or activation of a cyclin/CDK at an inappropriate time might result in abnormal or unrestrained cell division. In this context, it is noteworthy that the bcl oncogene associated with B-cell lymphoma appears to be the cyclin D1 gene. Similarly, the oncoproteins (or transforming proteins) produced by several DNA viruses target the Rb transcription repressor for inactivation, inducing cell division inappropriately, while inactivation of Rb, itself a tumor suppressor gene, leads to uncontrolled cell growth and tumor formation.

During the S phase, mammalian cells contain greater quantities of DNA polymerase than during the nonsynthetic phases of the cell cycle. Furthermore, those enzymes responsible for formation of the substrates for DNA synthesis—that is, deoxyribonucleoside triphosphates—are also increased in activity, and their activity will diminish following the synthetic phase until the reappearance of the signal for renewed DNA synthesis. During the S phase, the nuclear DNA is completely replicated once and only once. It seems that once chromatin has been replicated, it is marked so as to prevent its further replication until it again passes through mitosis. This process is termed replication licensing. The molecular mechanisms for this phenomenon appear to involve dissociation and/or cyclin-CDK phosphorylation and subsequent degradation of several origin binding proteins that play critical roles in replication complex formation. Consequently origins fire only once per cell cycle.

In general, a given pair of chromosomes will replicate simultaneously and within a fixed portion of the S phase upon every replication. On a chromosome, clusters of replication units replicate coordinately. The nature of the signals that regulate DNA synthesis at these levels is unknown, but the regulation does appear to be an intrinsic property of each individual chromosome that is mediated by the several replication origins contained therein.

All Organisms Contain Elaborate Evolutionarily Conserved Mechanisms to Repair Damaged DNA

Repair of damaged DNA is critical for maintaining genomic integrity and thereby preventing the propagation of mutations, either horizontally, that is DNA sequence changes in somatic cells, or vertically, where nonrepaired lesions are present in sperm or oocyte DNA and hence can be transmitted to progeny. DNA is subjected to a huge array of chemical, physical, and biological assaults on a daily basis (Table 35–8), hence recognition and repair of DNA lesions is essential. Consequently, eukaryotic cells contain five major DNA repair pathways, each of which contain multiple, sometimes shared proteins; these DNA repair proteins typically have orthologues in prokaryotes. The mechanisms of DNA repair include Nucleotide Excision Repair, NER; Mismatch Repair, MMR; Base Excision Repair, BER; Homologous Recombination, HR; and Nonhomologous End-Joining, NHEJ repair pathways (Figure 35–22). The experiment of testing the importance of many of these DNA repair proteins to human biology has been performed by nature—mutations in a large number of these genes lead to human disease (Table 35–9).Moreover, systematic gene-directed experiments with laboratory mice have clearly ascribed critical gene integrity maintenance functions to these genes as well. In the mouse genetic studies, it was observed that indeed targeted mutations within these genes induce defects in DNA repair while often also dramatically increasing susceptibility to cancer.

TABLE 35–8 Types of Damage to DNA

Image

Image

FIGURE 35–22 Mammals use multiple DNA repair pathways of variable accuracy to repair the myriad forms of DNA damage genomic DNA is subjected to. Listed are the major types of DNA damaging agents, the DNA lesions so formed (schematized and listed), the DNA repair pathway responsible for repairing the different lesions, and the relative fidelity of these pathways. (Modified, with permission, from: “DNA-Damage Response in Tissue-Specific and Cancer Stem Cells” Cell Stem Cell 8:16–29 (2011) copyright © 2011 Elsevier Inc.

TABLE 35–9 Human Diseases of DNA Damage Repair

Image

One of the most intensively studied mechanisms of DNA repair is the mechanism used to repair DNA double-strand breaks, or DSBs; these will be discussed in some detail here. There are two pathways, HR and NHEJ, that eukaryotic cells utilize to remove DSBs. The choice between the two depends upon the phase of the cell cycle (Figures 35–20 and 35-21) and the exact type of DSB breaks to be repaired (Table 35–8). During the G0/G1 phases of the cell cycle, DSBs are corrected by the NHEJ pathway, whereas during cell cycle phases S, and G2/M, HR is utilized. All steps of DNA damage repair are catalyzed by evolutionarily conserved molecules, which include DNA damage Sensors, Transducers, and damage repair Mediators. Collectively, these cascades of proteins participate in the cellular response to DNA damage. Importantly, the ultimate cellular outcomes of DNA damage and cellular attempts to repair DNA damage range from Cell-Cycle Delay to allow for DNA repair, to Cell-Cycle Arrest, to Apoptosis or Senescence (see Figure 35–23; and further detail below). The molecules involved in these complex and highly integrated processes range from damage-specific histone modifications (ie, dimethylated lysine 20 Histone H4; H4K20me2) and histone isotype variants such as histone H2AX (cf. Table 35–1), poly ADP ribose polymerase, PARP, the MRN protein complex (Mre11-Rad50-NBS1 subunits); to DNA damage-activated kinase recognition/signaling proteins [ATM (Ataxia Telangiectasia, Mutated) and ATM-related kinase, ATR, the multisubunit DNA-dependent protein kinase (DNA-PK and Ku70/80), and Checkpoint kinases 1 and 2 (CHK1, CHK2)]. These multiple kinases phosphorylate, and consequently modulate the activities of dozens of proteins, such as numerous DNA repair, checkpoint control, and cell-cycle control proteins like CDC25A, B, C, Wee1, p21, p16, and p19 [all Cyclin-CDK regulators (see Figure 9–8; and below); various exo- and endonucleases; DNA single-strand-specific DNA-binding proteins (RPA); PCNA and specific DNA polymerases (DNA pol delta,δ; and eta,η)]. Several of these (types) of proteins/enzymes have been discussed above in the context of DNA replication. DNA repair and its relationship to cell-cycle control are very active areas of research given their central roles in cell biology and potential for generating and preventing cancer.

Image

FIGURE 35–23 The multistep mechanism of DNA double-strand break repair. Shown top to bottom are the proteins (protein complexes) that: identify DSBs in genomic DNA (Sensors), transduce and amplify the recognized DNA damage (Transducers and Mediators), as well as the molecules that dictate the ultimate outcomes of the DNA damage response (Effectors). Damaged DNA can be: (a) repaired directly (DNA repair), or, via p53-mediated pathways and depending upon the severity of DNA damage and p53-activated genes induced, (b), cells can be arrested in the cell cycle by p21/WAF1 the potent CDK–cyclin complex inhibitor to allow time for extensively damaged DNA to be repaired, or (c), and (d) if the extent of DNA damage is too great to repair, cells can either apotose or senesce; both of these processes prevent the cell containing such damaged DNA from ever dividing and hence inducing cancer or other deleterious biological outcomes. (Based on: “DNA-Damage Response in Tissue-Specific and Cancer Stem Cells” Cell Stem Cell 8:16–29 (2011) copyright © 2011 Elsevier Inc.)

DNA & Chromosome Integrity Is Monitored Throughout the Cell Cycle

Given the importance of normal DNA and chromosome function to survival, it is not surprising that eukaryotic cells have developed elaborate mechanisms to monitor the integrity of the genetic material. As detailed above, a number of complex multisubunit enzyme systems have evolved to repair damaged DNA at the nucleotide sequence level. Similarly, DNA mishaps at the chromosome level are also monitored and repaired. As shown in Figure 35–20, both DNA and chromosomal integrity are continuously monitored throughout the cell cycle. The four specific steps at which this monitoring occurs have been termed checkpoint controls. If problems are detected at any of these checkpoints, progression through the cycle is interrupted and transit through the cell cycle is halted until the damage is repaired. The molecular mechanisms underlying detection of DNA damage during the G1 and G2 phases of the cycle are understood better than those operative during S and M phases.

The tumor suppressor p53, a protein of apparent MW 53 kDa on SDS-PAGE, plays a key role in both G1 and G2 checkpoint control. Normally a very unstable protein, p53 is a DNA-binding transcription factor, one of a family of related proteins (ie, p53, p63, and p73), that is somehow stabilized in response to DNA damage, perhaps by direct p53-DNA interactions. Like the histones discussed above, p53 is subject to a panoply of regulatory PTMs, all of which likely modify its multiple biological activities. Increased levels of p53 activate transcription of an ensemble of genes that collectively serve to delay transit through the cycle. One of these induced proteins, p21CIP, is a potent CDK–cyclin inhibitor (CKI) that is capable of efficiently inhibiting the action of all CDKs. Clearly, inhibition of CDKs will halt progression through the cell cycle (see Figures 35–19 and 35–20). If DNA damage is too extensive to repair, the affected cells undergo apoptosis (programmed cell death) in a p53-dependent fashion. In this case, p53 induces the activation of a collection of genes that induce apoptosis. Cells lacking functional p53 fail to undergo apoptosis in response to high levels of radiation or DNA-active chemotherapeutic agents. It may come as no surprise, then, that p53 is one of the most frequently mutated genes in human cancers. Indeed recent genomic sequencing studies of multiple tumor DNA samples suggest that over 80% of human cancers carry p53 loss of function mutations. Additional research into the mechanisms of checkpoint control will prove invaluable for the development of effective anticancer therapeutic options.

SUMMARY

Image DNA in eukaryotic cells is associated with a variety of proteins, resulting in a structure called chromatin.

Image Much of the DNA is associated with histone proteins to form a structure called the nucleosome. Nucleosomes are composed of an octamer of histones around which about 150 bp of DNA is wrapped.

Image Histones are subject to an extensive array of dynamic covalent modifications that have important regulatory consequences.

Image Nucleosomes and higher-order structures formed from them serve to compact the DNA.

Image DNA in transcriptionally active regions is relatively more sensitive to nuclease attack in vitro; some regions, so-called hypersensitive sites are exceptionally sensitive and are often found to contain transcription control sites.

Image Highly transcriptionally active DNA (genes) is often clustered in regions of each chromosome. Within these regions, genes may be separated by inactive DNA in nucleosomal structures. In eukaryotes the transcription unit—that portion of a gene that is copied by RNA polymerase—often consists of coding regions of DNA (exons) interrupted by intervening sequences of noncoding DNA (introns).

Image After transcription, during RNA processing, introns are removed and the exons are ligated together to form the mature mRNA that appears in the cytoplasm; this process is termed RNA splicing.

Image DNA in each chromosome is exactly replicated according to the rules of base pairing during the S phase of the cell cycle.

Image Each strand of the double helix is replicated simultaneously but by somewhat different mechanisms. A complex of proteins, including DNA polymerase, replicates the leading strand continuously in the 5′ to 3′ direction. The lagging strand is replicated discontinuously, in short pieces of 150–250 nucleotides, in the 3′ to 5′ direction.

Image DNA replication is initiated at special sites termed origins, or ori’s and generate replication bubbles. Each chromosome contains multiple origins. The entire process takes about 9 h in a typical human cell and only occurs during the S phase of the cell cycle.

Image A variety of mechanisms that employ different enzyme systems repair damaged cellular DNA after exposure of cells to chemical and physical mutagens.

REFERENCES

Blanpain C, Mohrin M, Sotiropoulou PA, et al: DNA-damage response in tissue-specific and cancer stem cells. Cell Stem Cell 2011;8:16–29.

Bohgaki T, Bohgaki M, Hakem R: DNA double-strand break signaling and human disorders. Genome Integr 2010;1:15–29.

Campos EL, Fillingham J, Li G, et al: The program for processing newly synthesized histones H3.1 and H4. Nat Struct Mol Biol 2010;17:1343–1351.

Campos EL, Reinberg D: Histones: annotating chromatin. Annu Rev Genet 2009;43:559–599.

Dalal Y, Furuyama T, Vermaak D, et al: Structure, dynamics, and evolution of centromeric nucleosomes. Proc Nat Academy of Sciences 2007;104:41.

Deng W, Blobel GA: Do chromatin loops provide epigenetic gene expression states? Curr Opin Genet Develop 2010;20:548–554.

Encode Consortium: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007;447:799.

Gilbert DM: In search of the holy replicator. Nature Rev Mol Cell Biol 2004;5:848.

Hakem R: DNA-damage repair; the good, the bad, and the ugly. EMBO J 2008;27:589–605.

Johnson A, O’Donnell M: Cellular DNA replicases: components and dynamics at the replication fork. Ann Rev Biochemistry 2005;74:283.

Krishnan KJ, Reeve AK, Samuels DC, et al: What causes mitochondrial DNA deletions in human cells? Nat Genet 2008;40:275.

Lander ES, Linton LM, Birren B, et al: Initial sequencing and analysis of the human genome. Nature 2001;409:860.

Luger K, Mäder AW, Richmond RK, et al: Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 1997;389:251.

Margueron R, Reinberg D: Chromatin structure and the inheritance of epigenetic information. Nat Rev Genet 2010;11:285–296.

Misteli T: Beyond the sequence: cellular organization of genome function. Cell 2007;128:787.

Misteli T, Soutoglou E: The emerging role of nuclear architecture in DNA repair and genome maintenance. Nat Rev Mol Cell Biol 2010;10:243–254.

Orr HT, Zoghbi, HY: Trinucleotide repeat disorders. Annu Rev Neurosci 2007;30:375.

Ponicsan SL, Kugel JF, Goodrich JA: Genomic gems: SINE RNAs regulate mRNA production. Curr Opin Genet Develop 2010;20:149–155.

Sullivan, Blower MD, Karpen GH: Determining centromere identity: cyclical stories and forking paths. Nat Rev Genet 2001;2:584.

Takizawa T, Meaburn KJ, Misteli T: The meaning of gene positioning. Cell 2008;135:9–13.

Talbert PB, Henikoff S: Histone variants–ancient wrap artists of the epigenome. Nat Rev Mol Cell Biol 2010;11:264–275.

Venter JC, Adams MD, Myers EW, et al: The sequence of the human genome. Science 2002;291:1304.

Zaidi SK, Young DW, Montecino MA, et al: Mitotic bookmarking of genes: a novel dimension to epigenetic control. Nat Rev Genet 2010;11:583–589.

Zilberman D, Henikoff S: Genome wide analysis of DNA methylation patterns. Development 2007;134:3959.

* So far as is possible, the discussion in this chapter and in Chapters 36, 37, and 38 will pertain to mammalian organisms, which are, of course, among the higher eukaryotes. At times it will be necessary to refer to observations in prokaryotic organisms such as bacteria and viruses, or lower eukaryotic model systems such as Drosophila, C. elegans or yeast. However, in such cases the information will be of a kind that can be extrapolated to mammalian organisms.