The discovery that DNA is the prime genetic molecule

The discovery that DNA is the prime genetic molecule, carrying all
the hereditary information within chromosomes, immediately
focused attention on its structure. It was hoped that knowledge
of the structure would reveal how DNA carries the genetic messages that
are replicated when chromosomes divide to produce two identical
copies of themselves. During the late 1940s and early 1950s, several
research groups in the United States and in Europe engaged in serious
efforts—both cooperative and rival—to understand how the atoms
of DNA are linked together by covalent bonds and how the resulting
molecules are arranged in three-dimensional space. Not surprisingly,
there initially were fears that DNA might have very complicated and
perhaps bizarre structures that differed radically from one gene to
another. Great relief, if not general elation, was thus expressed when the
fundamental DNA structure was found to be the double helix. It told us
that all genes have roughly the same three-dimensional form and that
the differences between two genes reside in the order and number of
their four nucleotide building blocks along the complementary strands.
Now, some 50 years after the discovery of the double helix, this simple
description of the genetic material remains true and has not had to be appreciably
altered to accommodate new findings. Nevertheless, we have
come to realize that the structure of DNA is not quite as uniform as was
first thought. For example, the chromosome of some small viruses have
single-stranded, not double-stranded, molecules. Moreover, the precise
orientation of the base pairs varies slightly from base pair to base pair in a
manner that is influenced by the local DNA sequence. Some DNA sequences
even permit the double helix to twist in the left-handed sense, as
opposed to the right-handed sense originally formulated for DNA’s general
structure. And while some DNA molecules are linear, others are circular.
Still additional complexity comes from the supercoiling (further twisting)
of the double helix, often around cores of DNA-binding proteins.
Likewise, we now realize that RNA, which at first glance appears
to be very similar to DNA, has its own distinctive structural features.
It is principally found as a single-stranded molecule. Yet by means
of intra-strand base pairing, RNA exhibits extensive double-helical
character and is capable of folding into a wealth of diverse tertiary
structures. These structures are full of surprises, such as non-classical
base pairs, base-backbone interactions, and knot-like configurations.
Most remarkable of all, and of profound evolutionary significance,
some RNA molecules are enzymes that carry out reactions that are at
the core of information transfer from nucleic acid to protein.
Clearly, the structures of DNA and RNA are richer and more intricate
than was at first appreciated. Indeed, there is no one generic structure
for DNA and RNA. As we shall see in this chapter, there are in fact variations
on common themes of structure that arise from the unique physical,
chemical, and topological properties of the polynucleotide chainThe most important feature of DNA is that it is usually composed of
two polynucleotide chains twisted around each other in the form of a
double helix (Figure 6-1). The upper part of the figure (a) presents the
structure of the double helix shown in a schematic form. Note that if
inverted 180° (for example, by turning this book upside-down), the
double helix looks superficially the same, due to the complementary
nature of the two DNA strands. The space-filling model of the double
helix, in the lower part of the figure (b), shows the components of the
DNA molecule and their relative positions in the helical structure.
The backbone of each strand of the helix is composed of alternating
sugar and phosphate residues; the bases project inward but are accessible
through the major and minor grooveset us begin by considering the nature of the nucleotide, the fundamental
building block of DNA. The nucleotide consists of a phosphate
joined to a sugar, known as 2-deoxyribose, to which a base is attached.
The phosphate and the sugar have the structures shown in Figure 6-2.
The sugar is called 2-deoxyribose because there is no hydroxyl at
position 2 (just two hydrogens). Note that the positions on the ribose
are designated with primes to distinguish them from positions on the
bases (see the discussion below).
We can think of how the base is joined to 2-deoxyribose by imagining
the removal of a molecule of water between the hydroxyl on the
1 carbon of the sugar and the base to form a glycosidic bond (Figure
6-2). The sugar and base alone are called a nucleoside. Likewise, we
can imagine linking the phosphate to 2-deoxyribose by removing a
water molecule from between the phosphate and the hydroxyl on the
5 carbon to make a 5 phosphomonoester. Adding a phosphate (or
more than one phosphate) to a nucleoside creates a nucleotide. Thus,
by making a glycosidic bond between the base and the sugar, and by
making a phosphoester bond between the sugar and the phosphoric
acid, we have created a nucleotide (Table 6-1).
Nucleotides are, in turn, joined to each other in polynucleotide
chains through the 3 hydroxyl of 2-deoxyribose of one nucleotide and
the phosphate attached to the 5 hydroxyl of another nucleotide (Figure
6-3). This is a phosphodiester linkage in which the phosphoryl group
between the two nucleotides has one sugar esterified to it through a
3 hydroxyl and a second sugar esterified to it through a 5 hydroxyl.
Phosphodiester linkages create the repeating, sugar-phosphate backbone
of the polynucleotide chain, which is a regular feature of DNA. In
contrast, the order of the bases along the polynucleotide chain is irregular.
This irregularity as well as the long length is, as we shall see, the
basis for the enormous information content of DNA.
The phosphodiester linkages impart an inherent polarity to the DNA
chain. This polarity is defined by the asymmetry of the nucleotides
and the way they are joined. DNA chains have a free 5 phosphate
or 5 hydroxyl at one end and a free 3 phosphate or 3 hydroxyl at
the other end. The convention is to write DNA sequences from the
5 end (on the left) to the 3 end, generally with a 5 phosphate and aThe bases in DNA fall into two classes, purines and pyrimidines. The
purines are adenine and guanine, and the pyrimidines are cytosine and
thymine. The purines are derived from the double-ringed structure
shown in Figure 6-4. Adenine and guanine share this essential structure
but with different groups attached. Likewise, cytosine and thymine arevariations on the single-ringed structure shown in Figure 6-4. The figure
also shows the numbering of the positions in the purine and pyrimidine
rings. The bases are attached to the deoxyribose by glycosidic linkages
at N1 of the pyrimidines or at N9 of the purines.
Each of the bases exists in two alternative tautomeric states, which
are in equilibrium with each other. The equilibrium lies far to the side
of the conventional structures shown in Figure 6-4, which are the predominant
states and the ones important for base pairing. The nitrogen
atoms attached to the purine and pyrimidine rings are in the amino
form in the predominant state and only rarely assume the imino
configuration. Likewise, the oxygen atoms attached to the guanine
and thymine normally have the keto form and only rarely take on the
enol configuration. As examples, Figure 6-5 shows tautomerization
of cytosine into the imino form (a) and guanine into the enol form (b).
As we shall see, the capacity to form an alternative tautomer is a frequent
source of errors during DNA synthesisThe double helix is composed of two polynucleotide chains that are
held together by weak, non-covalent bonds between pairs of bases, as
shown in Figure 6-3. Adenine on one chain is always paired with
thymine on the other chain and, likewise, guanine is always paired
with cytosine. The two strands have the same helical geometry but
base pairing holds them together with the opposite polarity. That is,
the base at the 5 end of one strand is paired with the base at the
3 end of the other strand. The strands are said to have an anti-parallelorientation. This anti-parallel orientation is a stereochemical consequence
of the way that adenine and thymine and guanine and cytosine
pair with each togetherThe pairing between adenine and thymine and between guanine and
cytosine results in a complementary relationship between the sequence
of bases on the two intertwined chains and gives DNA its self-encoding
character. For example, if we have the sequence 5-ATGTC-3 on one
chain, the opposite chain must have the complementary sequence
3-TACAG-5.The strictness of the rules for this “Watson-Crick” pairing derives
from the complementarity both of shape and of hydrogen bonding properties
between adenine and thymine and between guanine and cytosine
(Figure 6-6). Adenine and thymine match up so that a hydrogen bond
can form between the exocyclic amino group at C6 on adenine and the
carbonyl at C4 in thymine; and likewise, a hydrogen bond can form between
N1 of adenine and N3 of thymine. A corresponding arrangement
can be drawn between a guanine and a cytosine, so that there is both
hydrogen bonding and shape complementarity in this base pair as well.
A G:C base pair has three hydrogen bondsbecause the exocyclic NH2 at
C2 on guanine lies opposite to, and can hydrogen bond with, a carbonyl
at C2 on cytosine. Likewise, a hydrogen bond can form between N1 of
guanine and N3 of cytosine and between the carbonyl at C6 of guanine
and the exocyclic NH2 at C4 of cytosine. Watson-Crick base pairing requires
that the bases are in their preferred tautomeric states.
An important feature of the double helix is that the two base pairs
have exactly the same geometry; having an A:T base pair or a G:C base
pair between the two sugars does not perturb the arrangement of the
sugars. Neither does T:A or C:G. In other words, there is an approximately
twofold axis of symmetry that relates the two sugars and allfour base pairs can be accommodated within the same arrangement
without any distortion of the overall structure of the DNA.