Information in DNA?
Deoxyribonucleic acid (DNA), is a gigantic polymer with anywhere from thousands to hundreds of millions of nucleotides (220 million in Human Chromosome 1 or 16,000 for our mitochondria) with the vast majority of the genome serving no known function. One of the often missteps made when trying to explain DNA to others (I am also guilty of it) is to use the analogy of binary information, only expanded to quaternary. Each “bit” appears as one of the four basic nucleotides: Adenine, Thymine, Guanine, or Cytosine. Each “bit” is part of the “information used to create a protein.” As we shall see, this is far from the case.
DNA Sequences vs Information
DNA is a molecule which does not “code” for anything. This statement may come as quite a shock to many people, but allow me to explain before I receive thousands of “HOW CAN YOU SAY THAT!?!??!!” e-mails and letters. In order to code for something, it must be in raw form and then encoded. Since DNA is never encoded in the first place, and simply serves as a template for additional molecules, it is not coded. We may perceive the sequence as code, and we may also explain it as such, but the sequence is not a code since the sequence (when replicated into RNA) can serve as a functional unit itself. This is the case of ribozymes such as RNase P and many others. If DNA is a code, how is it the functional unit without being “decoded?”
The answer to the previous question is simple: DNA sequences are not coded. The illusion of a “code” stems from individuals trying to figure out ways to predict which proteins would result from a given sequence. We now know that not all DNA results in proteins, which gave rise to another misconception: “Junk DNA.” Researchers originally thought “this junk does not code for anything, so it is useless!” To continue to consider this the case is to fundamentally miss the point of many more recent findings. Namely that DNA does not have to be converted into protein to be functionally useful. Consider the dozens of types of regulatory roles these sequences also serve. Simply because we egocentric humans use sequences to transmit messages, this does not mean DNA, as a repeating sequence is used to convey any sort of information. Our preconceptions of “codes” makes this method of explanation quite simple, and as any college professor can attest, the simplest myths are the hardest to dispel.
So what does this have to do with science education, and why is it important? In case you’re unfamiliar with the various tactics of creationists, intelligent design proponents, and the general “let’s try to misrepresent as much biology as we can” crowds, let me explain. Although first I must ask a question of you: what rock have you been under the past few decades? The idea that DNA “codes” for something is simply the only way we can express quickly the idea that sequence X is used as a template for an RNA which is then, in turn, used as a template for a protein or used as a ribozyme. It is much faster to describe specific sequences as genes and to treat these as units leading to a protein or ribozyme product to save years of describing the actual mechanisms of each.
So what about “codons?”
Codons are the functional groups of RNA when being used to synthesize a protein sequence. There are 64 possible combinations (43) which match up with a specific tRNA carrying an amino acid and an “anti-codon” which is the complementary sequence to the three-base codon. These codons, however, are not a naturally occurring unit. For example, if a single nucleotide is deleted from the sequence, a frameshift mutation results in which the rest of the codons are different from the parent sequence.
Parent: AUG GCG GGU CAC UGC CAG UGA CXX XXX XXX
Mutant:AUG GCG GUC ACU GCC AGU GAC XXX XX
The resulting protein sequence would be very different from the parent sequence.
Parent: Met Ala Gly His Cys Gln STOP
Mutant: Met Ala Val Thr Ala Asp…
Other types of mutations result in other alterations to the amino acid sequence, but the fundamental part which must be emphasized is that this sequence does not carry any “information” within it. The wording is only used because it is the easiest mechanism of explanation. We, as (arguably) Homo sapiens sapiens, find it difficult to distinguish patterns made by us to communicate from patterns found in nature as a means of heredity. This is the fundamental source of using verbiage used in linguistics within genetics. It does not represent reality, but it is useful as a metaphor. Those interested in these subjects must remember that scientists often must speak in metaphor since reality is not required to fit into our language.