I am new here and so forgive me if this has been answered.
I have heard people assert that DNA is a code, and I have heard many scientists say it is not a code and that “code” is just an analogy. So, my question is this: How is DNA not like a code? What are some of the shortcomings of using “code” as an analogy for DNA? Where does the analogy of DNA as code break down?
I can try to answer your question, but it would help to know if you have any relevant background in math/statistics or computer science? The discussion can get quite technical, but I can try to keep it simple if that helps?
I have backgrounds in data science, statistics, computer programming, sociology and anthropology. In my day job I do mostly data management, some statistical analysis and loads of computer programming.
I can likely deal with some level of being technical and if I don’t understand something I will ask.
Where I usually start with this question is to note that DNA is chemistry. It might qualify as a code because it stores information, but this code is also the material for building what is encoded; DNA is translated to proteins.
By analogy, we might write down in code the instructions for building a car and send them off to a recipient. The analogy breaks when recipient then takes the code (or copies it) and proceeds to build a car out of the ink and paper - this is unlike any other sort of practical code or language that humans would create.
(Edit: To clarify: we can write down the instructions for folding origami on a piece of paper and give it to someone else to fold the origami from that same paper. That sort of counter-example doesn’t generalize well to practical coding of arbitrary information.)
How we define what a “code” is also matters. Some will claim that DNA is a binary code, which is technically true but not very useful. We might describe anything using a binary code, including other codes, but that doesn’t make the original object a code. There should be IMO a more useful definition before we call something a code.
Here the discussion might take a turn towards Information Theory, or not, but I’ll pause for response.
So, if I understand you correctly, human codes usually convey information, directions, ideas, etc. that is not constituted by the communication medium. Blue prints are on ink and paper and someone can use them to build a car out of other materials. DNA, however, would have to be the “code” and the building material. So, this is a point where the analogy of DNA as a code breaks down. Is that correct?
We also need a better definition of code. Correct?
Shannon’s information theory paper is on my reading list. I have not gotten to it yet though so you may have to explain that part more simply.
Yes, that is generally correct. We can propose exceptions (see my edit above) but they don’t generalize to arbitrary information.
Yes. Codes usually imply a sender and receiver. Passing information from one generation of DNA to the next could qualify, but again this isn’t the way humans would use a code.
The most important thing to know here is that Information Theory has nothing to do with the contents or meaning of the message. When it comes to DNA, the meaning is a chemical message of some sort, enabling assembly of proteins. The function of those proteins is not part of the DNA code (but regulation may be). For discussion of chemical meaning you need to chat with the biochemists.
If you know anything about Maximum Likelihood theory from statistics, then you already know a lot about Shannon Information (SI) theory, because there is a strong overlap in concepts. Statistical variance is the Shannon Information of a sample. SI is about describing the bandwidth needed to communicate messages from sender A to receiver B.
When you get to Kolmogorov theory (Algorithmic Information), the concepts are again the same, but instead of bandwidth it is measuring the compressibility of messages (ex: lossless image compression or self-extracting ZIP files).
You might also want to read about Shannon-Fano Coding, which describes how to go about creating a code in practice. This again has nothing to do with the meaning of codes, but you might appreciate it as a data scientist.
I tend to think of DNA as a template rather than as a code. We humans do use templates, but we normally do not call those “codes”.
From my perspective, codes are more abstract. That is, they are further from the physical arrangements. Typically, we have a set of conventions that define the code. And then we might have different conventions that specify particular ways of implementing the code. So DNA seems too physical for me to think of it as a code. By contrasts, templates usually are physical, so thinking of DNA as a template can fit quite well. Templates do not require conventions. How they are used depends on their physical structure rather than on conventions.
Some highlights in your response are that we don’t have meaning in DNA, for example, semantic meaning. Would that be correct? Since DNA and cellular processes are simply “normal” chemical reactions, that would not constitute meaning anymore than any other chemical reaction. By “normal” I mean that it is a chemical reaction that occurs if the conditions are present and the same chemical reactions would occur in a beaker under the same conditions. Is that a fair summary?
I will add a paper on Shannon-Fano coding to my reading list.
That’s like one of those “is the flagellum a motor” kind of questions. They’re just labels and man-made categories, and we like to try to categorize things. Some times we find that objects don’t always match all aspects of the definitions and categories we use. When that is the case it seems to me we can either invent a new category for the new thing, or expand the to include the new thing.
So I’m fine with saying yes DNA(or at least certain sequences of DNA, or the genetic code) is a code, and the flagellum is a motor/machine, as I don’t think that changes anything about how DNA (or flagella) originates and evolves. Some codes are designed, some are evolved. Some machines are designed, some machines are evolved.
I don’t think the people who insist on accepting or denying that some entity belongs to some category accomplish anything in showing how it came to exist. Saying DNA is a code does not prove it didn’t evolve, and saying DNA isn’t a code does not prove it wasn’t designed.
One definition: a largely random (more accurately understood as “opaque” or arbitrary) association between two sets of symbols.
One clear example by this definition: ascii code, which associates binary numbers from 0 to 255 with alpha-numeric-etc characters, in a one to one relationship.
ASCII code is analogous to sequences of three bases corresponding to different amino acids, the genetic code. Each triplet codon is associated tightly with a particular amino acid in a one to many relationship.
But how and in what ways are ASCII and the genetic code “random” or “arbitrary”? Now that’s were the analogy starts to fall apart quite quickly. They are random and arbitrary to different degrees and in different ways. Understanding the nuances will teach anyone quite a bit about computer science and biology.
Where did the the two codes come from? What is our epistemology of their origins? Here too the cases are very different. In one case we have direct historical evidence that ascii derived from a human process. The genetic code does not derive from a human process (everyone agrees to that!) and it’s at best disputed how the code arose. Most certainly, we don’t have documentation analogous to ASCII’s history to sort out the answer.
I think the reference there is about an abstraction intrinsic to the DNA like an abstraction intrinsic to something humans made because humans created it to express or convey an abstraction. At least that is how I understood it.
Perhaps but that misses the point, or perhaps it is a category error at play.
DNA itself isn’t a code, any more than transistors are a code. ASCII, however, is implemented in machines that rely heavily on transistor based logic. In the same way, the genetic code is implemented in a DNA based system. There are other man made systems that include DNA that do not actually in include the genetic code, in the same way that computers can be built that have no use for an ascii table.
BUT when people say “DNA” we often mean that as a broader reference to the role and function of DNA within living systems. That broader definition does include the genetic code as a subset of its domain.
So DNA is not a code, and it is a code, depending which of those two definitions you mean.