The Linguistics of DNA

I forget, how do we define ‘code’?


Great. A 20 year old opinion piece making the analogy between human language and genetic functions. Be still my racing heart. :expressionless:


Need I say it? Language is an analogy of DNA, but DNA is not language. The analogy breaks down. Obviously.


We can define code however we want. Just be clear on what you mean by “code” when you use it. As a warning, reality is not forced to conform to our definitions.


I agree that DNA is not a language.

Very interesting find @DaleCutler . It’d be interesting if NLP applies to DNA, along with code reverse engineering methodologies. If so, could be quite fruitful avenue of research.


Yes, this is a well explored area of computational biology. It was very fruitful, but it also hit some limits. You might want to see the work done on probabilistic context-free grammars (an NLP construct) and RNA structure. Very solid work, even though it is over a decade old.

Does not quite sound like an analogy breaking down…

Except it is an analogy that breaks down.

If you care to understand the science here, check this out:

It is closely related to much of the work I did in my PhD. Really important work too, but ultimately only of limited application because DNA does not follow a context-free grammar.

Languages do have limits.

We are not talking about the limits of language. We are talking about the limits of the analogy. You don’t think that DNA is literally a language, do you?

@DaleCutler, given some of your posts and the infamous exchange about flagellum and motors, I thought you might consider this painting:

This is Not a Pipe


The painting shows a pipe. Below it, Magritte painted, “Ceci n’est pas une pipe.” , French for “This is not a pipe.”

The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No, it’s just a representation, is it not? So if I had written on my picture ‘This is a pipe’, I’d have been lying!

— René Magritte[4]

Do you understand his point? It is precisely mine. The representation of a pipe is not a pipe itself. DNA is like a language, it might be an example of a “language” by some definition (probably misapplied), but it most definitely is not a language. In the same way that a picture of a pipe is not a pipe.

Do you understand this?

Huh, this is confusing to me. Human language is difficult for NLP precisely because it is not context free, so you have to analyze a word in a context that could perhaps span the whole text, which rapidly makes human text analysis intractable. So, my naive reading of your statement would strengthen the notion that DNA is language-like, i.e. the analogy holds. Perhaps you can clarify why not being context free makes DNA unlike human language?


The pipe analogy applies to you in the same way that a motor is not a motor.

How about this, and I mean this as a serious exercise. Can you list out the ways that DNA and, say, the English language are the same and different? I grant up front that there are some similarities. There are also many differences. Are you willing engage in this exercise with me? It should clarify what I’m getting at here.

Is there a problem with Table 1 in the paper and its explanation in that regard?