Shannon information and COVID-19

Can anyone help me decode this paper? The Second Law of Information Dynamics

It purports to show, using COVID-19 sequences from throughout the pandemic, that mutations consistently decrease information entropy. However, I’m not sure what exactly this means. Does it mean that the Shannon information decreases? Or does it mean that the Shannon information increases? (Is Shannon information the same as informational entropy?)

Also, is Shannon information a valid proxy for genetic “functional information”? Have any studies been done to show whether Shannon information can, for example, be used to detect protein-coding genes within the genome?

If this paper is saying that the Shannon information increases, and it is a valid proxy for “functional information,” then it would provide a refutation of the ID/creationist claim that mutations decrease functional information. But if it’s saying that Shannon information decreases, then it would seem to confirm the ID/creationist claim that mutations (on the whole) tend to decrease functional information.


This all looks rather dubious to me.

Around 1980, the IBM PC-XT came out, with a 10 Mb disk. In modern computers, the physical disk is of a similar size, but with a capacity of several terabytes. We should question the idea that information is a fixed physical thing. From my perspective (as a mathematician and computer science), information is abstract. Roughly speaking, it is a useful fiction that we find convenient to use in our theorizing about the world. But it doesn’t have any physical existence. Of course, we represent information as physical signals. But it is the physical signals that have physical existence. The information itself is still abstract (and thus fictional).

I’ll comment on the biology, but I am not a biologist.

Some creationists have argued that mutations increase entropy. And this supposedly proves YEC, because life is running down due to this entropy increase. The paper that you ask about is instead saying that mutations decrease entropy.

The response from biologists, as I understand it, is that (a) yes mutations increase entropy, but (b) natural selection reduces entropy. The paper that you reference does not mention natural selection. But the data that they are using is actually the result of both mutation and selection. They seem to be making the mistake of ignoring selection.

My take on their data would be that it is the role of selection that is mainly involved in reducing entropy. Their idea of being able to predict mutations seems mistaken.


It looks off-the-wall to me. They keep talking about “mutation” and don’t mention natural selection at all. I am hardly the only person who is sure that these substitutions (not just mutations) involve natural selection as well. I also think Shannon Information is not the right thing to keep track of. Specified information / functional information, or some equivalent, is far more relevant.


No. Shannon information measures the improbability associated with strings of characters but is unable to distinguish merely improbable sequences of characters from those that are meaningful or functional. For example, the two sequences below have the same probability, hence the same Shannon information, but only one conveys a meaningful message.

  • in the beginning was the word
  • vtdke pushcgvt jifmzf gtlgd

The paper is saying that information (or Shannon) entropy of genetic systems tends to decrease, which, as far as I can understand, means that the randomness of genetic sequences erases with time. I guess that a phenomenon such as mutational bias explains part of this outcome. In any case, the decrease of information entropy is an observation that supports the claim that mutations tend to decrease functional information.

It’s too late tonight for me to read that paper in any detail, but I an highly skeptical of any paper making such grand claims for its own importance.

No. The authors are doing something strange, measuring the Shannon Information Entropy (SIE) for the distribution of nucleotides (𝐴,𝐶,𝐺,𝑇) in the genome (Why?). First (section IV) they note the number of mutations increases over time (no surprise there), then note that SIE decreases (very slightly) over time. All this means is the distribution of (𝐴,𝐶,𝐺,𝑇) has changed a little, probably one or two nucleotides are less common and the others more. My educated guess is this is just a Regression to the Mean effect (very common, no big deal).


I think that statement is unwarranted, simply because SIE of nucleotides can’t represent any sort of biological function.


Ooooh, so that’s absolutely useless in determining the actual biological effects of these mutations. With that plus the fact that they didn’t even mention natural selection once, it’s a wonder this ever got past peer review. As an information dynamics phenomenon, it’s interesting I guess, but not useful at all in an evolutionary context.


I see that Neil Rickert and I came up the same analysis, independently, and Dan Eastwood agrees. Giltil of course came to the opposite conclusion.


Of course, all the processes involved in natural selection, and life itself, act to increase net entropy in the universe. It may be that more efficiently increasing entropy is the driver of natural selection. Order in the genes is bought at the expense of greater disorder in the universe.


I would rather think that the decrease in SIE is mostly due to mutational biases, a phenomenon that may tend to erase the randomness of genomes, hence affect functionality.
This paper by Lynch is quite relevant with respect to mutational biases.

I’m not going to heckle @Giltil over this one, I think Gil was assuming the authors meant strings of symbols, as did I until I read closely.

A technical point for Gil here; as SIE goes to zero, this implies only a single type of nucleotide remains, and no way to code functional information other than the length of the string itself. That’s probably not what Gil meant, but it is correct in that sense. :wink:

I agree a mutational bias could cause the effect the authors find, and I could speculate that a recent change, like a DNA swap with a related virus or a jump to a new host, could make that bias apparent. I seriously doubt the Lynch paper is considering the same sort of mutational bias as Vopson and Lepadatu, so let’s get on the same page before we argue over nothing. :slight_smile:

1 Like

But it happens that’s exactly what I mean. You put it well. In fact, the more a genetic system has SIE, the more information it can convey. Conversely, the less it has SIE, the less information it can convey. The problem I think is that most participants here confuse genetic entropy with Shannon information entropy (SIE). But they are not the same thing at all.

I think that Dan is the only one here who grasps what Shannon information entropy is.

I’m not sure that I came to the opposite conclusion. Until now, you contribute only one post in this conversation. Here it is:
It looks off-the-wall to me. They keep talking about “mutation” and don’t mention natural selection at all. I am hardly the only person who is sure that these substitutions (not just mutations) involve natural selection as well. I also think Shannon Information is not the right thing to keep track of. Specified information / functional information, or some equivalent, is far more relevant.

And it happens that I don’t really disagree with what you say here, and I totally agree with what you say in the last sentence.

I think that the changes in sequence reflect both mutational biases and natural selection (such as for greater infectivity and immune escape). The readiness with which the authors generalise their regression line showing decrease in Shannon information to a general law of information change is startling. What happens if, in some other case, the line goes up instead of down. Note that it is a law parallel to the Second Law, and all confidently inferred from this one case.


That’s a very interesting hypothesis. The same basic idea was also proposed in this very interesting paper:

I don’t know how, or if, it would be possible to test this hypothesis that life is a manifestation of the 2LT, but it’s certainly an interesting thought.


I’n certain that is not correct :slight_smile:

You know I was trying to throw you a line, right? You happen to be correct about a corner case with no applicability to biology, and so not relevant. Then next time I send one your way with a positive spin, catch it! :wink:

I had those same thoughts. Also, from the paper …

By searching for complete genome sequences, containing the same number of nucleotides as the reference sequence, we carefully selected variants that displayed an incremental number of SNP mutations with time, and we computed the Shannon information entropy for each variant.

What do they mean by “carefully selected?” Why not a random sample? I’m very tempted to write a letter to that journal.

1 Like

Wouldn’t you need to create a universe where the 2LoT doesn’t apply, to act as a control, and then see if life evolves in that universe?

1 Like

Hi Andrew
Here is a paper that defines biological functional information for your reference. This has been discussed extensively on this blog in the past.

1 Like

It should be enough to control the energy gradient (peak and background levels, tendency to disperse).

1 Like