Shannon information and COVID-19

Ted Schneider attempted to tie evolution and Shannon information together in a computer simulation.

As @nwrickert states earlier, I think information in this context is an abstract concept that we humans have invented to understand the whole. Even in Schneider’s work we could easily argue that there are situations where a weakening in binding between a transcription factor and a DNA binding region would actually be beneficial. We could apply the same concept to SARS-CoV-2 immune evasion where a reduction in binding between specific antibodies and the S protein increases viral fitness. On the flip side, increases in binding between viral and host cell surface proteins could be beneficial.

What I don’t agree with is that there is a meaningful connection between the types of information modeled in these papers and the actual process of entropy. I do think physical information could be real (atomic spin, velocity, mass), but I don’t think this abstract type of information is concrete nor is it part of the 2LoT.

3 Likes

Such a cool paper! Thanks.

Evolving is a process in time, and the 2LoT may be necessary for time or at least direction. So it may be asked, who’s got time for that?

3 Likes

Somewhere in my reading on IT I found mention of something which was roughly the equivalent of 2LoT for Information Theory. It applies (IIRC) to a change of information relative to some starting point, not to an absolute increase or decrease as for physical entropy. If I can find it again I may have to get it tattooed on my arm for easy reference.

1 Like

NOT what I was looking for, but worth a read:

2 Likes

As I argued here, to the extent that life is some sum of macromolecular associations and interactions, it is entirely correct to assert that life exists, not in spite of the SLoT, but because of it. (Read the whole thread, including the part before my first comment.)

2 Likes

“in the beginning was the word” doesn’t convey a meaningful message, so “vtdke pushcgvt jifmzf gtlgd” must be the one that does. What does it mean?

2 Likes

As being measured here, the greater the SIE, the more information that can be conveyed per base. With the small differences observed here, it would only take 12 additional nucleotides to get the total information back to the same level. And that’s well within the variation in genome length seen in SARS-CoV-2.

4 Likes

I wondered that too. So first I reproduced their results with the same 10 sequences.

I didn’t scale by k_b because it obviously won’t change the trend, and when I did so I got numbers on a very different scale than what the figure showed. However, the table shows the information entropy without the k_b scaling, and my numbers were exactly the same as the table.

Once I had the data, I could also see that while the mutations increased over time, it was not a fully nested set of mutations. I got the impression the “careful selection” was intended to achieve that kind of nesting, but apparently not.

For there mutational bias hypothesis (which I also wondered about), here is the distribution of changes with respect to the reference sequence.

Then I went to NCBI for more sequences. I used their filtering to restrict to complete genomes with no ambiguous characters. That still left a lot more data than I wanted to download, so I kept the 29,903 length restriction and only human samples from the oronasopharynx (no reason other than it was the only option that reduced the numbers further but left a useful sample). Could those choices introduce biases? Absolutely. I wouldn’t try to publish these results; I just want to quickly satisfy my curiosity.

Here are the same charts with this bigger sample:

The overall trends are the same. I’d guess that the number of sequences thins out over time because some of the adaptive mutations are deletions, so that exact length of 29,903 becomes less common. The number of mutations in each sequence is with respect to the same reference sequence used in the paper. I could be underestimating by overlooking sequential mutations at the same location that might be discernible with a more sophisticated analysis than just computing the number of string mismatches.

Since we’re not dealing with a strict progression increase in mutations, a more relevant trend might be directly in number of mutations and information entropy, rather than indirectly comparing them by looking at their trends over time.

4 Likes

‘We support ID and don’t care how bad our arguments are if we can get them published’.

2 Likes

Wow @AndyWalsh, that is an impressive effort! (Here, have a badge :slight_smile: )

Cumulative mutations over time should be linear, but it’s good to confirm that.

Great thought. It had not occurred to me to consider nesting.

A non-linear trend? I’m not sure what that means yet, will think on it.

1 Like

It’s probably just a bad paper.

“never attribute to malice that which is adequately explained by stupidity.”

Hanlon’s Razor

3 Likes

Oh, no, I don’t think it’s an ID paper. Apparently it is pretty bad though. It’s too bad we don’t have the authors here to explain what their methodology was and why they drew the conclusions that they did.

I have a suggestion for all. Let’s put together a concise list of criticisms and questions, and contact the authors.

A good starting point would be to summarize what’s already here.

3 Likes

Fully agree. Without entropy life couldn’t exist. It is the movement of energy from low entropy to high entropy that drives everything life does. Without a direction for energy to move along we couldn’t perform even the simplest metabolic tasks. For that matter, heat may not even reach us from the Sun.

3 Likes

Yes, that sounds fair. It doesn’t seem right to keep bashing this paper without allowing the authors to explain first.

1 Like

Thanks for your nice work. So it seems that your analyses confirm the authors conclusion, ie that the SIE of SARS2 RNA genome decreases with time, don’t they ?

My wild guess is that they are probably engineers, and not biologists. They have perhaps been paying attention to news reports of mutations with COVID. But the news reports are really of those mutations which reached near fixation, so that could be what they were looking at.

I’ve been doing some digging into the authors, and found this interesting article:

Currently, we produce ∼10^21 digital bits of information annually on Earth. Assuming a 20% annual growth rate, we estimate that after ∼350 years from now, the number of bits produced will exceed the number of all atoms on Earth, ∼10^50. After ∼300 years, the power required to sustain this digital production will exceed 18.5 × 10^15 W, i.e., the total planetary power consumption today, and after ∼500 years from now, the digital content will account for more than half Earth’s mass…
…In conclusion, we established that the incredible growth of digital information production would reach a singularity point when
there are more digital bits created than atoms on the planet.

Paging Malthus…

1 Like

I always save the contents of my bit bucket for recycling (CoSci joke).

My first CoSci instructor told many tales, including one about keeping an actual bucket in the computer cabinet so the seniors operators could tell the new guys to “empty the bit bucket.” :slight_smile:

Paging @AllenWitmerMiller

5 Likes