Shannon information and COVID-19

T_aquaticus · September 19, 2022, 2:58pm

Ted Schneider attempted to tie evolution and Shannon information together in a computer simulation.

How do genetic systems gain information by evolutionary processes? Answering this question precisely requires a robust, quantitative measure of information. Fortunately, 50 years ago Claude Shannon defined information as a decrease in the uncertainty of a receiver. For molecular systems, uncertainty is closely related to entropy and hence has clear connections to the Second Law of Thermodynamics. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. Here this method is used to observe information gain in the binding sites for an artificial ‘protein’ in a computer simulation of evolution. The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium.
Evolution of biological information - PMC

As @nwrickert states earlier, I think information in this context is an abstract concept that we humans have invented to understand the whole. Even in Schneider’s work we could easily argue that there are situations where a weakening in binding between a transcription factor and a DNA binding region would actually be beneficial. We could apply the same concept to SARS-CoV-2 immune evasion where a reduction in binding between specific antibodies and the S protein increases viral fitness. On the flip side, increases in binding between viral and host cell surface proteins could be beneficial.

What I don’t agree with is that there is a meaningful connection between the types of information modeled in these papers and the actual process of entropy. I do think physical information could be real (atomic spin, velocity, mass), but I don’t think this abstract type of information is concrete nor is it part of the 2LoT.

misterme987 · September 19, 2022, 7:32pm

Such a cool paper! Thanks.

RonSewell · September 19, 2022, 7:32pm

Evolving is a process in time, and the 2LoT may be necessary for time or at least direction. So it may be asked, who’s got time for that?

Dan_Eastwood · September 19, 2022, 7:50pm

Somewhere in my reading on IT I found mention of something which was roughly the equivalent of 2LoT for Information Theory. It applies (IIRC) to a change of information relative to some starting point, not to an absolute increase or decrease as for physical entropy. If I can find it again I may have to get it tattooed on my arm for easy reference.

Dan_Eastwood · September 19, 2022, 8:29pm

NOT what I was looking for, but worth a read:

Art · September 19, 2022, 9:29pm

As I argued here, to the extent that life is some sum of macromolecular associations and interactions, it is entirely correct to assert that life exists, not in spite of the SLoT, but because of it. (Read the whole thread, including the part before my first comment.)

Roy · September 20, 2022, 1:22pm

“in the beginning was the word” doesn’t convey a meaningful message, so “vtdke pushcgvt jifmzf gtlgd” must be the one that does. What does it mean?

AndyWalsh · September 20, 2022, 1:27pm

As being measured here, the greater the SIE, the more information that can be conveyed per base. With the small differences observed here, it would only take 12 additional nucleotides to get the total information back to the same level. And that’s well within the variation in genome length seen in SARS-CoV-2.

AndyWalsh · September 20, 2022, 1:31pm

I wondered that too. So first I reproduced their results with the same 10 sequences.

I didn’t scale by k_b because it obviously won’t change the trend, and when I did so I got numbers on a very different scale than what the figure showed. However, the table shows the information entropy without the k_b scaling, and my numbers were exactly the same as the table.

Once I had the data, I could also see that while the mutations increased over time, it was not a fully nested set of mutations. I got the impression the “careful selection” was intended to achieve that kind of nesting, but apparently not.

For there mutational bias hypothesis (which I also wondered about), here is the distribution of changes with respect to the reference sequence.

Then I went to NCBI for more sequences. I used their filtering to restrict to complete genomes with no ambiguous characters. That still left a lot more data than I wanted to download, so I kept the 29,903 length restriction and only human samples from the oronasopharynx (no reason other than it was the only option that reduced the numbers further but left a useful sample). Could those choices introduce biases? Absolutely. I wouldn’t try to publish these results; I just want to quickly satisfy my curiosity.

Here are the same charts with this bigger sample:

The overall trends are the same. I’d guess that the number of sequences thins out over time because some of the adaptive mutations are deletions, so that exact length of 29,903 becomes less common. The number of mutations in each sequence is with respect to the same reference sequence used in the paper. I could be underestimating by overlooking sequential mutations at the same location that might be discernible with a more sophisticated analysis than just computing the number of string mismatches.

Since we’re not dealing with a strict progression increase in mutations, a more relevant trend might be directly in number of mutations and information entropy, rather than indirectly comparing them by looking at their trends over time.

Roy · September 20, 2022, 1:33pm

‘We support ID and don’t care how bad our arguments are if we can get them published’.

Dan_Eastwood · September 20, 2022, 1:47pm

Wow @AndyWalsh, that is an impressive effort! (Here, have a badge )

Cumulative mutations over time should be linear, but it’s good to confirm that.

Great thought. It had not occurred to me to consider nesting.

A non-linear trend? I’m not sure what that means yet, will think on it.

Dan_Eastwood · September 20, 2022, 1:50pm

It’s probably just a bad paper.

“never attribute to malice that which is adequately explained by stupidity.”

Hanlon’s Razor

misterme987 · September 20, 2022, 1:58pm

Oh, no, I don’t think it’s an ID paper. Apparently it is pretty bad though. It’s too bad we don’t have the authors here to explain what their methodology was and why they drew the conclusions that they did.

Dan_Eastwood · September 20, 2022, 2:05pm

I have a suggestion for all. Let’s put together a concise list of criticisms and questions, and contact the authors.

A good starting point would be to summarize what’s already here.

T_aquaticus · September 20, 2022, 2:36pm

Fully agree. Without entropy life couldn’t exist. It is the movement of energy from low entropy to high entropy that drives everything life does. Without a direction for energy to move along we couldn’t perform even the simplest metabolic tasks. For that matter, heat may not even reach us from the Sun.

misterme987 · September 20, 2022, 2:36pm

Yes, that sounds fair. It doesn’t seem right to keep bashing this paper without allowing the authors to explain first.

Giltil · September 20, 2022, 3:30pm

Thanks for your nice work. So it seems that your analyses confirm the authors conclusion, ie that the SIE of SARS2 RNA genome decreases with time, don’t they ?

nwrickert · September 20, 2022, 5:59pm

My wild guess is that they are probably engineers, and not biologists. They have perhaps been paying attention to news reports of mutations with COVID. But the news reports are really of those mutations which reached near fixation, so that could be what they were looking at.

Roy · September 21, 2022, 3:24pm

I’ve been doing some digging into the authors, and found this interesting article:

Currently, we produce ∼10^21 digital bits of information annually on Earth. Assuming a 20% annual growth rate, we estimate that after ∼350 years from now, the number of bits produced will exceed the number of all atoms on Earth, ∼10^50. After ∼300 years, the power required to sustain this digital production will exceed 18.5 × 10^15 W, i.e., the total planetary power consumption today, and after ∼500 years from now, the digital content will account for more than half Earth’s mass…
…In conclusion, we established that the incredible growth of digital information production would reach a singularity point when
there are more digital bits created than atoms on the planet.

Paging Malthus…

Dan_Eastwood · September 21, 2022, 3:41pm

I always save the contents of my bit bucket for recycling (CoSci joke).

My first CoSci instructor told many tales, including one about keeping an actual bucket in the computer cabinet so the seniors operators could tell the new guys to “empty the bit bucket.”

Paging @AllenWitmerMiller

Topic		Replies	Views
Define "information"? Creationists aren't even willing to define it Conversation Science , Communication	161	4388	November 2, 2023
Durston: Functional Information Office Hours Design	63	8255	December 5, 2018
Stern Cardinale: Response to Price, Carter, and Sanford on Genetic Entropy Conversation Science	99	2514	December 16, 2020
Explaining the shape of a typical COVID 19 epidemic curve Conversation Science	80	1894	August 12, 2020
Lessons from the pandemic: A new look at an new virus: patterns of mutation accumulation in SARS-CoV-2 since 2019 Conversation Science , Design	124	2638	August 29, 2022

Shannon information and COVID-19

Related topics