Comments on Gpuccio: Functional Information Methodology

You are assuming that there is no FI in the starting sequence. He is saying change in FI.

Indeed. I am particularly struck by what appears to be an unwillingness to make a serious effort to accurately estimate the prevalence of even a single function in sequence space, which is an essential factor in every mathematical assertion @gpuccio and @Giltil are making.

Hereā€™s another actual biological function for which FI is meaningless: the function of MHC (major histocompatibility complex) is as a necessary component for the immune systemā€™s ability to distinguish self from nonself. IOW, its function is literally to be different. Thereā€™s no way that I can see to calculate FI for that.

2 Likes

So how do the FI calculations for ubiquitin factor in the FI of the proteins that preceded ubiquitin?

2 Likes

Of course not. Most of the FI is already there. You only add what is missing.

If you had read my statements about random walks, you would know that FI expresses the probability of finding the target by a random walk from an unrelated state.

1 Like

But isnā€™t it then obvious that you donā€™t actually know whether any of the proteins you use as examples really contain those 500 bits? When you say they do, you are essentially claiming to know that there are no other proteins they evolved from, otherwise you would have to say most of the FI was already there because most of the sequence was present in the ancestral state.

3 Likes

I see the safe analogy has made another appearance. Unfortunately, the bit about the large safeā€™s handle turning more and more as the thief gets closer to finding the right combination has been omitted.

2 Likes

Thatā€™s not found in any of the calculations I have seen. Instead, the FI calculation focuses solely on the function that emerges by comparing the variation of that gene in different lineages. I have yet to see an FI calculation that factors in ancestral proteins that had a different function.

What is an ā€œunrelated stateā€ in this scenario? If we have a protein with no ubiquitin function that gains ubiquitin function through a single mutation would that meet your standards? Would the change in FI in this scenario be 4.3 bits?

3 Likes

In fact, I have revised my estimation of FIb and now think it is lower, around 50 bits. But this still remains a pretty large amount of FI that the IS cannot produced by a random walk. In order to produce these 50 bits of FI, the IS resorts to the RV + NS mechanism.

The question now is why the RV + NS is able to produce high FI in a few weeks during the process of somatic hypermutation (SHM). The thing to see here is that the path from an antibody with low affinity for the antigen to an antibody with high affinity for this antigen consists of a succession of discrete selection steps. Moreover, the FI associated with each of these selection steps is very low, around 10 bits. Given the probabilistic ressources of the IS during the SHM process, it is a childā€™s play for the IS to produce 10 bits of FI by random walk and, as a result, to go through the different selection steps leading cumulatively to high FI. So it is true that RV + NS can produce high FI but only in one particular and very special situation, ie., when the final target exhibiting high FI can be reach incrementally through a serie of small selective steps. Such a very special situation is quite rare in biology and doesnā€™t apply to complex proteins. This last point is well argued by @gpuccio in his OP below.
https://uncommondescent.com/intelligent-design/what-are-the-limits-of-natural-selection-an-interesting-open-discussion-with-gordon-davisson/

Good guessšŸ‘

With all due respect, you are completely wrong here, as explained by @gpuccio below.

I was indeed wrong, but a lot less wrong that @gpuccio.

3 Likes

If the scenario that you describe is real then the FI was already in the previous application as far as I can tell.

Who ever ends up being right is immaterial as the conversation is going to improve our understanding of functional information. Thanks for the thoughtful posts.

@gpuccio here is a reference point for the discussion. This definition is from Hazen and Szostak.

I(Ex) = ?log2[F(Ex)], where F(Ex) is the fraction of all possible configurations of the system that possess a degree of function ? Ex. Functional information, which we illustrate with letter sequences, artificial life, and biopolymers, thus represents the probability that an arbitrary configuration of a system will achieve a specific function to a specified degree. In each case we observe evidence for several distinct solutions with different maximum degrees of function, features that lead to steps in plots of information versus degree of function.

The safes example assumes independence (see 1 below). That is, the combinations and rewards (hence in $dollars) are part of a whole, not 101 independent parts. Each safe has $1. The firt safe has a 1-but combination the second a 2-bit combo, etc., up to a 100-bit combo for the last safe.
The first small safe with a 1-bit combination is quickly opened by thief, who gains $1 and 1-bit of the combination to the next safe. The second safe has a 2-bit combination, but using his 1-bit knowledge he only needs one more bit! The second safe is also soon opened gaining another $1 and 1 more bit. The thief proceed to open each safe in turn until all the safes are open, and walks out with $100.

Letā€™s make this harder - The thief does know how the safes are ordered, so it is not clear what order he should proceed.
He starts by entering ā€œ0ā€ as the combination for all 100 safes, if one of the does not open, he goes around again and enters ā€œ1ā€ as the combination, and one must open. The thief has gained $1 and 1-bit. 99 safes remain.
The thief repeats his task, starting with the bit(s) he knows and and adding 1-bit at a time until all the safes are open.

Additional notes:

  1. I am making the error of assuming only a single function, or a single set of safes, when the thief may have many to choose from. A combination that does not open a safe for a particular function might open a safe containing some other new and unexpected function.

  2. Another error! There is not just a single thief, but a population of thieves, each working to open the safes and sparing information.

  3. If the safe combinations allow extra bits beyond the correct combination, the thief can guess the next two or three bits, possibly opening several safes with each pass, greatly speeding his task.

  4. The thief ought to be flipping coins to choose bits instead of sequentially trying ā€œ0ā€ and ā€œ1ā€ bits. The thief will average 2 attempts per bit, but this does not substantially change the point I am making so Iā€™m not going back to fix it! :wink:

1 Like

Gpuccio is, I think, assuming there is nothing for the thief can gain until some substantial number of correct bits are known. We could modify the example so the thief needs to guess more than 1-bit, at least at first. That would mitigate some of the easy gains for the thief I demonstrated, but opening the safes and gaining $100 is still far easier than claimed. Iā€™m willing to give our busy thief a well earned rest. :slight_smile:

Looking at the original CARD11 example: The actual function of the matching protein in Saccoglossus kowalevskii seems to be unknown at this time, as it is only a predicted protein from analysis of the genome sequence (if I understand correctly). However, if it does have an analogous function, isnā€™t the divergence in sequence between Saccoglossus kowalevskii and humans suggestive of low functional information (as defined), since two very different proteins can achieve the same function? And if the two proteins donā€™t have analogous functions and we are just looking for the raw material to mutate into something that can carry out the function in humans/vertebrates, donā€™t we need to look at the DNA level rather than the protein level?

5 Likes

The situation is a little more dynamic that this. The environment/ecosytem changes (change in weather, animal migration patterns, pathogens etc).

So it more like the safes code keeps changing every once in a while.
This is why evolution is a process that depended so much on ā€œcontingenciesā€.
In short biodiversity is a miracle that normally shouldnā€™t have happened.

How do you know?

2 Likes

No, thatā€™s not what Iā€™m saying. Iā€™m saying why canā€™t the accumulation in the same protein continue? Thereā€™s an antibody, it has some sequence. It mutates in the hypervariable region in a stretch of 6-10 amino acids, and over the course of a few weeks 50 FI (or whatever) is generated. Why could a larger portion of the antibody not continue mutating?

Why could the same thing not happen to 200 residues, out of a 600 amino acid protein over the course of 500 million years? In fact, isnā€™t this exactly what phylogenetics reveals to us has happened? We create a large tree of homologous proteins, and we see that along some lineages lots of mutations have accumulated in the protein during this time period?

Natural selection fixed the mutations in the antibody. Why canā€™t natural selection have fixed the mutations in the larger protein?

2 Likes