A Ubiquitin Response to Gpuccio

AlanFox · September 13, 2018, 4:22pm

It’s only forty comments long. You haven’t explained anything in any comment you’ve posted above.

colewd · September 13, 2018, 4:31pm

Here it is again. He is comparing sequences and reducing the functional information calculation by the AA’s that were not preserved. In almost all cases the calculation separates from Demski’s universal probability bound by orders of magnitude.

swamidass · September 13, 2018, 4:33pm

@colewd it’s really easy to show in a simulation of common descent that FI measured this way will always be astronomically high, orders of magnitude more than the actual FI.

colewd · September 13, 2018, 4:35pm

Can you explain the simulation method?

swamidass · September 13, 2018, 4:42pm

Start with a sequence representing an species.
Make a copy of sequence to represent a new species that branches off.
Add mutations scaled by time and rate to both sequences.
Simulate more sequences as desired.

If you measure “FI” on sequences simulated this way, it will always be very high, unless unrealistically high mutation rates are used.

colewd · September 13, 2018, 4:56pm

Very good thanks.

If we look at synonymous mutations I would expect an interesting result here especially over 400 million years. What we are seeing is actually a very high synonymous mutation rate but a very constricted AA mutation rate which is what points to high purifying selection. This points to high FI but exactly how high is what would be great to hone in on.

Lets think about this as we move to the point that engaging gpuccio would be fruitful.

T_aquaticus · September 13, 2018, 5:16pm

The problem is that this method does not tell us all of the possible combinations of amino acids that will produce a specific function which is what you need to calculate FI.

swamidass · September 13, 2018, 5:19pm

Turns out your intuition is false. Rather than staying your intuitions as facts, pleas ask questions, especially in regards to mathematical biology.

colewd · September 13, 2018, 5:40pm

I don’t think this is applicable as there is specific binding involved in these proteins.

colewd · September 13, 2018, 5:42pm

Fair enough. Can you demonstrate that my intuition is false?

T_aquaticus · September 13, 2018, 6:30pm

Mutual information can’t tell us all of the amino acid combinations that will produce that specific binding, nor can it tell us all of the possible binding partners that can produce the same function. Therefore, mutual information is not a measure of functional information.

colewd · September 13, 2018, 10:02pm

I can see this for the observed sequence being true for a single active site enzyme.

I cannot see it being true with a nuclear protein like PRPF8 that has to bind more then 5 large proteins with each binding site have its own functional island.

T_aquaticus · September 13, 2018, 10:09pm

The same logic applies to every active site, no matter how many active sites there are.

We could look at something I am a bit more familiar with like Factor Xa. This enzyme cleaves prothrombin into thrombin at a specific site on the protein as part of the clotting cascade. There are way more possible combinations that would cut prothrombin at the same site than we see in biology right now, AND you would get the same function if you changed the amino acid sequence of the cleavage site in prothrombin and had a different enzyme cut at that site. It is this second part that you are ignoring.

colewd · September 13, 2018, 10:18pm

I strongly disagree. The empirical evidence I have looked at shows higher FI as you add binding sites to a single fold. Gpuccio’s examples are typically multi binding site nuclear proteins as in ubiquitin compatible proteins.

The other issue is we are observing these proteins in positions that they are unable to substitute AA’s despite many DNA mutations.

T_aquaticus · September 13, 2018, 10:28pm

Gpuccio is measuring mutual information, not functional information.

That is due to historical contingency. Are you familiar with the concept of fitness peaks?

Evolution can only work with what it has. It can’t start from scratch. Therefore, the number of mutations available are limited by the protein you start with. It is entirely possible that very different proteins can serve the same function, but there is no pathway that evolution can take to get to those proteins because it is locked into the first pathway it found.

colewd · September 13, 2018, 10:40pm

I respectfully disagree. Although it is an indirect measurement and certainly has error to it is measuring function. I would however agree his measurements include mutual information.

Yes, I am familiar with this theoretical construct that has some empirical backing.

I agree this is true for a single function protein but thats not what we are dealing with here. I also agree it can get stuck but you have to ask your self the question, how did all these proteins find this stuck and optimized position. Look at the Hayashi paper. Experimental Rugged Fitness Landscape in Protein Sequence Space

T_aquaticus · September 14, 2018, 2:45pm

This is the formula for calculating functional information:

FI = -log2 [N/W]

N is the number of protein sequences that will produce a specific function and W is the number of possible protein sequences under consideration.

Gpuccio and others are not using this forumula. Instead, they are using the formula for measuring mutual information:

H ( X f( t )) = -∑ P ( X f( t )) log P ( X f( t )) (1)
Durston et al., 2007

Mutual information is not a measure of functional information. Without knowing all possible protein sequences that can produce a specific function then you have no way of calculating FI.

Why wouldn’t it apply to proteins with multiple active sites? How is it different than each of those active sites being on separate proteins?

Even if we don’t know the blow by blow mutational history of every protein in existence this doesn’t change the fact that mutual information is not a measure of functional information.

AlanFox · September 14, 2018, 5:58pm

Shout it from the rooftops. How many needles in the haystack. We don’t know till we find them!

colewd · September 14, 2018, 6:33pm

Thats why we have math. To make estimates and sense of the data. Poll takers do not call all households every time they take a poll. Dr. Hunts article in 2004 was all about estimates of needles in the haystack. Science is tentative so we will never be able to prove evolution is false. We can however show it is most likely not the right overall explanation for life’s diversity.

Conscious intelligence does not count on the needle to haystack ratio

colewd · September 14, 2018, 6:37pm

Thanks for the thoughtful response. I want to spend a couple some time with Durston’s paper. Thanks for citing it. Will respond in a couple of days.

Topic		Replies	Views
Gpuccio: Functional Information Methodology Conversation Science , Design	183	13474	September 1, 2019
Durston: Functional Information Office Hours Design	63	8281	December 5, 2018
Computing the Functional Information in Cancer Conversation Design	41	5442	July 6, 2020
Looking for sources on the information argument Conversation Design	127	2874	September 10, 2021
Explaining the Cancer Information Calculation Conversation	85	6774	September 28, 2020

A Ubiquitin Response to Gpuccio

Related topics