A Ubiquitin Response to Gpuccio

It’s only forty comments long. You haven’t explained anything in any comment you’ve posted above.

Here it is again. He is comparing sequences and reducing the functional information calculation by the AA’s that were not preserved. In almost all cases the calculation separates from Demski’s universal probability bound by orders of magnitude.

@colewd it’s really easy to show in a simulation of common descent that FI measured this way will always be astronomically high, orders of magnitude more than the actual FI.

Can you explain the simulation method?

1 Like
  1. Start with a sequence representing an species.
  2. Make a copy of sequence to represent a new species that branches off.
  3. Add mutations scaled by time and rate to both sequences.
  4. Simulate more sequences as desired.

If you measure “FI” on sequences simulated this way, it will always be very high, unless unrealistically high mutation rates are used.

Very good thanks.

If we look at synonymous mutations I would expect an interesting result here especially over 400 million years. What we are seeing is actually a very high synonymous mutation rate but a very constricted AA mutation rate which is what points to high purifying selection. This points to high FI but exactly how high is what would be great to hone in on.

Lets think about this as we move to the point that engaging gpuccio would be fruitful.

The problem is that this method does not tell us all of the possible combinations of amino acids that will produce a specific function which is what you need to calculate FI.

2 Likes

Turns out your intuition is false. Rather than staying your intuitions as facts, pleas ask questions, especially in regards to mathematical biology.

2 Likes

I don’t think this is applicable as there is specific binding involved in these proteins.

Fair enough. Can you demonstrate that my intuition is false?

1 Like

Mutual information can’t tell us all of the amino acid combinations that will produce that specific binding, nor can it tell us all of the possible binding partners that can produce the same function. Therefore, mutual information is not a measure of functional information.

I can see this for the observed sequence being true for a single active site enzyme.

I cannot see it being true with a nuclear protein like PRPF8 that has to bind more then 5 large proteins with each binding site have its own functional island.

The same logic applies to every active site, no matter how many active sites there are.

We could look at something I am a bit more familiar with like Factor Xa. This enzyme cleaves prothrombin into thrombin at a specific site on the protein as part of the clotting cascade. There are way more possible combinations that would cut prothrombin at the same site than we see in biology right now, AND you would get the same function if you changed the amino acid sequence of the cleavage site in prothrombin and had a different enzyme cut at that site. It is this second part that you are ignoring.

I strongly disagree. The empirical evidence I have looked at shows higher FI as you add binding sites to a single fold. Gpuccio’s examples are typically multi binding site nuclear proteins as in ubiquitin compatible proteins.

The other issue is we are observing these proteins in positions that they are unable to substitute AA’s despite many DNA mutations.

Gpuccio is measuring mutual information, not functional information.

That is due to historical contingency. Are you familiar with the concept of fitness peaks?

Evolution can only work with what it has. It can’t start from scratch. Therefore, the number of mutations available are limited by the protein you start with. It is entirely possible that very different proteins can serve the same function, but there is no pathway that evolution can take to get to those proteins because it is locked into the first pathway it found.

I respectfully disagree. Although it is an indirect measurement and certainly has error to it is measuring function. I would however agree his measurements include mutual information.

Yes, I am familiar with this theoretical construct that has some empirical backing.

I agree this is true for a single function protein but thats not what we are dealing with here. I also agree it can get stuck but you have to ask your self the question, how did all these proteins find this stuck and optimized position. Look at the Hayashi paper. Experimental Rugged Fitness Landscape in Protein Sequence Space

This is the formula for calculating functional information:

FI = -log2 [N/W]

N is the number of protein sequences that will produce a specific function and W is the number of possible protein sequences under consideration.

Gpuccio and others are not using this forumula. Instead, they are using the formula for measuring mutual information:

H ( X f( t )) = -∑ P ( X f( t )) log P ( X f( t )) (1)
Durston et al., 2007

Mutual information is not a measure of functional information. Without knowing all possible protein sequences that can produce a specific function then you have no way of calculating FI.

Why wouldn’t it apply to proteins with multiple active sites? How is it different than each of those active sites being on separate proteins?

Even if we don’t know the blow by blow mutational history of every protein in existence this doesn’t change the fact that mutual information is not a measure of functional information.

2 Likes

Shout it from the rooftops. How many needles in the haystack. We don’t know till we find them!

Thats why we have math. To make estimates and sense of the data. Poll takers do not call all households every time they take a poll. Dr. Hunts article in 2004 was all about estimates of needles in the haystack. Science is tentative so we will never be able to prove evolution is false. We can however show it is most likely not the right overall explanation for life’s diversity.

Conscious intelligence does not count on the needle to haystack ratio :wink:

Thanks for the thoughtful response. I want to spend a couple some time with Durston’s paper. Thanks for citing it. Will respond in a couple of days.