Gil grabs some ammunition and shoots down Doug Axe's 2004 extrapolation by a factor of more than 10^44

Giltil · February 4, 2020, 10:37pm

This is wrong, plain wrong. For example, gpuccio’s methodology aimed at calculating the FI associated to a function, and not at all the probability of getting a specific protein. Likewise, in the paper I referred to, the authors, Tian and Best, calculate the probability of discovering protein folds by random search and not at all the probability of discovering a specific protein sequence.

T_aquaticus · February 4, 2020, 10:40pm

You have that exactly backwards. @gpuccio’s method can not calculate the odds of getting a specific function, or a selectable function in general. All he is measuring is how many residues can be changed without losing the specific function in the starting protein.

Exactly. You are focusing on getting folds instead of selectable function.

Rumraket · February 4, 2020, 10:42pm

No, wrong. Spectacularly wrong. Gpuccio’s method ONLY works on the basis of sequence alignments. It tells you nothing about whether other highly dissimilar sequences can perform the function of interest.

Buddy, I think you should stop telling other people here that they don’t understand this subject.

Giltil · February 4, 2020, 10:43pm

This is a strange statement for neither the participants to the Wistar conference, nor Berlinski or Gelerntner were or are creationists!

Rumraket · February 4, 2020, 10:45pm

No, but you’ve got your perspective on what occurred at that conference, from creationists.

Faizal_Ali · February 4, 2020, 10:47pm

Mercer · February 4, 2020, 10:50pm

Catalytic antibodies.

colewd · February 5, 2020, 1:30am

What do these tell us about muscle proteins?

Timothy_Horton · February 5, 2020, 1:40am

Gelerntner sure is. In fact he made a right fool of himself in rambling error-filled diatribe against science he published last year, Giving Up Darwin.

Rumraket · February 5, 2020, 1:50am

There goes Bill Gish-galloping. John was responding to what you wrote, not some other thing you bring up after the fact.

Catalytic antibodies tell us something about how frequent functions are as opposed to particular folds.

Faizal_Ali · February 5, 2020, 1:54am

The further irony, of course, is Bill thinks Catalytic antibodies tell us nothing about muscle proteins, but has no problem accepting that Axe’s silly little project with beta-lactamase can tell us everything about every protein that ever was, and ever will be.

Creationists, sheesh.

Rumraket · February 5, 2020, 2:02am

Guess who wrote this?

“You need to show all proteins can evolve not some small subset. As you need to show all proteins have local optimums and that local optimums exist such that functional space is almost equal to sequence space.”

Hint.

nwrickert · February 5, 2020, 2:38am

Hmm, maybe I should start a new thread on this.

Yes, I know about the Wistar conference. As best I recall, the people there had some of the same reservations that I once had about evolution (before I fully understood it).

I’ll tentatively plan to start a new thread on how I currently understand evolution. I’ll probably start it tomorrow (Wednesday Feb 05).

Mercer · February 5, 2020, 3:10am

Speaking of muscle proteins, why are there >100 variants of MYH7 in perfectly healthy people?

Giltil · February 5, 2020, 7:25am

Why are you saying I am wrong when you know perfectly well that it is indeed the case that gpuccio’s methodology aims at calculating FI? You may disagree on the fact that his method works, but you can’t deny its aim.
Now, let’s go back to gpuccio’s method per se.
You contest it because it doesn’t take into account the possibility that other dissimilar solutions may exist in the sequence space. Although this is true, it doesn’t affect the soundness of his analysis. To see this, let’s take again the ATP synthase example. Gpuccio has calculated that it has a FI of 1297 bits. Now imagine that 1000 other dissimilar solutions exist in the sequence space that can perform the same function. In that case, the FI associated with the function of ATP synthase would be reduced by 10 bits.
If 1 billion other dissimilar solutions existed, the reduction would be of 30 bits.
If 10^100 other dissimilar solutions existed, the reduction would be of 332 bits.
If 10^250 other dissimilar solutions existed, the reduction would be of 830 bits, leading to a FI of 467 bits, which is just below the threshold that warrants a design inference.
So you see, in order to dismiss the design inference for ATP synthase, you have to imagine that there exist about 10^250 dissimilar solutions in the sequence space!!! Given that no evidence whatsoever exist that a single alternative dissimilar solution exist for implementing the function of ATP synthase, your case is, say, week, to say the least.

Mercer · February 5, 2020, 8:10am

According to @gpuccio himself in post 153 of this thread, the aim of his exercise is clear:
“It [my contribution here] is, instead, aimed at intellectual confrontation about a very inmportant paradigm difference: design against neo-darwinism to explain biological functions.”

Gpuccio: Functional Information Methodology Conversation

Gpuccio is a poster at Uncommon Descent who published a blog article at UD arguing that proteins have high functional information (FI), and must have been designed. His work is commonly referenced by ID proponents. This is his post: We asked for him to explain his methodology further. @colewd asked him (comment 343), and gpuccio gave a response (comment 356). He has agreed to respond to reasonable critique. OK, so here is a brief primer about my methodology to measure Functional Information in proteins: a) I use Blast to measure sequence homology between proteins, in bits. I take the bitscore from the Blast algorithm as it is, with some consideration of the number of identities and similarities, too. b) I am interested in homologies that are conserved throughout long evolutionary periods. I consider that kind of homology as a very good estimator of FI. The reason is very simple: a specific sequence can be conserved for those long time windows only if it is under very strong fu…

@gpuccio did. He made it clear that his aim is confrontation. It’s just a polemic weapon that he’s abandoned you to defend alone.

Mercer · February 5, 2020, 8:14am

Imagine that Nigel Tufnel had a knob that went to 13! Would it mean that his amp would be louder?

You haven’t shown that the design inference has any basis in reality. It’s just another knob that goes to 11.

Roy · February 5, 2020, 12:12pm

Um, 48 orders of magnitude.

Rumraket · February 5, 2020, 12:26pm

Pff, what’s 10 billion between friends?

Rumraket · February 5, 2020, 1:13pm

That may be what Gpuccio intends(as in he aims at that in his head), but not how the method actually works. It’s based on trying to extrapolate functional variation from homologous sequences, hence it’s not based really on function but on sequence similarity. That’s why.

So if there was a very dissimilar sequence that could perform the function, it would not be detected as homologous, Gpuccio would not be able to find it in some database when using a similarity based-search, it would not be classified as belonging to the clade of sequences with a similar name (ubiquitin, alpha actin, ATP synthase subunit beta, or w/e), and so Gpuccio would miss it in his calculation.

Hence, his method is not based on functions, but on sequence similarity and annotation classifications. Which means he’s only ever going to be looking within particular sequence-based families judged to be homologous by annotators.

You contest it because it doesn’t take into account the possibility that other dissimilar solutions may exist in the sequence space. Although this is true, it doesn’t affect the soundness of his analysis.

Of course it does. We can show that with a simple hypothetical example. Keefe & Szostak evolved multiple different ATP binding proteins from a library of about about 10^12 different random sequences 80 amino acids in length. The total sequence space of which is about 1.2×10^104. They find that the function ATP binding exists at a frequency of about 10^-11. (Btw that experiment has been repeated by another lab and they found basically the same thing).

Now let’s suppose we use Gpuccio’s method to try to derive the FI for the function from one of those sequences the Szostak lab found, having diverged over 400 million years. We find that over that time period, it has diversified quite a lot, so much so that every position in the sequence has at least 3 other amino acids in some variant. We calculate from that, that there are 4^80 = ~1.5×10^48 possible sequences that can implement the ATP binding function in the sequence space for proteins 80 amino acids long.

So now we try to derive the fraction of sequence space 80 amino acids long that can bind ATP:
(1.5×10^48)/(1.2×10^104) = 1.25×10^-56

But the real frequency of the function is in the 10^-11 to 10^-12 range as revealed in multiple empirical experiments. Yet basing our calculation on homologous sequences, even if an enormous amount of variation has been generated such that every single position is known to have 3 possible alternative amino acids(for a total of 4), we still end up with a 45 order of magnitude underestimation.

FI is useless for establishing the true fraction of sequence space able to perform some function of interest, because it isn’t physically possible for evolution, even over the entire duration of life’s existence on Earth, to generate all the variation that would be needed for us to be able to extrapolate the true number for FI. All of which I explained here.

To see this, let’s take again the ATP synthase example. Gpuccio has calculated that it has a FI of 1297 bits.

Let’s be clear that you’re talking about the beta-subunit of ATP synthase, you can’t really calculate the FI for the whole machine as it is made of multiple independent proteins, and it would be meaningless to even attempt do so since it evolved from protein subunits that had other functions on their own. And the beta-subunit is homologous to the alpha subunit in the catalytic hexamer, which is basically just the same protein repeated six times into a hexagonal oligomer.

Now imagine that 1000 other dissimilar solutions exist in the sequence space that can perform the same function. In that case, the FI associated with the function of ATP synthase would be reduced by 10 bits.
If 1 billion other dissimilar solutions existed, the reduction would be of 30 bits.
If 10^100 other dissimilar solutions existed, the reduction would be of 332 bits.
If 10^250 other dissimilar solutions existed, the reduction would be of 830 bits, leading to a FI of 467 bits, which is just below the threshold that warrants a design inference.
So you see, in order to dismiss the design inference for ATP synthase, you have to imagine that there exist about 10^250 dissimilar solutions in the sequence space!!! Given that no evidence whatsoever exist that a single alternative dissimilar solution exist for implementing the function of ATP synthase, your case is, say, week, to say the least.

The problem with all this is twofold. First of all as just stated, even supposing there were that many different functional proteins for the molecule, they couldn’t physically exist or be generated by the evolutionary process, so any estimation of FI based on similarity would be unable to correctly estimate it since it’s merely an extrapolation based on extant known variation. Thanks for making that point for me.

Second: ATP synthase subunit beta evolved from simpler precursors able to perform similar functions(which, ironically, is related to the ATP binding function evolved by Keefe & Szostak 2001). We’ve been over this.

That would also make FI useless for giving any hints about whether some protein function is evolvable, because even if that function now is very rare in sequence space, it is possible it can evolve incrementally from a simpler precusor that is much more frequent in that space. As all evidence shows is the case for the relationship between the extant and ancestral function of ATP synthase subunit beta. These matters are even further complicated by the fact that proteins can have multiple functions, so one function that is highly abundant in sequence space, can give rise to another that is very rare but happens to overlap in some cluster of sequences.

For these reasons you simply can’t establish the relationship between FI and sequence space based on homologous sequences generated over evolutionary history, and even if you could do that, you can’t derive from that relationship that protein X could not evolve because you’re still only considering a sort of de novo evolution where the function has to emerge as-is, instead of deriving from some a simpler and more frequent function, or an entirely different one.

Topic		Replies	Views
Gpuccio: Functional Information Methodology Conversation Science , Design	183	13401	September 1, 2019
Looking for sources on the information argument Conversation Design	127	2819	September 10, 2021
Does ID have Hypotheses? Conversation	97	2648	June 24, 2021
Miller: Axe Decisively Confirmed? Conversation Science , Design	31	4639	February 23, 2019
The Failures of Mathematical Anti-Evolutionism Conversation Science , Design	211	4524	June 28, 2022

Gil grabs some ammunition and shoots down Doug Axe's 2004 extrapolation by a factor of more than 10^44

Related topics