Gil grabs some ammunition and shoots down Doug Axe's 2004 extrapolation by a factor of more than 10^44

Why are you saying I am wrong when you know perfectly well that it is indeed the case that gpuccio’s methodology aims at calculating FI? You may disagree on the fact that his method works, but you can’t deny its aim.
Now, let’s go back to gpuccio’s method per se.
You contest it because it doesn’t take into account the possibility that other dissimilar solutions may exist in the sequence space. Although this is true, it doesn’t affect the soundness of his analysis. To see this, let’s take again the ATP synthase example. Gpuccio has calculated that it has a FI of 1297 bits. Now imagine that 1000 other dissimilar solutions exist in the sequence space that can perform the same function. In that case, the FI associated with the function of ATP synthase would be reduced by 10 bits.
If 1 billion other dissimilar solutions existed, the reduction would be of 30 bits.
If 10^100 other dissimilar solutions existed, the reduction would be of 332 bits.
If 10^250 other dissimilar solutions existed, the reduction would be of 830 bits, leading to a FI of 467 bits, which is just below the threshold that warrants a design inference.
So you see, in order to dismiss the design inference for ATP synthase, you have to imagine that there exist about 10^250 dissimilar solutions in the sequence space!!! Given that no evidence whatsoever exist that a single alternative dissimilar solution exist for implementing the function of ATP synthase, your case is, say, week, to say the least.

According to @gpuccio himself in post 153 of this thread, the aim of his exercise is clear:
“It [my contribution here] is, instead, aimed at intellectual confrontation about a very inmportant paradigm difference: design against neo-darwinism to explain biological functions.”

@gpuccio did. He made it clear that his aim is confrontation. It’s just a polemic weapon that he’s abandoned you to defend alone.

1 Like

Imagine that Nigel Tufnel had a knob that went to 13! Would it mean that his amp would be louder?

You haven’t shown that the design inference has any basis in reality. It’s just another knob that goes to 11.

1 Like

Um, 48 orders of magnitude.

2 Likes

Pff, what’s 10 billion between friends?

1 Like

That may be what Gpuccio intends(as in he aims at that in his head), but not how the method actually works. It’s based on trying to extrapolate functional variation from homologous sequences, hence it’s not based really on function but on sequence similarity. That’s why.

So if there was a very dissimilar sequence that could perform the function, it would not be detected as homologous, Gpuccio would not be able to find it in some database when using a similarity based-search, it would not be classified as belonging to the clade of sequences with a similar name (ubiquitin, alpha actin, ATP synthase subunit beta, or w/e), and so Gpuccio would miss it in his calculation.

Hence, his method is not based on functions, but on sequence similarity and annotation classifications. Which means he’s only ever going to be looking within particular sequence-based families judged to be homologous by annotators.

You contest it because it doesn’t take into account the possibility that other dissimilar solutions may exist in the sequence space. Although this is true, it doesn’t affect the soundness of his analysis.

Of course it does. We can show that with a simple hypothetical example. Keefe & Szostak evolved multiple different ATP binding proteins from a library of about about 10^12 different random sequences 80 amino acids in length. The total sequence space of which is about 1.2×10^104. They find that the function ATP binding exists at a frequency of about 10^-11. (Btw that experiment has been repeated by another lab and they found basically the same thing).

Now let’s suppose we use Gpuccio’s method to try to derive the FI for the function from one of those sequences the Szostak lab found, having diverged over 400 million years. We find that over that time period, it has diversified quite a lot, so much so that every position in the sequence has at least 3 other amino acids in some variant. We calculate from that, that there are 4^80 = ~1.5×10^48 possible sequences that can implement the ATP binding function in the sequence space for proteins 80 amino acids long.

So now we try to derive the fraction of sequence space 80 amino acids long that can bind ATP:
(1.5×10^48)/(1.2×10^104) = 1.25×10^-56

But the real frequency of the function is in the 10^-11 to 10^-12 range as revealed in multiple empirical experiments. Yet basing our calculation on homologous sequences, even if an enormous amount of variation has been generated such that every single position is known to have 3 possible alternative amino acids(for a total of 4), we still end up with a 45 order of magnitude underestimation.

FI is useless for establishing the true fraction of sequence space able to perform some function of interest, because it isn’t physically possible for evolution, even over the entire duration of life’s existence on Earth, to generate all the variation that would be needed for us to be able to extrapolate the true number for FI. All of which I explained here.

To see this, let’s take again the ATP synthase example. Gpuccio has calculated that it has a FI of 1297 bits.

Let’s be clear that you’re talking about the beta-subunit of ATP synthase, you can’t really calculate the FI for the whole machine as it is made of multiple independent proteins, and it would be meaningless to even attempt do so since it evolved from protein subunits that had other functions on their own. And the beta-subunit is homologous to the alpha subunit in the catalytic hexamer, which is basically just the same protein repeated six times into a hexagonal oligomer.

Now imagine that 1000 other dissimilar solutions exist in the sequence space that can perform the same function. In that case, the FI associated with the function of ATP synthase would be reduced by 10 bits.
If 1 billion other dissimilar solutions existed, the reduction would be of 30 bits.
If 10^100 other dissimilar solutions existed, the reduction would be of 332 bits.
If 10^250 other dissimilar solutions existed, the reduction would be of 830 bits, leading to a FI of 467 bits, which is just below the threshold that warrants a design inference.
So you see, in order to dismiss the design inference for ATP synthase, you have to imagine that there exist about 10^250 dissimilar solutions in the sequence space!!! Given that no evidence whatsoever exist that a single alternative dissimilar solution exist for implementing the function of ATP synthase, your case is, say, week, to say the least.

The problem with all this is twofold. First of all as just stated, even supposing there were that many different functional proteins for the molecule, they couldn’t physically exist or be generated by the evolutionary process, so any estimation of FI based on similarity would be unable to correctly estimate it since it’s merely an extrapolation based on extant known variation. Thanks for making that point for me.

Second: ATP synthase subunit beta evolved from simpler precursors able to perform similar functions(which, ironically, is related to the ATP binding function evolved by Keefe & Szostak 2001). We’ve been over this.

That would also make FI useless for giving any hints about whether some protein function is evolvable, because even if that function now is very rare in sequence space, it is possible it can evolve incrementally from a simpler precusor that is much more frequent in that space. As all evidence shows is the case for the relationship between the extant and ancestral function of ATP synthase subunit beta. These matters are even further complicated by the fact that proteins can have multiple functions, so one function that is highly abundant in sequence space, can give rise to another that is very rare but happens to overlap in some cluster of sequences.

For these reasons you simply can’t establish the relationship between FI and sequence space based on homologous sequences generated over evolutionary history, and even if you could do that, you can’t derive from that relationship that protein X could not evolve because you’re still only considering a sort of de novo evolution where the function has to emerge as-is, instead of deriving from some a simpler and more frequent function, or an entirely different one.

2 Likes

You’re stating that he cannot calculate it yet the evidence for different functional sequences you claim is there is based on evolutionary theory and not being their factually. I think this is what Koonin calls an adaptionist “just so” story.

He shows that you can have an exorbitant number of solutions and his claim still holds.

His calculation is meaningless because you believe the “just so” story is true. What if the “just so” story is false?

No, it’s based on the very same thing you are using to derive FI: sequence similarity.

You either accept that you can infer a homologous relationship based on nesting hierarchical structure in similar sequences, or you do not. If you make an estimation of FI based on similar sequences found in some database, you have implicitly accepted that the sequences are related. And thus that homology can be established from tree structure in similar sequences.

You can’t then suddenly and arbitrarily turn around and insist the method is “theory” and “just so story” when that VERY EXACT SAME METHOD is used to show ancestries deeper and more divergent than you are using it for. Then you are having your cake and eating it too.

I think this is what Koonin calls an adaptionist “just so” story.

LOL. Guess who co-authored this phylogenetic analysis of the entire P-loop NTPase superfamily of proteins?
eipe DD, Koonin EV, Aravind L. Evolution and classification of P-loop kinases and related proteins. J Mol Biol. 2003 Oct 31;333(4):781-815. DOI: 10.1016/j.jmb.2003.08.040

Bill you either accept phylogenies as evidence for homologous relationships, or you don’t. When you are doing FI calculations based on extrapolating tolerated variation from alignments of similar sequences, you have already accepted that. So using that very same evidence, we can show the evolutionary histories of the proteins in question go back, incrementally, to even simpler stages, some times with different functions and even structures.

1 Like

Sometimes the sequences are not so similar. We are looking at sequences based on similar or the same function. Beta lactamase is an example of a protein that has similar function but divergent sequences.

So what?

1 Like

Your claim:

Is false.
We are looking at similar function.

No, it’s not false. Many (actually the vast, vast majority) of the functional annotations found in databases are based on sequence similarity(some times including surrounding gene and non-coding DNA synteny), scientists have generally not re-done the complete experimental biochemical assays that show the actual biochemical functions of these proteins for every new species they are discovered in.

And even where they have done assays that show the functions of different proteins, Gpuccio has not been deriving his numbers from different proteins with similar functions. He’s just done blast searches looking for proteins with similar sequences.

1 Like

This is not the method we are using. The function of alpha actin is clear as the function of beta lactamase is also clear. One protein lines up well the other does not. The comparison is based on functional similarity not sequence alignment. I understand the maturity of the databases is an issue for certain comparisons.

This is false. You need to rethink your argument.

Hey, it gets worse. Gpuccio implicitly accepts the inference of incremental increases in protein size and complexity over the history of some clade. That’s how he derives his so-called “information jumps” from for example the evolution of some protein in an inferred ancestral vertebrate, to the human version of the sequence.

Hence Gpuccio implicitly accepts that this historical growth has occurred, and has an even deeper ancestry. Read the link.

1 Like

Yes, that is the method you are using. You clearly have no idea what’s going on, again.

No, it’s not false. Please go and read what Gpuccio actually does, from his own posts. Click the link above.

1 Like

After re reading this I see where you are coming from. He is looking at basically the same function and not similar function. I think this is where the disconnect is. Where I think you are still in error is claiming that

The sequences may be similar and may not be. It depends on the variation observed from the specific function in different animals.

No the disconnect still is that you don’t understand that his method is based on similarity searches, because he literally explicitly does similarity-based searches when he’s looking for homologous sequences in other species to the one he’s interested in.

There isn’t any way around this.

But I’m not, for reasons already explained.

Your subjective views on the degree to which the sequences are “similar” is not a relevant factor here Bill.

What matters to the question of whether the method is based on similarity or not, is how Gpuccio finds and includes a sequence to use so as to extrapolate the amount of tolerated variation for some sequence to implement a function. He does that by doing similarity-based searches. That is what happens when you use a BLAST search tool.

Even were he to use names of proteins from different species(in other words lets say he tries to avoid using blast), say he wants to find ATP synthase subunit beta in some unicellular eukaryote, he could just be searching for that (“ATP synthase subunit beta”) on uniprot for example, and then choose to sort results by taxonomy. He’d find lots of candidate gene sequences that haven’t actually been experimentally characterized to be ATP synthase subunit beta, but merely inferred to be that merely on the basis of some sort of similarity measure. That’s how these genes are often times automatically annotated in these databases.

Only a very small subset of them have been experimentally characterized to function in some specific way expected from their similarity.

Now since these BLAST-based searches used to collect homologous sequence for use in the extrapolation of FI, are in fact based on sequence similarity, and since this in turn means that to calculate FI you are implicitly accepting that the similar sequences you use in your calculation are in fact homologous, then it will be hypocritical to suddenly arbitrarily reject similarity-based inferences of relatedness when those very same similarity-based searches can be used to show deeper ancestral relationships to proteins with different functions, or with simpler structures and shorter, more likely sequences.

You can’t have your cake and eat it too.

1 Like

His claim has nothing to do with functional information.

1 Like

How many different types of beta-lactamase are there, Bill? Are all of their sequences homologous?

How are you doing on coming up with a design explanation for all of those MYH7 alleles in healthy humans?

2 Likes

According to Bill, such experiments can be dismissed as “just so” stories. :smile: