Gil grabs some ammunition and shoots down Doug Axe's 2004 extrapolation by a factor of more than 10^44

Roy · February 5, 2020, 6:04pm

Having looked at the plot, the immediate suspicion is that there isn’t actually an information jump at all (because such a big difference in information would require the human version of these proteins to be five times as long as the echinoderm ones) but the plotter is merely showing the level of difference between the human vs animal sequences, and fish/mice have higher numbers because they diverged from the human lineage more recently than echinoderms and tunicates.

Having found the source (a post by gpuccio on Uncommon Descent), it transpired that was exactly what was done:
The evolutionary history of those six protein is summarized in the following graph, realized as usual by computing the best homology bit score with the human protein in different groups of organisms.

So there is no information jump to explain - the echinoderm and human proteins contain about the same amount of information. The information is merely different, and the plot merely reflects divergence time. In fact, gpuccio doesn’t even confirm that the differences are due to changes in the human/mouse/fish lineages as opposed to changes in the metazoan/echinoderm/tunicate lineages! As far as that paper is concerned it may be showing an information ‘jump’ from chordates to echinoderms, and I’m sure that if gpuccio had calculated his ‘information’ values by comparisons to ecinoderms rather than humans that is exactly what it would show.

So this isn’t merely a failure to plot anything relevant, it’s also a failure to understand that phylogenies are bush-like not ladder-like, and that echinoderm lineages have evolved for exactly as long as the human lineage since they split.

There’s nothing here for anyone to explain, unless it’s gpuccio and Bill explaining why they consistently fail to understand one of the most basic concepts of evolution.

colewd · February 5, 2020, 6:14pm

He is measuring functional information not Shannon information.

Rumraket · February 5, 2020, 6:15pm

All of those concerns were raised to Gpuccio himself when he was here about 6 months ago.

Rumraket · February 5, 2020, 6:23pm

That doesn’t change the fact that what he’s doing is meaningless as an estimate of FI since all he measures is the degree of relatedness. He’s really just finding some other nebulous way to show that as you find increasingly distantly related organisms on the tree of life, you find increasingly dissimilar protein sequences. Calling this an “information jump” is nonsensical.

colewd · February 5, 2020, 6:45pm

Based on the measure (human conserved sequences) we are observing a functional information jump closer to the human sequence 400 million years ago. It’s an observation and a viable explanation as the change does not appear to be gradual.

Faizal_Ali · February 5, 2020, 6:48pm

Does it ever cross your mind, Bill, that when people who are experts in fields about which you know absolutely nothing keep saying you are wrong, that you might actually be wrong?

colewd · February 5, 2020, 6:54pm

Sure I might be wrong. I learned from Rum that there is a weakness in the uniprot database. You may have noticed I liked his post. As far as the information jump that gpuccio identified there is no viable counter yet only assertions.

Faizal_Ali · February 5, 2020, 6:56pm

He identified no “information jump.” It’s make believe. That’s what the people who understand this stuff as part of their jobs have been telling you. For months.

colewd · February 5, 2020, 7:04pm

What is the argument that supports this assertion?

Rumraket · February 5, 2020, 7:30pm

So you then now realize that Gpuccio’s method is based on the implicit acceptance that nesting hierarchical structure in sequences of similar genes, imply relatedness?

If you see similarity-based annotation as a weakness of the databases, then by implication you must also reject Gpuccio’s method of estimating FI (from similar sequences found in those databases) as having zero basis in fact. Hence you have no basis for claiming what the FI of any biological polymer is.

You can’t have it both ways.

colewd · February 5, 2020, 8:07pm

I think you are letting your personal bias get the best of you. This is made clear by statements of exaggeration like zero basis in fact. While your point about the database maturity is an excellent one saying there is no value to the measurement is a stretch.

We have a method and it needs the databases to mature for that method to improve.

Rumraket · February 5, 2020, 8:32pm

No, it’s you who has to decide what you think can be derived based on shared similar sequences. If you don’t think it can be used to infer relatedness, then by implication you can’t do what Gpuccio is doing because that’s the only tool he has.

You’d need direct biochemical evidence for function for all these sequences, and you’re just never going to get that. There are too many species and too many genes. At best you’re going to get expressed protein sequences, which means you still only have a sequence-similarity-based inference of function.

But that is what you are implying when you suddenly and arbitrarily turn around and reject sequence-similarity based methods for inferring relatedness.

So you do accept the inference of relatedness based on nesting hierarchical structure in the sequences of shared similar genes? Then you must deal with the evidence for the simpler ancestries of these genes or you are having a hypocritical double standard.

Mercer · February 5, 2020, 9:18pm

He explained it very clearly.

Faizal_Ali · February 5, 2020, 9:43pm

Well not me, exactly. You, @Rumraket, @Roy, others in this discussion and the previous ones about @gpuccio’s idea. Is it any mystery why he ran away from the discussion shortly after it began?

colewd · February 6, 2020, 12:02am

Sorry Faizal but we all have bias sources here. The claims you’re side are making are iffy at best.

colewd · February 6, 2020, 12:06am

Direct biochemical evidence would be preferred but the appearance of de novo sequences in vertebrates is not zero evidence.

I am not rejecting relatedness either by common descent or common design as possible inferences.

Let me think about this. The nested structure is positive evidence for common descent but this does not explain the appearance of complex sequences that are mutation resistant.

Mercer · February 6, 2020, 12:37am

Bill,

This is why that graph is meaningless. The sequences were literally preselected to give that “jump.”

colewd · February 6, 2020, 12:42am

The jump in itself is significant. I am impressed he found 6 proteins with over 1000 new bits vs the other species.

Rumraket · February 6, 2020, 12:56am

I’m sorry but that latter part doesn’t make sense as a response to what you quote me say. As I was saying, if you are not okay with merely inferring the same function merely from sequence-simlarity, then yes it certainly makes sense for you to desire direct biochemical evidence for function.
But that has nothing to do with anything about “de novo sequences in vertebrates”. Who was talking about de novo sequences in vertebrates? Not me. You might have got something mixed up here and it’s not clear what that is. Is it that talk about information jumps?

The “information jump” in vertebrates that Gpuccio attempts to infer does not involve any de novo sequences. He’s deriving that idea from the observation of a large degree of change in sequences over some period of time, he’s not saying there’s some new protein sequence that suddenly pops up out of nowhere where before there was none.

Good, that’s a first step I guess. But it’s also a bit too vague to be meaningful with respect to our argument here. You’re not rejecting it as possible inferences? Okay, but then when do you actually reject it and why?

Do you reject that the sequences analyzed in this paper on the phylogenetic analysis of the P-loop NTPase superfamily, are actually related? And if you do, why?
Leipe DD, Koonin EV, Aravind L. Evolution and classification of P-loop kinases and related proteins. J Mol Biol. 2003 Oct 31;333(4):781-815. DOI: 10.1016/j.jmb.2003.08.040

Abstract

Sequences and structures of all P-loop-fold proteins were compared with the aim of reconstructing the principal events in the evolution of P-loop-containing kinases. It is shown that kinases and some related proteins comprise a monophyletic assemblage within the P-loop NTPase fold. An evolutionary classification of these proteins was developed using standard phylogenetic methods, analysis of shared sequence and structural signatures, and similarity-based clustering. This analysis resulted in the identification of approximately 40 distinct protein families within the P-loop kinase class. Most of these enzymes phosphorylate nucleosides and nucleotides, as well as sugars, coenzyme precursors, adenosine 5’-phosphosulfate and polynucleotides. In addition, the class includes sulfotransferases, amide bond ligases, pyrimidine and dihydrofolate reductases, and several other families of enzymes that have acquired new catalytic capabilities distinct from the ancestral kinase reaction. Our reconstruction of the early history of the P-loop NTPase fold includes the initial split into the common ancestor of the kinase and the GTPase classes, and the common ancestor of ATPases. This was followed by the divergence of the kinases, which primarily phosphorylated nucleoside monophosphates (NMP), but could have had broader specificity. We provide evidence for the presence of at least two to four distinct P-loop kinases, including distinct forms specific for dNMP and rNMP, and related enzymes in the last universal common ancestor of all extant life forms. Subsequent evolution of kinases seems to have been dominated by the emergence of new bacterial and, to a lesser extent, archaeal families. Some of these enzymes retained their kinase activity but evolved new substrate specificities, whereas others acquired new activities, such as sulfate transfer and reduction. Eukaryotes appear to have acquired most of their kinases via horizontal gene transfer from Bacteria, partly from the mitochondrial and chloroplast endosymbionts and partly at later stages of evolution. A distinct superfamily of kinases, which we designated DxTN after its sequence signature, appears to have evolved in selfish replicons, such as bacteriophages, and was subsequently widely recruited by eukaryotes for multiple functions related to nucleic acid processing and general metabolism. In the course of this analysis, several previously undetected groups of predicted kinases were identified, including widespread archaeo-eukaryotic and archaeal families. The results could serve as a framework for systematic experimental characterization of new biochemical and biological functions of kinases.

I agree it does not. It is not supposed to. The ultimate origin of a sequence that subsequently diverges into a large superfamily of proteins is not explained merely from the inference that it diversified into that superfamily.

It does however allow us to make inferences about what it was that first originated, as in what the ancestral sequence and function was, and how it then subsequently changed and evolved into the sequences we see today.

Roy · February 6, 2020, 12:46pm

I didn’t mention Shannon information.

Topic		Replies	Views
Miller: Axe Decisively Confirmed? Conversation Science , Design	31	4567	February 23, 2019
Gauger and Mercer: Bifunctional Proteins and Protein Sequence Space Office Hours Design	188	7472	November 15, 2018
Mercer's Work on Protein Function and Sequence Space Office Hours Design	5	809	June 19, 2021
Is Doug Axe Right about the Rarity of Proteins? Conversation	7	2976	February 22, 2019
Gpuccio: Functional Information Methodology Conversation Science , Design	183	12550	September 1, 2019

Gil grabs some ammunition and shoots down Doug Axe's 2004 extrapolation by a factor of more than 10^44

Abstract

Related topics