You’re channeling Nigel Tufnel again. You haven’t even made the smallest effort to show that (or to find out whether) gpuccio’s “FI” has any relationship at all with functional information.
Wrong, because the same allele can have a negative effect on fitness on one background and a positive effect on a different background.
Do you know what epistasis is? If so, why are you pretending it doesn’t exist?
Your hypothesis said nothing about which variants (not mutations) you’d consider. Do you not understand that you need to bake all of your assumptions into the hypothesis before you see the data, so you don’t cheat as you’re doing now?
If a given position of a given protein exhibits conservation through deep time, it means that a mutation at that position would very likely have a negative fitness effect on all the considered backgrounds.
All conservation can show you is how many changes can occur from one single starting point. Sequence conservation can’t tell you how many starting points there are.
Furthermore, ID supporters keep asking for examples of evolution producing high levels of FI in living populations. Obviously, this is a dishonest request since they measure FI by using sequence conservation over hundreds of millions of years. In fact, if an ID supporter were there for the evolution of the ancestral sequence 100’s of millions of years ago for what is now a highly conserved protein they would claim that it had very low FI when it first evolved.
Most natural proteins adopt folded structures to perform their functions, sure. That doesn’t mean that a functional protein has to adopt a fold to function, nor that functional proteins are rare in sequence space.
You are again confusing functions with folds, and proteins in their extant forms with their ancestral states.
Are they? How rare is “rare”?
How do you know that? Have you tested some fold for all possible functions, under all possible physical conditions?
There are at least two problems with this. First of all, those premises merely beg the question. Second, is your apparent, continued misconception that many of the functional proteins we see in extant life must have sort of just popped into existence in their present form de novo, rather than evolving incrementally from simpler precursors, or by rearrangements and fusion of other protein fragments. You are ignoring phylogenetic evidence.
Well no, it’s worse than that. When the protein first evolved they would ignore it, wait 400 million years, and THEN claim it has high FI and that it couldn’t evolve.
But the amount of divergence the protein would have to undergo over those 400 million years, to have a large impact on the FI calculation, is effectively physically impossible. We would have to find more variants of the protein, having become fixed or at least detected in some lineage, than there are numbers of individual cells that have ever lived.
It is not physically possible for sequence divergence over 400 million years to even result in so many protein variants as would be required to bring the FI down from some value like 700, or 1000, to below the arbitrarily set threshold of 500 bits FI. As I have argued at length. Even if we don’t require they become fixed. Just to have existed for some fleeting moment.
Suppose we have a 300 amino acid protein and we only know of one variant of it. That’s 1297 bits of FI for that one variant. We then discover that every single cell that has ever existed in the history of life made it’s very own mutant version of that protein, so we have that much variation known. Let’s be extremely charitable and say as many as 10^50 cells have lived during the history of life(so one novel variant pr. cell, aka 10^50 different known variants of the protein). What does that bring the FI down to? 1130 bits.
With 10^50 different variants, we go from 1297 to 1130 bits. Let’s exaggerate to the extreme and propose that every single one of those 10^50 cells that ever lived, all produced one thousand different versions of it each, all of them different. So each of those 10^50 cells produced one thousand variants of the protein. What does that do to those 1130 bits? They become -log2(10^53/20^300) = 1120 bits FI.
Even if every single cell that ever lived produced one thousand of their very own sequence variants(detectable as homologous based on similarity) of the protein in question, that would not result in enough variation to make much of a dent on the FI calculation at those scales.
The estimation of FI based on extrapolation from alignments of homologous sequences appears to have been carefully designed to represent an impossible hurdle to overcome. It is not physically possible for all of life to have generated the amount of variation IDcreationists are demanding to be shown. Which means the very method simply begs the question against evolution.
Even if there really were so many different possible functional variants of the protein, there’s no physical way life could diverge and produce it during it’s entire history even under completely unrealistically generous assumptions. The whole thing is fatuous in the extreme.
GIl’s claim is incoherent, because the function of many, many proteins is literally to change structure.
This claim from Gil also is incoherent. “Folds” are particular structures. Virtually all proteins, even randomly synthesized ones, fold, because some residues are hydrophilic and some hydrophobic.
I know I keep saying this, but it’s even worse! Many natural functional proteins have intrinsic disorder, yet are functional.
Lieutaud P, Ferron F, Uversky AV, Kurgan L, Uversky VN, Longhi S. How disordered is my protein and what is its disorder for? A guide through the “dark side” of the protein universe. Intrinsically Disord Proteins. 2016 Dec 21;4(1):e1259708. DOI: 10.1080/21690707.2016.1259708
In protein science, the existence of intrinsic disorder in proteins has been known for a long time. This is in spite of the fact that it contradicts the classical protein sequence-structure-function paradigm where the “lock-and-key” model is used to explain how a protein can achieve its biological function via folding into a unique, highly structured state determined by its amino acid sequence.14 IDPs and IDPRs constitute a part of the “dark proteome” that includes entire proteins or protein regions for which the molecular conformation is entirely unknown.15 Traditional ordered proteins have a relatively stable 3-D structure possess Ramachandran angles that vary only slightly around their equilibrium positions with occasional cooperative conformational switches. On the other hand, IDPs/IDPRs, despite being biologically active, fail to form specific 3D structures and exist as highly dynamic structural ensembles, either at the secondary or at the tertiary level.5,6,16-21 Furthermore, intrinsic disorder is characterized by high structural heterogeneity. In fact, it is now recognized that IDPs/IDPRs may contain collapsed disorder (where the intrinsic disorder is present in a molten globular form) and extended disorder (where intrinsic disorder is present in a form of random coil or pre-molten globule) under physiological conditions in vitro .5,20,22 It has also been shown that, in addition to completely ordered and disordered regions, proteins may contain regions of semi-disorder; i.e., fragments that have ∼50% predicted probability to be ordered or disordered.23 Such semi-disordered regions have been shown to play key roles in protein aggregation, and to participate in protein-protein interactions involving induced folding.23 The currently available structural data has been used to suggest that the heterogeneous spatiotemporal structure of IDPs/IDPRs can be described as a set of foldons, inducible foldons, semi-foldons, non-foldons, and unfoldons.21,24 The discovery of IDPs and IDPRs, which would not have been possible without bioinformatics, has drastically expanded the understanding of protein functionality, and exposed new and unexpected roles of dynamics, plasticity, and flexibility in the context of protein functions.
(…)
While there are IDPs/IDPRs that are able to perform their function while remaining completely disordered (e.g. entropic chains), many such proteins and regions experience a disorder-to-order transition after binding to their physiological partner(s), known as “induced folding.”65 The functional relevance of disorder is the result of increased plasticity which allows for binding numerous and structurally distinct targets. Consequently, intrinsic disorder is a common and distinctive feature of “hub” proteins, with disorder acting as a measure of protein promiscuity.66 As such, the majority of IDPs are involved in functions that involve multiple partner interactions, such as molecular assembly, molecular recognition, signal transduction and transcription, and cell cycle regulation.67
Hilariously, natural proteins are on average MORE disordered than random protein sequences:
Yu JF, Cao Z, Yang Y, Wang CL, Su ZD, Zhao YW, Wang JH, Zhou Y. Natural protein sequences are more intrinsically disordered than random sequences. Cell Mol Life Sci. 2016 Aug;73(15):2949-57. DOI: 10.1007/s00018-016-2138-9
Of course! They are not as frequent as the ones that change conformation, though
That would fit with much of biology involving transitions between METAstable structures. The prion protein is much more ordered and stable in the prion conformation than it is in its functional conformation and also quite deadly!
I have looked at 150 human substitutions in the database of the myh 7 protein and all but one only required 1 nucleotide substitution for the missense mutation. Most of these were associated with a health problem.
Most of the people harboring most of these variants are perfectly healthy people because of epistasis, so they have to be counted as working sequences.
If you’re not desperately trying to cheat. You wouldn’t do that, would you, Bill?
You don’t have a clue about epistasis, do you, Bill?
It would be interesting to have a list of say 20 to 30 missenses neutral SNP for the human MYH7 protein, the prediction being, according to gpuccio’s reasonning, that the majority of these SNP will land at positions that do not exhibit conservation through deep time. The problem is that I don’t have the ressources to compile such a list.
Those are estimations for the rarity of particular folds, not all possible folds relevant to some particular function, much less the frequency of functions regardless of how it is achieved.
I ask again, where is your evidence that functional proteins are too rare in sequence space to evolve? You seem to have this as some sort of axiom, but the only thing you seem able to argue to that effect is to insist (incorrectly) that some particular protein fold must evolve de novo otherwise you can’t have a functional protein.
That was an obvious moving of the goalposts from “particular fold” to “folded structure,” Gil.
Besides, many proteins do not adopt a stable folded structure until they bind a ligand.
Again, a massive amount of biology is about proteins transitioning from one structure to a very different one. Troponin is a fine example of that, to keep us focused on Bill’s choice of the sarcomere.
what those numbers do to Doug Axe’s extrapolation?
A design hypothesis would predict no correlation between the numbers of known proteins with each fold (folds do not correspond to functions) and the SC. In fact, the opposite is true, as the authors note that it is correlated with the size of a gene family. How do you explain that with a designer?
How many such proteins I happen to personally know of is completely irrelevant to the question of whether functional proteins are too rare to evolve, and what evidence you have for your apparent a priori belief that they are too rare to evolve.
Oh and btw, the article you linked also shows that SC*(aka the fraction of all sequences able to adopt the fold in question) is anti-correlated with protein age. In other words, proteins with more recent origins appear to have structures that are more likely to emerge de novo. Which implies that more complex and unlikely protein folds are older, because they evolved from simpler and more likely precursors. You bring a paper but seem to have read it with a sort of tunnel vision, focusing entirely on the Big Numbers in that one table. Let me bring your attention to figure 5:
This has of course not escaped the notice of the authors, who write in the conclusion:
Our SC estimates for the CATH database enable us to estimate the total SC of the known universe of protein structures, and to correlate the SC of a fold with its evolutionary age. We find that more recently evolved proteins have higher SC∗, which may be an advantage for initial discovery of a folded structure, but that more ancient proteins have a higher absolute SC, suggesting that evolution guides proteins toward more designable structures.
Here is the uniprot list of myh 7 human variants. There are about 200 of them and they are listed after you scroll down the page . The associated disease can be found by clicking the publication tab. If you google genetic code you will find a letter to amino acid conversion table.
But are not the subset of folded structures which are able to perform biological functions subject to continuous change and adaptation to new functions? They are not necessarily fragile and exact entities which become useless when subject to the slightest change. For instance, modifications to photopsins may exhibit useful differences in photo sensitivity to different light frequencies, and a case of a women who exhibited functional tetrachromacy has been documented. A billion years or so of tweeking protein folding by the earth’s biosphere can come up with a lot of useful proteins. In fact, this seems to be going on all the time in biology, especially with the never ending dance of infectious agents and organism membrane defenses, even apart from the immune system.