He hasn’t actually done that. He has attempted to do that, but the problem is he’s taking sequence variations from the known diversity of life to be a proxy for the total “target space” above the minimum threshold.
There’s a big issue with that approach, namely that what we’re seeing is the product of history, a combination of drift and mostly negative selection, not the total possible diversity of functional sequences that could support life.
To pick just a simple example you can go into some database right now and find some human protein coding gene as Gpuccio did. It will have some canonical sequence listed as the main “normal” isoform that isn’t associated with any disease state.
Okay, then you can do the same for chimpanzee, and gorilla, and so on and so forth. And you can go through the data base and find lots and lots of different sequences and collect a diverse set of similar sequences for this protein from several different lineages in some larger clade(like vertebrates, or all animals).
Gpuccio does this, and then defines this as his target space of functional sequences above the “minimum threshold”. Any other imaginable sequence variant is taken to be outside of the target space, as in below the minimum threshold for function (in the parlance of Hazen & Szostak).
Why is this a problem? Because many other, “non-canonical” sequences can actually support life, they just have lower fitness. That’s why we don’t see them at high frequencies in any population of course, they’ve been selected against as they’re associated with some disease state or lower performance. That also means they’re not likely to turn up in the kinds of simplistic searches Gpuccio is doing as he’s combing through some protein homologue database when he decides to sample 10-20 different species.
Going back to the human protein, how many actual mutant variants of that protein exist out there in the human population, some of which cause disease, others of which are neutral variants at low frequencies? They’re not including in Gpuccio’s calculation, some of them probably haven’t even been sequenced.
Now realize the same applies to any species out there with a homologue of this protein. There will be chimpanzees with mutant version of this protein, who haven’t had their genomes sequenced, or which didn’t make it into some database as the canonical or main isoform that Gpuccio used in his collection.
Same for gorilla, and orangutan, and so on through the entire diversity of life that carries a homologue of this protein.
As should be pretty obvious, Gpuccio is nowhere NEAR doing the kind of work he would have to be doing to even begin to make a case for knowing the actual FI of ANY protein. He would have to have good reasons for thinking he has very substantially sampled the total diversity of homologoues of the protein that would meet the minimum threshold for function(instead of just variants with lower fitness): The ability to support life for the organism in question. But he has no reason for thinking he is anywhere near doing this.
But there’s another problem here:
Now the question is whether their evolution by the traditional RV+ NS mechanism is plausible. The answer is no. Why? Because although the RV+NS can in principle produce very high FI (>500 bits), such a performance by RV+NS is only possible if a smooth fitness landscape exists that connect the final target (the complex protein exhibiting high FI) to the starting point.
It turns out most of the sequence differences between species are actually not due to positive natural selection, they’re neutral or very nearly so. Many sequence variants have been discarded by negative selection, but the differences weren’t fixed one after another as a protein was pushed up some smooth hill on the fitness landscape. It’s not that when we see the differences between two proteins in two very distantly related proteins, we have to think that natural selection drove the fixation of all these differences over long timescales. We don’t have to think that these proteins kept getting better as they were pushed uphill in the landscape through hundreds of mutations.
Rather, these proteins have for the most part been evolving under historical contingency and epistasis. Neutral mutations open up for other neutral mutations, but once that 2nd neutral mutation happens, the first one can’t change back because it’s ancestral state now has lower fitness in the context of the 2nd neutral mutation. In this way, proteins evolve under a sort of neutral ratchet where they slowly become more and more dissimilar from their ancestors, and their cousins. Not because selection is somehow fine-tuning them to perform new functions(relatively few of all the mutations that occur when proteins evolve are fixed because they have any novel beneficial effects).
This also means we don’t have to posit that the protein we see were somehow evolved to their present state along a huge smooth slope to the top of some hill in the fitness landscape, because the shape of the hill isn’t actually static, the shape of the hill depends on the already existing sequence elsewhere in the protein. Neutral mutations can open up pathways which, in a different sequence context in the same protein, would have been deleterious(aka down some valley).
Ironically one of your IDcreationist allies who posts around here, Brian Miller, has supplied some nice references that demonstrates how this works. Work has been done also on ancestor reconstruction(*) where biologists have been able to show how the proteins we see in life has evolved through these historically contingent neutral ratchets. While a few mutations of a novel beneficial character many explain an initial fixation of a novel allele with a small number of beneficial mutations fixed by initial optimization, what subsequently happens over hundreds of millions of years is these proteins just sort of drift apart as they slowly accumulate mutations that are initially neutral, but reversal becomes deleterious due to negative epistasis.
- See for example: Starr TN, Flynn JM, Mishra P, Bolon DNA, Thornton JW. Pervasive contingency and entrenchment in a billion years of Hsp90 evolution. Proc Natl Acad Sci U S A.
2018 Apr 24;115(17):4453-4458. DOI: 10.1073/pnas.1718133115
Abstract
Interactions among mutations within a protein have the potential to make molecular evolution contingent and irreversible, but the extent to which epistasis actually shaped historical evolutionary trajectories is unclear. To address this question, we experimentally measured how the fitness effects of historical sequence substitutions changed during the billion-year evolutionary history of the heat shock protein 90 (Hsp90) ATPase domain beginning from a deep eukaryotic ancestor to modern Saccharomyces cerevisiae . We found a pervasive influence of epistasis. Of 98 derived amino acid states that evolved along this lineage, about half compromise fitness when introduced into the reconstructed ancestral Hsp90. And the vast majority of ancestral states reduce fitness when introduced into the extant S. cerevisiae Hsp90. Overall, more than 75% of historical substitutions were contingent on permissive substitutions that rendered the derived state nondeleterious, became entrenched by subsequent restrictive substitutions that made the ancestral state deleterious, or both. This epistasis was primarily caused by specific interactions among sites rather than a general effect on the protein’s tolerance to mutation. Our results show that epistasis continually opened and closed windows of mutational opportunity over evolutionary timescales, producing histories and biological states that reflect the transient internal constraints imposed by the protein’s fleeting sequence states.