Durston's FI

From my understanding, this can only tell us which residues we can change in an initial protein without losing the function it started with. It can’t tell us how many protein sequences are available at the very start. Evolution is heavily influenced by historical contingencies (i.e. Butterfly effect), and I think this is biasing your analysis. You would also need to factor in contingencies caused by epistatic effects where neutral mutations can later become essential in stabilizing otherwise deleterious mutations.

Fitness landscape modelling would need to be a part of your analysis. Once you start with a function you are throwing away a much larger landscape of function and focusing in on just the mutations that will move up the landscape from one single starting point. I don’t see a way for your method to detect all of the possible starting points in that fitness landscape.

3 Likes

It tells us less than that, even. Conservation can tell us which residues we can change without reducing the functionality of the protein. There might be many mutations possible that would preserve some function – enough to provide a path to the protein – that are no longer available for evolution to take because they have lower fitness than the well-established protein that we’re examining. Adaptive evolution involves a pretty effective one-way filter on fitness changes. This means that sequence conservation is a badly biased estimator of the density of functional states near an existing, well-adapted gene.

6 Likes

Myosins are a perfect example of such a family:
https://www.researchgate.net/figure/Different-classes-of-myosin-superfamily-Phylogenetic-tree-of-114-myosin-sequences_fig7_221686367

There is a good test case for #2 in this family, as there are many clinical papers documenting mutations in the beta-cardiac myosin heavy chain (MYH7) gene, nearly all of which are likely to be functional, large numbers of which have been assayed for enzymatic activity, and only rarely cause disease (epistasis is the likely reason).

I don’t think there is a good case for #2. The space of sequence sampled by common descent is tiny compared to all of sequence space.

I don’t either. The natural, yet functional, variation we observe for MYH7 would provide a data set that likely would show that.

You’re right. I’m saying that myosins are a good case for TESTING #2. I’ve modified my comment to clarify.