Optima in Evolution

Very rare. You can see this in table 1 of the paper I referred to at 102. Here it is:

The rarity in the sequence space of the 10 protein folds investigated by the authors is given in column 4 under the label SC*.

It would be interesting to have a list of say 20 to 30 missenses neutral SNP for the human MYH7 protein, the prediction being, according to gpuccio’s reasonning, that the majority of these SNP will land at positions that do not exhibit conservation through deep time. The problem is that I don’t have the ressources to compile such a list.

Those are estimations for the rarity of particular folds, not all possible folds relevant to some particular function, much less the frequency of functions regardless of how it is achieved.

I ask again, where is your evidence that functional proteins are too rare in sequence space to evolve? You seem to have this as some sort of axiom, but the only thing you seem able to argue to that effect is to insist (incorrectly) that some particular protein fold must evolve de novo otherwise you can’t have a functional protein.

1 Like

How many natural proteins do yo know of that perform their functions without adopting folded structures ?

That was an obvious moving of the goalposts from “particular fold” to “folded structure,” Gil.

Besides, many proteins do not adopt a stable folded structure until they bind a ligand.

Again, a massive amount of biology is about proteins transitioning from one structure to a very different one. Troponin is a fine example of that, to keep us focused on Bill’s choice of the sarcomere.

1 Like

So what? That doesn’t help you; it’s a sample.

But since you’re taking this as gospel,

  1. what those numbers do to Doug Axe’s extrapolation?

  2. A design hypothesis would predict no correlation between the numbers of known proteins with each fold (folds do not correspond to functions) and the SC. In fact, the opposite is true, as the authors note that it is correlated with the size of a gene family. How do you explain that with a designer?

1 Like

How many such proteins I happen to personally know of is completely irrelevant to the question of whether functional proteins are too rare to evolve, and what evidence you have for your apparent a priori belief that they are too rare to evolve.

Oh and btw, the article you linked also shows that SC*(aka the fraction of all sequences able to adopt the fold in question) is anti-correlated with protein age. In other words, proteins with more recent origins appear to have structures that are more likely to emerge de novo. Which implies that more complex and unlikely protein folds are older, because they evolved from simpler and more likely precursors. You bring a paper but seem to have read it with a sort of tunnel vision, focusing entirely on the Big Numbers in that one table. Let me bring your attention to figure 5:

This has of course not escaped the notice of the authors, who write in the conclusion:

Our SC estimates for the CATH database enable us to estimate the total SC of the known universe of protein structures, and to correlate the SC of a fold with its evolutionary age. We find that more recently evolved proteins have higher SC∗, which may be an advantage for initial discovery of a folded structure, but that more ancient proteins have a higher absolute SC, suggesting that evolution guides proteins toward more designable structures.

1 Like


Here is the uniprot list of myh 7 human variants. There are about 200 of them and they are listed after you scroll down the page . The associated disease can be found by clicking the publication tab. If you google genetic code you will find a letter to amino acid conversion table.

1 Like

But are not the subset of folded structures which are able to perform biological functions subject to continuous change and adaptation to new functions? They are not necessarily fragile and exact entities which become useless when subject to the slightest change. For instance, modifications to photopsins may exhibit useful differences in photo sensitivity to different light frequencies, and a case of a women who exhibited functional tetrachromacy has been documented. A billion years or so of tweeking protein folding by the earth’s biosphere can come up with a lot of useful proteins. In fact, this seems to be going on all the time in biology, especially with the never ending dance of infectious agents and organism membrane defenses, even apart from the immune system.

1 Like

Why are there so many?

Why is the penetrance for so many of them so low?

Most proteins have to adopt folded structures in order to perform a complex function. But not all proteins able to adopt folded structures will be able to perform a complex function. IOW, for most proteins, being able to adopt folded structures is a necessary condition for performing a complexfunction, but it is not sufficient. Now, once a protein has emerged that is able to perform a complex function, it will be very, very difficult for it to perform another complex function. What you may observe of course is some tweaking of the existing function, aka micro evolution. But that’s all.

No convincing required.

Proteins excited by different light frequencies are variations in the opsin protein family, providing different spectral responses in organisms from fish, hummingbirds, snakes, and humans. If fish to birds is an example of micro evolution, then I’m OK with it.


It’s not clear what you mean by a “complex function”, much less what “another” is supposed to mean exactly. If a protein binds one spot, or a DNA sequence, and then mutates to bind another, or act on another non-identical molecule, is it then enough to constitute “another” function in your view?

How different must one function A be from another function B before it counts, in your subjective opinion, as “another”, “different”, “macro” or whatever?

First of all that’s incorrect use of the term microevolution, which refers to evolution below the species level, and hence doesn’t really apply to protein evolution since proteins can change functions whether they are part of the same species, or those changes occur concomitant with species-level transitions and diversifications.

The second issue is what you’re saying is flat out wrong. Changes in protein function have been demonstrated at all levels of functional classification. From structural/binding proteins changing into enzymes, between different enzymatic functions, from enzymes that break other molecules apart, into enzymes that assemble them.

For example you can take a look at this paper that gives a nice overview of the kinds of functional changes associated with particular structurally defined enzyme superfamilies:
Furnham N, Sillitoe I, Holliday GL, Cuff AL, Laskowski RA, Orengo CA, Thornton
JM. Exploring the evolution of novel enzyme functions within structurally defined
protein superfamilies. PLoS Comput Biol. 2012;8(3):e1002403. DOI:10.1371/journal.pcbi.1002403


In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life.

Now if that doesn’t ram a freight-train through the idea that large-scale functional shifts can’t occur in protein evolution, or that it’s all just “microevolution”, then I don’t know what does.

Of course, they’re exclusively looking at enzyme functions here and have not characterized functional shifts between enzymes and non-enzymes(proteins that have other functions than to catalyze chemical reactions).

E.C. numbers attributed to these 276 superfamilies (including relatives where the domain is in different MDA contexts) account for 71% of the 2,676 E.C. numbers assigned to known enzymes, with the E.C. numbers associated with single domain enzymes accounting for approximately 36%. The high coverage of enzyme functionality from just 276 superfamilies, given that this represents only 15% of known domains, is surprising. Moreover, just 45 superfamilies account for 50%, of all sequences assigned E.C. numbers with 31 superfamilies in which the single domain accounts for 25%.

From this we can postulate that a limited repertoire of structural frameworks has evolved to carry out a large proportion of reactions required for all of life. Moreover, it is clear that generating new chemistry does not necessarily require large leaps, such as the evolution of novel protein structures or large structural re-arrangements, but can be made by small local changes e.g. residue substitutions or small insertions or deletions. Functional changes can also arise from changes in MDA and less frequently insertion/deletion of unstructured regions. This is perhaps not surprising since residue changes in the active site can easily induce changes in chemistry. Superfamilies supporting a wide range of enzyme functions predominantly adopt one of a few relatively highly populated superfamilies, such as the TIM barrel or Rossmann-like fold, which both possess large surface clefts likely to tolerate residue mutations [21].

We also observe that the addition of another domain or set of domains can bring a function associated solely with those domains and not with the superfamily domain (see Figure S9 and S10) i.e. acquisition of function by domain addition. These domains can bring confusion as to where the function is originating and the role (if any) that the superfamily domain under scrutiny contributes to that function. The contribution of these additional domains to the functional repertoire of a superfamily has been taken into account.

The Thornton lab has shown, using ancestral sequence reconstruction, how an ancient enzyme radically altered it’s function into a sort of structural protein that contributes to controlling the spatial direction of cell division in the tissues of multicellular organisms:
Anderson DP, Whitney DS, Hanson-Smith V, Woznica A, Campodonico-Burnett W, Volkman BF, King N, Thornton JW, Prehoda KE. Evolution of an ancient protein function involved in organized multicellularity in animals. Elife. 2016 Jan 7;5:e10147. DOI: 10.7554/eLife.10147


To form and maintain organized tissues, multicellular organisms orient their mitotic spindles relative to neighboring cells. A molecular complex scaffolded by the GK protein-interaction domain (GKPID) mediates spindle orientation in diverse animal taxa by linking microtubule motor proteins to a marker protein on the cell cortex localized by external cues. Here we illuminate how this complex evolved and commandeered control of spindle orientation from a more ancient mechanism. The complex was assembled through a series of molecular exploitation events, one of which - the evolution of GKPID’s capacity to bind the cortical marker protein - can be recapitulated by reintroducing a single historical substitution into the reconstructed ancestral GKPID. This change revealed and repurposed an ancient molecular surface that previously had a radically different function. We show how the physical simplicity of this binding interface enabled the evolution of a new protein function now essential to the biological complexity of many animals.

There’s a pretty good summary here:

eLife digest

For billions of years, life on Earth was made up of single cells. In the lineage that led to animals – and independently in those that led to plants and to fungi – multicellular organisms evolved as cells began to specialize and arrange themselves into tissues and organs. Although the evolution of multicellularity is one of the most important events in the history of animal life, very little is known about the molecular mechanisms by which it took place.

To form and maintain organized tissues, cells must coordinate how they divide relative to the position of their neighbours. One important aspect of this process is orientation of the mitotic spindle, a structure inside the dividing cell that distributes the chromosomes —and the genetic material they carry — between the daughter cells. When the spindle is not oriented properly, malformed tissues and cancer can result. In a diverse range of animals, the orientation of the spindle is controlled by an ancient scaffolding protein that links the spindle to “marker” proteins on the edge of the cell.

Anderson et al. have now used a technique called ancestral protein reconstruction to investigate how this molecular complex evolved its ability to position the spindle. First, the amino acid sequences of the scaffolding protein’s ancient progenitors, which existed before the origin of the most primitive animals on Earth, were determined. Anderson et al. did this by computationally retracing the evolution of large numbers of present-day scaffolding protein sequences down the tree of life, into the deep past. Living cells were then made to produce the ancient proteins, allowing their properties to be experimentally examined.

By experimentally dissecting successive ancestral versions of the scaffolding protein, Anderson et al. deduced how the molecular complex that it anchors came to control spindle orientation. This new ability evolved by a number of “molecular exploitation” events, which repurposed parts of the protein for new roles. The progenitor of the scaffolding protein was actually an enzyme, but the evolution of its spindle-orienting ability can be recapitulated by introducing a single amino acid change that happened many hundreds of millions of years ago.

How could a single mutation have conferred such a dramatically new function? Anderson et al. found that the ancient scaffolding protein uses the same part of its surface to bind to the spindle-orienting molecular marker as the ancient enzyme used to bind to its target substrate molecule, and the two partner molecules happen to share certain key chemical properties. This fortuitous resemblance between two unrelated molecules thus set the stage for the simple evolution of a function that is now essential to the complexity of multicellular animals.

The genetic simplicity of the evolutionary change in GKPID function is underscored by the fact that we found not one but two historical amino acid replacements from the relevant phylogenetic interval, either of which is sufficient to confer the GKPID’s derived functions on the ancestral enzyme. This finding indicates that GK acquired its new protein-binding function through a relatively simple, high-probability genetic path, rather than a long trajectory that required many specific mutations before the new function could be established.

GKPID’s dramatic evolutionary transition in function could take place through such a simple genetic mechanism because of its biophysical architecture. The gk enzyme’s simple binding site for GMP can also be occupied by a simple two-residue motif on the Pins peptide, which fortuitously has similar surface properties. In addition, a series of small hydrophobic patches, which happen to be adjacent, was available to bind the hydrophobic portion of the Pins peptide and increase affinity. All that was required to confer the protein’s new function was a single mutation that revealed this molecular surface, apparently by changing the protein’s conformational flexibility. In this way, the physical simplicity of an interaction between ancient molecules set the stage for the easy evolution of a novel molecular complex and, in turn, a cellular function that now plays an important role in the complex biology of multicellular animals.


Opsins of course belong to a larger family of proteins called G-protein coupled receptors
(GPCRs), from which all opsins ultimately derive. Which have seen numerous rather large-scale functional shifts during the history of life.

Isn’t it interesting to consider that the physical senses touch, smell, taste, and sight, are all evolutionarily related at the molecular level? They all employ GPC receptors as part of the extracellular sensory mechanism.


So do I understand you correctly here, to be saying that even if we could somehow assemble a substantial library of random proteins that nevertheless fold into some structure, we’d still be very unlikely to find a biologically useful function?

In other words, that not only are folding proteins in general rare among protein all sequences, but even among the minority of protein sequence that do fold, biologically useful functions are rare too?

1 Like

How did you determine that by looking at just the human sequences? You would need to look at all lineages that have the homologous protein and reconstruct ancestral sequences to determine how many mutations have accumulated in the lineage leading to humans.

From what I can see of MYH7, this protein is shared with fungi, so you would need to go the common ancestor of fungi and animals, and then start tracking mutations through the entire phylogeny leading to humans.

1 Like

I think he’s just trying to determine the minimum number of DNA base changes required to produce the observed amino acid substitutions. Say if one protein in species 1 has the amino acid K, and another species has L in the same position(or it could be the differences between variants in the same species), how many DNA base substitutions would that require at minimum? Well that would require two nucleotide substitutions, because K is encoded by AAA and AAG, while L is encoded by UUA, UUG, and CUN, so there is no possible one-nucleotide substitution that could produce the amino acid substitution K<->L.

I think it is reasonable to say that, on average, we expect a preponderance of amino acid substitutions that only require a single base change over those that require two.

But what you fail to acknowledge is that more complex functions involve more changes in structure, and consequently less constraint.

I think he’s trying to avoid offering a design hypothesis that explains why there are so many MYH7 variants harbored by perfectly healthy humans.

Not an ortholog, but multiple “unconventional” myosins that are homologs of MYH7.

Here’s a great paper on the family: