Functions are not so rare at all, and definitely not isolated, in sequence space of biopolymers

But how many of those do Another thing? Answer: A lot.

That’s not what @T_aquaticus requested.

Yes, but there are still immensely more bunch of ways not to do anything at all.

It’s… the topic of this thread. Did you not read the subject of the thread before you started posting in it?

The provided article says the opposite. It shows that ‘function’ is easy, optimized function is difficult.

If it can vary from a number smaller than the effective population size of bacteria, then it is safe to say the low end does not represent a problem.

The authors unfortunately ignored an important aspect of the the D2 domain’s functionality, namely it’s association with two other proteins. The coevolution of these proteins would reduce the difficulty of optimization at any point in the evolution. Regardless, it is a non-issue for any function, as that requires far fewer searches.

Your own article shows that function is high in sequence space when function is defined!


Define ‘immensely’. 5-10 orders of magnitude? Sure. The often suggested >100 orders of magnitude? Certainly not. Having a one in a billion chance of getting a functional protein from de novo translation is entirely sufficient.

1 Like

2 posts were merged into an existing topic: Sal’s Flower?

That has to be getting close to saying that function doesn’t exist at all. What do we get if we multiply the number of known functions by 100 orders of magnitude?

I don’t follow your question in the context of my previous comment. Sorry.

This is an important point with respect to how evolution finds new functional proteins. We have to be mindful that what determines success in evolution is effect on fitness. Not HOW it has effect on fitness. Just that it has effect on fitness. There are innumerable numbers of ways something can positively contribute to fitness.

A novel protein might be an enzyme that speeds up a useful chemical reaction (of which there are many millions), it might assist another protein in folding(of which all organisms encode thousands, and which can all be potentially assisted and stabilized in billions of possible ways), it might simply buffer against misfolding or enhance overall temperature stability, it might block or reduce expression of a useless gene, or enhance expression of another. It might insert into the membrane and interact with other molecules there, it might act as a chelating agent against toxic metals or other charged ions. The possibilities are virtually endless.

That means even if each of all the different possible, particular protein functions (chelating some toxic alloy of Nickel, say) - are relatively rare in sequence space - the fact that there are so many different possible functions that could contribute to fitness besides that specific one, means it might not be all that rare to discover one that is useful. Even if that specific function is rare.

You could have individual functions be (say) as rare as 10-40, but if there are 1032 different useful functions that could positively affect fitness, then naively you might say the probability of discovering some useful function by one randomly chosen sample from sequence space is only 10-8.

Now add to this the phenomenon of constant exposure of organisms to novel environmental challenges, and the fact that there is an ongoing pervasive low-level transcription of most of the genome for basically all organisms. With many millions of different transcripts being produced at low levels from both coding and non-coding regions, both in-and-out of frame, and from opposite strands of DNA - the opportunity for discovering something useful massively increases.


Funny, there’s an entire article dedicated to discussing that statement:

Some key statements:

It is now well accepted that most — and probably all — extant enzymes are, in fact, promiscuous [5, 6].

Recent large-scale studies, both computational and experimental, have opened our eyes to the enormous functional diversity among existing enzyme superfamilies, the vastness of ‘promiscuity space,’ and therefore the seemingly limitless potential for future evolutionary innovation. Baier et al. surveyed the functional diversity, as represented by Enzyme Commission (EC) numbers, in five common superfamilies [7•]. Each superfamily contained enzymes from all six of the EC classes (Figure 1a). Furnham et al. went further and used a phylogenetic approach [8] to reconstruct the evolutionary histories of 379 superfamilies from the Class, Architecture, Topology, Homology (CATH) database, and to ask how often a change in EC number was observed over the course of their evolution [9•]. While 81% of the functional changes were within an EC class, every possible change between EC classes was also observed (Figure 1b), with the exception of a change from a ligase (EC class 6) to an isomerase (EC class 5). These bioinformatics studies emphasize that there is little, if anything, that constrains particular catalytic chemistries to particular folds.

Four high-throughput experimental studies (reviewed in detail elsewhere [7•]) have reached a similar conclusion. Dozens of enzymes from within the cytosolic glutathione transferase [10], β-keto acid cleavage enzyme [11], metallo-β-lactamase [12], and haloalkanoate dehalogenase [13••] superfamilies were each tested for activity towards a range of different substrates. In each case, many enzymes were found to have multiple functions in vitro . In the most comprehensive study, 217 members of the haloalkanoate dehalogenase superfamily were expressed, purified, and screened for phosphatase or phosphonatase activity towards 167 substrates (most of which were naturally occurring metabolites). The authors discovered breathtakingly broad substrate specificities. A median of 15.5 substrates were recognized by each enzyme, 50 of the enzymes could utilize 40 or more substrates, and remarkably, one enzyme could utilize 143 [13••].


Promiscuous enzymes could probably be used as an argument against design.

In humans, alcohol dehydrogenase catalyses the oxidation of methanol into formic acid, which causes blindness in humans.

A better designed alcohol dehydrogenase would not have this downside.


No worries, I could have included Giltil’s claim too, and I didn’t explain myself well.

Now let me add to this …

Since each of the 20 amino acids is chemically distinct and each can, in principle, occur at any position in a protein chain, there are 20 × 20 × 20 × 20 = 160,000 different possible polypeptide chains four amino acids long, or 20^n different possible polypeptide chains n amino acids long.

So if function is immensely rare, by >100 order of magnitude, then we shouldn’t expect to find much of any function in chains less than 100 peptides long. We do find function in shorter chains, therefore function cannot be so incredibly rare.

Said another way, if function really were so incredibly rare, then some of the functions we already know should not exist at all.


Ah, I think then you are asking a question of Giltil, and not me.

I think the notion of rare functionality is obviously false. As I said, it might be true that only 1 in a million (or even 1 in a billion) random strings has relevant functionality, but certainly not 1 in 10^30 (or 10^170 as Axe suggests). This is not a guess on my part, it has been tested experimentally.


References please.

Or just search ‘phage display library’ in your favorite database.


Keep in mind that for a protein the ability to bind something is quite a simple function and as such, it is not surprising that proteins with this ability are not so rare. But things are very different for proteins carrying more complex functions such as, for examples, DNA polymerases or ion channels.

Not all functions are rare in sequence space, only complex ones.

Urf13 evolved through recombination of mitochondrial genes, and it has ligand-gated pore forming activity.


I agree…

How do you think those things work? Binding to the strand (easy), binding to the nucleotide (easy), conformation change (easier). That’s basically all biochemistry: Binding affinities and conformation changes.

1 Like

T-urf13 is an ion channel and evolved during selective breeding of maize.

The novel antimicrobial peptide discovered and described in Knopp et al 2019 is also an ion channel:

Generally speaking transmembrane proteins are surprisingly easy to evolve, so much so that they are among the most likely de novo proteins.

The very weak sequence constraints on transmembrane domains has also been shown phylogenetically:

Transmembrane domains are so easy to evolve because they’re essentially just repeat-proteins consisting of a single, simple structural element, such as a beta-hairpin. These will naturally tend to oligomerize, their hydrophobic exterior will have an intrinsic affinity for the hydrophobic interior of the membrane bilayer, and they form so-called “barrel” (essentially just tube-shaped structures).

(A few beta-barrel structures)

There is a lot of literature on the evolution of transmembrane repeat proteins (such as beta-barrel structures).

Etc. etc.