From my Evolution is Easy files. I’ve been watching this preprint from Eugene Koonin’s group since last year, and it has recently been published at PNAS:
https://doi.org/10.1073/pnas.2509015122
Like recent large-scale analyses of evolution across rough fitness landscapes, that we’ve discussed here at PS, this work tackles one of the “hard questions” of evolutionary biology. In this case, the question is the origin of stable protein folds (not to be confused with “protein folding,” which is a different question). How hard is this question? The authors begin their Discussion section with: “The emergence of stable, globular protein folds from random sequences is arguably the principal problem of protein evolution and one of the major challenges in the study of the origin of life.”
The paper is pretty easy to read IMO, and it’s open access. First the “significance statement” and abstract then some excerpts with my comments.
Significance
Origin of protein folds is an essential early step in the evolution of life that is not well understood. We address this problem by developing a computational framework approach for protein fold evolution simulation (PFES) that traces protein fold evolution in silico at the level of atomistic details. Using PFES, we show that stable, globular protein folds could evolve from random amino acid sequences with relative ease, resulting from selection acting on a realistic number of amino acid replacements. About half of the in silico evolved proteins resemble simple folds found in nature, whereas the rest are unique. These findings shed light on the enigma of the rapid evolution of diverse protein folds at the earliest stages of life evolution.
Abstract
The origin and evolution of protein folds are among the most challenging, long-standing problems in biology. We developed Protein Fold Evolution Simulator (PFES), a computational approach that simulates evolution of globular folds from random amino acid sequences with atomistic details. PFES introduces random mutations in a population of protein sequences, evaluates the effect of mutations on protein structure, and selects a new set of proteins for further evolution. Iteration of this process allows tracking the evolutionary trajectory of a changing protein fold that evolves under selective pressure for protein fold stability, interaction with other proteins, or other features shaping the fitness landscape. We employed PFES to show how stable, globular protein folds could evolve from random amino acid sequences as monomers or in complexes with other proteins. The simulations reproduce the evolution of many simple folds of natural proteins as well as emergence of distinct folds not known to exist in nature. We show that evolution of small globular protein folds from random sequences, on average, takes 1.15 to 3 amino acid replacements per site, depending on the population size, with some simulations yielding stable folds after as few as 0.2 replacements per site. These values are lower than the characteristic numbers of replacements in conserved proteins during the time since the Last Universal Common Ancestor, suggesting that simple protein folds can evolve from random sequences relatively easily and quickly. PFES tracks the complete evolutionary history from simulations and can be used to test hypotheses on protein fold evolution.
When did protein folds first evolve? This is important for interpreting their results. They write in the Discussion:
It should be noted that the stage of evolution we model in this work corresponds to the transition from an RNA-peptide world, in which the translation system, albeit primarily RNA-based one, and the genetic code have already evolved, but globular proteins have not.
This is important because it implies particular kinds of selection pressure, which we can’t specify with much certainty. Here are their thoughts on selection from that same paragraph:
In the RNA-peptide world, peptides would serve as enhancers and stabilizers of ribozymes (17, 56, 57). The advent of bona fide proteins would greatly enhance these functions and would eventually lead to the replacement of most RNA catalysts with protein ones. For this transition to occur, stable, folded proteins and their interactions were essential, and furthermore, formation of globular folds could confer surface properties favoring interactions with other proteins, as demonstrated in some of our computational experiments, and with RNA. Thus, it appears most likely that fold stability and interactivity were key targets of selection at this stage.
Their results suggest easy evolution and also tight constraint; the repeated appearance of folds that we see today is (I think) what we call “molecular convergence”:
Taken together, our results suggest that evolution of globular protein folds from random sequences could be straightforward, requiring no unknown evolutionary processes, and in part, solve the enigma of rapid emergence of protein folds. Furthermore, the appearance, in many of the PFES runs, of simple folds closely similar to those found in natural proteins implies that evolutionary trajectories in the folding space are strongly constrained.
In my opinion, this coexistence of strong constraint (i.e., seemingly few paths and solutions in evolutionary “solution space”) with ease of exploration (i.e., seemingly vast ways that evolution can find those paths and solutions) is a fundamental feature of life and of proteins.
Their simulations, and their results, were made possible by the very recent breakthroughs in large-scale protein structure knowledge, namely AlphaFold and ESMfold.
The main limitation, in the authors’ words:
The major limitation of PFES is the limited accuracy of protein structure prediction methods. Although ESMfold provides a good tradeoff between speed and accuracy, it is questionable how biophysically realistic the predicted structures are, especially, structures with low confidence scores that often represent the intermediate states of protein fold evolution.
The rest of that paragraph explores why this limitation may not be as serious as it sounds.
Sorry this turned into a blog post! Someone please tell me to get Quintessence of Dust running again…