Maybe evolution of protein folds is.... easy

From my Evolution is Easy files. I’ve been watching this preprint from Eugene Koonin’s group since last year, and it has recently been published at PNAS:

https://doi.org/10.1073/pnas.2509015122

Like recent large-scale analyses of evolution across rough fitness landscapes, that we’ve discussed here at PS, this work tackles one of the “hard questions” of evolutionary biology. In this case, the question is the origin of stable protein folds (not to be confused with “protein folding,” which is a different question). How hard is this question? The authors begin their Discussion section with: “The emergence of stable, globular protein folds from random sequences is arguably the principal problem of protein evolution and one of the major challenges in the study of the origin of life.”

The paper is pretty easy to read IMO, and it’s open access. First the “significance statement” and abstract then some excerpts with my comments.

Significance
Origin of protein folds is an essential early step in the evolution of life that is not well understood. We address this problem by developing a computational framework approach for protein fold evolution simulation (PFES) that traces protein fold evolution in silico at the level of atomistic details. Using PFES, we show that stable, globular protein folds could evolve from random amino acid sequences with relative ease, resulting from selection acting on a realistic number of amino acid replacements. About half of the in silico evolved proteins resemble simple folds found in nature, whereas the rest are unique. These findings shed light on the enigma of the rapid evolution of diverse protein folds at the earliest stages of life evolution.
Abstract
The origin and evolution of protein folds are among the most challenging, long-standing problems in biology. We developed Protein Fold Evolution Simulator (PFES), a computational approach that simulates evolution of globular folds from random amino acid sequences with atomistic details. PFES introduces random mutations in a population of protein sequences, evaluates the effect of mutations on protein structure, and selects a new set of proteins for further evolution. Iteration of this process allows tracking the evolutionary trajectory of a changing protein fold that evolves under selective pressure for protein fold stability, interaction with other proteins, or other features shaping the fitness landscape. We employed PFES to show how stable, globular protein folds could evolve from random amino acid sequences as monomers or in complexes with other proteins. The simulations reproduce the evolution of many simple folds of natural proteins as well as emergence of distinct folds not known to exist in nature. We show that evolution of small globular protein folds from random sequences, on average, takes 1.15 to 3 amino acid replacements per site, depending on the population size, with some simulations yielding stable folds after as few as 0.2 replacements per site. These values are lower than the characteristic numbers of replacements in conserved proteins during the time since the Last Universal Common Ancestor, suggesting that simple protein folds can evolve from random sequences relatively easily and quickly. PFES tracks the complete evolutionary history from simulations and can be used to test hypotheses on protein fold evolution.

:cactus: When did protein folds first evolve? This is important for interpreting their results. They write in the Discussion:

It should be noted that the stage of evolution we model in this work corresponds to the transition from an RNA-peptide world, in which the translation system, albeit primarily RNA-based one, and the genetic code have already evolved, but globular proteins have not.

This is important because it implies particular kinds of selection pressure, which we can’t specify with much certainty. Here are their thoughts on selection from that same paragraph:

In the RNA-peptide world, peptides would serve as enhancers and stabilizers of ribozymes (17, 56, 57). The advent of bona fide proteins would greatly enhance these functions and would eventually lead to the replacement of most RNA catalysts with protein ones. For this transition to occur, stable, folded proteins and their interactions were essential, and furthermore, formation of globular folds could confer surface properties favoring interactions with other proteins, as demonstrated in some of our computational experiments, and with RNA. Thus, it appears most likely that fold stability and interactivity were key targets of selection at this stage.

:cactus: Their results suggest easy evolution and also tight constraint; the repeated appearance of folds that we see today is (I think) what we call “molecular convergence”:

Taken together, our results suggest that evolution of globular protein folds from random sequences could be straightforward, requiring no unknown evolutionary processes, and in part, solve the enigma of rapid emergence of protein folds. Furthermore, the appearance, in many of the PFES runs, of simple folds closely similar to those found in natural proteins implies that evolutionary trajectories in the folding space are strongly constrained.

:cactus: In my opinion, this coexistence of strong constraint (i.e., seemingly few paths and solutions in evolutionary “solution space”) with ease of exploration (i.e., seemingly vast ways that evolution can find those paths and solutions) is a fundamental feature of life and of proteins.

:cactus: Their simulations, and their results, were made possible by the very recent breakthroughs in large-scale protein structure knowledge, namely AlphaFold and ESMfold.

:cactus: The main limitation, in the authors’ words:

The major limitation of PFES is the limited accuracy of protein structure prediction methods. Although ESMfold provides a good tradeoff between speed and accuracy, it is questionable how biophysically realistic the predicted structures are, especially, structures with low confidence scores that often represent the intermediate states of protein fold evolution.

The rest of that paragraph explores why this limitation may not be as serious as it sounds.

Sorry this turned into a blog post! Someone please tell me to get Quintessence of Dust running again…

5 Likes

Your post challenges me to pull out and reread my very old copy of Evolution of Protein Folds for Dummies.

3 Likes

Still to me a major question is the modeled selection directly for fold stability (which undoubtedly contributes to protein fitness up to a point), as opposed to all the attributes of protein sequences that constitute their fitness contribution to the organism.

Since there must be residues in an amino acid sequence that participate in protein function without necessarily increasing stability, the sequence space of fitness increasing mutations must necessarily be even larger (and therefore the landscape also somewhat smoother) than the space defined only by the effect these residues have on protein structural stability.

The exploding field of de novo protein evolution from non coding DNA seems to confirm that it takes even fewer mutations to get functional proteins from nonfunctional ones, than this paper focused only on reaching some threshold of fold stability implies.

Still an intriguing paper, with the potential to shed light on a lot of long-standong questions about the evolution of the genetic code and it’s relationhip to protein evolution. I wonder to what extend the structure of the genetic code might contribute to, or hinder, the gain of protein structure and stability, and how the extant genetic code fares compares to earlier (simpler, shorter code alphabets) stages in the history of the codes evolution. And how optimized it is in that respect.

IIRC some researchers have published results supporting the extant genetic code actually makes frameshift mutations more likely to preserve protein attributes (preserves hydrophobic patterns for example) than alternative codes. Does that imply the present code is also more likely to facilitates the initial emergence in protein folding ability, than alternative codes, during frameshifting mutations, and do those same results hold for in-frame deletions and insertions, in addition to substitutions?

There’s so many possible ways to build on the results of this paper.

1 Like

Thank you for sharing this insightful paper! It provides yet another strong argument against the so-called “waiting time problem”, which we can reference in our forthcoming publication. :slight_smile:

It might be hard to apply the early evolution of globular proteins (in the transition from the RNA World) to the mythological “waiting time problem” which, as I understand it, is about adaptation. (I could be wrong about that, and I rarely visit creationist outhouses; my last exposure to the “waiting time problem” revealed an inane, banal, incoherent creationist lie-fest.)

However, among actual scientists, there is interest in a topic/question called Haldane’s Dilemma, which seems to be the “source” of the “waiting time problem.” Since you are writing about this, you should probably carefully read and understand this recent paper from a research group I know well. :smiling_face_with_three_hearts: (You have probably already done this, but it’s a cool paper and worthy of everyone’s attention.)

https://academic.oup.com/genetics/article/229/4/iyaf011/7979206

1 Like

The time required to achieve stable and functional protein folds from random sequences is a type of waiting-time problem discussed by creationists like Axe and others. However, it is unrelated to the original waiting-time problem proposed Haldane. And yes, we have already addressed the paper you mentioned. :slightly_smiling_face:

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.