A few recent (last 2-3 years) have begun to explore the protein universe with tools that didn’t exist (and, I think, could only be dreamed about) ten years ago: 1) AlphaFold and its descendants, which enable the detailed visualization and initial characterization of proteins that “exist” only as a sequence; and 2) AI/machine learning (ML) tools such as LLMs, that enable the exploration of sequence space at vast and increasing speeds and capacities. This fact is noted by the authors of our paper.
This open access paper is IMO the most important one to read in this thread, not because it provides a final definitive answer to a particular question but because the authors clearly delineate the questions and consider how to address them. The work itself is pretty technical but the text and especially the Discussion should be approachable to non-specialists. I provide some excerpts below but again, the paper is fun to read and I encourage those who are genuinely curious about protein fold evolution to simply read it (and maybe have a look at the references they cite).
You can get the gist from the Significance paragraph:
Origin of protein folds is an essential early step in the evolution of life that is not well understood. We address this problem by developing a computational framework approach for protein fold evolution simulation (PFES) that traces protein fold evolution in silico at the level of atomistic details. Using PFES, we show that stable, globular protein folds could evolve from random amino acid sequences with relative ease, resulting from selection acting on a realistic number of amino acid replacements. About half of the in silico evolved proteins resemble simple folds found in nature, whereas the rest are unique. These findings shed light on the enigma of the rapid evolution of diverse protein folds at the earliest stages of life evolution.
The Abstract is blunt about the difficulty of the question:
The origin and evolution of protein folds are among the most challenging, long-standing problems in biology. We developed Protein Fold Evolution Simulator (PFES), a computational approach that simulates evolution of globular folds from random amino acid sequences with atomistic details. PFES introduces random mutations in a population of protein sequences, evaluates the effect of mutations on protein structure, and selects a new set of proteins for further evolution. Iteration of this process allows tracking the evolutionary trajectory of a changing protein fold that evolves under selective pressure for protein fold stability, interaction with other proteins, or other features shaping the fitness landscape. We employed PFES to show how stable, globular protein folds could evolve from random amino acid sequences as monomers or in complexes with other proteins. The simulations reproduce the evolution of many simple folds of natural proteins as well as emergence of distinct folds not known to exist in nature. We show that evolution of small globular protein folds from random sequences, on average, takes 1.15 to 3 amino acid replacements per site, depending on the population size, with some simulations yielding stable folds after as few as 0.2 replacements per site. These values are lower than the characteristic numbers of replacements in conserved proteins during the time since the Last Universal Common Ancestor, suggesting that simple protein folds can evolve from random sequences relatively easily and quickly. PFES tracks the complete evolutionary history from simulations and can be used to test hypotheses on protein fold evolution.
The first paragraph of the Discussion:
The emergence of stable, globular protein folds from random sequences is arguably the principal problem of protein evolution and one of the major challenges in the study of the origin of life. Once diverse globular folds evolved, the subsequent evolution of proteins throughout the 4 billion year history of life on Earth was a relatively straightforward and fairly well understood process. Successful attempts to computationally imitate protein evolution have been made previously, but these studies either focused on a particular part of the protein with amino acid substitutions that did not change the protein fold (49–51) or were limited to the exploration of simplified models in lattice space and reduced amino acid alphabets (7, 18). Here, in contrast, we simulated protein evolution with all-atom models, allowing us to observe how protein folds nucleated from random sequences and how the accumulation of amino acid substitutions led to large-scale conformational rearrangements, changing the general architecture of the fold. This critical difference that was made possible by the advent of powerful tools for protein structure prediction, such as AlphaFold and large language model based ESMfold, creates the opportunity to recapitulate protein fold evolution in detail not accessible previously.
In silico evolution of globular protein folds from random sequences
PNAS, open access, June 2025.
See also a PS thread from last year, just after the paper was published: Maybe evolution of protein folds is.... easy