So apparently, in part by using ancestral sequence reconstruction, biochemists have been able to show that the core fold of RNA polymerase(and a number of other ancient protein folds), the Double-Psi-Beta-Barrel(DPBB), likely originated by repeat of (duplication) of a relatively simple peptide 46 residues in length, using only 7 different (and “early” in genetic code evolution) amino acids.
While the two halves of the extant symmetrical protein fold today today share less than 30%(with some as low as 4.5%) sequence identity, apparently the same structure can be produced by the 100% same sequence just repeated twice.
The extant complex proteins must have evolved from ancient short and simple ancestors. Nevertheless, how such prototype proteins emerged on the primitive earth remains enigmatic. The double-psi beta-barrel (DPBB) is one of the oldest protein folds and conserved in various fundamental enzymes, such as the core domain of RNA polymerase. Here, by reverse engineering a modern DPBB domain, we reconstructed its evolutionary pathway started by “interlacing homo- dimerization” of a half-size peptide, followed by gene duplication and fusion. Furthermore, by simplifying the amino acid repertoire of the peptide, we successfully created the DPBB fold with only seven amino acid types (Ala, Asp, Glu, Gly, Lys, Arg, and Val), which can be coded by only GNN and ARR (R = A or G) codons in the modern translation system. Thus, the DPBB fold could have been materialized by the early translation system and genetic code.
I just recalled that not too long ago an ID proponent came here to question the inference of homology of relatively dissimilar protein sequences. The question appears to have been why anyone should believe that two non-identical proteins should be thought of as related (homologous - deriving from a common ancestor), and is there any reason to believe that you can sort of incrementally mutate two such proteins from a common ancestor while retaining it’s function as they become more and more dissimilar in sequence?
Take a look at table 2 in the paper. There’s a list of 250 different variants of DPBB-containing proteins, with internal sequence-identity ranging all the way from 45%(most similar known natural variants) to 4.5%(which remarkably is below the 5% similarity expected from randomly picking two sequences of similar length), at essentially every intermediate percentage spanning that “gap”. Those are extant proteins with internal sequence symmetry identities covering basically that entire range.
And now biochemists have shown that indeed the protein could have begun with a repeat of the exact same sequence, even using a plausible pre-biotic alphabet of just 7 amino acids.
They didn’t test that, but my guess is it is very unlikely to be able to function as a particularly specific or high-rate catalyst of RNA polymerization. While it is possible it can have alternative catalytic functions (a highly promiscous catalyst of small-molecule metabolic reactions), generally speaking such small and simple proteins are more likely to have functions such as binding and stabilization of other molecules. Given it’s natural affinity for RNA and DNA, it is likely this was it’s original role, to simply bind to and stabilize some ribonucleotide polymer. There are double-psi beta barrel proteins today that have that function only, one of which is found in the ribosome IIRC. The domain just sits there and binds and stabilizes ribosomal RNA.
There are, however, examples of very small and reduced protein domains of a similar size that are able to perform functions similar to those of their much larger and elaborate descendant proteins. For example the Tawfik lab was able to show that an extremely simplified version of the P-loop NTPase fold(essentially consisting of little more than the P-loop motif itself) could actually function as an RNA/DNA helicase: