Very interesting new paper from the Masel lab at the University of Arizona*. Below are some excerpts that I found important.
From the Intro:
Previous analyses focused on the age of orthologous gene families (20–22); ours infers which protein domains date back to LUCA. Protein domains are the basic units of proteins, that can fold, function, and evolve independently (23). Proteins often contain multiple protein domains, each of which might have a different age (Fig. 1). For the purpose of inferring ancient amino acid usage, what matters is the age of the protein domain, not that of the whole protein that it is part of.
Beginning of the Discussion:
The evolution of the current genetic code proceeded via stepwise incorporation of amino acids, driven in part by changes in early life’s environment and requirements. Contemporary proteins retain information about which amino acids were part of the code at the time of their birth, allowing us to infer the order of recruitment on the basis of enrichment or depletion in LUCA’s protein domains. Smaller amino acids were added to the code first, and when this is accounted for, there is no further information in Trifonov’s (4) widely used “consensus” order based on 40 metrics, some of dubious relevance.
From the Discussion:
More broadly, coding for different amino acids might have emerged at similar times but in different biogeochemical environments. The temporal order of recruitment that we infer based on LUCA sequences is not the temporal order for coding as a whole, but for the ancestor of the modern translation machinery. Indeed, HGT of the tRNAs coupled with their cognate aminoacyl tRNA synthetases might have brought the diverse components of the modern translation machinery together (77). This further emphasizes that the time of origin of the translation machinery’s components need not match the time of their incorporation into the surviving ancestral lineage.
Final paragraph:
Perhaps the biggest mystery is how sequences such as the common ancestor of L/I/V-tRNA synthetase, which were translated via alternative or incomplete genetic codes, ended up being recoded for translation by the direct ancestor of the canonical genetic code. Harmonization of genetic codes facilitated innovation sharing via HGT, making it advantageous to use the most common code, driving code convergence (85, 86). Only once a common code was established did HGT drop to levels such that a species tree became apparent, i.e., the LUCA coalescence point corresponds to convergence on a code (85). Our identification of pre-LUCA sequences provides a rare source of data about early, alternative codes.
The paper (open access):
https://www.pnas.org/doi/10.1073/pnas.2410311121
*Disclosure: I know Joanna Masel, and one of my offspring did his PhD in her lab.