I’ve been deeply fascinated with the ability of scientists to reconstruct past evolutionary histories using phylogenetic methods, and the possibility of reconstructing ancestral protein sequences to understand molecular evolutionary transitions from the deep past is no exception.
The database can be found here: http://revenant.inf.pucp.edu.pe/
The paper is:
Matias Sebastian Carletti, Alexander Miguel Monzon, Emilio Garcia-Rios, Guillermo Benitez, Layla Hirsh, Maria Silvina Fornasari, Gustavo Parisi, Revenant: a database of resurrected proteins, Database , Volume 2020, 2020, baaa031, https://doi.org/10.1093/database/baaa031
Revenant is a database of resurrected proteins coming from extinct organisms. Currently, it contains a manually curated collection of 84 resurrected proteins derived from bibliographic data. Each protein is extensively annotated, including structural, biochemical and biophysical information. Revenant contains a browse capability designed as a timeline from where the different proteins can be accessed. The oldest Revenant entries are between 4200 and 3500 million years ago, while the younger entries are between 8.8 and 6.3 million years ago. These proteins have been resurrected using computational tools called ancestral sequence reconstruction techniques combined with wet-laboratory synthesis and expression. Resurrected proteins are commonly used, with a noticeable increase during the past years, to explore and test different evolutionary hypotheses such as protein stability, to explore the origin of new functions, to get biochemical insights into past metabolisms and to explore specificity and promiscuous behaviour of ancient proteins.
It is rather mindblowing to me that it is possible to reconstruct, with high probability, the amino acid sequence of a protein that has not existed on Earth for three quarters the total age of the planet.
Simplified overview of the process of ancestor reconstruction in this:
Schematic representation of the different steps to obtain resurrected proteins. The first step involves sequence similarity searches of a given protein to obtain a set of homologous sequences, involving the ancestral nodes to be studied. For example, one could be interested in studying biochemical properties of the studied protein in the last common ancestor for all vertebrates. Using these sequences, it is possible to estimate a phylogenetic tree to define the ancestral node to be reconstructed. In the second step, ancestral sequence reconstruction techniques are applied to estimate most probable sequences in the studied node. The third step involves the ancestral sequence synthesis. This sequence is then inserted into a vector, cloned, expressed and purified (fourth step). The fifth and final step involves a series of biochemical and biophysical characterization.