First paper here gives evidence that non-coding DNA is generally speaking very close in sequence space to DNA encoding foldable protein secondary structural elements:
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences’ properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic ORFs (Open Reading Frames) of S. cerevisiae with the aim of (i) exploring whether the large structural diversity observed in proteomes is already present in noncoding sequences, and (ii) estimating the potential of the noncoding genome to produce novel protein bricks that can either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural diversity of canonical proteins with strikingly the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by identifying intergenic ORFs with a strong translation signal in ribosome profiling experiments and by reconstructing the ancestral sequences of 70 yeast de novo genes. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and the one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Then there’s this:
Contrary to long-held views, recent evidence indicates that de novo birth of genes is not only possible, but is surprisingly prevalent: a substantial fraction of eukaryotic genomes are composed of orphan genes, which show no homology with any conserved genes. And a remarkably large proportion of orphan genes likely originated de novo from non-genic regions. Here, using a parsimonious mathematical model, we investigate the probability and timescale of de novo gene birth due to spontaneous mutations. We trace how an initially non-genic locus accumulates beneficial mutations to become a gene. We sample across a wide range of biologically feasible distributions of fitness effects (DFE) of mutations, and calculate the conditions conducive to gene birth. We find that in a time frame of millions of years, gene birth is highly likely for a wide range of DFEs. Moreover, when we allow DFEs to fluctuate, which is expected given the long time frame, gene birth in the model becomes practically inevitable. This supports the idea that gene birth is a ubiquitous process, and should occur in a wide variety of organisms. Our results also demonstrate that intergenic regions are not inactive and silent but are more like dynamic storehouses of potential genes.
This is sure to be a new favorite collection of papers to @colewd I’m sure.