Evidence favoring de novo gene evolution, and an actual population genetic model of de novo gene gain

First paper here gives evidence that non-coding DNA is generally speaking very close in sequence space to DNA encoding foldable protein secondary structural elements:


The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences’ properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic ORFs (Open Reading Frames) of S. cerevisiae with the aim of (i) exploring whether the large structural diversity observed in proteomes is already present in noncoding sequences, and (ii) estimating the potential of the noncoding genome to produce novel protein bricks that can either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural diversity of canonical proteins with strikingly the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by identifying intergenic ORFs with a strong translation signal in ribosome profiling experiments and by reconstructing the ancestral sequences of 70 yeast de novo genes. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and the one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.

Then there’s this:


Contrary to long-held views, recent evidence indicates that de novo birth of genes is not only possible, but is surprisingly prevalent: a substantial fraction of eukaryotic genomes are composed of orphan genes, which show no homology with any conserved genes. And a remarkably large proportion of orphan genes likely originated de novo from non-genic regions. Here, using a parsimonious mathematical model, we investigate the probability and timescale of de novo gene birth due to spontaneous mutations. We trace how an initially non-genic locus accumulates beneficial mutations to become a gene. We sample across a wide range of biologically feasible distributions of fitness effects (DFE) of mutations, and calculate the conditions conducive to gene birth. We find that in a time frame of millions of years, gene birth is highly likely for a wide range of DFEs. Moreover, when we allow DFEs to fluctuate, which is expected given the long time frame, gene birth in the model becomes practically inevitable. This supports the idea that gene birth is a ubiquitous process, and should occur in a wide variety of organisms. Our results also demonstrate that intergenic regions are not inactive and silent but are more like dynamic storehouses of potential genes.

This is sure to be a new favorite collection of papers to @colewd I’m sure.


Do not taunt Happy Fun Bill. :wink:


I don’t know about @colewd, but I would love to hear @pnelson’s take on this.


That comes perilously close to Panglossian thinking, assigning an adaptive function to junk DNA.

There is of course plentiful empirical evidence of fairly recent genes arising from junk sequences, given the existence of close enough relatives of the species with the new gene.

1 Like

Yeah I was a bit annoyed by the phrasing there too, but I don’t think that’s what they’re actually saying. I think the contrast between “silent and inactive” vs “dynamic” is the distinction they mean to highlight, not to suggest that junk DNA is adaptive.


It isn’t clear what that means. Junk DNA is of course silent, and all DNA is inactive: it just sits there, sometimes being bound to by this or that protein, and sometimes being transcribed, and sometimes being replicated. But it’s all passive. Now if they mean that there’s frequent change of role between junk DNA and functional DNA, with function being gained or lost at various loci all the time, then that’s fine. And of course deleterious mutations can also arise in junk DNA, as can strictly neutral mutations that reduce genetic compatibility between populations. Why, there may even be reinforcement, i.e. selection favoring increased isolation, though that usually ought to apply to premating or prezygotic isolation.

It’s not clear from that sentence alone, but I can’t find anything in the main text that suggests the authors are postulating a sort of adaptive function of junk DNA(they’re nowhere saying that the reason junk DNA exists is because it can potentially evolve into genes). My best guess is they’re speaking in the context of evolutionary timeframes, as opposed to the behavior of junk DNA at any particular instance in the lifetime of a cell.

Curiously the term junk appears nowhere in the paper, and they only ever contrast coding vs non-coding, deleterious or neutral vs adaptive, and/or functional vs non-functional. And everything they have to say on that topic seems totally solid to me. For example this part I think represents very well what I’ve come to understand about the relationship between pervasive transcription of non-coding DNA and de novo gene birth.

Patterns observed in studies on well-known de novo genes and putative de novo genes shed light on possible mechanisms of gene birth: new genes are born preferably in genomic regions with high GC content and near meiotic hotspots. In animals, new genes are more likely to be expressed in the brain and testis ( Vakirlis et al., 2018 ). Interestingly, these cells and genomic regions are also especially prone to pervasive transcription ( Jensen et al., 2013 ). These patterns suggest an intriguing hypothesis ( Van Oss and Carvunis, 2019 ), whereby non-genic loci, made visible to nat-ural selection by pervasive expression, can be driven to evolve in two ways: the sequences could evolve towards reducing deleterious traits, such as propensity to aggregate, in a process called pre-adaptation ( Masel, 2006 ; Wilson et al., 2017 ); alternatively, or additionally, the sequences can gain new functions that increase the fitness of the organism ( Carvunis et al., 2012 ). Pre-adaptation is considered non-adaptive evolution since it ensures that expressed sequences remain harmless, but in itself does not prescribe howthe expression of these sequences increase organismal fitness. Whereas gain of a new function is considered adaptive and leads to gene birth.

Two simple observations, taken together, lend support to such a mechanism of gene birth: first, random sequences can gain functionality, provided they are consistently expressed ( Hayashi et al., 2003 ). And second, it was demonstrated that new promoters could easily evolve in E. coli ( Yona et al., 2018 ). These studies highlight the possibility that non-genic sequences can, in stages, gain the hallmarks of genes: regulated expression, and subsequently, functionality.

Of course, since non-coding DNA is not necessarily nonfunctional, I think it would be a mistake to only consider the phenomenon of de novo gene birth from non-coding DNA as occurring in junk, so that would explain why they’re not really focused on that specifically.

1 Like

So by “active” they just mean “actively transcribed”.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.