Functional Long Non-coding RNAs Evolve from Junk Transcripts

This new paper in cell looks interesting, and relevant to topics that often come up here.

It speaks to things like junk DNA, pervasive transcription, constructive neutral evolution, and organismal complexity.


Transcriptome studies reveal pervasive transcription of complex genomes, such as those of mammals. Despite popular arguments for functionality of most, if not all, of these transcripts, genome-wide analysis of selective constraints indicates that most of the produced RNA are junk. However, junk is not garbage. On the contrary, junk transcripts provide the raw material for the evolution of diverse long non-coding (lnc) RNAs by non-adaptive mechanisms, such as constructive neutral evolution. The generation of many novel functional entities, such as lncRNAs, that fuels organismal complexity does not seem to be driven by strong positive selection. Rather, the weak selection regime that dominates the evolution of most multicellular eukaryotes provides ample material for functional innovation with relatively little adaptation involved.


Just quoting some key statements:

A second important implication of the weak selection regime in eukaryotic genomes that counters the above trend against the emergence of lncRNAs, is the continuous emergence of biochemically active but non-functional entities. The barrage of mutations, which the genome experiences, constantly generates short motifs that have biochemical activity, including transcription factor binding and recruitment of RNA polymerase, resulting in cryptic transcriptional start sites. This is not surprising given that transcription factor binding sites are typically very short (Stewart et al., 2012), and many random pieces of DNA can activate transcription (Gerber et al., 2013; Gosselin et al., 2016; Reinke et al., 2008; White et al., 2013). Under the weak selection regime, these cryptic transcriptional start sites will only be eliminated by purifying selection if their associated transcript has a negative fitness effect above the drift barrier. The potential negative effects of such transcription start sites are drastically diminished by quality control mechanisms that degrade spurious RNAs or at least prevent their efficient translation into proteins (see below). Thus, one would expect that genomes of multicellular eukaryotes, which evolve under weak selection, would inevitably produce numerous, low abundance non-coding RNAs that exert small (positive and negative) fitness effects. Spurious ‘‘genes’’ producing non-specific transcriptional noise are expected to be incessantly created and destroyed by neutral evolution. Thus, the evolutionary dynamics in complex eukaryotes necessarily produces a genome teeming with ever changing transcriptional noise.

Another concept that is commonly glossed over is that the production of a low level of junk RNA is fully compatible with our current understanding of biochemistry. All enzymes as well as regulatory proteins possess a degree of promiscuity and can bind to, and act on, sub-optimal substrates (Copley, 2020; Tawfik, 2020). Thus, transcription factors, which typically recognize short degenerate DNA motifs, will bind not only to gene-regulatory regions, but also to many additional non-functional sites in the genome (Paris et al., 2013; Reilly and Noonan, 2016; Villar et al., 2014; Wong et al., 2015).

There is so much good stuff in this paper it’s hard to pick something to highlight over something else.


We’ve talked about the limits of seeing DNA as computer code, and here is one key point. DNA tolerates quite a bit of noise, but computer programs don’t.

Moreover, the noise being described here is a thicket of complexity, like a messy room or a cluttered desk.


Agreed. The computer code analogies completely break down when it comes to describing the physics of interacting molecules. For example, there is nothing in computer code that is analogous to the relationship between GC or AT hydrogen bonding and binding affinity(and how this affects things like melting temperature), between complementary antiparallel strands of nucleic acids. They just aren’t the same things and don’t function by the same principles.

I think this also explains why many people have a hard time understanding why biological macromolecules can evolve and change over time when they are time and again inappropriately analogized to “inert” macroscopic, mechanical objects, like vehicle engines with axles and pistons, and electronics devices.


This report also is another refutation of the claim that there is some sort of informational barrier to evolution. The origins and functioning of lncRNAs are decidedly low information (information in the ID use of the term, that is).


“So the way this code works is it produces a 1, unless the room is under 20C, or you’ve gone more than 3 days between a power cycle, or you have any of the programs listed below installed. Then it’s a 0. Unless…”


That…that’s not how code is supposed to work?


I’ve seen code like that. Mostly using conditional #ifdef lines. :sob:

1 Like