Basic ID question re: "de novo" sequences

Yes. There are numerous cases in the literature where scientists have inferred that such a chain of event has occurred to produce a functional protein coding gene from previously non-coding DNA.

For any stretch of more or less random sequence DNA, you’re going to run into a reading frame that contains a canonical start codon, and further downstream from that, sooner or later you’re going to run into a stop codon in the same reading frame. The question is if that putative ORF (open reading frame), if expressed into a protein sequence, actually serves some useful function. If it does, natural selection can further improve and fine-tune this function.

Most non-coding DNA is evolving at close to the rate at which mutation occurs, probably just undergoing weak negative selection (because it does have some slight metabolic cost), and purifying selection against acquiring deleterious activities. But besides these selective pressures, it’s generally just evolving at a neutral rate. Eventually any piece of non-coding DNA will get transcribed by accidental transcription factor binding activity in a phenomenon termed pervasive transcription, and if that transcript contains an open reading frame, it will likely get translated into a protein.

It takes remarkably little for beneficial open reading frame to acquire a functional transcriptional promoter. See for example this:
Yona, A.H., Alm, E.J. & Gore, J. Random sequences rapidly evolve into de novo promoters. Nat Commun 9, 1530 (2018) doi:10.1038/s41467-018-04026-w


How new functions arise de novo is a fundamental question in evolution. We studied de novo evolution of promoters in Escherichia coli by replacing the lac promoter with various random sequences of the same size (~100 bp) and evolving the cells in the presence of lactose. We find that ~60% of random sequences can evolve expression comparable to the wild-type with only one mutation, and that ~10% of random sequences can serve as active promoters even without evolution. Such a short mutational distance between random sequences and active promoters may improve the evolvability, yet may also lead to accidental promoters inside genes that interfere with normal expression. Indeed, our bioinformatic analyses indicate that E. coli was under selection to reduce accidental promoters inside genes by avoiding promoter-like sequences. We suggest that a low threshold for functionality balanced by selection against undesired targets can increase the evolvability by making new beneficial features more accessible.

There are a handful of examples where there is strong evidence that this has resulted in genuinely novel, functional protein coding genes:
Cai J, Zhao R, Jiang H, Wang W. De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics. 2008 May;179(1):487-96. doi:10.1534/genetics.107.084491.


Origination of new genes is an important mechanism generating genetic novelties during the evolution of an organism. Processes of creating new genes using preexisting genes as the raw materials are well characterized, such as exon shuffling, gene duplication, retroposition, gene fusion, and fission. However, the process of how a new gene is de novo created from noncoding sequence is largely unknown. On the basis of genome comparison among yeast species, we have identified a new de novo protein-coding gene, BSC4 in Saccharomyces cerevisiae. The BSC4 gene has an open reading frame (ORF) encoding a 132-amino-acid-long peptide, while there is no homologous ORF in all the sequenced genomes of other fungal species, including its closely related species such as S. paradoxus and S. mikatae. The functional protein-coding feature of the BSC4 gene in S. cerevisiae is supported by population genetics, expression, proteomics, and synthetic lethal data. The evidence suggests that BSC4 may be involved in the DNA repair pathway during the stationary phase of S. cerevisiae and contribute to the robustness of S. cerevisiae, when shifted to a nutrient-poor environment. Because the corresponding noncoding sequences in S. paradoxus, S. mikatae, and S. bayanus also transcribe, we propose that a new de novo protein-coding gene may have evolved from a previously expressed noncoding sequence.