Basic ID question re: "de novo" sequences

I tried asking this of one of our resident defenders of ID, but couldn’t get an answer, so I thought I would ask it of the group as a whole:

A key claim often made by proponents of ID regards what are called “de novo” DNA sequences. The claim usually entails that de novo sequences either cannot be produced by standard evolutionary processes such as mutation, genetic drift and selection, or cannot be produced with sufficient frequency to account for the history of life on earth as described by evolutionary history. Therefore, intelligent design is a more likely explanation for the existence of these sequences.

What I would like to know: What exactly is a de novo sequence? For instance, would one have been expected to have arisen in an experiment such as Lenski’s LTEE if evolution is true? If so, how would it be identified? If such a thing would not be expected, how would one determine that a de novo sequence had arisen at some point in the past? How is it determined that a certain number of these sequences must have arisen in order for evolution to be true?


Usually what the IDers talk about is de novo genes, rather than just sequences. Usually they’re referring to genes that have no orthologous counterpart genes in closely related species (and, presumably, no paralogs in the genome). Usually, however, there are orthologous non-genic sequences in those species. So de novo genes don’t come out of nowhere but out of previously existing sequences. Since the E. coli in Lenski’s experiment have very little non-genic sequence, such events should be vanishingly rare in it.

Of course the whole idea of de novo genes depends on common descent. In YEC all genes are de novo, aren’t they?

1 Like

Except for those species that survived on board the Ark and then hypermutated.

Here is an example of the context in which the term “de novo sequence” is used. It does not seem to be what you are referring to (the writer is referring to the evolution of aerobic citrate metabolism in Lenski’s LTEE) :

What they mean, perhaps, is a lengthy sequence that pops into existence from nowhere. Such things do not exist. Insertions arise in a number of ways, but they all involve the copying of existing sequences: replication slippage, transposition, inversion, duplication, retroposons, and such. Of these, some only happen in eukaryotes, and duplication is the process most likely to produce new, long sequences in E. coli. I don’t know the rate at which this happens, but I would expect it to be extremely rare. And I suspect the writer wouldn’t consider that to be de novo anyway.

Here’s an interesting study (not Lenski, though) of observed gene duplication in several E. coli strains. Mind you, the duplications are inferred from comparing sequences, not events that happened in the lab.

That is what it appears to me, but I have never heard an IDC say this explicitly, so I’d like confirmation from one them or their supporters.

Could you maybe unpack that a little? How does a non-genic sequence turn into a gene? My understanding is that a gene is a DNA sequence between an initiation and termination codon, and then there is the regulatory sequences. I’m not sure though, what changes about a non-genic sequence and by what mechanisms do they change to form a functional gene?

As you imply, by gaining one or more of a transcription factor binding site, an initiation codon, and/or a stop codon by mutation.

So then it’s just a matter of whether that gene codes for a functional protein of some sort? If so, I’m guessing natural selection then works to conserve (and optimize) the gene, and if not, I’m assuming it will eventually lose its codons, etc. since it doesn’t do anything beneficial. Is that somewhat right?

Yes, somewhat. Your terminology is odd, but I think you have the point.

Yes. There are numerous cases in the literature where scientists have inferred that such a chain of event has occurred to produce a functional protein coding gene from previously non-coding DNA.

For any stretch of more or less random sequence DNA, you’re going to run into a reading frame that contains a canonical start codon, and further downstream from that, sooner or later you’re going to run into a stop codon in the same reading frame. The question is if that putative ORF (open reading frame), if expressed into a protein sequence, actually serves some useful function. If it does, natural selection can further improve and fine-tune this function.

Most non-coding DNA is evolving at close to the rate at which mutation occurs, probably just undergoing weak negative selection (because it does have some slight metabolic cost), and purifying selection against acquiring deleterious activities. But besides these selective pressures, it’s generally just evolving at a neutral rate. Eventually any piece of non-coding DNA will get transcribed by accidental transcription factor binding activity in a phenomenon termed pervasive transcription, and if that transcript contains an open reading frame, it will likely get translated into a protein.

It takes remarkably little for beneficial open reading frame to acquire a functional transcriptional promoter. See for example this:
Yona, A.H., Alm, E.J. & Gore, J. Random sequences rapidly evolve into de novo promoters. Nat Commun 9, 1530 (2018) doi:10.1038/s41467-018-04026-w


How new functions arise de novo is a fundamental question in evolution. We studied de novo evolution of promoters in Escherichia coli by replacing the lac promoter with various random sequences of the same size (~100 bp) and evolving the cells in the presence of lactose. We find that ~60% of random sequences can evolve expression comparable to the wild-type with only one mutation, and that ~10% of random sequences can serve as active promoters even without evolution. Such a short mutational distance between random sequences and active promoters may improve the evolvability, yet may also lead to accidental promoters inside genes that interfere with normal expression. Indeed, our bioinformatic analyses indicate that E. coli was under selection to reduce accidental promoters inside genes by avoiding promoter-like sequences. We suggest that a low threshold for functionality balanced by selection against undesired targets can increase the evolvability by making new beneficial features more accessible.

There are a handful of examples where there is strong evidence that this has resulted in genuinely novel, functional protein coding genes:
Cai J, Zhao R, Jiang H, Wang W. De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics. 2008 May;179(1):487-96. doi:10.1534/genetics.107.084491.


Origination of new genes is an important mechanism generating genetic novelties during the evolution of an organism. Processes of creating new genes using preexisting genes as the raw materials are well characterized, such as exon shuffling, gene duplication, retroposition, gene fusion, and fission. However, the process of how a new gene is de novo created from noncoding sequence is largely unknown. On the basis of genome comparison among yeast species, we have identified a new de novo protein-coding gene, BSC4 in Saccharomyces cerevisiae. The BSC4 gene has an open reading frame (ORF) encoding a 132-amino-acid-long peptide, while there is no homologous ORF in all the sequenced genomes of other fungal species, including its closely related species such as S. paradoxus and S. mikatae. The functional protein-coding feature of the BSC4 gene in S. cerevisiae is supported by population genetics, expression, proteomics, and synthetic lethal data. The evidence suggests that BSC4 may be involved in the DNA repair pathway during the stationary phase of S. cerevisiae and contribute to the robustness of S. cerevisiae, when shifted to a nutrient-poor environment. Because the corresponding noncoding sequences in S. paradoxus, S. mikatae, and S. bayanus also transcribe, we propose that a new de novo protein-coding gene may have evolved from a previously expressed noncoding sequence.


Off-topic, but when I ask how this evidence for Design could be falsified, the answer is either “demonstrate a step-wise evolutionary pathway” or “show the selected steps”, which is pretty much the same thing as far as I can tell (Behe-speak?). What doesn’t get mentioned is that a Designer capable of creating complex things must also be capable of simple things. The Designer could have made each change in a step-wise manner, selecting each step along the way - just as we expect from evolution - and this claim can never be falsified.

We now return you to the interesting discussion of just what could be meant by “de novo” sequences. :slight_smile:

1 Like

Given the discussion in other threads, I would suspect that ID supporters would describe a de novo sequence as something that poofs into existence from nowhere. If the sequence is produced from a combination of recombination and mutation, then they will claim that it isn’t new because it was derived from pre-existing sequence.

I think this creates a difference in expectations. When we cite examples of new protein function emerging through the mutation of ancestral DNA sequences they claim that this isn’t what they are looking for. They expect that genes just pop into existence from nowhere, so they automatically reject examples of evolved genes because they don’t fit their definition of how genes should come to exist.


That also highlights the contradictory nature of the request to see a “step-wise” evolutionary pathway to some complex adaptation, and then simultaneously insisting evolution can’t create new information, and that any putative “step” in this pathway only works because the information “pre-exists”.

So there’s some already existing sequence that mutates into some other sequence incrementally, but creationists think this doesn’t count as “new information”, because at every mutational step the information “mostly pre-exists”. And so even though you have shown a stepwise pathway from the ancestral to the descendant step, from no function to function, to a complex adaptation, it somehow magically doesn’t count because it’s not “new information”, it was just pre-existing information being copied and modified by mutation, so it’s not truly new novel de novo out of non-existant-nothing information in the special sense of new novel and de novo the creationist meant all along.

1 Like

Heads they win; Tails we lose. It’s a classic.

1 Like

Of course, once one realizes that all genetic information arises from pre-existing information, one arrives at the conclusion that no new information is needed to explain the origins of species. Which pretty much renders all information-based ID theory as irrelevant and meaningless.

In other words, as much an own goal as anything.


How would you test this hypothesis?

By observing how new genes arise from pre-existing sequences.

How would you test the hypothesis a disembodied mind used magic to POOF new genes into existence?

HINT: Observing humans compose and send messages to other humans doesn’t do it.