Basic ID question re: "de novo" sequences

That is what it appears to me, but I have never heard an IDC say this explicitly, so I’d like confirmation from one them or their supporters.

Could you maybe unpack that a little? How does a non-genic sequence turn into a gene? My understanding is that a gene is a DNA sequence between an initiation and termination codon, and then there is the regulatory sequences. I’m not sure though, what changes about a non-genic sequence and by what mechanisms do they change to form a functional gene?

As you imply, by gaining one or more of a transcription factor binding site, an initiation codon, and/or a stop codon by mutation.

So then it’s just a matter of whether that gene codes for a functional protein of some sort? If so, I’m guessing natural selection then works to conserve (and optimize) the gene, and if not, I’m assuming it will eventually lose its codons, etc. since it doesn’t do anything beneficial. Is that somewhat right?

Yes, somewhat. Your terminology is odd, but I think you have the point.

Yes. There are numerous cases in the literature where scientists have inferred that such a chain of event has occurred to produce a functional protein coding gene from previously non-coding DNA.

For any stretch of more or less random sequence DNA, you’re going to run into a reading frame that contains a canonical start codon, and further downstream from that, sooner or later you’re going to run into a stop codon in the same reading frame. The question is if that putative ORF (open reading frame), if expressed into a protein sequence, actually serves some useful function. If it does, natural selection can further improve and fine-tune this function.

Most non-coding DNA is evolving at close to the rate at which mutation occurs, probably just undergoing weak negative selection (because it does have some slight metabolic cost), and purifying selection against acquiring deleterious activities. But besides these selective pressures, it’s generally just evolving at a neutral rate. Eventually any piece of non-coding DNA will get transcribed by accidental transcription factor binding activity in a phenomenon termed pervasive transcription, and if that transcript contains an open reading frame, it will likely get translated into a protein.

It takes remarkably little for beneficial open reading frame to acquire a functional transcriptional promoter. See for example this:
Yona, A.H., Alm, E.J. & Gore, J. Random sequences rapidly evolve into de novo promoters. Nat Commun 9, 1530 (2018) doi:10.1038/s41467-018-04026-w


How new functions arise de novo is a fundamental question in evolution. We studied de novo evolution of promoters in Escherichia coli by replacing the lac promoter with various random sequences of the same size (~100 bp) and evolving the cells in the presence of lactose. We find that ~60% of random sequences can evolve expression comparable to the wild-type with only one mutation, and that ~10% of random sequences can serve as active promoters even without evolution. Such a short mutational distance between random sequences and active promoters may improve the evolvability, yet may also lead to accidental promoters inside genes that interfere with normal expression. Indeed, our bioinformatic analyses indicate that E. coli was under selection to reduce accidental promoters inside genes by avoiding promoter-like sequences. We suggest that a low threshold for functionality balanced by selection against undesired targets can increase the evolvability by making new beneficial features more accessible.

There are a handful of examples where there is strong evidence that this has resulted in genuinely novel, functional protein coding genes:
Cai J, Zhao R, Jiang H, Wang W. De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics. 2008 May;179(1):487-96. doi:10.1534/genetics.107.084491.


Origination of new genes is an important mechanism generating genetic novelties during the evolution of an organism. Processes of creating new genes using preexisting genes as the raw materials are well characterized, such as exon shuffling, gene duplication, retroposition, gene fusion, and fission. However, the process of how a new gene is de novo created from noncoding sequence is largely unknown. On the basis of genome comparison among yeast species, we have identified a new de novo protein-coding gene, BSC4 in Saccharomyces cerevisiae. The BSC4 gene has an open reading frame (ORF) encoding a 132-amino-acid-long peptide, while there is no homologous ORF in all the sequenced genomes of other fungal species, including its closely related species such as S. paradoxus and S. mikatae. The functional protein-coding feature of the BSC4 gene in S. cerevisiae is supported by population genetics, expression, proteomics, and synthetic lethal data. The evidence suggests that BSC4 may be involved in the DNA repair pathway during the stationary phase of S. cerevisiae and contribute to the robustness of S. cerevisiae, when shifted to a nutrient-poor environment. Because the corresponding noncoding sequences in S. paradoxus, S. mikatae, and S. bayanus also transcribe, we propose that a new de novo protein-coding gene may have evolved from a previously expressed noncoding sequence.


Off-topic, but when I ask how this evidence for Design could be falsified, the answer is either “demonstrate a step-wise evolutionary pathway” or “show the selected steps”, which is pretty much the same thing as far as I can tell (Behe-speak?). What doesn’t get mentioned is that a Designer capable of creating complex things must also be capable of simple things. The Designer could have made each change in a step-wise manner, selecting each step along the way - just as we expect from evolution - and this claim can never be falsified.

We now return you to the interesting discussion of just what could be meant by “de novo” sequences. :slight_smile:

1 Like

Given the discussion in other threads, I would suspect that ID supporters would describe a de novo sequence as something that poofs into existence from nowhere. If the sequence is produced from a combination of recombination and mutation, then they will claim that it isn’t new because it was derived from pre-existing sequence.

I think this creates a difference in expectations. When we cite examples of new protein function emerging through the mutation of ancestral DNA sequences they claim that this isn’t what they are looking for. They expect that genes just pop into existence from nowhere, so they automatically reject examples of evolved genes because they don’t fit their definition of how genes should come to exist.


That also highlights the contradictory nature of the request to see a “step-wise” evolutionary pathway to some complex adaptation, and then simultaneously insisting evolution can’t create new information, and that any putative “step” in this pathway only works because the information “pre-exists”.

So there’s some already existing sequence that mutates into some other sequence incrementally, but creationists think this doesn’t count as “new information”, because at every mutational step the information “mostly pre-exists”. And so even though you have shown a stepwise pathway from the ancestral to the descendant step, from no function to function, to a complex adaptation, it somehow magically doesn’t count because it’s not “new information”, it was just pre-existing information being copied and modified by mutation, so it’s not truly new novel de novo out of non-existant-nothing information in the special sense of new novel and de novo the creationist meant all along.

1 Like

Heads they win; Tails we lose. It’s a classic.

1 Like

Of course, once one realizes that all genetic information arises from pre-existing information, one arrives at the conclusion that no new information is needed to explain the origins of species. Which pretty much renders all information-based ID theory as irrelevant and meaningless.

In other words, as much an own goal as anything.


How would you test this hypothesis?

By observing how new genes arise from pre-existing sequences.

How would you test the hypothesis a disembodied mind used magic to POOF new genes into existence?

HINT: Observing humans compose and send messages to other humans doesn’t do it.


Random De Novo sequences are easy to make, making a sequence that codes for a structurally integrated protein, not so easy by random chance, and not so easy by selection if the whole integrated system won’t be selected for unless all the parts are in place simultaneously. In fact I guess the chances of this are astronomically remote!

So de novo is easy to make if one sets the bar low for what constitutes de novo. I’d say for even modestly well fitting systems such as those below, it’s outrageous to think this could happen by chance.

Two good illustrations of the well-fitting parts (I count 21 or so proteins):

ATP synthase1

Some have used “phylogenetic methods” to argue that this complex could easily arise by saying one of the proteins has some passing similarity to helicases. I find that phylogenetic “explanation” to be a total non-sequitur.

Another example of well fitting parts is below. This is the PRC2 complex with several proteins.

Notice how nicely all the little shapes join with other shapes! The space filling diagram on the lower right shows the fitting of the PRC2 parts quite well. The SUZ12/VEFS in beige is especially astonishing as it has to properly connect to multiple proteins in addition to connecting to a lincRNA from a totally different chromosome (chromosome 12) than the one it is positioned on (often the Hox cluster in chromosome 2).

1 Like

Look at all the necessary domains (the colored boxes) in the EZH2 protein (of the above PRC2 complex) that have to be properly positioned and in the right order. Holy smokes!

Each of the boxes is a domain composed of a subsequence of 30 to 130 amino acids long and which collectively together form a protein of 950 amino acids, which would require not quite 3000 DNA nucleotides in its gene to code (at least). These are like nice little modules!

Many of the domains are mix-and-match. As in the domains violate a simple protein phylogeny since it would require odd sorts of gene fusions where several parts from several genes are slapped together to make a new functional gene, but each of the part need modification too! This is a known problem in evolutionary biology called “Promiscuous Domains.”

The “ZNx3” under the CXC domain is probably a suspected Cysteine zinc finger array, but it hasn’t been fully resolved according to literature I’ve read. If it is, I’ve argued before why its absurd to think zinc finger arrays can become functional by random mutation and selection.

The ID argument in a nutshell. Look at it, it’s so complex, holy smokes! The end.


I’m beginning to think “holy smokes” is ID code for “Holy Ghost”. That seems to be the preferred “mechanism”.


But in actual fact, you have no idea whatsoever. All you have is your guess, based on nothing at all.

Doesn’t matter what you think is outrageous. That’s not an argument.

No, there is nobody who says “it can easily arise because one of the proteins is actually homologus to hexameric helicases”.

Yeah so do I, no wonder nobody actually made an argument that “evolving ATP synthase is easy” just because the F1 subunits alpha and beta are homologous to hexameric helicases(though, they actually are no matter how much you want to insist they merely have “some passing similarity”).

By the way there’s an error in the figure, it incorrectly shows that the membrane spanning ring is a “proton channel”, and that the protons appear to enter the center of the c-ring, and exit at the center of the alpha-beta hexamer of the F1 subunit. That’s incorrect, the proton channel actually split in two in the a-protein, and is at the interface between the protein labeled a(base of the stator), and the ring labeled c(rotor).
I have corrected your figure:
ATP synthetase
Protons enter into a half-channel at the base of the stator(the a-protein), becomes bound to aspartatic acid(forming aspartate) in the middle of a c-ring subunit, causing the c-ring to rotate and pick up more protons. Eventually the c-ring has turned all the way around and can release the proton back into the other half-channel in the a-protein, releasing it into the outer membrane space of the mitochondria.

The c-ring, even by itself, can actually function as a protein translocase able to move proteins across the membrane.

1 Like

You have not answered the questions I asked in the OP. IIRC, you no longer consider yourself a supported of ID. But since you were an important part of the movement for several years, I presume you will have some hlepful insights.

1 Like