I meant no one among the people writing on this thread.
Oh – sorry! You are right.
I’d be surprised to learn that there are researchers who insist that NO orphan genes exist at all. There might be disagreement about the proportions, but none?
That would be more than a “stretch.” It would be incompetence. Fortunately, no one has done that.
Well, to be fair, I’m certain people outside this thread have made similar conclusions. The key point @Ignostic, is that we are not making that conclusion. I’ve seen this mistake made, however, in anti-evolution polemics all the time!
It helps to know some context: there has been a considerable history of people claiming to find interesting novelties in genome comparisons (things like extensive horizontal gene transfer into mammals), only to have the finds disappear with better genome sequences. So most researchers are going to be skeptical about a result that depends on not finding something in a few genomes.
Well then, what is your explanation? Remember, we are talking about 568 (potential) orphans or maybe just ORFs in the lineage of the great apes that are missing in the more distantly related lineages. If 85-95% of the macaque/orangutan genome has already been sequenced, why did Guerzoni and McLysaght only find orthologous DNA for just ~29% of all their candidate genes?
That is about 5% of the genes in the genome. About what a back of the envelope computation predicts.
Now hold on for a second. Other studies confirm that de novo genes emerge at a pace that is 5.4 - 9.1 times slower compared to gene duplicates and that the majority of (rodent-specific) “novel” protein-coding genes are actually old genes or recent duplicates. Many of them may share homology with genes from other vertebrates.

Well then, what is your explanation?
The explanation is uncertainty. It’s been discussed repeatedly in this thread. Try reading?

Other studies confirm that de novo genes emerge at a pace that is 5.4 - 9.1 times slower compared to gene duplicates and that the majority of (rodent-specific) “novel” protein-coding genes are actually old genes or recent duplicates.
You omitted the three most important words from your sentence: “identified with phylostratigraphy.” The paper you cite is about phylostratigraphy, which seeks not to identify new gene birth at the resolution we have been discussing here, but to identify gene birthdates in the deep past. It seems you did not read the paper, or even the abstract.
[Edit: I was wrong here; see correction below.]
@sfmatheson he is offering a counterpoint to @pnelson that seems correct. I don’t see an error in what he wrote. What did I miss?

I don’t see an error in what he wrote. What did I miss?
I didn’t see an error in what he wrote. I saw a comparison of birthdating of genes by phylostratigraphy with identification of de novo gene birth by McLysaght and others, and that’s a bogus comparison. But what I didn’t see was that he was responding to Nelson. So the error is mine! I’m sorry about that, @Ignostic.
@Ignostic, can you point to what you were responding to? I think it’s the last quote from this post by Nelson but can’t tell.
Hello Ignostic,
Above in this thread, I pointed out that this research field (orphan / de novo / taxonomically restricted genes) is still young, and, like most stormy adolescents, bounces around rather wildly, with some tearful interludes in the high school parking lot in late afternoon. In their recent review, Van Oss and Carvunis (2019) point out:
“Estimates regarding the frequency of de novo gene birth and the number of de novo genes in various lineages vary widely and are highly dependent on methodology. Studies may identify de novo genes by phylostratigraphy/BLAST-based methods alone, or may employ a combination of computational techniques (see above), and may or may not assess experimental evidence for expression and/or biological role. Furthermore, genome-scale analyses may consider all or most ORFs in the genome, or may instead limit their analysis to already annotated genes.”
(p. 10, emphasis mine – paper open access: De novo gene birth)
In particular, with reference to the Casola (2018) paper you cite, they write:
“A reanalysis of three such studies in murines that identified between 69 and 773 candidate de novo genes argued that the various estimates included many genes that were not in fact de novo genes [74]. Many candidates were excluded on the basis of no longer being annotated in the major databases. A conservative approach was applied to the remaining genes, which excluded candidates with paralogs, distantly related homologs or conserved domains, or that lacked syntenic sequence information in non-rodents. This approach validated ~40% of candidate de novo genes, resulting in an upper estimate of only 11.6 de novo genes formed (and retained) per million years, a rate ~5–10 times slower than what was estimated for novel genes formed by duplication [74]. It is notable that even after application of this stringent pipeline, the 152 validated de novo genes that remained still represents a significant fraction of the mouse genome likely to have originated de novo. Generally speaking, however, it remains debated whether duplication and divergence or de novo gene birth represent the dominant mechanism for the emergence of new genes [63, 65, 73, 75–77], in part due to the fact that de novo genes are likely both to emerge and to be lost more frequently than other young genes.”
(pp. 10-11, emphasis mine)
Remember your adolescence? I sure remember mine. All kinds of wild excitement, all kinds of misery and uncertainty, in about equal measures. That’s this research area.
P.S. When the MS of the Casola 2018 paper appeared on bioRxiv, I sent it to a leading researcher in the area (whom I won’t name to protect the person’s privacy). This scientist responded that Casola’s methods left them unpersuaded, as “gene annotation” performed by algorithm struck them as an “artificial working hypothesis” and thus “not good criteria.” The same scientist, however, thought “de nono” was a good joke, all in all.
@sfmatheson I was responding to Nelson’s assertion that “orphans are not simply artifacts of poor sampling or incomplete genome annotation”.
The author of the study I cited discusses this very topic. From the article:
The assumption of “novelty” in orphan genes can be violated under two scenarios, which I discuss below with regard to PDNGs [ = putative de novo genes]. First, novel proteins are routinely added to existing sequence databases, thus expanding the sequence space available to search for possible homologous sequences of PDNGs.
Casola performed sequence similarity analyses and this is what he concluded:
[…] I show that ∼60% of the remaining 381 putative de novo genes share homology with genes from other vertebrates, originated through gene duplication, and/or share no synteny information with nonrodent mammals.
@pnelson True, it remains debated.

@pnelson True, it remains debated.
Not really. Those most interested in the question are gathering data and withholding judgment, not debating.