A problem with orphan genes?

No. That number is a wild overestimate. They might only have a single TRG compared to their closest relatives.

Abstract

Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously noncoding sequences: functional ncRNA, introns, or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false-positives (nonfunctional open reading frames and/or functional genes that did not arise de novo) and false-negatives. Here, we search conservatively for high-confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous noncoding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single-copy genes in Drosophila simulans and Drosophila sechellia . It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that predates their open reading frame. These genes have not been previously reported as de novo originated, and to our knowledge, they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes.

1 Like

so we are going back to the claim that nearly all of these orphan genes are actually “false genes”?

We never left it.

Yes, the majority identified by mere detection of translatable open reading frames, and possibly by low-level transcripts, are likely to be false positives that are just transient ORFs coming and going as non-coding DNA accumulates mutations, occasionally getting spuriously transcribed.

Have you ever actually cited the paper you got that figure from, and do they actually say what they did? I see that you haven’t. Let’s start with that.

And is it your contention that D. simulans and D. sechellia are not related by descent? If so, there’s no way to say that they’re sister species and thus no reason to compare them.

That doesn’t appear to fit @scd’s definition of an orphan gene, since it appears in multiple species.

Ah, here it is:
https://www.researchgate.net/publication/26776433_More_than_just_orphans_Are_taxonomically-restricted_genes_important_in_evolution

I see that their criterion for “orphan” is that it’s a predicted ORF whose inferred protein sequence doesn’t appear in a protein database. There are many problems with that criterion. As has been mentioned, predicted ORFs grossly overestimate the number of real, functional genes. Further, they don’t test for homologous non-coding sequences in related species. And protein databases are highly incomplete.

But no, they didn’t do genome-to-genome comparisons, as you claimed.

2 Likes

Which is then likely to put the true number of orphans to zero in those two species. And probably for many others, depending on how closely related they are to their nearest extant relatives.

If so we stayed with the problem of the lack of correlation between time and the amount of “false genes”. so why for instance the species grimshawi has a similar number of “false genes” as the newer species such as the simulans? (although the grimshawi is almost ten times more ancient).

They have relatively similar amounts of transient ORFs because they have (relatively) similar genome sizes. At any given moment in time, some portion of the non-coding (and mostly junk) genome will contain a valid open reading frame just by chance. Because the junk-portion is evolving pretty much at a neutral rate, it will differ substantially in terms of sequence even between closely related species. As such, the spurious ORFs in one species are likely to be different from those in another species, hence they will show up as “orphan” open reading frames.

1 Like

Sorry, but that’s gibberish. Why should false genes depend on time? And how are you measuring the age of a taxon, since you reject phylogeny?

1 Like

Reminds me of the time a creationist was trying to argue against common ancestry by pointing out something to do with genes that were conserved between species.

Something of the sort actually occurs in the recent book from RTB, Thinking about Evolution, though in that case it’s the ages of taxa inferred from phylogeny.

for 3000 “false genes” we need about no more than 3 million bp (suppose 1000 bp per gene). the genome size of Drosophila is almost 200 million bp, so even if 10% of its genome is non- functional we have a place for about 20,000 possible false genes. thus, we should find a correlation between time and number of false genes. and yet we dont. dont forget that animals with much more “junk” have similar number of false genes too.

see above. why not actually?

More gibberish. False genes both appear (when a stop codon in some reading frame turns into a sense codon) and disappear (when a sense codon becomes a stop codon). It should be obvious that there is an equilibrium point.

No. Remember, the region keeps mutating because it’s not being maintained by selection. So emerging open reading frames quickly(on evolutionary timescales) disappear again as they mutate further.

So a non-coding junk region mutates and becomes an open reading frame(read that to make sure you understand what that refers to), continues accumulating mutations, and eventually stops being a translatable open reading frame again, as new mutations replace what they previously created. This process continues indefinitely where ORFs come and go. Like I tried to depict with the changing colors in my figure up above, you can just substitute in “false genes” in place of genuine de novo genes.

You keep saying that, yet you haven’t shown any data that includes the time values - or even any indication of what you mean by ‘time’.

So there’s no basis for your claim.

if this is indeed true, then we should find more “false genes” in a species with more junk in its genome. right?

That makes sense. You surprise me.

2 Likes

if so can you explain why human for instance, has so few false genes compared to Drosophila?

Have you checked how many pseudogenes humans (and Drosophila) have before making this claim?

Actually - before you google it - take a guess how many pseudogenes we are expected to have under a design paradigm, and under an evolutionary paradigm. Then google it.