A problem with orphan genes?

Rumraket · March 24, 2021, 2:55pm

No. That number is a wild overestimate. They might only have a single TRG compared to their closest relatives.

Abstract

Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously noncoding sequences: functional ncRNA, introns, or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false-positives (nonfunctional open reading frames and/or functional genes that did not arise de novo) and false-negatives. Here, we search conservatively for high-confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous noncoding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single-copy genes in Drosophila simulans and Drosophila sechellia . It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that predates their open reading frame. These genes have not been previously reported as de novo originated, and to our knowledge, they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes.

scd · March 24, 2021, 3:00pm

so we are going back to the claim that nearly all of these orphan genes are actually “false genes”?

Rumraket · March 24, 2021, 3:06pm

We never left it.

Yes, the majority identified by mere detection of translatable open reading frames, and possibly by low-level transcripts, are likely to be false positives that are just transient ORFs coming and going as non-coding DNA accumulates mutations, occasionally getting spuriously transcribed.

John_Harshman · March 24, 2021, 4:10pm

Have you ever actually cited the paper you got that figure from, and do they actually say what they did? I see that you haven’t. Let’s start with that.

And is it your contention that D. simulans and D. sechellia are not related by descent? If so, there’s no way to say that they’re sister species and thus no reason to compare them.

John_Harshman · March 24, 2021, 4:13pm

That doesn’t appear to fit @scd’s definition of an orphan gene, since it appears in multiple species.

John_Harshman · March 24, 2021, 4:24pm

Ah, here it is:
https://www.researchgate.net/publication/26776433_More_than_just_orphans_Are_taxonomically-restricted_genes_important_in_evolution

I see that their criterion for “orphan” is that it’s a predicted ORF whose inferred protein sequence doesn’t appear in a protein database. There are many problems with that criterion. As has been mentioned, predicted ORFs grossly overestimate the number of real, functional genes. Further, they don’t test for homologous non-coding sequences in related species. And protein databases are highly incomplete.

But no, they didn’t do genome-to-genome comparisons, as you claimed.

Rumraket · March 24, 2021, 5:26pm

Which is then likely to put the true number of orphans to zero in those two species. And probably for many others, depending on how closely related they are to their nearest extant relatives.

scd · March 25, 2021, 5:10pm

If so we stayed with the problem of the lack of correlation between time and the amount of “false genes”. so why for instance the species grimshawi has a similar number of “false genes” as the newer species such as the simulans? (although the grimshawi is almost ten times more ancient).

Rumraket · March 25, 2021, 5:21pm

They have relatively similar amounts of transient ORFs because they have (relatively) similar genome sizes. At any given moment in time, some portion of the non-coding (and mostly junk) genome will contain a valid open reading frame just by chance. Because the junk-portion is evolving pretty much at a neutral rate, it will differ substantially in terms of sequence even between closely related species. As such, the spurious ORFs in one species are likely to be different from those in another species, hence they will show up as “orphan” open reading frames.

John_Harshman · March 25, 2021, 5:32pm

Sorry, but that’s gibberish. Why should false genes depend on time? And how are you measuring the age of a taxon, since you reject phylogeny?

Faizal_Ali · March 25, 2021, 5:44pm

Reminds me of the time a creationist was trying to argue against common ancestry by pointing out something to do with genes that were conserved between species.

John_Harshman · March 25, 2021, 5:46pm

Something of the sort actually occurs in the recent book from RTB, Thinking about Evolution, though in that case it’s the ages of taxa inferred from phylogeny.

scd · March 25, 2021, 7:34pm

for 3000 “false genes” we need about no more than 3 million bp (suppose 1000 bp per gene). the genome size of Drosophila is almost 200 million bp, so even if 10% of its genome is non- functional we have a place for about 20,000 possible false genes. thus, we should find a correlation between time and number of false genes. and yet we dont. dont forget that animals with much more “junk” have similar number of false genes too.

see above. why not actually?

John_Harshman · March 25, 2021, 8:49pm

More gibberish. False genes both appear (when a stop codon in some reading frame turns into a sense codon) and disappear (when a sense codon becomes a stop codon). It should be obvious that there is an equilibrium point.

Rumraket · March 25, 2021, 8:56pm

No. Remember, the region keeps mutating because it’s not being maintained by selection. So emerging open reading frames quickly(on evolutionary timescales) disappear again as they mutate further.

So a non-coding junk region mutates and becomes an open reading frame(read that to make sure you understand what that refers to), continues accumulating mutations, and eventually stops being a translatable open reading frame again, as new mutations replace what they previously created. This process continues indefinitely where ORFs come and go. Like I tried to depict with the changing colors in my figure up above, you can just substitute in “false genes” in place of genuine de novo genes.

Roy · March 26, 2021, 11:41am

You keep saying that, yet you haven’t shown any data that includes the time values - or even any indication of what you mean by ‘time’.

So there’s no basis for your claim.

scd · March 28, 2021, 7:49pm

if this is indeed true, then we should find more “false genes” in a species with more junk in its genome. right?

John_Harshman · March 28, 2021, 9:05pm

That makes sense. You surprise me.

scd · March 29, 2021, 6:39am

if so can you explain why human for instance, has so few false genes compared to Drosophila?

Witchdoc · March 29, 2021, 9:03am

Have you checked how many pseudogenes humans (and Drosophila) have before making this claim?

Actually - before you google it - take a guess how many pseudogenes we are expected to have under a design paradigm, and under an evolutionary paradigm. Then google it.

Topic		Replies	Views
James Tour on Orphan Genes Conversation	46	2758	July 5, 2019
New article on lineage-specific genes Conversation Science , Article	2	320	November 5, 2020
From Junk to Genes: The Birth of New miRNA Genes in the Human Genome Public Square Science	13	615	March 11, 2021
Answers Journal on Taxonomically Restricted Genes Conversation Science	41	1023	August 12, 2019
JeffB and Swamidass: Understanding Evidence for Phylogeny Conversation Science	88	2997	May 8, 2021

A problem with orphan genes?

Abstract

Related Topics