A problem with orphan genes?

If evolution is true we should probably find a correlation between time and new genes over time, since all species supposedly evolved from a common ancestor, as we can see in that figure:

now, when we look at the data, we can see that there is no correlation between time and evolution of unique genes, since basically any species has a similar percentage of them (between 10-20% on average):

It seems as if all those species were created separately and did not evolve from a common ancestor.

(image from https://www.researchgate.net/figure/Percentages-of-orphan-or-taxonomically-restricted-genes-TRGs-in-30-animal-genomes_fig1_26776433)

All extant species have been evolving for the same amount of time, so there is nothing to correlate to.

Once again you’re failing to understand the basics of evolution. It produces a branching tree, not a linear scale.


yes there is. remember that we are talking here about orphan genes. which means that these genes are unique for each species. so just for the sake of the argument, suppose that human has about 10% unique genes (so he supposedly evolved these genes in the last 6-8 my). we should not find that percentage in a lungfish for instance, since lungfish exist for at least 400 my.

When they speak of the number of “unique genes” in the lungfish genome, to whose genome is it being compared?

@pnelson , can you help @scd develop this train of thought?

What about gene loss?

Compared to whom?

The figure you cited lists taxonomically restricted genes (TRG’s). This means that older genes will have spread wider in the tree so will no longer be counted as taxonomically restricted genes. Therefore, only recent new genes will be counted as TRG’s so it isn’t surprising that there are similar numbers in all of the species they looked at since we are looking at short time frames towards the tips of the tree.

1 Like

If they are unique for each species then they had to evolve recently, right?

If we are looking for genes found in just one species of lungfish, yes we should see a similar percentage since we would be looking at genes that evolved in just the last few million years.

1 Like

There is no such thing as the lungfish.

Understand then criticise. Not the other way around.

1 Like

How can you know that unless you compare each species to its sister species, which your little graph doesn’t do? Your comparison is useless without a tree behind it.

You should understand that there are several extant species of lungfish, not separated from each other by 400 my. What was that lungfish compared to?

  1. There is no requirement of evolution that the total number of protein coding genes in a genome of an organism should just continue to accumulate indefinitely. Organisms aren’t continuously accumulating new genes without losing old ones too. The total number of protein coding genes has stayed relatively constant throughout metazoan evolution.

  2. There is some core of central metabolic, developmental, and informational genes that stick around by purifying selection (the part shared among all animals in your figure), and then non-essential but fitness-contributing genes are replaced over time as old ones are no longer necessary, and new ones evolve to replace old ones.

  3. The vast majority of predicted orphan genes are either false positives(aren’t actually functional protein coding genes, just recently emerged but ultimately transient open reading frames being spuriously transcribed), or just very old genes that have mistakenly identified as lacking signal of relatedness by homology deteciton failure(see: Many, but not all, lineage-specific genes can be explained by homology detection failure).

These factors explain why most species would have relatively similar amounts of TRGs, as the fraction of all protein coding genes in the genome that consists of novel, non-essential genes stays relatively constant(though also significantly smaller than that identified in your older reference), but also because they are usually non-essential they are also subject to a lot of gene turn-over, being lost and replaced quickly.

There you go.


This bears repeating. As somebody who has assembled and annotated some complex genomes over the past couple years, I can’t emphasize enough that no newly published genome assembly/annotation is complete or free of errors. It takes years of effort and many iterations to “finalize” a set of predicted genes. Even genomes from well studied and economically important species are still being refined over time.


if its indeed true than how do you explain their high percentage? 15% of about 20,000 genes is about 3000 genes that supposedly evolved in the last few million years. compare with about 6000 genes that supposedly evolved in the last 500-600 my. doesnt make sense.

i dont think they checked lungfish genome. i just gave it as a theoretical example.

this because of my english.


So I still don’t understand what argument you are trying to make. You claim there should be a “correlation between time and evolution of unique genes”, but what do you mean by “time”? “Time” from when to when?

Presumably he means time to the last common ancestor used in the comparison. In other words, if 2 species that diverged 10 million years ago should have “x” orphan genes relative to each other, then 2 species that diverged 50 million years ago should have “5x” orphan genes relative to each other. This all assuming we’ve accounted for mutation rate and population size, and have sufficient outgroups to determine which genes were present in the common ancestor of each pair.

1 Like

It would depend on which species they are comparing as to how long the time period is. Also, taxonomically restricted genes may not even be new genes if there isn’t enough coverage in the phylogeny.

i dont think we need that. unless all of these species in the figure are equally close to each other, which im pretty sure its not the case.

true. so your scenario is that there is some limitation for the number of genes evolving? if so why do we have much more genes that are not unique?

its still doesnt explain the lack of correlation between time and gene count (even if they are “false genes”).

the same problem. if these are indeed genes that were lost, we should see a correlation between time and genes that were lost over time.

How much thought did you put into this?

Isn’t % DNA similarity an extremely good proxy for exactly what you are asking?

It’s not so much a scenario as it is an inference from the data. There just appears to be no need to continuously expand the total number of protein coding genes over the course of (at least) metazoan evolution.

To the extend that there is variation in genome size (not to be confused with total gene number) it comes mostly from changes in the amount of non-coding DNA (facilitated mostly by changes in the activity of transposable elements), but that a relatively limited and similar number of protein coding genes (roughly about 20000 to 30000 total protein coding genes) has remained throughout.

As above, it appears that this core set of shared genes for animals evolved relatively early, defines a set of basic animal functions that make for a somewhat successful organism across a broad range of environments, and only relatively little additional gene innovation is actually necessary to specialize this organism further to the many different environmental niches that exist on Earth(with any remainder specialization owing more to changes in regulation and non-coding DNA). For the most part, while these additional genes can have fitness-contributing functions, they are not essential, and so can often be lost and replaced with new ones at little to no cost.

If there is a relatively constant number of them over time, and they are quickly gained and lost again, hence continuously replaced, that really does explain why different species appear to have roughly equal numbers of them.

I thought about explaining this in more detail but they say a picture is worth:

I believe I have now explained this so everyone should be able to understand.

1 Like

That made no sense. You can’t say that a gene is present only in one species unless you compare it to all likely sister species. If you reject phylogeny, then I suppose you would have to have the complete genomes of every species in the world just to be sure.

Time? Where do you even get a measure of time without a phylogeny?

1 Like

ok. so lets take a look at Drosophila phylogeny:

(image from https://www.researchgate.net/figure/Genus-Drosophila-phylogeny-Powell-1997_fig2_5857976)

the species simulans and sechellia supposedly split off about 3 my ago, and both have about 20% unique genes. which give us about 3000 new genes in about 3 million years. or a single new gene per 1000 years. can you show me how we can get so many new genes (without any homologous) in such a short time?

see above: this is what they did. at least for the Drosophila.