evograd
(Blogging Graduate Student)
April 5, 2019, 3:21pm
8
To clarify, this approximate percentage comes from studies comparing protein sequences (or sometimes transcripts). In other words, ~20% of protein-coding genes do not have homologous protein-coding genes in other species. When comparing nucleotide sequences, this percentage goes way down, as many non-coding homologues are found. See my previous analysis on this in these 4 posts:
Ok, so I’ve taken a look at the data from Ruiz-Orera et al. (2015):
They identified 634 candidate de novo genes in the human genome, based on the fact that they found 1,029 transcripts in humans that weren’t present in chimpanzees.
They link to a GTF file containing the information about those human-specific transcripts here: http://dx.doi.org/10.6084/m9.figshare.1604892
but note that this file contains both the human-specific transcripts found in humans, as well as the hominoid-specific transcripts found in humans. This wasn’t apparent to me at first, and I think @roohif missed it too, as he included the entire file in his analysis. I separated out the species-specific transcripts only (1,029).
A second complication is that the GTF file contains separate entries for different exons and CDSs in each transcript, so while there are only 1,029 transcripts (1,029 transcript IDs), there are ~4,000 individual sequences specified in the file. As these 1,029 transcripts correspond to jus…
https://discourse.peacefulscience.org/t/comments-on-james-tour-on-orphan-genes/5009/102?u=evograd
https://discourse.peacefulscience.org/t/comments-on-james-tour-on-orphan-genes/5009/109?u=evograd
https://discourse.peacefulscience.org/t/comments-on-james-tour-on-orphan-genes/5009/123?u=evograd
5 Likes