Orphan Genes Talk at the Texas Genetics Society

cwhenderson · April 5, 2019, 3:33am

I’m attending a meeting of the Texas Genetics Society this weekend, and I noticed something very interesting in the program. I was just thumbing through when a title caught my eye: “Search for de-novo orphan genes - the neo holy grail of genomes?” Komal K. B. Raja (Baylor College of Medicine - Department of Pathology and Immunology) will give the talk, but none other than our friend, @pnelson is on the byline! Additionally, Richard Gunasekera (I confess, I’ve never heard of him) is also indicated to be associated with both Biola University and the Department of Chemistry at Rice University.

I see nothing in the abstract related to Intelligent Design, but needless to say, I’ll be very interested to hear this talk.

Here is the full list of contributors, just in case anyone is familiar with them:
Komal Raja
Vinodh Gunasekera
Suresh Hewapathirana
Savidu Dias
Paul Nelson
Richard Gunasekera

Do any of you know these scientists?

P. S. I’ll type in the abstract if anyone is interested in reading it.

P. P. S. @pnelson, in the off-chance you are actually at the conference, let me know!

John_Harshman · April 5, 2019, 4:32am

Sure. It seems odd that none of the people involved seems to have anything to do with phylogenetics or comparative biology. I would be interested in their taxon sample, unless this is strictly a methods talk.

pnelson · April 5, 2019, 6:09am

Hi Curtis,

Wasn’t there – I was presenting a poster on taxonomically restricted essential genes at this conference (flying home to Chicago today):

https://evolutionevolving.org

evograd · April 5, 2019, 6:55am

Apparently he’s a senior fellow at the DI.

cwhenderson · April 5, 2019, 11:17am

The abstract leads me to believe it is a methods talk (in silico) but I will have an update later.

Orphan genes are de novo genes found unique to specific species or restricted only to a particular taxonomy group in the tree of life. With the profusion of genomes being sequenced today, it has been shown that approximately 20% or more of genes in a given species do not have homologous sequences in other species or in their taxonomical tree which classified them as orphans. Therefore the study of these lineage-specific orphan genes is significant for understanding biological origins and the functions of life. To identify orphan genes in genomic sequences, we have developed a web-based software search engine named ORFanID. The search engine can determine whether a DNA nucleotide or an amino acid sequence is an orphan in the NCBI databases at various taxonomies. ORFanID identifies genes unique to a genus, family, class etc., or strictly at the species level at differing taxonomies according to the original Linnaeus classification. The software allows for user control of the NCBI database search parameters. Based on the parameters specified, some orphans may or may not fall under the given classification for strict orphans, defined as those found in only a specific species. The results of the search are provided in a spreadsheet as well as a graphical display. All the tables in the software are sortable by column and results can be easily filtered with fuzzy search functionality. The graphical display is expandable and collapsible by taxonomy. ORFanID can help delineate the actual sequence and function of de novo genes discovered at the species level and at all levels of the taxonomy tree. The identification of the clandestine orphan genes and what they do in genomes may be the new Holy Grail that redefines genetics as we know it.

Sorry for any typos, I am doing this on my phone and have no inclination to edit at this point

cwhenderson · April 5, 2019, 1:28pm

Turns out the talk is only 15 minutes this afternoon.

John_Harshman · April 5, 2019, 1:54pm

Well, that was a lot of typing on a phone. Thanks. So they’re just using the NCBI database. That might be enough to find candidates for further investigation, but I don’t think they would be able to make real claims on that basis. Reading between the lines, this seems like a shell that lies on top of BLAST.

evograd · April 5, 2019, 3:21pm

To clarify, this approximate percentage comes from studies comparing protein sequences (or sometimes transcripts). In other words, ~20% of protein-coding genes do not have homologous protein-coding genes in other species. When comparing nucleotide sequences, this percentage goes way down, as many non-coding homologues are found. See my previous analysis on this in these 4 posts:

James Tour on Orphan Genes Conversation

Ok, so I’ve taken a look at the data from Ruiz-Orera et al. (2015): They identified 634 candidate de novo genes in the human genome, based on the fact that they found 1,029 transcripts in humans that weren’t present in chimpanzees. They link to a GTF file containing the information about those human-specific transcripts here: http://dx.doi.org/10.6084/m9.figshare.1604892 but note that this file contains both the human-specific transcripts found in humans, as well as the hominoid-specific transcripts found in humans. This wasn’t apparent to me at first, and I think @roohif missed it too, as he included the entire file in his analysis. I separated out the species-specific transcripts only (1,029). A second complication is that the GTF file contains separate entries for different exons and CDSs in each transcript, so while there are only 1,029 transcripts (1,029 transcript IDs), there are ~4,000 individual sequences specified in the file. As these 1,029 transcripts correspond to jus…

https://discourse.peacefulscience.org/t/comments-on-james-tour-on-orphan-genes/5009/102?u=evograd

https://discourse.peacefulscience.org/t/comments-on-james-tour-on-orphan-genes/5009/109?u=evograd

https://discourse.peacefulscience.org/t/comments-on-james-tour-on-orphan-genes/5009/123?u=evograd

T_aquaticus · April 5, 2019, 6:04pm

Echoing what @evograd wrote, I would be interested to hear if they are defining genes as RNA transcripts or translated peptides. I would also be interested in how the ORFanID search engine works. For example, does it search for annotated sequences or is it directly comparing orthologous/homologous genomic DNA sequences?

Thanks for typing that up. My fat thumbs could not have done it.

cwhenderson · April 5, 2019, 10:28pm

I kinda regretted saying I would do it about halfway in, but I was already committed at that point It was explained as using full BLAST algorithms, so I don’t think it used only annotated sequences. The presenter also mentioned that it could perform BLASTP and TBLASTN searches. @pnelson can tell us more, if he’s available.

The tool is apparently freely accessible at orfangenes.com if anyone wants to play with it.

Art · April 6, 2019, 12:34am

T-urf13 draws a blank. @pnelson, is that expected? I am not clear what databases the program calls upon.

BenKissling · April 6, 2019, 2:49am

I can feel the burn from here.

cwhenderson · April 6, 2019, 3:56am

I don’t understand, is there a point to your post?

evograd · April 6, 2019, 10:04am

Was it a blank, or did the search not seem to go through? When I click submit I’m immediately taken to this screen:

pnelson · April 6, 2019, 1:07pm

To my knowledge (haven’t spoken to my co-authors for several weeks), the version of ORFanID at the orfangenes.com link is still a beta tester, so if a trial sequence search turns up nothing, don’t be surprised – the whole thing isn’t working yet as it should be. I am puzzled that the speaker at the Texas meeting announced publicly that ORFanID was ready for anyone to use, but then I am just one fish in this particular ORFanID pond.

Another member of the larger ORFanBase project (not directly connected to ORFanID) is trying to find a better algorithm than BLAST, in its various incarnations. A geneticist friend once told me, somewhat sheepishly, “Paul, I use BLAST, but I don’t have the faintest idea how it works.”

cwhenderson · April 6, 2019, 1:28pm

I’m afraid so, there was zero indication that this was a beta version.

Art · April 6, 2019, 2:09pm

Thanks, @pnelson .

Put a shiny toy in a room full of children and they will want to play with it…

T_aquaticus · April 8, 2019, 4:39pm

First off, no database or algorithm is perfect. Criticisms shouldn’t overshadow the effort that has been put into the project.

From my limited knowledge, BLAST is going to lean towards false negatives, especially for ORF’s that are less than 1 kb. A largish indel or complicated recombination event would be an obvious cause for false negatives. Therefore, conclusions based on a lack of a match should be manually checked. Trying to match up flanking orthologous regions may also help in elucidating the origin of some orphan genes.

swamidass · April 8, 2019, 5:52pm

I know how BLAST works! That was a landmark paper in computational biology.

davecarlson · April 8, 2019, 5:54pm

It’s got around 70k citations at this point.
Also, as far as bioinformatics algorithms go, BLAST is one of the easiest to understand.

Topic		Replies	Views
James Tour on Orphan Genes Conversation	46	2903	July 5, 2019
From Junk to Genes: The Birth of New miRNA Genes in the Human Genome Public Square Science	13	643	March 11, 2021
New article on lineage-specific genes Conversation Science , Article	2	357	November 5, 2020
Junk DNA, Rana, and Cardinale Conversation Design	20	949	February 16, 2023
Carter: Response to TMR4A and Created Heterozygosity Conversation Adam , Science , Design , Dialogue	95	2198	November 4, 2020

Orphan Genes Talk at the Texas Genetics Society

Related topics