Orphan Genes Talk at the Texas Genetics Society

I’m attending a meeting of the Texas Genetics Society this weekend, and I noticed something very interesting in the program. I was just thumbing through when a title caught my eye: “Search for de-novo orphan genes - the neo holy grail of genomes?” Komal K. B. Raja (Baylor College of Medicine - Department of Pathology and Immunology) will give the talk, but none other than our friend, @pnelson is on the byline! Additionally, Richard Gunasekera (I confess, I’ve never heard of him) is also indicated to be associated with both Biola University and the Department of Chemistry at Rice University.

I see nothing in the abstract related to Intelligent Design, but needless to say, I’ll be very interested to hear this talk.

Here is the full list of contributors, just in case anyone is familiar with them:
Komal Raja
Vinodh Gunasekera
Suresh Hewapathirana
Savidu Dias
Paul Nelson
Richard Gunasekera

Do any of you know these scientists?

P. S. I’ll type in the abstract if anyone is interested in reading it.

P. P. S. @pnelson, in the off-chance you are actually at the conference, let me know!


Sure. It seems odd that none of the people involved seems to have anything to do with phylogenetics or comparative biology. I would be interested in their taxon sample, unless this is strictly a methods talk.


Hi Curtis,

Wasn’t there – I was presenting a poster on taxonomically restricted essential genes at this conference (flying home to Chicago today):



Apparently he’s a senior fellow at the DI.

The abstract leads me to believe it is a methods talk (in silico) but I will have an update later.

Orphan genes are de novo genes found unique to specific species or restricted only to a particular taxonomy group in the tree of life. With the profusion of genomes being sequenced today, it has been shown that approximately 20% or more of genes in a given species do not have homologous sequences in other species or in their taxonomical tree which classified them as orphans. Therefore the study of these lineage-specific orphan genes is significant for understanding biological origins and the functions of life. To identify orphan genes in genomic sequences, we have developed a web-based software search engine named ORFanID. The search engine can determine whether a DNA nucleotide or an amino acid sequence is an orphan in the NCBI databases at various taxonomies. ORFanID identifies genes unique to a genus, family, class etc., or strictly at the species level at differing taxonomies according to the original Linnaeus classification. The software allows for user control of the NCBI database search parameters. Based on the parameters specified, some orphans may or may not fall under the given classification for strict orphans, defined as those found in only a specific species. The results of the search are provided in a spreadsheet as well as a graphical display. All the tables in the software are sortable by column and results can be easily filtered with fuzzy search functionality. The graphical display is expandable and collapsible by taxonomy. ORFanID can help delineate the actual sequence and function of de novo genes discovered at the species level and at all levels of the taxonomy tree. The identification of the clandestine orphan genes and what they do in genomes may be the new Holy Grail that redefines genetics as we know it.

Sorry for any typos, I am doing this on my phone and have no inclination to edit at this point :stuck_out_tongue:


Turns out the talk is only 15 minutes this afternoon.

1 Like

Well, that was a lot of typing on a phone. Thanks. So they’re just using the NCBI database. That might be enough to find candidates for further investigation, but I don’t think they would be able to make real claims on that basis. Reading between the lines, this seems like a shell that lies on top of BLAST.


To clarify, this approximate percentage comes from studies comparing protein sequences (or sometimes transcripts). In other words, ~20% of protein-coding genes do not have homologous protein-coding genes in other species. When comparing nucleotide sequences, this percentage goes way down, as many non-coding homologues are found. See my previous analysis on this in these 4 posts:





Echoing what @evograd wrote, I would be interested to hear if they are defining genes as RNA transcripts or translated peptides. I would also be interested in how the ORFanID search engine works. For example, does it search for annotated sequences or is it directly comparing orthologous/homologous genomic DNA sequences?

Thanks for typing that up. My fat thumbs could not have done it.

1 Like

I kinda regretted saying I would do it about halfway in, but I was already committed at that point :slight_smile: It was explained as using full BLAST algorithms, so I don’t think it used only annotated sequences. The presenter also mentioned that it could perform BLASTP and TBLASTN searches. @pnelson can tell us more, if he’s available.

The tool is apparently freely accessible at orfangenes.com if anyone wants to play with it.

1 Like

T-urf13 draws a blank. @pnelson, is that expected? I am not clear what databases the program calls upon.

1 Like

I can feel the burn from here.

I don’t understand, is there a point to your post?

Was it a blank, or did the search not seem to go through? When I click submit I’m immediately taken to this screen:

To my knowledge (haven’t spoken to my co-authors for several weeks), the version of ORFanID at the orfangenes.com link is still a beta tester, so if a trial sequence search turns up nothing, don’t be surprised – the whole thing isn’t working yet as it should be. I am puzzled that the speaker at the Texas meeting announced publicly that ORFanID was ready for anyone to use, but then I am just one fish in this particular ORFanID pond.

Another member of the larger ORFanBase project (not directly connected to ORFanID) is trying to find a better algorithm than BLAST, in its various incarnations. A geneticist friend once told me, somewhat sheepishly, “Paul, I use BLAST, but I don’t have the faintest idea how it works.”

1 Like

I’m afraid so, there was zero indication that this was a beta version.

1 Like

Thanks, @pnelson .

Put a shiny toy in a room full of children and they will want to play with it…


First off, no database or algorithm is perfect. Criticisms shouldn’t overshadow the effort that has been put into the project.

From my limited knowledge, BLAST is going to lean towards false negatives, especially for ORF’s that are less than 1 kb. A largish indel or complicated recombination event would be an obvious cause for false negatives. Therefore, conclusions based on a lack of a match should be manually checked. Trying to match up flanking orthologous regions may also help in elucidating the origin of some orphan genes.

1 Like

I know how BLAST works! That was a landmark paper in computational biology.


It’s got around 70k citations at this point.
Also, as far as bioinformatics algorithms go, BLAST is one of the easiest to understand.