The term “Sal’s Flower” keeps showing up in different threads on PS and it refers to the Venn diagram shown below:
The source is the Howe et al and should be appropriately referred to as Howe’s diagram or Howe’s flower.
What does Howe’s diagram show? It shows two things, orthologous and species-specific genes. Orthologous genes are genes shared by extant organisms due to common ancestry. Species-specific genes are genes found in only in one or more species and may have been derived independently in those species. Howe and colleagues analyzed and summarized the number of ancestral genes shared between humans, chickens, zebrafish and mice. This is what they found out:
Bold and content in square brackets is mine. Note that Howe’s diagram doesn’t just show orthologous genes for all the organisms examined. It also shows shared genes between any two or three of the organisms, that is, it shows genes shared between humans and mice or mice and chicken or humans, zebrafish and chicken etcetera.
This representation of orthologous and species-specific genes with Venn diagrams is common practice. Another of such representations is that of the various Streptomyces strains as shown below:
In the above figure we see that different strains of Streptomyces all inherited genes (3086, number in the center) from their common ancestor, but as time passed and they diverged more, novel genes were obtained in the different lineages (the species-specific genes).
Hope this helps clarify the meaning of the Howe’s diagram or any other Venn diagram of orthologous and species-specific genes. Corrections from experts are welcome.
Edit: I changed all instances of “non-orthologous” to “species-specific” to better reflect the distinction between genes found or not found within an ancestral set of genes.
Not true. Paralogous genes have detectable signals of shared ancestry through gene duplication. I’m pretty sure the word “orthologous” was chosen as an explicit description.
My bad. By non-orthologous, I meant not coming from a common ancestor but this excludes paralogous genes which share a common ancestor too. I should have used species-specific instead of non-orthologous to describe genes genes independently acquired in different lineages or not found in the ancestral core of genes, right?
I’ve been wondering what method is used to establish genes as “orthologues” as opposed to homologue, and how this annotation is done… If you read the legend to the figure, referring specifically to the Venn diagram, it says::
Evolutionary aspects of the zebrafish genome
a , Orthologue genes shared between the zebrafish, human, mouse and chicken genomes, using orthology relationships from Ensembl Compara 63.
But I have been unable to find how exactly Ensembl Compara defines what is or isn’t an orthologoue (what’s the cutoff?). It is entirely possible, for example, that many of the so-called “species-specific” genes are genes that have diverged a lot over the time since last common ancestor.
It would be interesting to find a handful of concrete examples of these so-called species-specific genes and look more closely at whether they have any corresponding similar sequences in the other species(and/or other more closely related species).
And anyway, it would also be interesting to know what differences (if any) there are between different Ensembl Compara versions. From looking at their website they seem to be at version 104 now. I assume version 63 was the latest or most complete version available in 2013 when Howe et al published their paper. Pretty sure annotation, sequencing efforts, and homology-detection algorithms have all been developing further since that time.
I do believe that the folks who made that diagram meant “non-orthologous” in the strict, technical sense. What you meant is better communicated by “non-homologous”, and that would be a quite different diagram.
I would hope that it would be based on synteny, flanking sequences, and such. But I have no real idea. It certainly shouldn’t be based on genetic distance or similarity.
Well by orthologous they meant a gene is shared by all or some of the organisms examined due to vertical descent. The converse, species-specific, should refer to genes with no detectable homologs in all or some of the sampled organisms.
I think species-specific is more accurate. Thanks.
Not true. What does “a shared gene” mean here? It doesn’t just mean “a shared homologous sequence”. Some homologous sequences aren’t genes, and some are paralogous, not orthologous. The term “orthologous” has a specific meaning. Different members of the same gene family are homologous, but they’re paralogous, not orthologous. The most common origin of new genes is by gene duplication, and the resulting genes are paralogs. I am reasonably certain that most of those non-orthologous genes still have paralogs in at least some of the other species.
No, “species-specific” genes include both paralogs and non-homologs, and the former do indeed have detectable homologs. Further, some species specific genes arise from non-coding sequences and might still have detectable non-coding homologs in related species. Even further, I suspect that very few of the genes in that sample are truly species-specific; family-, order-, or class-specific, perhaps. But with only 4 species in the sample there’s no way to be sure.
That doesn’t explain what “shared gene” means, and it certainly doesn’t hint at any idea that non-shared genes must have no homologs in other species. I’m pretty sure that “ortholog” was used with the specific meaning generally understood among evolutionary biologists.
It’s a better descriptor for what you’re talking about, but non-orthologous is a better descriptor of what the paper is talking about. That would include both paralogous and non-homologous genes, as well as genes with non-coding homologs in other species.
To be sure I am on the same page here, ortholog are genes descended from a single ancestral gene in the last common ancestor of sampled organisms, right? This definition is a slight paraphrase from a Koonin paper I read sometime ago.
Oh oh. I see your point even better now. The thing is the authors didn’t use non-orthologous. They used species-specific to describe genes not found in the core set of genes common to all four organisms (and orthologous for genes common to all four organisms). If I am reading you correctly, that is incorrect?
I have been wondering that myself. I am also curious how annotations affect these numbers. If there are fewer genes annotated in the chicken genome would that impact the numbers?
Yes, but so, potentially are paralogous genes. Paralogous genes arise from gene duplications. If the duplication happens in lineage A, sister to lineage B, lineage A will have two copies and lineage B will have one. One of the copies in lineage A is orthologous to the single copy in lineage B, and one of them is paralogous to both the single copy in B and to the other copy in A. Did that make sense?
Well, species-specific genes are just genes not found in the other species. That doesn’t mean that homologous sequences are not found, just that orthologous genes aren’t found.
I genuinely do not have enough of an insight into these methods to be able to tell how this procedure would affect their results.
Edit: corrected link.
Edit2: But it does seem to confirm that synteny is a substantial aspect of the orthology inference, and why many genuine orthologoues which can have transferred to other chromosomes or simply moved elsewhere on the same chromosome, then will be missed and instead scored as “species-specific” genes.
Probably not. It’s likely that one of the lineage A copies is the original, and thus orthologous to the lineage B gene, and that the other is a copy, and thus paralogous. Now, we might not necessarily be able to tell, particularly if it’s a tandem duplication. But still, one of them must be the original. It gets worse if we have ancestral paralogs, and one of them is lost in lineage A and the other is lost in lineage B. In such a case there are two non-orthologs, but it wouldn’t be easy to tell.
Not if the transferred blocks are larger than single genes. If Gene A is flanked by genes B and C, it should still be recognizable as gene A, whatever the more remote context.
Co-orthology doesn’t seem to have anything to do with which copy among two paralogs is original. The idea is that since the two copies arose just in Lin A post-speciation, they both serve as co-orthologs to the copy in Lin A. When we consider the paralogs in Lin A separately, then only one can be an orthologous to the sole copy in Lin B.
That’s a fallback position if we can’t tell which one is the original. In the case of tandem duplications, it seems likely that we can’t, since a duplication before or after the original would look the same.