Because I’m not a geneticist and a novice with the online genome tools, I hope you can help me out.
I’m trying some things out with the chromosome 2 fusion in humans and comparing it to the most recent available chimp genome (which I think is NHGRI_mPanTro3-v2.0_pri reference assembly GCF_028858775.2-RS_2024_02).
When I BLAST (pseudo-)genes at or around the fusion site in humans (like DDX11L2 or WASH2P) to the chimp genome, I get the closest hits at chimp chromosomes 21 and 10. Why isn’t this chimp chromosome 2?
I’m also wondering why the chimp chromosomes 2A and 2B aren’t visible in the BLAST results and at the Genome Data Viewer at all. Are they numbered differently? If so, which chromosomes in the Genome Data Viewer correspond to chimp 2A-2B and therefore human 2?
Because, classically, chromosomes are numbered in size order, the two chromosomes that fused to form human chromosome 2 are chimp chromosomes 12 and 13, though now we tend to use the new names 2A and 2B. Makes comparison easier. Are you doing this BLAST at NCBI?
I’ve read that they can indeed be 12 and 13, so I suppose that would be the chimp chromosome numbers in the BLAST results and Genome Data Viewer. So, shouldn’t the closest match of human WASH2P and DDX11L2 be found on those chimp genomes, in stead of 21 and 10?
I have no idea how to explain your result. When I looked at the UCSC genome browser, there’s a WASH2P homolog on chromosome 12. Given the frequency of duplication, transposition, etc., it’s not uncommon to find paralogs of a given sequence in several places, but why it should be absent in the expected spot is unclear. What exactly was your query?
I see from Wikipedia that there’s a region of 100Kbases at the fusion site that appears to be a duplication of a region in human chromosome 9. It’s not entirely clear whether this region includes WASH2P, but that’s implied. And WASH2P is paralogous to several other sequences on several human chromosomes. Try BLASTing your query in the human genome and see what shows up.
Hi and welcome. Interesting question! This is not my immediate area but I will attempt to summon @Zachary_Ardern who, I think, has expertise far beyond mine.
I am really curious about the renaming of the other ape chromosomes and like you I didn’t see it in a quick look at the Genome Data Viewer. But here are some clues, perhaps:
This paper looks in detail at the fusion region…
…and cites this paper which seems to find paralogous regions on 9 and 22. I know that’s not a huge help but this paper (I can only see the abstract) names the chromosomes in other species. This might help?
Good luck!
EDIT: my post was approved after John’s answer above but the sources I link to above might help with questions about why paralogs and orthologs don’t always stay put.
So the two databases are contradictory. No idea what that means. Perhaps different assemblies of the data? You will need someone more adept at this than I am.
Sorry @sfmatheson I don’t have a lot of expertise on the human genome specifically! (though I dabbled recently in exploring claims about % identity)
However, I checked the BLAST of wash2p with a local BLAST search and got the same result as you @bart_klink. I don’t know the different numbering systems that have been used for chimp chromosomes over the years (so not quite sure about interpreting older literature) but my interpretation from what I’ve seen is that the version of that gene family which is expected at chromosome 12 is likely deleted in the chimp. Gene loss is common I think, particularly in regions which have been subject to structural variation.
If you want to explore the evolutionary history of this genome region you may find it useful to plot the presence and absence, chromosome number, and similarity values for a few genes from human chromosome 2 across a few primate species (e.g. chimp, bonobo, gorilla, orangutan). If you want to do it locally on your computer (e.g. to get all pair-wise values in a less tedious way) chatGPT can provide adequate code to run on the terminal, including for downloading the BLAST program (though best to install WSL if on Windows …).
I also have another question. An argument creationists level against the fusion is the presence of the DDX11L2 gene (actually pseudo) spanning across the fusion site. Even is this gene is not functional, what are explanations of how it got there? Relocated after the fusion from a place nearby?
Thanks for your answer! The strange thing is that I get chr. 12 as best match with ensmbl or the UCSC genome browser, but not with NCBI (see replies above).