Finding genes from the chr.2 fusion site in the chimp genome

Because I’m not a geneticist and a novice with the online genome tools, I hope you can help me out.

I’m trying some things out with the chromosome 2 fusion in humans and comparing it to the most recent available chimp genome (which I think is NHGRI_mPanTro3-v2.0_pri reference assembly GCF_028858775.2-RS_2024_02).

When I BLAST (pseudo-)genes at or around the fusion site in humans (like DDX11L2 or WASH2P) to the chimp genome, I get the closest hits at chimp chromosomes 21 and 10. Why isn’t this chimp chromosome 2?

I’m also wondering why the chimp chromosomes 2A and 2B aren’t visible in the BLAST results and at the Genome Data Viewer at all. Are they numbered differently? If so, which chromosomes in the Genome Data Viewer correspond to chimp 2A-2B and therefore human 2?

For WASH2P I used the sequence found here: https://www.ncbi.nlm.nih.gov/nuccore/NR_024077.2/

For DDX11L2 I used the sequence found here: https://www.ncbi.nlm.nih.gov/nuccore/NR_024004.2/

Welcome to Peaceful Science, @bart_klink. That’s a fascinating topic. This is a great place for it.

I will defer to those with vastly greater knowledge.

1 Like

Because, classically, chromosomes are numbered in size order, the two chromosomes that fused to form human chromosome 2 are chimp chromosomes 12 and 13, though now we tend to use the new names 2A and 2B. Makes comparison easier. Are you doing this BLAST at NCBI?

4 Likes

I’ve read that they can indeed be 12 and 13, so I suppose that would be the chimp chromosome numbers in the BLAST results and Genome Data Viewer. So, shouldn’t the closest match of human WASH2P and DDX11L2 be found on those chimp genomes, in stead of 21 and 10?

Fo BLAST I use BLAST: Basic Local Alignment Search Tool

I have no idea how to explain your result. When I looked at the UCSC genome browser, there’s a WASH2P homolog on chromosome 12. Given the frequency of duplication, transposition, etc., it’s not uncommon to find paralogs of a given sequence in several places, but why it should be absent in the expected spot is unclear. What exactly was your query?

I see from Wikipedia that there’s a region of 100Kbases at the fusion site that appears to be a duplication of a region in human chromosome 9. It’s not entirely clear whether this region includes WASH2P, but that’s implied. And WASH2P is paralogous to several other sequences on several human chromosomes. Try BLASTing your query in the human genome and see what shows up.

2 Likes

Hi and welcome. Interesting question! This is not my immediate area but I will attempt to summon @Zachary_Ardern who, I think, has expertise far beyond mine.

I am really curious about the renaming of the other ape chromosomes and like you I didn’t see it in a quick look at the Genome Data Viewer. But here are some clues, perhaps:

This paper looks in detail at the fusion region…

…and cites this paper which seems to find paralogous regions on 9 and 22. I know that’s not a huge help but this paper (I can only see the abstract) names the chromosomes in other species. This might help?

Good luck!

EDIT: my post was approved after John’s answer above but the sources I link to above might help with questions about why paralogs and orthologs don’t always stay put.

3 Likes

Thanks! I do get the best match for DDX11L2 on chr. 12 via https://genome.ucsc.edu/cgi-bin/hgBlat.

When I give the same sequence to the BLAST at NCBI I get:

This is the sequence for DDX11L2 I use (from: https://www.ncbi.nlm.nih.gov/nuccore/NR_024004.2/)
1 ctttgcgagg gcggagttgc gttctcttta gcacacagcc ggagagcatc gcgagggcgg
61 agctgcgttc tcctctgcac agacttcggg gctattgcga aggcggagca gagttcttct
121 caggtgtctg acttccagca actgctggcc tgtgccaggg tgcaagctga gcactggagt
181 ggagttttcc tgtggagagg agccatgcct agagtgggat gggccattgt tcatcttctg
241 gcccctgttg tctgcatgta acttaatacc acaaccaggc ataggggaaa gattggagga
301 aagatgagtg agagcatcaa cttctctgac aacctaggcc agctcctgtc tccccccagg
361 tgtgtggtga tgccaggcat gcccttccct agcatcaggt ctccagagct gcagaagacg
421 acggccgact tggatcacac tcttgtgagt gtccccagtg ttgcagaggt gagaggagag
481 tagacagtga gtgggagtgg cgtcgcccct agggctctac tggaccagcg tctcctgtct
541 cctggagagg cttcgatgcc cctccacacc ctcttgatct tccctgtgat gtcatctgga
601 gccctgctgc ttgcggtggc cttataaagc ctcctggtct ggctccaagg cctggcagag
661 tctttcccag ggaaagctac aagcagcaaa cagtccgcat gggtcatccc cttcactccc
721 agctcagagc ccaggccagg ggcccccaag aaaggctctg gtggagaacc tctgcatgaa
781 ggctgtcaac cagtccatag gcaagcctgg ctgcctccag ctgggtggac agacaggggc
841 tggagaaggg gagaagagga aagggggttg cctgccctgt ctcctacctg aggctgagga
901 gggagaaggg gatgcactgt tggggaggca gctgtaactc aaagccttag cctctgttcc
961 cacgaaggca gggccatcag gcaccaaagg gattctgcca gcatagtgct cctggaccag
1021 tgatacaccc ggcaccctgt cctggacaag ctgttggcct ggatctgagc cctcgtggag
1081 gtcaaagcca cctttggttc tgccattgct gctgtgtgga agttcactcc tgccttttcc
1141 tttccctaga gcctccacca ccccgagatc acatttctca ctgccttttg tctgcccagt
1201 ttcactagaa gtaggcctca tcctgacagg cagctgcacc actgcctggc gctgtgccct
1261 tcctttgctc tgcccgctgg agacggtgtt tgtcatgggc ctggtctgca gggatcctgc
1321 tacaaaggtg aaacccagga gagtgtggag tccagagtgt tgccaggacc caggcacagg
1381 cattagtgcc cgttggagaa aacgggaatc ccaaagaaat ggtgggtcct ggccatccgt
1441 gagatcttcc cagggcagct cccctctgtg gaacccaatc tgtcttccat cctgtgtggc
1501 cgagggccag gcttctcact aggcctctgc aggaggctgc catttgtcct gcccaccttc
1561 ttagaagcga gacggagcag acccatctgc tactgccctt tctataataa ctaaagttag
1621 ctgccctgga ctattcaccc cctagtctca atttaaaaag atccccatgg ccacagggcc
1681 cctgcctggg ggcttgtcac ctcccccacc ttcttcctga gtcactcctg cagccttgct
1741 ccctaacctg ccccacagcc ttgcctggat gtctatctcc ctggcttggt gccagttcct
1801 ccaagtcgat ggcacctccc tccctctcaa ccacttgagc aaactccaag acatcttcta
1861 ccccaacacc agcaattgtg ccaagggcca ttaggctctc agcatgacta tttttagaga
1921 ccccgtgtct gtcactgaaa ccttttttgt gggagactat tcctcccatc tgcaacagct
1981 gcccctgcta actgcccttc tctcctccct ctcatcccag agaaacaggt cagctgggag
2041 cttctgcccc cactgcctag ggaccaacag gggcaggagg cagtcactga ccccgagacg
2101 tttgcatcct gcacagctag aggtccttta ttaaaagcac actgttggtt tctgctcaaa
2161 aaaaaaaa

So the two databases are contradictory. No idea what that means. Perhaps different assemblies of the data? You will need someone more adept at this than I am.

Using the UCSC genome browser, I found a 97% identical sequence on chromosome 12 of the bonobo genome. You might try BLAT.

The assemblies are indeed different. I tried the same assembly as I used at NCBI here: https://www.ensembl.org/Pan_troglodytes/Tools/Blast/Results?tl=DF3sSDocjANb3jYc-11206516 and got the best match for chr. 12, after two results that start with AACZ (no idea what those are).

Maybe the algorithms at NCBI on the one hand and ensmbl/UCSC on the other are different?

UCSC uses BLAT and NCBI uses BLAST. Maybe that causes the difference?

Sorry @sfmatheson I don’t have a lot of expertise on the human genome specifically! (though I dabbled recently in exploring claims about % identity)

However, I checked the BLAST of wash2p with a local BLAST search and got the same result as you @bart_klink. I don’t know the different numbering systems that have been used for chimp chromosomes over the years (so not quite sure about interpreting older literature) but my interpretation from what I’ve seen is that the version of that gene family which is expected at chromosome 12 is likely deleted in the chimp. Gene loss is common I think, particularly in regions which have been subject to structural variation.

If you want to explore the evolutionary history of this genome region you may find it useful to plot the presence and absence, chromosome number, and similarity values for a few genes from human chromosome 2 across a few primate species (e.g. chimp, bonobo, gorilla, orangutan). If you want to do it locally on your computer (e.g. to get all pair-wise values in a less tedious way) chatGPT can provide adequate code to run on the terminal, including for downloading the BLAST program (though best to install WSL if on Windows …).

1 Like

I also have another question. An argument creationists level against the fusion is the presence of the DDX11L2 gene (actually pseudo) spanning across the fusion site. Even is this gene is not functional, what are explanations of how it got there? Relocated after the fusion from a place nearby?

Thanks for your answer! The strange thing is that I get chr. 12 as best match with ensmbl or the UCSC genome browser, but not with NCBI (see replies above).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.