Finding genes from the chr.2 fusion site in the chimp genome

Because I’m not a geneticist and a novice with the online genome tools, I hope you can help me out.

I’m trying some things out with the chromosome 2 fusion in humans and comparing it to the most recent available chimp genome (which I think is NHGRI_mPanTro3-v2.0_pri reference assembly GCF_028858775.2-RS_2024_02).

When I BLAST (pseudo-)genes at or around the fusion site in humans (like DDX11L2 or WASH2P) to the chimp genome, I get the closest hits at chimp chromosomes 21 and 10. Why isn’t this chimp chromosome 2?

I’m also wondering why the chimp chromosomes 2A and 2B aren’t visible in the BLAST results and at the Genome Data Viewer at all. Are they numbered differently? If so, which chromosomes in the Genome Data Viewer correspond to chimp 2A-2B and therefore human 2?

For WASH2P I used the sequence found here: https://www.ncbi.nlm.nih.gov/nuccore/NR_024077.2/

For DDX11L2 I used the sequence found here: https://www.ncbi.nlm.nih.gov/nuccore/NR_024004.2/

Welcome to Peaceful Science, @bart_klink. That’s a fascinating topic. This is a great place for it.

I will defer to those with vastly greater knowledge.

1 Like

Because, classically, chromosomes are numbered in size order, the two chromosomes that fused to form human chromosome 2 are chimp chromosomes 12 and 13, though now we tend to use the new names 2A and 2B. Makes comparison easier. Are you doing this BLAST at NCBI?

3 Likes

I’ve read that they can indeed be 12 and 13, so I suppose that would be the chimp chromosome numbers in the BLAST results and Genome Data Viewer. So, shouldn’t the closest match of human WASH2P and DDX11L2 be found on those chimp genomes, in stead of 21 and 10?

Fo BLAST I use BLAST: Basic Local Alignment Search Tool

I have no idea how to explain your result. When I looked at the UCSC genome browser, there’s a WASH2P homolog on chromosome 12. Given the frequency of duplication, transposition, etc., it’s not uncommon to find paralogs of a given sequence in several places, but why it should be absent in the expected spot is unclear. What exactly was your query?

I see from Wikipedia that there’s a region of 100Kbases at the fusion site that appears to be a duplication of a region in human chromosome 9. It’s not entirely clear whether this region includes WASH2P, but that’s implied. And WASH2P is paralogous to several other sequences on several human chromosomes. Try BLASTing your query in the human genome and see what shows up.

1 Like

Hi and welcome. Interesting question! This is not my immediate area but I will attempt to summon @Zachary_Ardern who, I think, has expertise far beyond mine.

I am really curious about the renaming of the other ape chromosomes and like you I didn’t see it in a quick look at the Genome Data Viewer. But here are some clues, perhaps:

This paper looks in detail at the fusion region…

…and cites this paper which seems to find paralogous regions on 9 and 22. I know that’s not a huge help but this paper (I can only see the abstract) names the chromosomes in other species. This might help?

Good luck!

EDIT: my post was approved after John’s answer above but the sources I link to above might help with questions about why paralogs and orthologs don’t always stay put.

1 Like

Thanks! I do get the best match for DDX11L2 on chr. 12 via https://genome.ucsc.edu/cgi-bin/hgBlat.

When I give the same sequence to the BLAST at NCBI I get:

This is the sequence for DDX11L2 I use (from: https://www.ncbi.nlm.nih.gov/nuccore/NR_024004.2/)
1 ctttgcgagg gcggagttgc gttctcttta gcacacagcc ggagagcatc gcgagggcgg
61 agctgcgttc tcctctgcac agacttcggg gctattgcga aggcggagca gagttcttct
121 caggtgtctg acttccagca actgctggcc tgtgccaggg tgcaagctga gcactggagt
181 ggagttttcc tgtggagagg agccatgcct agagtgggat gggccattgt tcatcttctg
241 gcccctgttg tctgcatgta acttaatacc acaaccaggc ataggggaaa gattggagga
301 aagatgagtg agagcatcaa cttctctgac aacctaggcc agctcctgtc tccccccagg
361 tgtgtggtga tgccaggcat gcccttccct agcatcaggt ctccagagct gcagaagacg
421 acggccgact tggatcacac tcttgtgagt gtccccagtg ttgcagaggt gagaggagag
481 tagacagtga gtgggagtgg cgtcgcccct agggctctac tggaccagcg tctcctgtct
541 cctggagagg cttcgatgcc cctccacacc ctcttgatct tccctgtgat gtcatctgga
601 gccctgctgc ttgcggtggc cttataaagc ctcctggtct ggctccaagg cctggcagag
661 tctttcccag ggaaagctac aagcagcaaa cagtccgcat gggtcatccc cttcactccc
721 agctcagagc ccaggccagg ggcccccaag aaaggctctg gtggagaacc tctgcatgaa
781 ggctgtcaac cagtccatag gcaagcctgg ctgcctccag ctgggtggac agacaggggc
841 tggagaaggg gagaagagga aagggggttg cctgccctgt ctcctacctg aggctgagga
901 gggagaaggg gatgcactgt tggggaggca gctgtaactc aaagccttag cctctgttcc
961 cacgaaggca gggccatcag gcaccaaagg gattctgcca gcatagtgct cctggaccag
1021 tgatacaccc ggcaccctgt cctggacaag ctgttggcct ggatctgagc cctcgtggag
1081 gtcaaagcca cctttggttc tgccattgct gctgtgtgga agttcactcc tgccttttcc
1141 tttccctaga gcctccacca ccccgagatc acatttctca ctgccttttg tctgcccagt
1201 ttcactagaa gtaggcctca tcctgacagg cagctgcacc actgcctggc gctgtgccct
1261 tcctttgctc tgcccgctgg agacggtgtt tgtcatgggc ctggtctgca gggatcctgc
1321 tacaaaggtg aaacccagga gagtgtggag tccagagtgt tgccaggacc caggcacagg
1381 cattagtgcc cgttggagaa aacgggaatc ccaaagaaat ggtgggtcct ggccatccgt
1441 gagatcttcc cagggcagct cccctctgtg gaacccaatc tgtcttccat cctgtgtggc
1501 cgagggccag gcttctcact aggcctctgc aggaggctgc catttgtcct gcccaccttc
1561 ttagaagcga gacggagcag acccatctgc tactgccctt tctataataa ctaaagttag
1621 ctgccctgga ctattcaccc cctagtctca atttaaaaag atccccatgg ccacagggcc
1681 cctgcctggg ggcttgtcac ctcccccacc ttcttcctga gtcactcctg cagccttgct
1741 ccctaacctg ccccacagcc ttgcctggat gtctatctcc ctggcttggt gccagttcct
1801 ccaagtcgat ggcacctccc tccctctcaa ccacttgagc aaactccaag acatcttcta
1861 ccccaacacc agcaattgtg ccaagggcca ttaggctctc agcatgacta tttttagaga
1921 ccccgtgtct gtcactgaaa ccttttttgt gggagactat tcctcccatc tgcaacagct
1981 gcccctgcta actgcccttc tctcctccct ctcatcccag agaaacaggt cagctgggag
2041 cttctgcccc cactgcctag ggaccaacag gggcaggagg cagtcactga ccccgagacg
2101 tttgcatcct gcacagctag aggtccttta ttaaaagcac actgttggtt tctgctcaaa
2161 aaaaaaaa