As most of you probably know, there have been a series of posts on ENV in the last week that purport to show that the newest and best estimates of the sequence divergence between humans and chimps is now as high as 14.9% based on the complete sequencing of several ape genomes.
The authors of the paper in question didnt revise the current estimate of ~ 1.2% so Casey did a deep dive into the arcane supplemental data to obtain that higher number.
https://www.nature.com/articles/s41586-025-08816-3
As far as I can tell Casey took the total nucleotides for various insertions, deletions and inversions as nucleotide differences. It seems to me this approach is wrong for the question we’re considering and my very quick and sloppy calculation shows the divergence to be ~1.3% at the most. But I stress this is quick and sloppy so others need to check!!
Heres how I approach this question. Lets say I wanted to make some money by copying a current best selling novel and selling it cheap. Lets say the book is 300 pages and on the first page I introduce an extra space. I try to claim that after the space the 2 works no longer line up, but of course if you just remove the gap they line up perfectly. So then was I do is add 700 pages with just the letter ‘a’. ( or a period or a blank space). Now I can say my work is 1000 pages and only 300 pages line up, so the similarity is 30%. That may be low enough to avoid plagiarism charges. But of course the question is not how many letters and characters lines up but how many changes it took to convert one to the other - adding 700 filler pages is still just one change.
It seems to me that Casey has counted all the nucleotides in insertions and deletions as individual changes.
First I’ll say it was never the absolute sequence difference between humans and chimps that was considered evidence for common ancestry, it was the relative difference between different species. Calculating differences in this way would drastically increase the sequence divergence between individual humans as well.
What we’re interested in here is how many changes occurred in each lineage leading to human and chimps from a common ancestor. A single nucleotide change counts as one change but so does the insertion of 1000 nucs.
So what I did is this. The h/c sequence difference due to indels etc. is about 13.7%. Thats 411 million nucs. Lines are about 6 kbp, sines are ~ 300. If we take an average for the indels as 2500 that increases the divergence by a small fraction of a percent. If we count the average size at 200 bp that only increases the divergence by about .068%. I haven’t considered the regions that show rapid change such as the MHC loci and others and I could have easily missed something. When I started my hunch was the new # would be around 1.7%- 2%