William Amos: Did Sapiens interbreed with Neanderthals?


Neanderthal introgression: a case of smoke and mirrors?

Goal: It is widely accepted that modern humans carry a few percent Neanderthal DNA. However, much of the evidence on which this inference is made depends on an assumption that mutation rate is constant. As my work shows, it is not! I have now conducted a whole series of analyses to test predictions from the introgression model. In every single case the evidence favours a faster mutation rate in Africa over introgression. My best current estimate of non-modern human DNA in Europeans / East Asians is two orders of magnitude less than the published estimates of ~2%. My paper should appear soon(ish) on BioArchives.

He also has one BioRxiv paper:

What do you make of his work @glipsnort, @John_Harshman, and @Joe_Felsenstein?

Immediate impression: Exclamation points don’t make your claims more convincing, and BioRxiv is not real publication.

1 Like

It will be a few days before I can read the pre-print. When I do, I will be looking at these 2 issues closely:

  1. To what mechanisms does he ascribe the radically faster mutation rate in Africa? My prior is that mutation rates within a species do not vary across time or geography in the absence of explainable causes.

  2. How is European H Sapiens DNA similarity with Neandertalis DNA explained? Amos seems to focus on explaining dissimilarity between African and European DNA, but the similarity with Neanderthal DNA must also be explained.

Merry Christmas to all!




1 Like

He purports to show that the percentage of Neandertal DNA depends on distance from Ethiopia. Figure 4 doesn’t however seem to show that, instead having an west-east gradient across Eurasia. However his model should show a step function, as under it each bottleneck should produce a more homozygous and slower evolving population - he should be predicting that Native Americans have more archaic DNA than Eurasians, though by a small amount as the bottleneck was relatively recent.(Possible counterargument - homozygosity increasing on the frontier of population expansion, which would smear out the steps.)

He rather glosses over the distribution of Denisovan DNA, which is rather more unevenly distributed than that of Denisovan DNA. Why are neandertal alleles found in Africa, but not Denisovan ones? Why are archaic African alleles only found in Africa?

He assumes an out-of-Africa bottleneck. But he also offers an alternative explanation (African populations evolving more rapidly) for much of the evidence for the out-of-Africa model. But he still has the mtDNA and non-pseudoautosomal region of the Y chromosome to support out-of-Africa.

His model should predict that Eurasian AMH alleles are less distant from Neandertal ones that African ones. By focusing on base pairs, rather than on longer stretches of DNA up to nearly full genomes, he has missed the opportunity to perform additional tests on his hypothesis.

How much faster does he propose African populations are evolving? Is this a plausible number?

1 Like

He claims a correlation between mutation rate and heterozygosity.

This is what I have the biggest problem with. He’s performing his test on, apparently, on individual sites across whole genomes. But it’s my understanding that the Neandertal and Denisovan bits are clumped into small regions, not evenly distributed. I don’t see how a genome-wide mutation rate can explain that.

1 Like

The number I recall is that 20% of the Neandertal genome is segregating in Eurasian populations.

One study identified 20% of the Neanderthal genome in existing populations, but estimated the total fraction segregating to be 35% to 70%. Another paper identified roughly a third of the Neanderthal genome in modern humans.

1 Like

I’m reluctant to wade into the Amos paper. Given the variety and sophistication of analyses that have been applied to inferring the Neanderthal and Denisovan introgressions, and the number of smart people who tried to poke holes in the conclusion, my prior on the Amos being right is quite small. I’d rather wait until he’s able to get it published.


To @glipsnort @Chris_Falter and all who are in the relevant fields…

I asked Dr. Amos about the suggestion that looking at longer stretches of DNA would provide the evidence for Neanderthal introgression that was not found looking base-pair by base-pair. He gave me a detailed response, some of which I may not have the background to understand correctly (and almost certainly not understand fully). But since he did respond to the only evidentiary feed-back I got on this, I wanted to share it…

great to hear from you. The ‘haplotype’ (=chunk) objection is common
but misguided. There are many important issues:

  1. Just as many bases are a polymorphic in closely related lineages
    because they have not yet drifted to fixation (incomplete lineage
    sorting) so will many haplotypes. It is unclear how you can tell an
    introgressed haplotype from one that is present due to incomplete
    lineage sorting. In theory, older haplotypes (incomplete sorting)
    should be shorter, but there is huge variance and the expected
    distribution of lengths is unknown because we do not yet know the rules
    governing recombination and how recombination hotspots shift over time.

I had often wondered about this myself. For example Joshua once gave me a detailed explanation as to why even though gorillas are closer to humans than chimps are on about 1/3 of the gene differences, there is no problem with our understanding of the evolutionary relationships on that score. I bought it, but in the back of my mind it made me wonder if these finer distinctions would be impossible to make if ILS was so powerful.

  1. How are haplotypes inferred? This is extremely unclear. The original paper by Green et al. found an ABBA-BABA excess of 7,000 across about 1Gb of aligned bases, so about 1 every 100,000 bases. The average length of introgressed fragments is meant to be around 50Kb. To define a haplotype you need at least two informative sites (one at each end!), so most introgressed haplotypes cannot be identified by this method. Moreover, the situation is worse because the 7,000 excess is against a background of ~100,000 ABBAs / BABAs due to incomplete lineage sorting, so there is an added issue that a very weak signal has to be detected in a system where 9/10 of the signal is ‘background’. I have tried quite hard to find out how introgressed haplotypes are ‘confidently inferred’ but I have yet to find a proper description beyond ‘random field’ / ‘hidden Markov’. I * think * that what is done is to take the 2% introgression value as irrefutable fact and then to use a probabilistic approach to identify the most likely genomic locations where the fragments are located. This will, of course, happily identify the best-fit haplotypes, but it is very different from taking a tract of DNA and inferring an introgressed haplotype a priori. Personally I do not think it is possible to identify introgressed haplotypes with any confidence a priori, but maybe your colleagues can explain how such a weak signal can be extracted.

So I may have detected a weakness in his argument here, but I’d like you folks to tell me. If it is so that is… I found the next much more convincing.

  1. There is a fundamental problem with any probabilistic inferences such as hidden Markov / random field which is that they require assumptions. Critically, they have to assume that mutations occur independently. In fact, direct mutation counting in human pedigrees shows that ~10% of mutations occur in clusters where 2 – 8 substitutions occur near each other in a single event, almost invariably on the same strand (see Besenbacher et al.). This is a massive blow to any method trying to use clusters of tightly linked bases to identify haplotypes because these could reflect a signal mutation event. Indeed, the genome is a big place and we must expect to find lots of cases that reflect these multi-change events. As far as I can tell, no-one has tried to infer haplotypes whilst at the same time making proper allowance for these events. There is also the issue that recombination events tend to be focused in hotspots that are themselves dynamic, most differing between humans and chimpanzees. Again, in order to infer haplotypes you need to know the recombination landscape and to my knowledge all that is done is to make simplifying assumptions (either a uniform rate or, at best, a model that includes hotspots – but we do not yet know the dynamics of how rapidly these hotspots change location or why, so the model cannot be realistic).

  2. Did I send my latest offering? Here is explore what happens when you condition D, for example only considering sites that are homozygous in one of the two humans being compared. This approach is massively informative. Under the introgression model, most introgressed fragments should be heterozygous. Consequently, if we calculate D using only sites that are homozygous in the non-African the signal should be killed. It is not! However, if you condition the African to be homozygous the signal is destroyed. Even more spectacular, you can take any two Africans and, when one is conditioned to be homozygous, D becomes large. These various combinations reveal a completely unambiguous pattern where the signal of introgression is driven more or less entirely by heterozygous sites in Africans. The paper is in review.

I don’t know who the relevant experts are in this group, but I request someone tag them.

1 Like

Not sure where he’s getting the figure of 10% from. Besenbacher et al. (2016) say:

we find that 3.1% (558 of 17812) of the de novo SNVs are accompanied by another mutation less than 20kb away

1 Like

It seems to me that we ought to be able to distinguish introgression, in which the sequences should have diverged <50kya, from ILS, in which the sequences should have diverged >500kya.

1 Like

John I think that close divergences and introgressions from same are much harder to spot. There is just not enough genetic distance to see them from noise. That is kinda the issue here, where Dr. Amos is saying it isn’t an introgression, just a variance in divergence. Were we 10 million years apart we’d be so different that there would be no doubt!

1 Like

The study he cites says that 3.1% of the de novo mutations are accompanied by another mutation close by. AT LEAST one other mutation close by. When he gives his ~10% figure he is stating that it is a figure for TOTAL mutations which are a part of clusters. If 3.1% of gene mutations are accompanied by an average of two more in the cluster…Maybe there are on average 3 genes in a cluster. 3.1% x 3 = 9.3%?

But I should think this would be an issue for finding introgression at low levels even at 3.1%. It would just take more generations to produce the same false positive.

I don’t think that’s true. 450,000 years should be enough of a difference to notice over a few thousand bases.

No, that’s not right. The paper says that 3% of new variants are accompanied by a second nearby new variant, suggesting that they were part of the same mutation event. If all multi-site mutation events created just two variants, then the multi-site mutation events would constitute ~1.5% of mutation events. Since there are sometimes more sites involved (although ~80% are a single pair), the actual rate of multi-site mutation events will be somewhere between 1 and 1.5% of all events. (Note that for 17% of these clusters, the pair of variants are immediately adjacent; most recent analyses will probably treat these as a single mutation.)

I don’t have time to wade in very far at the moment, but I would suggest looking at this paper: Identification of African-Specific Admixture between Modern and Archaic Humans - PubMed
in some detail.


But by definition, those other mutations (more accurately SNVs) are also part of the 3.1%.

1 Like

And @glipsnort

That’s what is unclear to me because it says 3.1% are accompanied by another mutation nearby. It is unclear whether the one nearby is also accounted or if there is this close pairing in 3.1% of the cases. Either way, I don’t know if that aspect of the issue is that important to the bottom line because it would only change the number of generations needed to cause a false introgression signal if it can. And that’s the real problem- if it can. If a gene set is prone to a series of mutation in a single event, very specific mutations, then it doesn’t mean much for introgression if those mutations are found in Neanderthals and some low-diversity H. Sapiens. So I think focusing on the important questions would solve the issue, looking for some more fundamental flaw than getting a percentage wrong.

1 Like

It’s not that unclear if you keep reading and examine the figures. See this section for example:

The 558 SNV mutations can be grouped into 247 MNM clusters, most of them with just two mutations, but with the largest cluster containing 8 mutations (see Fig 2C). The majority (315 of 558) of clustered mutations are less than 2kb from another mutation (median distance = 525 bp) and 17% (108) are immediately adjoining. Considering these 108 adjoining mutations as 54 tandem mutation events we estimate that the tandem mutation rate is 0.30% (95% c.i.: 0.23%–0.40%) of the single nucleotide mutation rate. This estimate is not far from the estimate of 0.4% that was recently calculated in a meta-analysis of 7 different studies that had estimated the tandem mutation rate [23].

In any case, I believe you’re right that this has little actual importance for the question at hand.