John Harshman: Bottlenecks and Trans-Species Variation

Has anyone so far brought up the existence of loci for which humans and chimps share more than four homologous alleles, i.e. alleles for which a human allele is more closely related to the corresponding chimp allele than to other human alleles? Any loci for which there are more than four such alleles would preclude a bottleneck of two at any time since the separation of the human and chimp lineages. I think immediately of MHC loci.

1 Like

Yes, we considered this in detail: Heliocentric Certainty Against a Bottleneck of Two?. I ended up being wrong on this too. Turns out not to be nearly as strong evidence as I’d been led to believe.

It is possible future work could change the story here, but at the moment it isn’t strong evidence against a bottleneck.

Welcome to the forum @John_Harshman, tell us about yourself?

Molecular avian phylogeneticist, mostly retired. Try Researchgate for pubs.

I think the evidence is stronger than your post shows. For one thing, if there’s a human allele that’s the sister of a pair of human and chimp alleles, that first human allele must predate the divergence of humans and chimps, and so must be counted when determining the number of human alleles that must have crosses any hypothesized bottleneck.

1 Like

If I made an error I certainly want to correct it. Do you have any papers or data to add to the mix that could help us see if any loci have more than 4 alleles trans-species, after correcting for convergence?

You have the expertise to do the appropriate analysis too. If you can falsify a bottleneck, go for it. I’m all ears.

Have you looked at Ayala’s data? I haven’t. I expect one could add more alleles at that locus for the various great apes (including humans). I doubt that a better analysis would give a substantially different tree, since I doubt that the clock assumption would have been seriously enough violated by those data. Could be wrong, of course. The painful part of the analysis would be assembling the data set, and I’m not ambitious enough to do it.

Would you agree with the statement about sister groups increasing the number of alleles crossing a hypothetical bottleneck?

Sounds like you haven’t read the link I gave you earlier. Start with a careful read there before we go farther. I dealt with Ayala’s paper, and others, in detail.

Maybe. There is a lot of challenge here because we only expect trans-species variation in loci with balancing selection, which also makes the molecular clock invalid. I think the situation is complex. If you can untangle it, please do. However, Ayala’s analysis seems to be in error.

Would you agree with the statement that, given the correct phylogeny, the sister group to a human-ape pair increases the number of alleles crossing the hypothetical bottleneck?

Given the correct phylogeny, yes. Yes but I’m not sure how we can confidently infer the phylogeny that far back without showing transpecies variation with another species.

I read the link. It does not appear, based on that link, that you looked at his data. You looked at his analysis, but that isn’t the same thing. Also, some of the other analyses do show more than 4 lineages if you accept the sister group claim.

1 Like

Show me where. How do you accurately calibrate a clock when there is balancing selection? How do you know the inferred phylogeny is the correct one?

I agree, and say exactly that. Others however did look at his data and dismissed his analysis as incorrect. No one has reproduced his results, and subsequent studies have shown convergence as an alternate explanation.

I’m not sure we know for sure how this affects bottleneck claims. It certainly isn’t settled science yet.

So your main worry is the root, then?

No, not really. My main worry is two fold (1) the rate of the clock and (3) convergence. With balancing selection, the branches can appear longer than they really are, and with convergence the trans-species variation might an illusion for some lineages.

It seems that the best way to solve this would be to:

  1. Look for trans-species variation in HLA introns (not exons), which should not be subject to convergence in the same way.
  2. Use a method that accounts for recombination.
  3. Expand our pool of non-human DNA, to pick up as much non-human variation as possible.
  4. Include all the great apes instead of just chimpanzees.
  5. Develop an objective criteria for calling introns transpecies or not.

This is your area @John_Harshman so you are well situated to run this analysis. There have been some advances in phylogenetics lately too, which might facilitate this study too. This study would probably clarify the strength of the evidence. I can’t predict what will happen here. Maybe you would succeed in falsifying a single couple bottleneck this way.

One risk with a sole-genetic progenitor hypothesis is that is will be vulnerable to falsification with a study like this. Of course, some people might find that potential for falsification to be an advantage. I personally have no stake here, so please do falsify it if you can.

As far as what is currently published in literature, however, this is not settled evidence against a bottleneck. This is a disputed finding, since at least 1998. It has not been replicated. Do you agree?

Yasukochi & Satta 2014, for one. " Our estimates of divergence time suggested that seven HLA-DRB1 Group A allelic lineages in humans have been maintained since before the speciation event between humans and chimpanzees". But you don’t have to estimate divergence time, just accept the tree, that human lineages that predate the split between the single chimp allele and its human sister must be older than that split. This gives 5 B-group alleles and 1 A-group allele that must predate the split, based topology alone. The additional A-group alleles mentioned in that sentence are just gravy.

Of course one can never be 100% certain that a tree is correct; one can never be 100% certain of anything in science. Still, the tree was a likelihood tree; if you’re a big fan of Bayesian analysis, wouldn’t you accept likelihood as close enough? And it has bootstrap percentages too.

1 Like

I could run the analysis, but I have no way to gather the data. No lab. The introns do seem like a good idea. It seems that the objective criteria are simple enough: tree topology.

Now, the problem is that there’s little scientific interest in testing the sole progenitor model, because that interest relies on attempting to show that Genesis 2 documents real history in some way. Thus there is no particular rush to find more than 4 alleles (that number being of interest only for Genesis 2 specifically) stretching past a bottleneck for which there is no evidence. One must wait until the evidence is collected for other purposes. If there come to be chimp and gorilla population genomics projects, and though I have no knowledge of any it seems like a useful endeavor, that would be your best bet.

I agree. That is why I did not do it either.

On the exons, we might expect a different tree. A better analysis would include recombination, and might substantially reduce the branch lengths this way. The clock assumption is seriously violated in balancing selection too. That is the challenge in interpreting this data. This is the outlier case that doesn’t follow the standard rules.

Yes, that would be a good thing to do. However, later studies have shown convergence at this loci, which makes perfect biological sense. So that might produce the illusion of trans-species variation.

Do you mean this paper?

The issue is, once again, concern about the clock. A few key points about that analysis. Remember that these loci are the extreme examples of balancing selection, so the assumptions for the clock are violated. There does not appear to be a good way to calibrate a clock in this cases.

We should clarify that there are two ways this term “trans-species variation” is being used:

  1. Allele lineages in the phylogeny that are clocked to extend back past the divergence of chimps and humans.
  2. Allele lineages in the phylogeny that are shared between different species.

#2 is much stronger evidence than #1, because it doesn’t require calibrating the clock. There is no evidence in that 2014 paper that they established 5 lineages at any loci by #2, just by #1, which is not nearly as strong.

If I find a collaborator who can gather the data I will let you know. Perhaps it could be a good project for us to do.

I don’t think this is the right way to look at it. Instead…

  1. It is a scientifically important question to understand what the data is and isn’t showing us.

  2. There are mechanisms of speciation that require tight bottlenecks, and it is worth asking if they are relevant in origins.

  3. In science we should care about answering questions carefully and rigorously, where ever they come from; it is one way we serve the common good. The public asks about this, so we should care.

  4. Even if this is possible, it is so far back in history 500 kya, that Genesis would not be historical anyways. For a historical Adam. it seems a genealogical approach is more sensible, and is entirely unrelated to this.

I agree, that the non-human sequences are the hard part. I’ll keep my eye for a genomics person with the right data.

Until then, thanks for checking my work. If I did make any errors, please help me fix them.

Yes, I refer to the paper you cited and put up the trees from. Again, the clock is irrelevant to my point. It’s only relevant to the A-group alleles, to which a single chimp allele is sister group. The conclusion of at least 6 alleles crossing the hypothetical bottleneck depends only on tree topology. And again, there need be no sharing of most alleles among species provided their divergence is deeper than (and on the same lineage as) a shared allele.

I think this is the data to which they are referring:

An external file that holds a picture, illustration, etc. Object name is 1880-6805-33-14-2.jpg

It all depends on the calibration of the clock to know if there are seven lineages that stretch that far back. There is no independent evidence from the divergence amount to tell us how old the lineages are. What am I missing?

While it’s true that one could test a tight bottleneck, that’s not the same as testing a bottleneck of two individuals, which is very unlikely simply because a population reduced to that level is quite unlikely to survive. Rapid expansion might save enough genetic diversity to prevent inbreeding depression, though I doubt it, but a more likely course is extinction within a genration or two. The only justification for testing a bottleneck of two is Genesis. And not every question is worth the expense and effort of resolving. The hypothesis that Uranus has a small moon made from Wensleydale cheese is not worth sending a probe, for example.


Yes, I do see your point. Keep in mind that this is not in service of my personal view. I’m just trying to be honest about what the data shows.

What is interesting about this is that if we can’t rule out a bottleneck of two (and we can’t more ancient than about 500 kya), we certainly can’t rule out a bottleneck of 10 or 20. That seems to be an important and overlooked point. Once again, tight bottlenecks might have been important in speciation of our lineage. That has been a live possibility at times, but then fell out of favor because of the genetic evidence. Perhaps some of those hypotheses deserve another look. A misunderstanding of the genetic evidence might have prematurely foreclosed them.

I’m sure we can rule out that hypothesis without need to send a probe. That may not be the case for a single couple bottleneck. It is just an issue of honestly. We shouldn’t say there is evidence against it if we can’t produce said evidence. Of course, there is evidence against it more recent than 500 kya, but before than point it becomes equivocal, at least for now.