Heliocentric Certainty Against a Bottleneck of Two?

I wanted to further expand on this deficiency in Ayala’s study.

Convergent Evolution or Trans-Species Variation?

Convergent evolution, rather than shared history, is an alternate explanation of Trans-Specific variation. If we see several alleles in both humans and chimps clustered together, there are two possible explanations:

  1. this could be because the allele lineaged existed in the common ancestor of the two species,
  2. or it could be because of convergent evolution.

Ayala never considered or tested for this possibility. This is a critically important point, and why his use of similarity in phylogenetic analysis substantially undermines his point. In order to trust the tree, we need to know how many discordant mutations there are in the tree. However, he used similarity (not nested clades) to build the tree. He had no way to know if the data actually made sense as tree recapitulating common descent or not.

A basic feature of scientific thinking is to test hypotheses. Ayala’s paper did not rule out the hypothesis of convergent evolution.

Testing for Convergent Evolution

The good news is that convergent evolution leaves a tell tale sign. We should see a large number of mutations that cannot fit into a tree like structure if convergent evolution is at play. It turns out that several groups have been studying convergent evolution on a genomewide scale, and HLA types regularly are outliers in these analyses.

For example, take a look at this study. Parallel or convergent evolution in human population genomic data revealed by genotype networks | BMC Ecology and Evolution | Full Text It builds “allelic graphs”, which I won’t explain here in detail, except to say that when ever we see a cylcle, like this square, we know that it cannot fit into a tree structure:

image
Parallel or convergent evolution in human population genomic data revealed by genotype networks | BMC Ecology and Evolution | Full Text

We expect a few of these in neutral evolution, but not many. If the data fit a tree, we would only see a single path from the top genotype to the bottom one. However, if we see both paths, we know that different alleles are taking different paths, which means that they do not actually share history here. It is a type of homoplasy, and it is a signature of convergent evolution. Notably, this signature arises to convergent evolution, is not likely caused by Trans-Species variation.

If we see a large number of these squares in the genetic diversity of a particular part of the genome, that is evidence that the similarity we see between sequences is not actually a signature of common descent. Rather, in these cases, another hypothesis is favored: convergent evolution.

So what do the authors find?

Well, HLA genes have a massive excess of squares, a clear sign of pervasive convergent evolution. Ayala’s gene HLA-DBQ1 is not mentioned in the text, but we find it in the supplementary data as one of the genes with clear evidence of convergent evolution. In this case, it is not consistent with recombination or balancing selection alone.

Another gene, HLA-DRB1 is the most variable HLA gene. It is notable for having over 500 squares in the DNA of about merely 1,000 individuals, compared with an expected number of less than 10. That means if we had tried to put the DNA into a tree, we would see at least 500 mutations discordant with a phylogenetic tree. This is just a stunning result, because it means that HLA-DRB1 alleles are just not well described as a tree. The variation we see is evolving and re-evolving over and over again. Amazing.

It also validates my methodological concern about Ayala’s work:

It is just not an accurate view of the data to present HLA-DBQ1 in a tree based on a similarity matrix. One needs to test first to see if it actually fits a tree. We cannot even correctly determine ancestral history among human alleles themselves, let alone between species. The data seems to look more like convergent evolution than standard common descent, i.e. Trans-Species variation.

This is not exactly a new result, back in 2000, a test of Ayala’s hypothesis was done on HLA-DBQ1. They also found strong evidence of convergent evolution. Convergent evolution of major histocompatibility complex molecules in humans and New World monkeys | Semantic Scholar However, the allele graph makes clear how much this affects the data. Perhaps more importantly, this Nature Genetics study from 1998 directly disputes Ayala’s paper, arguing that this is rapid convergent evolution: Recent origin of HLA-DRB1 alleles and implications for human evolution | Nature Genetics.

Remember, Ayala did not even consider convergent evolution. He did not test for it. This seems to a valid alternative hypothesis, which also seems to better explain the data.

Moreover it is not really accurate to present trans-species variation as a settled finding of genomic science. At best, it is one competing hypothesis among many. However, it might even be accurate to say that it is the disfavored hypothesis. There are many more papers disputing Ayala’s findings than supporting it. No one should present this as as indisputable and settled evidence against a sharp bottleneck.

Perhaps the data will bear out Ayala’s initial hypothesis, but a lot of work needs to be done to demonstrate this to be the case.

What About Common Descent?

Everyone believes these alleles share common descent (at least back to 4 alleles). However, this is good reminder that genetic data can pick up signatures that erase the nested clade signature we usually see in DNA. Homoplasy is a real feature of the data, and expected even when there is common descent.

This is a great example of how there are rules in biology (e.g. DNA falls into nested clades), but there are exceptions (convergent evolution), that are very important to understanding this data.

Moreover, the next time someone points to mutations that do not fit the tree pattern in species, remember two things.

  1. We expect a few discordant mutations, even in neutral evolution. That is not evidence against common descent.
  2. Convergent evolution, also, can produce discordant mutations. Not usually ever as much as we see in HLA genes, but more than we expect from neutral evolution.
  3. We observe homoplasy and convergent evolution in cancer (called recurrent mutations).
  4. We observe homoplasy and convergent evolution in human variation (which everyone agrees shares common ancestry).

In case #2, we still expect to see a signature of common descent in most cases. However, it is such a pervasive pattern in HLA-DRB1 that it appears that the signature of common descent is erased, even though we all agree these alleles share common ancestry. And #3 and #4 are direct empirical evidence that convergent evolution is expected at a DNA level (#3) and that we homoplasy is observable in DNA everyone agrees shares common ancestry (#4).

Once again, the rule is that most (but not all) DNA fits into nested clades (a tree), but some does not. Neutral evolution produces nearly nested clade data, but positive selection (and balancing selection) can also lead to convergent evolution. Homoplasy (violations of nested trees) are expected in some DNA.

The Median TMR4A Estimate Unaffected

It’s important to understand how these findings interact with the TMR4A esitmates.

The convergent evolution creates homoplasy that will artificially increase TMRCA estimates upwards. Because a tree is a bad fit for the data, it will be impossible to find a parsimonious tree. This will inflate the TMRCA values substantially. This reinforces what I’ve said from the beginning. The molecular clock, in these regions, is not well calibrated.

Another indicator of this is that a much larger fraction of mutations in this region are non-synonymous (i.e. not neutral). This is an indicator that positive selection is driving most of the changes a far more rapid rate than neutral evolution. The end result of this is artificially inflated TMRCA estimates. Remember, that D = T * R only in regions where dynamics like this are not taking place.

This does not, however, create a problem for our estimate of a bottleneck limit. Remember that we used the median of TMR4A over the whole genome. So, this estimate is not really influenced much by a small portion of the genome in error. The estimate shifts only about 2 kya per 1% of the genome in error. That is the reason we used the median in the first place, it makes the estimate remarkably stable to errors like this.

Convergent evolution is really the exception to the rule in human variation. It is not accounted for by most phylogenomic methods, but that does not matter in our genome wide analysis. Our final estimate is not strongly influenced by this problem.

1 Like