Heliocentric Certainty Against a Bottleneck of Two?

The Genome-Wide TMRCA Distribution

What we really need is the distribution of TMRCAs (and eventually TMR4As) across the genome. The good news is that we found exactly that in the supplementary data of the paper. From there, we can get to our first plausible genomewide estimate of TMR4A.

This figure from the argweaver paper (S17) includes a random sample of 69 neutral regions (dashed line), compared with 69 regions undergoing balancing selection and containing no CpGs (red). The black line is the 56 regions undergoing balancing selection, but with shared CpGs. Though not the entire genome, the dashed line is going to be a good estimate of the neutral genome-wide distribution. For the statistically untrained, this going to be a hard graph to read. It is a CDF, not a PDF (https://en.wikipedia.org/wiki/Cumulative_distribution_function1).

image

Several factors can conspire to increase or reduce TMRCA. Molecular clocks only work when these factors are not interfering. That is why whole genome distributions are so important. We can test the effect of different regions. For example, if we wanted, we could start to untangle how identifiably neanderthal interbreeding biases results upwards, by seeing the results on those regions separately. We can also see how balancing selection affects dates (which violates the assumptions required for dating). Some regions of the genome, also have higher mutation rates (and therefore will overestimate TMRCA).

From this, we want the best estimate of TMRCA in neutral regions of the genome (the dashed line) in a way that reduces these sources of error. This is a fairly important point, as dates can can only be reliably inferred in places that are not under balancing selection. These are the only places where a molecular clock is expected to hold. Even then, some regions will still get “lucky” and coalesce more quickly to or much more slowly. So to a first approximation, we want the the median of these values.

We can make our estimate. In the regions not under selection, we see a median for the TMRCA at about 50,000 generations. You can see it yourself tracing the blue line in the graph. That gives us an estimated TMR4A of about 310 kya, well within the timeline where some scientists think Homo sapiens arise. According to this one view of the data (other data might contradict this) it is possible that our ancestors (including the Neanderthal lines) went through a single couple bottleneck about when Homo sapiens arise. This is not at all our final estimate, but just what this limited view of the data shows.

To be clear, however, this is just an estimate, based on our weak estimate of TMR4A from TMRCA. What we really need to do is look at TMR4A directly, in genome-wide phylogenies themselves. We will do that next.

1 Like