A 100% resolved whole-genome phylogeny of placental mammals

(paywalled but a version can be reached on the arxiv here: https://www.biorxiv.org/content/10.1101/2022.08.10.503388v1.full)

Legend to figure 1 says 100% bootstrap support for all nodes:

Placental mammal phylogeny based on coalescent analysis of nearly-neutral sites.

(A) 50% Majority-rule consensus tree from a SVDquartets analysis of 411,110 genome-wide, nearly-neutral sites from the human referenced alignment of 241 species. Bootstrap support is 100% for all nodes. Superordinal clades are labelled and identified in four colors. Nodes corresponding to Boreoeutheria and Atlantogenata are indicated by black circles. (B) The frequency at which eight superordinal clades (numbered 1-8 in Fig. 1A) were recovered as monophyletic in 2,164 window-based maximum likelihood trees from representative autosomes (Chr1, Chr21 and Chr22) and ChrX. Dotted lines indicate relationships that differ from the concatenated Maximum Likelihood analysis.

So a perfectly consistent nested hierarchy of placental mammals based on whole genome sequences. Is that evidence for common descent?


Bootstrap isn’t really a very good measure of confidence for very large data sets, which tend to have 100% support for every node, so I would view that with caution. Still evidence for common descent, though.

I’m not sure why I never asked this before: What does “bootstrap” mean?

1 Like

Bootstrap is a statistical resampling method for estimating the distribution of a statistic from the data itself, rather than by assuming a distribution. The name is a reference to “lifting yourself up by your bootstraps”.


Bootstrap is often used when the data may not meet all the assumptions for a particular statistical method. Non-normal data or not completely independent data are common examples where bootstrap might be used.

1 Like

Interesting. I can’t seem to work out why increasing the size of the data set would tend towards 100% bootstrap support. Can you explain?

As applied to phylogenetic inference it’s this:

As Dan explains, it’s a general term for creation of a distribution through resampling with replacement. In a phylogenetic bootstrap, individual characters in a data matrix are resampled with replacement to produce a new data set of the same size as the original one, many times. Each new data set, called a pseudoreplicate, is then analyzed to produce a phylogenetic tree. The bootstrap value for a branch of the original tree is the percentage of pseudoreplicates for which that branch appears. It’s more or less a measure of the consistency of the matrix’s phylogenetic signal.

But it’s also sensitive to the size of the data set.

That makes sense. A larger dataset will be less dependent on one (or a few) observations, and less likely to find different branching.
A larger data set, other things being equal, should also have more complete data on the true branching.

I can offer a hypothesis, though the main support is just empirical: different large data sets, or the same data sets analyzed in different ways, can produce 100% bootstrap results for contradictory branches. I’m guessing that this is because very small biases in some small percentage of the data are more likely to be well sampled in a large data set. If most of the data are indecisive, that small bias can control the outcome.

Thanks for the explanations, everyone!

It’s related to the law of averages.

Assuming the data that implies conflicting branches isn’t sampled together, in the bootstrap, you mean? Otherwise I don’t follow.

Why would it be more likely to be sampled in a large rather than small data set? That seems counterintuitive to me. If some X% of the data has some bias and the rest is indecisive, and we’re randomly sampling this data set when building a new alignment, why would you be more likely to sample this biased portion when you pull from an alignment of a larger fraction of the genome, than if you pull from a smaller fraction?

Thanks for the link to the full article. Figuring out the deep phylogeny of this group has been very challenging, and attempts to do so based on morphology does not work well. But genetic analysis has given us a tree that suggests that early branching of placentals happened while the northern land masses were breaking up, beginning ~ 130 million years ago. As a result we get large clades like Afrotheria, originating around Africa, and Laurasiatheria, which is a clade whose origins map to what is now Asia. There is this other paper that goes over this:
http://faculty.chas.uni.edu/~spradlin/evolution/Readings.blocked/mammaltrees.pdf, and also Wikepedia gives a comprehendible review of this matter under placental mammals.
A nice pattern that emerges is that besides divergent evolution within the different northern land areas, there is meanwhile some convergent evolution between isolated land areas. For example: Moles within the Laurasiatheria being similar to the golden mole within the Afrotheria. Shrews within the Laurasiatheria being much like the shrew tenrec within the Afrotheria. And pangolins within the Laurasiatheria to the aardvark within the Afrotheria.

Let us suppose that there’s a slight bias in some percentage of the data and again that the rest of the data are indecisive. If that bias is slight, it might take a lot of it to provide a strong signal. A small data set might not even contain any of those data or, if it did, not enough that the average pseudoreplicate had enough of it to be decisive. But resampling a large data set would be almost certain to include that biased data, in its actual proportion, in every pseudoreplicate. That is, there would be a smaller variance among pseudoreplicates.

Again, I can’t say that this is the actual explanationi, but it seems plausible to me.

Is the problem here one with the statistical method, or with the data though?

In this case, the only ‘signal’ would appear to be due to the “bias”. If that signal is statistically significant, shouldn’t a valid statistical method detect it – whether the signal is due to a genuine effect or bad data? Garbage in, garbage out – but it’s the statistical method’s job to treat the data as valid.

(I seem to remember methods specifically designed to be insensitive to outliers, Ridge Regression is the phrase that comes to mind – but it’s so long ago that my memory is hazy, but that’s a whole different kettle of fish.)

The purpose of phylogenetic bootstrapping is to determine the consistency of the data and the strength of the common signal. If the data are not in fact consistent and there is no common signal, there ought not to be a high bootstrap value. It’s conceivable that jackknifing would do better. It would be nice if @Joe_Felsenstein would weigh in on all this.

Ahh yeah I get it now. Even if there is contradictory branches in the data it’s not likely to get sampled all that differently between reach pseudoreplicate. Makes sense.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.