Mini-thought experiment about the nested hierarchy

A claim vague enough to entail nothing. Can you identify the higher scale at which these genuine patterns break down? Does that lower scale encompass humans and chimps?

Or you may notice that the replies explain the “anomalous” data by noting what is expected under an evolutionary model and that it’s consistent with what we see. ILS is expected and observed. Some sorts of horizontal transfer are expected and observed. Even taxonomically restricted genes arising by particular mechanisms are expected and observed. Note also that none of these “anomalies” would be noticed unless there existed a clear and dominant signal of a single nested hierarchy. Your claim that this is a bias in sorting of data based on a preconceived end result is just wrong.


It’s not anomalous. Lineages evolving their own lineage specific solution to a problem is completely consistent with evolutionary mechanisms.

Part of the problem is that you are confusing analogous and homologous features. Also, phylogenetic signal is measured by objective means, such as the many computer algorithms used in computational phylogenetics. If you really think there is a lack of phylogenetic signal, then the tools are already out there for you to check everyones’ work.

We would also expect noise in the data, but that is true for almost all data in science. It isn’t enough to point to examples of noise. It is the ratio between signal and noise that matters.

I doubt many ID/creationists would be comfortable with all vertebrates sharing a common ancestor, from sea squirts to humans. I’m not sure that’s the port in the storm you are hoping for.

1 Like

“The single most important component…of a phylogenetic analysis is the decision as to which method(s) or sequence(s) are appropriate to the phylogenetic question at hand. The method chosen must yield sufficient variation as to be phylogenetically informative, but not so much variation that convergences and parallelisms overwhelm informative changes.”

Peter R. Baverstock and Craig Moritz, “Project Design,” in David M. Hillis, Craig Moritz, and Barbara K. Mable, eds., Molecular Systematics, 2nd ed. (Sunderland, MA: Sinauer, 1996), pp. 17-27; p. 25.

One can only conclude that you don’t understand what you just quoted if you think it’s a counter to what I said. I expect better from a UC grad.

1 Like

Well, let’s find out. UC grads learn by the dialectical method.

You tell me what you think they’re saying, and why “data selection” of this sort isn’t a problem, and I’ll give my interpretation – namely, that it is a problem.

The phrase “phylogenetically informative” provides a hint.

Only using “phylogenetically informative” sequences in phylogenetic analyses is not the same as sorting data in such a way to bias the results towards resulting in a consistent nested hierarchy.

Phylogenetic “informativeness” is measured based on internal consistency, not consistency with other phylogenetic analyses. Nothing about choosing sequences that provide a statistically significant “signal” inherently guarantees or even biases that the trees implied by those signals should be consistent between studies covering overlapping parts of the tree of life.

We’ve discussed this a bit before, if you recall. In comments #14-25 in this thread:



In light of what you just posted (above), I’d be curious to have your opinion of the methods endorsed in this paper:

Having just read the abstract, it seems reasonable in principle. Note that the aim of this method is the estimation of divergence times rather than reconstructing tree topologies, as it says, those are “presumed to have already been inferred”.


It’s simple enough. A sequence should provide sufficient variation to be informative but should not be saturated so all signal is lost. Notice that it doesn’t talk about using only characters that fit a predetermined tree. Now you go.


How does one know when:

– a sequence is “informative” versus not

– a sequence contains “signal” versus something else (noise, I guess)

– all “signal is lost” from a sequence

1 Like

Nope. The deal was that I tell you what it means, then you tell me.

I will however point out that the answers to your new questions are not “if they match the correct tree”. There are phylogeny-free tests for all that stuff.

In context – the project design chapter (pp. 1-27 of the textbook) – common ancestry is presupposed, and some phylogenetic hypothesis is already on the table. For instance, this advice: “At least one outgroup taxon should be included in the analysis to root the tree” (p. 26). As you know, outgroup selection requires a pre-existing phylogeny; no one chooses an outgroup for rooting by reaching into a grab-bag randomly.

The entire chapter, understandably, guides the student towards molecular methods which will make evolutionary sense. What’s not on the table? Signal that may indicate primary or aboriginal discontinuity.

Yes, and (to my mind) that’s a problem. Divergence times need to make evolutionary sense. Data which don’t fit are therefore kept out.

This is what’s known as the Gish gallop. None of what you said there has anything to do with the meaning of the bit you quoted originally. You’re just bringing up other stuff. I presume that if I respond to what you said you will just bring up something else. And so goes the gallop. “Aboriginal discontinuity”? What does that even mean?


In some cases it’s pretty obvious: the quote you provided earlier mentioned one such case: when a sequence is identical in all examined species, it’s not going to be informative about the phylogenetic relationships between those groups. There needs to be sufficient number of characters that vary between the sequences.

We discussed one such example in the thread, months ago. I quoted Fang et al. as saying:

By contrast, the phylogenetic tree inferred from the removed GC-heterogeneous sites of the gc_168 dataset under the heterogeneous model is poorly supported (supplementary Fig. S13), which indicated that these GC-heterogeneous sites contain noisy signals.

As above.

If we want to calculate divergence times, then yes, they should make sense given the rest of model. Individual estimations of things like divergence time aren’t (and shouldn’t) be done in isolation, without considering the context of the well-supported framework of evolution.

If that’s your hypothesis, come up with a way to test it and search for that signal. What’s stopping you?

1 Like

From the op.

Supposed Gish Gallop. Looks pretty consistent to me.

No one has suggested that this quote from Paul was a Gish Gallop, or inconsistent with what he said earlier. Who are you replying to?




This is what’s known as the Gish gallop. None of what you said there has anything to do with the meaning of the bit you quoted originally. You’re just bringing up other stuff.

This is John Harshman’s comment above.