Mini-thought experiment about the nested hierarchy

pnelson · May 10, 2019, 6:58pm

Pre-Darwinian hierarchies were far from universal. Cuvier, for instance, posited the existence of four distinct embranchements (Vertebrata, Articulata, Radiata, Mollusca), not related to each other by material descent.

I am unfamiliar with Lavoisier’s place in biological systematics and would welcome information on that score.

Roy · May 10, 2019, 7:31pm

Aaargh! I meant Linnaeus. D’Oh!

John_Harshman · May 10, 2019, 8:01pm

Partially true, though ages/stages are getting pretty arbitrary. Nevertheless, what we have is a division of a continuum of time, often on the basis of purely local phenomena, and in fact the divisions can be different for different authorities. Thus we have Carboniferous vs. Mississippian and Pennsylvanian, often different ages/stages in Europe and America, and the recent addition of stages at the lower end of the Cambrian. What we don’t see is a discoverable nested hierarchy.

John_Harshman · May 10, 2019, 8:12pm

A claim vague enough to entail nothing. Can you identify the higher scale at which these genuine patterns break down? Does that lower scale encompass humans and chimps?

Or you may notice that the replies explain the “anomalous” data by noting what is expected under an evolutionary model and that it’s consistent with what we see. ILS is expected and observed. Some sorts of horizontal transfer are expected and observed. Even taxonomically restricted genes arising by particular mechanisms are expected and observed. Note also that none of these “anomalies” would be noticed unless there existed a clear and dominant signal of a single nested hierarchy. Your claim that this is a bias in sorting of data based on a preconceived end result is just wrong.

T_aquaticus · May 10, 2019, 8:14pm

It’s not anomalous. Lineages evolving their own lineage specific solution to a problem is completely consistent with evolutionary mechanisms.

Part of the problem is that you are confusing analogous and homologous features. Also, phylogenetic signal is measured by objective means, such as the many computer algorithms used in computational phylogenetics. If you really think there is a lack of phylogenetic signal, then the tools are already out there for you to check everyones’ work.

We would also expect noise in the data, but that is true for almost all data in science. It isn’t enough to point to examples of noise. It is the ratio between signal and noise that matters.

I doubt many ID/creationists would be comfortable with all vertebrates sharing a common ancestor, from sea squirts to humans. I’m not sure that’s the port in the storm you are hoping for.

pnelson · May 10, 2019, 8:51pm

“The single most important component…of a phylogenetic analysis is the decision as to which method(s) or sequence(s) are appropriate to the phylogenetic question at hand. The method chosen must yield sufficient variation as to be phylogenetically informative, but not so much variation that convergences and parallelisms overwhelm informative changes.”

Peter R. Baverstock and Craig Moritz, “Project Design,” in David M. Hillis, Craig Moritz, and Barbara K. Mable, eds., Molecular Systematics, 2nd ed. (Sunderland, MA: Sinauer, 1996), pp. 17-27; p. 25.

John_Harshman · May 10, 2019, 9:32pm

One can only conclude that you don’t understand what you just quoted if you think it’s a counter to what I said. I expect better from a UC grad.

pnelson · May 11, 2019, 12:43pm

Well, let’s find out. UC grads learn by the dialectical method.

You tell me what you think they’re saying, and why “data selection” of this sort isn’t a problem, and I’ll give my interpretation – namely, that it is a problem.

The phrase “phylogenetically informative” provides a hint.

evograd · May 11, 2019, 12:44pm

Only using “phylogenetically informative” sequences in phylogenetic analyses is not the same as sorting data in such a way to bias the results towards resulting in a consistent nested hierarchy.

Phylogenetic “informativeness” is measured based on internal consistency, not consistency with other phylogenetic analyses. Nothing about choosing sequences that provide a statistically significant “signal” inherently guarantees or even biases that the trees implied by those signals should be consistent between studies covering overlapping parts of the tree of life.

We’ve discussed this a bit before, if you recall. In comments #14-25 in this thread:

Beyond Reasonable Doubt? A Test for Common Ancestry Conversation

Hi all, this is first time starting a thread on this site, so I decided to go with something simple and uncontroversial, so we can all get along in unanimous agreement. Universal common ancestry. I’ve been browsing a few different threads here recently and one topic that I’ve noticed pop up repeatedly is formal tests of universal common ancestry - whether they have been performed, or are even possible. One paper that is often brought up (and not surprisingly so given its title), is Douglas Theobald’s 2010 paper A formal test of the theory of universal common ancestry. It’s worth a read if you haven’t already, but without getting into the details, I think it’s generally acknowledged at this point that Theobald’s statistical methods were flawed, so let’s put his paper to one side for a moment. What other research exists to fill this void? Well, as it happens I wrote a blog post a while back outlining one such piece of research. I’ll let you read the blog post and/or paper to get the …

pnelson · May 11, 2019, 1:53pm

Evograd,

In light of what you just posted (above), I’d be curious to have your opinion of the methods endorsed in this paper:

ncbi.nlm.nih.gov

So many genes, so little time: A practical approach to divergence-time estimation in the genomic era.

SA Smith, JW Brown and JF Walker, PloS one, 2018

Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. "Gene shopping", wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that "gene shopping" can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.

evograd · May 11, 2019, 2:18pm

Having just read the abstract, it seems reasonable in principle. Note that the aim of this method is the estimation of divergence times rather than reconstructing tree topologies, as it says, those are “presumed to have already been inferred”.

John_Harshman · May 11, 2019, 2:23pm

It’s simple enough. A sequence should provide sufficient variation to be informative but should not be saturated so all signal is lost. Notice that it doesn’t talk about using only characters that fit a predetermined tree. Now you go.

pnelson · May 11, 2019, 2:51pm

How does one know when:

– a sequence is “informative” versus not

– a sequence contains “signal” versus something else (noise, I guess)

– all “signal is lost” from a sequence

John_Harshman · May 11, 2019, 2:53pm

Nope. The deal was that I tell you what it means, then you tell me.

I will however point out that the answers to your new questions are not “if they match the correct tree”. There are phylogeny-free tests for all that stuff.

pnelson · May 11, 2019, 3:10pm

In context – the project design chapter (pp. 1-27 of the textbook) – common ancestry is presupposed, and some phylogenetic hypothesis is already on the table. For instance, this advice: “At least one outgroup taxon should be included in the analysis to root the tree” (p. 26). As you know, outgroup selection requires a pre-existing phylogeny; no one chooses an outgroup for rooting by reaching into a grab-bag randomly.

The entire chapter, understandably, guides the student towards molecular methods which will make evolutionary sense. What’s not on the table? Signal that may indicate primary or aboriginal discontinuity.

pnelson · May 11, 2019, 3:13pm

Yes, and (to my mind) that’s a problem. Divergence times need to make evolutionary sense. Data which don’t fit are therefore kept out.

John_Harshman · May 11, 2019, 3:29pm

This is what’s known as the Gish gallop. None of what you said there has anything to do with the meaning of the bit you quoted originally. You’re just bringing up other stuff. I presume that if I respond to what you said you will just bring up something else. And so goes the gallop. “Aboriginal discontinuity”? What does that even mean?

evograd · May 11, 2019, 3:42pm

In some cases it’s pretty obvious: the quote you provided earlier mentioned one such case: when a sequence is identical in all examined species, it’s not going to be informative about the phylogenetic relationships between those groups. There needs to be sufficient number of characters that vary between the sequences.

We discussed one such example in the thread, months ago. I quoted Fang et al. as saying:

By contrast, the phylogenetic tree inferred from the removed GC-heterogeneous sites of the gc_168 dataset under the heterogeneous model is poorly supported (supplementary Fig. S13), which indicated that these GC-heterogeneous sites contain noisy signals.

As above.

evograd · May 11, 2019, 3:47pm

If we want to calculate divergence times, then yes, they should make sense given the rest of model. Individual estimations of things like divergence time aren’t (and shouldn’t) be done in isolation, without considering the context of the well-supported framework of evolution.

evograd · May 11, 2019, 3:49pm

If that’s your hypothesis, come up with a way to test it and search for that signal. What’s stopping you?

Topic		Replies	Views
Trees and Star Diagrams Conversation Science	45	1291	September 3, 2020
Common Ancestry and Nested Hierarchy Conversation Science	52	2283	October 20, 2022
Introducing Babacar Conversation Introduction	40	3278	June 2, 2020
Testing the creationist hypothesis Conversation Adam	32	1468	March 10, 2022
What Line of Evidence is Strongest for Evolution? Conversation Science	166	2963	January 31, 2021

Mini-thought experiment about the nested hierarchy

Related topics