Holloway: Fallacy of the Phylogenetic Signal: Nucleotide Level

I wondered if this blog post by Eric Holloway (@EricMH) would be of interest. Dr Holloway is a member here though he hasn’t commented since February, 2019. In his post

Dr Holloway claims to have falsified a claim about common descent on the Talk Origins website saying:

Talk Origin’s claim that the nested hierarchy of species is well attested by the data is highly questionable if not outright false, and should be retracted as evidence for evolution until such time as a much more rigorous analysis with DAG eliminating controls is established.

As TSZ is not a particularly busy site these days, I thought I would bring Dr Holloway’s potentially earth-shattering revelation to a wider audience. Also, I hope to hear from anyone with expertise in phylogenetics as to whether we need to start rewriting the text books.

Very odd. It’s not completely clear just what he did, what fake sequences he actually analyzed, or how he aligned or analyzed them. He doesn’t say. But my first point to make is that consistency index is not a measure of phylogenetic signal. I can’t say much more until I see how he actually produced his data matrices. His little figure is no help, though oddly enough his model appears to be a form of phylogeny that has some hierarchical structure to it.

Put this up on TSZ just now:


Based on his brief description, he didn’t actually do a phylogenetic analysis. He just generated a tree (probably many trees, though he doesn’t say) and looked at its parsimony score. There is no suggestion that this tree is the most parsimonious, and no indication that he did any sort of test of support, which would require an actual analysis. CI, RI, etc. are not indices of hierarchical structure.

I can’t figure out, based on the description, what his taxa were or how he aligned the data set. It may be that his scheme does produce data with hierarchical structure. But that could be the result of the inheritance and branching in the model, even though the inheritance is of an odd sort, and the branching involves some reticulation.

That figure is exceptionally confusing. It looks like he’s generating characters de novo, passing them on to descendants, and then combines them with characters from other lineages while also introducing some degree of shuffling(it sorta kinda looks like what you get from sexual recombination). It would be false to say there is no genealogy going on in this process.

As DNA_Jock points out, there’s plenty of common descent going on in the scenario depicted in the figure.

What, in that figure, are the terminal taxa? How would he align the sequences, which it would seem are composed of short fragments that are either identical among taxa or unshared? How would you even get a CI from alignments of these sequences, which would seem to consist entirely of a combination of identical sites and gaps? Puzzling.

That’s reassuring. I thought it was my lack of reading comprehension causing me not being able to follow Eric’s argument.

I really have no idea.

More details can be found in a discussion over at the Biologos forum: Nested Clades, The Consistency Index, and Affirming the Consequent - Scientific Evidence - The BioLogos Forum


I seem to have reached an impasse, as the data set he provided on TSZ is not a valid NEXUS format. For one thing, it has different numbers of characters for the various taxa.

This is how a phylogenetic tree comes out after obtaining an alignment from one of these DAGs(containing 30 “taxa”): tree

I just obtained a new DAG from the link he gives to his data generator tool.

I first sent the “taxa”(which comes in fasta format) from the DAG to a multiple sequence alignment tool (MAFFT) using default settings.
Btw the alignment consists mostly of gaps as the sequences from different taxa have wildly different lengths.

For example this is one of his taxa:


Here’s another:


Then sent the alignment to a ML program again using default settings, but with bootstrap 100 turned on.

Had to force it into dendrogram mode to see the bootstrap values:

Most of them are pretty crap.

Regardless, none of this makes much sense because I’m still not sure how Holloway generates his data. It appears like his terminal taxa basically consists of anything output by his DAG generator, be they ancestors or descendants. I’m not sure.

I suspect that, to the extend there is any tree-like structure in the data, it’s because his DAG actually happened to implement a tree-like process, so if and when Holloway claims to measure any noteworthy CI, it’s because his DAG really has coincidentally implemented something like common descent.