John Harshman: The Phylogeny of Crocodiles

swamidass · August 9, 2019, 3:39am

Please explain CI and how it relates to the number of homoplastic mutations?

John_Harshman · August 9, 2019, 3:50am

CI is a property of data mapped onto a tree. It’s the ratio of the minimum possible number of transformations for each character on any tree to the number of transformations as mapped onto the current tree. The minimum possible number of transformations is the number of observed states minus one. Most of the additional transformations above the minimum would be considered homoplasy. CI is from 0 to 1; higher CI means less homoplasy. A CI of .95 means very little implied homoplasy; .53, quite a lot.

swamidass · August 9, 2019, 4:00am

Can you please work that out with a small example?

John_Harshman · August 9, 2019, 1:18pm

Sure. Suppose you have a site at which there are two states, say T and C, with some species having T and others C. Further suppose that you know the root state to be C. (That isn’t necessary and doesn’t affect the calculations, but it makes things easier to explain.) The minimum number of changes to explain the distribution is one, from C to T. Now suppose you have a tree that best fits all the data, and on this tree the species with T do not form a single group. We may have to suppose that the change from C to T happened twice independently in different parts of the tree, or we may have to suppose that C changed to T and, at some point, back to C. In either case, there are two changes necessary to explain the distribution of C and T over that tree. The consistency index for that site would be 1/2.

Rumraket · August 9, 2019, 1:54pm

That was excellent. What program does one use to obtain a CI for a data set?

John_Harshman · August 9, 2019, 2:18pm

I used PAUP. Note that the CI isn’t for the data set; it’s for the data set combined with some particular tree.

Rumraket · August 9, 2019, 2:42pm

Thanks, I understand.

davecarlson · August 9, 2019, 4:44pm

Okay, here is a follow up.

I used the T-REX webserver to generate an arbitrary (but resolved) tree with 10 taxa.
I then used PAML’s evolver package to simulate 10,000 bp of sequence along each branch of the tree. Then, I used IQ-tree to perform a Maximum Likelihood phylogeny inference with 100 bootstrap replicates.

Here is the tree (with midpoint rooting for easy visualization):

As expected, the tree is completely resolved with 100% bootstrap scores.

Next, I repeated the same procedure but used a completely unresolved tree (i.e., a polytomy) to simulate the DNA sequences. Here is the tree resulting from that data set:

This time the internal branches are all very short, and the low bootstrap scores suggest that all the relationships are very uncertain. Again, as expected.

Edit: fixed some wording

Rumraket · August 9, 2019, 4:51pm

What would be the CI values you get from each of those two trees?

davecarlson · August 9, 2019, 5:00pm

I don’t currently have PAUP installed and don’t know which other programs estimate CI, but if I get a chance, I’ll install PAUP and try to check.

Mercer · August 9, 2019, 7:37pm

For laypeople, it might be clearer to simply point out that we would mis-score 2 mutations (a mutation and a reversion to the original base or amino-acid residue) as 0 mutations if we have not sampled a species with the mutation. This is noise, but it is washed out by the enormous amount of data we can collect.

Joe_Felsenstein · August 10, 2019, 1:46am

Glad to hear it seemed cool. It is fairly old right now (the latest release about a decade old) but I am working on a new version with Java interfaces for the programs. I do like to think that it has the best documentation in the (phylogeny) industry.

Topic		Replies	Views
Introducing CrisprCAS9 Conversation Introduction	10	635	September 15, 2020
Complexity of avian evolution revealed by family-level genomes Conversation Science , Article	5	266	April 14, 2024
JeffB and Swamidass: Understanding Evidence for Phylogeny Conversation Science	88	3184	May 8, 2021
New article on lineage-specific genes Conversation Science , Article	2	357	November 5, 2020
Joe Felsenstein talks to Casey Dunn about developing phylogenetic methods Conversation Science	16	651	February 11, 2021

John Harshman: The Phylogeny of Crocodiles

Related topics