What Line of Evidence is Strongest for Evolution?

swamidass · November 16, 2018, 9:29pm

This is a legitimate question. I just wish it could have arisen much earlier, perhaps here:

Mercer · November 16, 2018, 9:33pm

You could input random sequence data. You would almost certainly get several trees, all with nonsignificant (high) p values. Taxa have absolutely nothing to do with this; that suggests that you really aren’t grasping that this is an simply mathematical process.

It is basically the probability of getting the tree to which it is attached over a tree derived from non related sequences, the null hypothesis.

T_aquaticus · November 16, 2018, 9:35pm

It’s a bit different. It is asking if chance could produce data that looks like common descent.

This is one of the basic concepts in statistics. For example, you may do an experiment where you try to determine if red heads are taller than with blond hair. You survey 100 people, 50 with red hair and 50 with blond hair. When you crunch the numbers and find that red heads are 1 inch taller on average than blonds. Have you proven that red heads are taller?

The question you have to ask is what are the chances of randomly selecting red heads that are slightly taller than the average red head, and/or randomly selecting blonds that are slightly shorter than the average blond? What you are asking is the probability of chance producing data where red heads are 1 inch taller on average. That is the basics of the famous Student’s T-test.

swamidass · November 16, 2018, 9:39pm

@T_aquaticus, I think the situation is far more complex than this.

There are several ways of computing a p-value, and each one might use a different null hypothesis.
There are several ways of constructing a tree, some of which do not look for nested clades.
There are several ways of assessing the tree-likeness of a dataset.
There are several ways the tree-likeness can be violated by known processes.

To exposit this fully, we probably need another conversation. Perhaps we should start that thread. The point, however, is that using a the right software, we can test directly whether the data falls into nested clades. We can also test specific theories of special creation. The only theories of special creation that survive are those that are evidentially equivalent common descent.

(NOTE: “mimic CD” is a biased description of these models)

Mung · November 16, 2018, 9:40pm

It was an example. The “taxon” would just be a label that is associated with a particular set of sequences. It’s the set of sequences that matter and i understand that.

Mercer · November 16, 2018, 9:42pm

No, I’m pretty sure you’d only get lots of trees with unacceptably high p-values.

No, I’m pretty sure that’s not the case. If you want to see the reticulation at the nodes of a tree caused by hybridization and other phenomena, here’s at least one program for that–I haven’t used it.

Mercer · November 16, 2018, 9:44pm

In his/her defense, we’re trying to keep it simple as Mung is working from very basic misconceptions.

T_aquaticus · November 16, 2018, 9:45pm

I only have a simple understanding of the methods used for phylogenetics, so any correction is much appreciated.

Mercer · November 16, 2018, 9:45pm

Then please don’t bring it up.

Mung · November 16, 2018, 9:56pm

ok, I’m struggling with this. The null hypothesis is that the sequences are not related. Are randomly generated trees used for that? Randomly generated sequences? In order to compare the probability some assumption has to be made about which sequences are related and which are not, would that be correct?

I might need to download some code and have a look at it to see what is gong on.

I work better with examples. I beg your indulgence.

swamidass · November 16, 2018, 10:06pm

I have found that ultimately works against understanding. By simplifying out the exceptions, we are vulnerable to @Pnelson claiming that homoplasy demonstrates common descent is false. It is better not to simplify, or these false objections will continue to confuse people no end.

One of the best pieces of advice about learning BioLogy was from a Computer Science professor who began doing computational biology in the 1990s. Contrasting this with computer science, he explained there are rules in biology, but there are exceptions to every rule, and every exception is important. We should teach students the rules, but upfront acknowledge that there are exceptions, and the exceptions are important. We are not ignorant of them, but are only cutting them out to aid their learning.

Then let us start with teaching you how to run the programs yourself. Can you find one online that you feel you can work with?

Mung · November 16, 2018, 10:12pm

I doubt any are written in Ruby, which would be my preference, but I’ll spend some time looking. Barring that either Python or R would be my next choices. I’m certain i can find something for either of those.

I wonder if there are any book/software packages that come together? Anyone?

ETA:

http://nbisweden.github.io/MrBayes/

But we may not want to go there right away.

Mung · November 16, 2018, 10:23pm

I’d prefer a simple program without a lot of bells and whistles and one in which the source code is available. If anyone has any suggestions.

swamidass · November 16, 2018, 10:29pm

MrBayes is great. Download it, go through the tutorials, and report back what you find out.

John_Harshman · November 16, 2018, 10:55pm

Next time, try actually saying that from the start. Saves time and effort.

John_Harshman · November 16, 2018, 11:03pm

Well, that’s not exactly what a bootstrap does. Instead it randomly resamples sites or characters, with replacement, to get a data set of the same size but of somewhat different composition from the original — i.e. missing some sites but having several copies of others — and analyzes that data set. It resamples many times, say 100. The bootstrap percentage of a node is the percentage of resampled data sets in which that node appears in the analysis. It’s a test, in other words, of the self-consistency of the data: do different samples of sites agree on the same tree, or do they display nested hierarchical structure? We would not expect random data to be consistent in that way.

John_Harshman · November 16, 2018, 11:31pm

Joe Felsenstein’s PHYLIP package would fit your needs there. Free! Online!

John_Harshman · November 16, 2018, 11:39pm

Let me present a very simplified explanation of how most phylogeny programs work: First, create a tree, possibly with reference to the data, possibly not. Second, evaluate the fit of the data to the tree using some optimality criterion, i.e. some algorithm that produces a number that describes the fit. The two most common criteria are minimum inferred length (the minimum number of changes that have to take place over the tree in order to explain the data) and likelihood (the probability of observing the data given the tree and a particular model of evolution). Rinse and repeat for a whole bunch of trees, keeping the tree that fares best under the optimality criterion. OK, that’s the “best” tree. But, especially under likelihood criteria, even random data are probably going to give you a single best tree. The question is whether that tree is really better than the others. That’s when statistical testing comes in. The bootstrap, which I’ve already described, is one such test. Likelihood ratio tests are another. If, given some test, the tree we have turns out to be a significantly better fit to the data than other trees, that’s evidence that the tree is real, and so is common descent. We do not expect data to display this hierarchical structure without common descent. Did that help?

Incidentally, Mr. Bayes doesn’t work that way, though the way it does work is analogous: it too picks a tree, evaluates it, and wanders around the space of all trees making these optimality (in this case likelihood) evaluations. But what it does with those evaluations is rather more complicated than I’ve described.

swamidass · November 16, 2018, 11:45pm

Don’t forget to explain discordant mutations.

John_Harshman · November 16, 2018, 11:47pm

Who are you talking to?

Topic		Replies	Views
Introducing Babacar Conversation Introduction	40	3283	June 2, 2020
Phylogeny - Help me see what you see Conversation Science	128	3683	February 6, 2021
Introducing Jeffb Conversation Introduction	100	3147	December 24, 2020
Evidence for Evolution - Your Elevator Pitch Conversation Science	77	4631	November 16, 2020
A Test of Common Descent vs. Common Function Conversation	54	2397	January 31, 2021

What Line of Evidence is Strongest for Evolution?

Related topics