What Line of Evidence is Strongest for Evolution?

swamidass · November 16, 2018, 11:48pm

You. Discordant mutations are a way of assessing how tree like a set of sequences are.

John_Harshman · November 16, 2018, 11:55pm

Not sure what method you’re talking about there. In the parsimony criterion, inferred homoplasy increases the length of the tree; since the shortest tree is best, you end up with the tree that requires the least homoplasy from the data. In that sense, inferred homoplasy would be “discordant” mutations. In the likelihood criterion, some homoplasy actually makes the data, given the tree, more likely, as zero homoplasy would be extremely unlikely on a tree of any reasonable length. So it’s unclear what “discordant” would mean in that context. Too little homoplasy or too much?

Lineage sorting, horizontal transfer, and concerted homoplasy are whole separate questions. Most phylogenetic algorithms don’t take them into account, though there are methods that do, though not all together.

Patrick · November 17, 2018, 12:19am

And I do also. @mung immerse yourself in the technology. You will learn something grand.

Mung · November 17, 2018, 12:49am

This is excellent. Thanks John.

The various trees that the program creates, I have referred to them as “phylogenetic trees.” Was I mistaken in using that terminology to refer to the generated trees?

Is it only after a tree has been selected which “turns out to be a significantly better fit to the data than other trees” that we have a phylogenetic tree?

How many trees do we have and how many phylogenetic trees do we have and at what point does a tree become not just a tree but a phylogenetic tree?

This may help clear up some matters.

Mung · November 17, 2018, 12:52am

I’d just like to express appreciation for everyone who has stuck with it. We’ve had some bumps but we’re actually having what I think is a constructive conversation that can benefit people who may not be familiar with the science and are having it explained by scientists who are.

John_Harshman · November 17, 2018, 12:56am

No, that’s what we call them. But of course they’re actually only estimates of the real phylogenetic tree, the one that actually shows the relationships among species, corresponds to the real history, or however you want to say that. Don’t get hung up on terminology. If a tree is significantly better than others, that gives us more confidence that the estimate is correct, and incidentally that common descent is a real thing.

Mung · November 17, 2018, 1:24am

I’m trying not to, but others seem to be concerned that I have some mistaken notions. To your credit you are helping clarify matters.

So when I say that a phylogenetic tree is not prima facie evidence for common descent, you and I both understand what I am talking about and you agree with me on that point?

Mung · November 17, 2018, 1:33am

I was considering starting a topic at TSZ to talk to Joe. I’m interested in the logic that goes into writing such a program. In a sense that’s what I’e been trying to get at in this thread. It’s not about whether CD is true or false, it’s about what goes into writing a program. Yeah, Yeah, I know it doesn’t come of that way.

Mung · November 17, 2018, 1:50am

Books I am using as reference material:

Molecular Evolution: A Phylogenetic Approach

Molecular Evolution and Phylogenetics

Reconstructing the Past: Parsimony, Evolution, and Inference

John_Harshman · November 17, 2018, 3:35am

Yes, that’s right.

Rumraket · November 17, 2018, 10:38am

I took a short online course in computational molecular evolution a few years ago, and the instructional videos are available online here, where some common tree building algorithms are explained pretty well in my opinion:
Maximum Parsimony
Maximum Likelihood
There are more videos hosted by that youtube channel on different algorithms and other aspects of molecular evolution, such as genetic drift, detection of selection etc.

As a layman it’s helped me understand a lot of things much better and correct some misconceptions I’ve had.

BruceS · November 17, 2018, 5:24pm

When you say “writing such a program”, is correct to say you do not mean the design, coding, or UI, but rather the requirements and in particular the biological models and the associated statistics. It’s not the program itself that is helpful, but rather the underlying science and (shudder) math. Is that characterization fair?

I’d be leary of TSZ. The resulting comments likely would just degenerate into something like the Nested Hierarchies thread from July that had a very low signal to noise ratio in its 1000+ comments.

Maybe Joe would post something here, where the moderators try to manage that by shunting such comments to side threads.

Mung · November 19, 2018, 1:11am

Yes. Being able to look at the source code, or write the code, would just be a tool to understanding.

Are the phylogenetic trees created first, and then a p-value is associated with the phylogenetic trees? Is every single phylogenetic tree that is generated considered evidence for common descent, or just some of them, that meet or exceed some other criteria?

I think John has answered this and that it’s the latter that is the case. So I don’t believe I have been at all “working from very basic misconceptions.”

swamidass · November 19, 2018, 1:13am

Great idea. Would you like to invite him? We can keep things on track here.

swamidass · November 19, 2018, 1:14am

Those questions are not answerable in a direct way because there are multiple methods. The each work in different ways, and might yield different answers for those questions.

Mung · November 19, 2018, 1:21am

I agree. Not every method uses p-values, for example. But surely they all use some means of ranking. We can move to a higher level of abstraction. It would be interesting to see whether a common abstraction can be found.

John_Harshman · November 19, 2018, 6:17am

I do not in fact know of a method that assigns a p value to a tree. Nor do I know of a phylogenetic method that uses a null model to assess any single tree.

Mercer · November 19, 2018, 5:40pm

Correct. The direct outputs don’t contain p values, they contain scores. But the scores can be translated to p values:

John_Harshman · November 19, 2018, 5:44pm

I don’t believe either of those papers is about phylogenetic analysis, which is the supposed subject.

swamidass · November 19, 2018, 5:47pm

As I understand it, it is most common to establish confidence of specific nodes, not the whole tree.

Topic		Replies	Views
Evidence for Evolution - Your Elevator Pitch Conversation Science	77	4583	November 16, 2020
Phylogeny - Help me see what you see Conversation Science	128	3528	February 6, 2021
Reviewing Nathaniel Jeanson's nuclear DNA arguments Conversation	9	546	May 10, 2019
General Discussion on the Nature and Methods of Cladistics Conversation Science	18	507	June 15, 2022
Gpuccio on Common Descent Conversation Science	1	750	August 26, 2019

What Line of Evidence is Strongest for Evolution?

Related topics