Genetic evidence for common ancestry (split-off from "Dating the Noachian Deluge")

misterme987 · September 10, 2022, 10:25pm

Even if you use an outgroup, doesn’t including that in the tree make the assumption that the outgroup shares common ancestry with the organisms within the group?

Rumraket · September 10, 2022, 10:27pm

No, because it doesn’t actually change what the similarity scores will be when no sequence from the other clade is used in rooting. It is only if a sequence from the other clade(or, I think, another one more more similar to it) is used as an outgroup that the result will be forced. Then it can’t help but make internal nodes closer to the outgroup more similar to it, and therefore also to other sequences more similar to the outgroup.

But as long as you create each alignment and infer a tree independently, where you put the root won’t change pairwise similarity scores of the nodes. It will only tell you which one you should consider more ancestral.

John_Harshman · September 10, 2022, 10:39pm

All you would need to do, if you’re comparing group A to group B, is to pick an outgroup that’s closer to group A for group A and closer to group B for group B. Or you could try an outgroup that’s outside the combined clade that includes both A and B.

Sure does, but what’s the problem with that? You’re assuming common ancestry within the ingroup already. The important thing is not to use common ancestry between groups A and B to inform the states at the root nodes.

Rumraket · September 10, 2022, 10:43pm

Ahh, I see what you mean now. You could use a different outgroup for both.
Edit: No I still think the problem remains. It seems to me that if you use outgroup sequences from outside of the two clades you want to compare, if you know those outgroup sequences have more similarity to A than to B, than sequences in A has to B, then including them in the A tree will unavoidably introduce some ancestral convergence towards B too simply because those outgroup sequences has “a bit more of B” than the sequences in A do.

I can’t work out how this is not poisoning the result if ancestral convergence is what you’re testing for.

Rumraket · September 10, 2022, 11:20pm

On a related note, you could make/use a “guide tree” inferred using an entirely different gene (or set of loci or whatever) to pick an approximate root position for each of your two subtrees.

Then if the two subtrees show ancestral convergence, with a root position (a direction of ancestrality) determined from a tree inferred from completely different gene sequences, that just makes it all the more in need of explanation why they should show ancestral convergence if common descent is false.

Edit: And in any case, simply picking an internal branch to root on is only a cosmetic change to the tree that makes it easier to see what the direction of time on the tree is (and therefore what is more ancestral, further in the past). It doesn’t affect the actual sequences you infer at each internal node.

John_Harshman · September 10, 2022, 11:43pm

Who said anything about similarity? It’s relationships that count. There’s no bias unless you pick an outgroup for A that’s closer to B. At any rate, you can only infer sequences at an internal node of some kind, so you would need an outgroup to infer the sequence at the root node of any clade.

misterme987 · September 11, 2022, 12:33am

Just for reference, here is what White, Zhong, and Penny say about their ancestral sequence reconstruction technique:

For Step 1 we take two subgroups of taxa X and Y (see Figure 1) that on independent evidence have non-overlapping subtrees; that is, they are natural subgroups (or clades). For example, with chloroplast sequences, we select subgroups based on nuclear and/or mitochondrial data [7], [8], and only later check that the subgroups are also supported by the chloroplast sequences. For each subgroup we independently align the sequences (Step 2); infer a subtree (Step 3); and infer the ancestral sequences ax and ay for the deepest nodes of each subtree (Step 4). For this step we use PAML [9], which is a well-established method that is robust to small changes in the tree [10]. Our test is conservative in that ancestral sequences are estimated independently: information from subgroup X is not used to estimate the ancestral sequence for subgroup Y, nor vice versa. We used the cpREV model [11] for inferring chloroplast trees, the WAG model [12] for nuclear proteins, and the mtREV24 model [13] for animal mitochondria.

Frankly, I can’t make heads or tails of this since I’m not a phylogeneticist. It seems to me that they don’t create a tree combining both subgroups in order to determine the root position (contra Rumraket) since they say “ancestral sequences are estimated independently,” but I’m not totally sure. Maybe @John_Harshman can shed some light on it.

John_Harshman · September 11, 2022, 2:11am

This doesn’t describe how they determined what the deepest nodes are or (which is the same thing) how they rooted their subtrees. Perhaps they didn’t estimate sequences of the root nodes but only of the nodes closest to the root? That wouldn’t require an outgroup, and you could just assume the root based on that other data.

Rumraket · September 11, 2022, 7:14am

But that’s my point. If you use an outgroup for A that is guaranteed to give ancestral convergence towards B, the result you’re testing for, it doesn’t seem like much of a test then.

Since the test is supposed to be for ancestral convergence between clades, when there supposedly is agreement that the members of each clade individually share common ancestry, you could just midpoint root each clade, or use some other criterion to pick an internal branch to root on, in each clade. If they don’t share common ancestry between the clades there’s no reason to expect them to converge, and then you haven’t used an outgroup in your subtree that could put bias in the data.

They even say this in the paper:

Estimating the Root of the Two Subtrees

There are several ways of estimating the root of the two subtrees, but in practice it appears to make little difference which of several methods we use. In the chloroplast example, the root of each subtree can be inferred from nuclear or mitochondrial DNA sequences (not chloroplast), and so is independent of the chloroplast data we use. This gives the position of the root in each subtree from prior information; alternatively they can be independently estimated by ‘midpoint rooting’. This can be done either by selecting the midpoint of the longest path, or the internal branch with the longest average of paths passing through it [16]. In practice, we take the node closest to the mid-point because we are estimating nodal sequences. There does appear to be an acceleration of the rate of evolution in the grasses [17], but, again in practice, this appeared to have little effect. The sequence of the root of the two subtrees appears to be quite robust.

I suppose when they say that “This gives the position of the root in each subtree from prior information” they mean they do what I described in the figure above, it tells them what internal branch to root on when they infer the subtree without an outgroup. But rather than make a big tree of both clades A and B to determine the root position of each, they just make a tree of A with mitochondrial data. Then they make a new tree using only clade A sequences and root it on the position implied by the tree that included mitochondrial sequences. In that case I agree there is no bias.

I see that in practice they’re just midpoint rooting and picking the nearest node as ancestral.

evograd · September 11, 2022, 2:30pm

That seems to be what they did, given this quote from the caption of Figure 1:

We use two natural subgroups (X and Y), independently align the sequences for the species in each subgroup, independently determine the optimal tree for each subgroup, independently infer the ancestral sequences ax and ay on the optimal subtrees (in practice the sequence at the nearest node to the root of the subtree is estimated), and finally measure the pairwise alignment score between the ancestral sequences, s(ax,ay).

and this paragraph:

There are several ways of estimating the root of the two subtrees, but in practice it appears to make little difference which of several methods we use. In the chloroplast example, the root of each subtree can be inferred from nuclear or mitochondrial DNA sequences (not chloroplast), and so is independent of the chloroplast data we use. This gives the position of the root in each subtree from prior information; alternatively they can be independently estimated by ‘midpoint rooting’. This can be done either by selecting the midpoint of the longest path, or the internal branch with the longest average of paths passing through it [16]. In practice, we take the node closest to the mid-point because we are estimating nodal sequences. There does appear to be an acceleration of the rate of evolution in the grasses [17], but, again in practice, this appeared to have little effect. The sequence of the root of the two subtrees appears to be quite robust.

Giltil · September 11, 2022, 3:56pm

But if the estimated ancestral nodes are more similar to each other than the extant species are, does it logically follow that the two clades have necessarily a common ancestor? Can’t we imagine that the 2 groups come from 2 different ancestors created separately with very similar genes which then diverged over time?

Rumraket · September 11, 2022, 4:53pm

Yes. The mere fact of convergence between two clades does not prove common ancestry. Of course it doesn’t follow necessarily that there is a common ancestor. Everything can in principle be explained away. Nevertheless it is of course required on common descent, but not required on separate ancestry. Because on separate ancestry they could also have been created with similarity equal to the average similarity of the two groups instead. That means ancestral convergence is more probable a prioi on common descent than on separate ancestry.

A separate ancestry proponent might say something like the common ancestor to all felines, was created more similar to the common ancestor of all canines, and that explains why you get ancestral convergence between these two clades.

However, you can also show ancestral converge between more inclusive clades (that there is a nested hierarchy, and that it goes beyond any two groups a proponent of separate ancestry would argue were created more similar in the past). For example, that there is ancestral convergence between the clade including all mammals, to the clade including all birds, say. And you can keep going in this way to show the root nodes of increasingly more inclusive clades become more and more similar over time. Which is really just another way of showing that there is a nested hierarchy.

It becomes really strange to say that a clade that includes a node representing the common ancestor of felines and the common ancestor of canines, should also converge towards a clade containing the nodes representing the common ancestor of rodents and the common ancestor of primates.

And so on.

John_Harshman · September 11, 2022, 6:47pm

True, but you can repeat the process at a deeper level and find the same situation. Eventually you end up with that nested hierarchy, whereas your hypothesis predicts a star tree.

misterme987 · September 11, 2022, 9:06pm

Can you explain the difference between a star tree and the tree that we see? Also, why does the common design hypothesis predict a star tree as opposed to the tree that we see?

Rumraket · September 11, 2022, 9:41pm

star vs tree

The “created with identical genes” model of independent creation is very strange. If all the different unrelated clades, that each have their own common ancestor, was created with identical genes, then for any tree inferred from these genes you’d basically have lots of trees all connecting directly to the same universally shared ancestor. Basically each branch on the star above would be it’s own clade, like this:

star of clades

So basically a tree rooted in a giant polytomy.

misterme987 · September 11, 2022, 9:44pm

Ah, thanks. I can understand how if each ‘kind’ began at a single starting point, then the ancestral nodes for each ‘kind’ or family should be identical, which would lead to that star pattern. Evidently, that’s not what we see, so the “identical starting point” hypothesis of common design fails leaving only common descent to explain the data.

colewd · September 11, 2022, 9:53pm

Hi Rum
Who is advocating this model?

misterme987 · September 11, 2022, 9:59pm

Hi @colewd,

You are. As we’ve explained several times, the only ways to explain the ancestral convergence demonstrated by White, Zhong, and Penny (2013) without invoking common ancestry is either: (1) the hypothesis that each ‘kind’ was created with identical or nearly identical copies of the same genes; or (2) the untestable hypothesis that a duplicitous creator created each ‘kind’ to look like it shares common ancestry with other ‘kinds.’

Since you presumably don’t like the duplicitous creator hypothesis – I don’t blame you for that, that’s a terrible idea – the only way for you to explain the data without common ancestry is to assume that each ‘kind’ was created with identical or nearly identical genes. But that predicts the “star tree” pattern, as @Rumraket and @John_Harshman explained, which we don’t see. So the only plausible way to explain the data is common ancestry.

Rumraket · September 11, 2022, 10:17pm

It’s not really important who is advocating for it. The point is just to elaborate on what we should expect to see given different possible models. Consider it an exercise in trying to work out what would be the consequences, with respect to the evidence, given different models.

colewd · September 11, 2022, 10:19pm

Do you agree with what Andrew is saying here?

Topic		Replies	Views
Beyond Reasonable Doubt? A Test for Common Ancestry Conversation	96	6249	January 31, 2021
Inferring the ancestry of everyone Conversation Science	2	244	August 10, 2020
Some Questions About ERV Evidence for Common Descent Conversation Science	22	1461	February 18, 2021
Are ERVs evidence for common ancestry Conversation	78	2918	February 15, 2022
Is near-universal common descent detectable? Conversation	14	471	June 3, 2021

Genetic evidence for common ancestry (split-off from "Dating the Noachian Deluge")

Estimating the Root of the Two Subtrees

Related topics