Even if you use an outgroup, doesn’t including that in the tree make the assumption that the outgroup shares common ancestry with the organisms within the group?
No, because it doesn’t actually change what the similarity scores will be when no sequence from the other clade is used in rooting. It is only if a sequence from the other clade(or, I think, another one more more similar to it) is used as an outgroup that the result will be forced. Then it can’t help but make internal nodes closer to the outgroup more similar to it, and therefore also to other sequences more similar to the outgroup.
But as long as you create each alignment and infer a tree independently, where you put the root won’t change pairwise similarity scores of the nodes. It will only tell you which one you should consider more ancestral.
All you would need to do, if you’re comparing group A to group B, is to pick an outgroup that’s closer to group A for group A and closer to group B for group B. Or you could try an outgroup that’s outside the combined clade that includes both A and B.
Sure does, but what’s the problem with that? You’re assuming common ancestry within the ingroup already. The important thing is not to use common ancestry between groups A and B to inform the states at the root nodes.
Ahh, I see what you mean now. You could use a different outgroup for both.
Edit: No I still think the problem remains. It seems to me that if you use outgroup sequences from outside of the two clades you want to compare, if you know those outgroup sequences have more similarity to A than to B, than sequences in A has to B, then including them in the A tree will unavoidably introduce some ancestral convergence towards B too simply because those outgroup sequences has “a bit more of B” than the sequences in A do.
I can’t work out how this is not poisoning the result if ancestral convergence is what you’re testing for.

Oh, I guess I misunderstood what they were saying. So doesn’t that seem to presuppose common ancestry? Or am I missing something?
On a related note, you could make/use a “guide tree” inferred using an entirely different gene (or set of loci or whatever) to pick an approximate root position for each of your two subtrees.
Then if the two subtrees show ancestral convergence, with a root position (a direction of ancestrality) determined from a tree inferred from completely different gene sequences, that just makes it all the more in need of explanation why they should show ancestral convergence if common descent is false.
Edit: And in any case, simply picking an internal branch to root on is only a cosmetic change to the tree that makes it easier to see what the direction of time on the tree is (and therefore what is more ancestral, further in the past). It doesn’t affect the actual sequences you infer at each internal node.

It seems to me that if you use outgroup sequences from outside of the two clades you want to compare, if you know those outgroup sequences have more similarity to A than to B, than sequences in A has to B, then including them in the A tree will unavoidably introduce some ancestral convergence towards B too simply because those outgroup sequences has “a bit more of B” than the sequences in A do.
Who said anything about similarity? It’s relationships that count. There’s no bias unless you pick an outgroup for A that’s closer to B. At any rate, you can only infer sequences at an internal node of some kind, so you would need an outgroup to infer the sequence at the root node of any clade.
Just for reference, here is what White, Zhong, and Penny say about their ancestral sequence reconstruction technique:
For Step 1 we take two subgroups of taxa X and Y (see Figure 1) that on independent evidence have non-overlapping subtrees; that is, they are natural subgroups (or clades). For example, with chloroplast sequences, we select subgroups based on nuclear and/or mitochondrial data [7], [8], and only later check that the subgroups are also supported by the chloroplast sequences. For each subgroup we independently align the sequences (Step 2); infer a subtree (Step 3); and infer the ancestral sequences ax and ay for the deepest nodes of each subtree (Step 4). For this step we use PAML [9], which is a well-established method that is robust to small changes in the tree [10]. Our test is conservative in that ancestral sequences are estimated independently: information from subgroup X is not used to estimate the ancestral sequence for subgroup Y, nor vice versa. We used the cpREV model [11] for inferring chloroplast trees, the WAG model [12] for nuclear proteins, and the mtREV24 model [13] for animal mitochondria.
Frankly, I can’t make heads or tails of this since I’m not a phylogeneticist. It seems to me that they don’t create a tree combining both subgroups in order to determine the root position (contra Rumraket) since they say “ancestral sequences are estimated independently,” but I’m not totally sure. Maybe @John_Harshman can shed some light on it.

Maybe @John_Harshman can shed some light on it.
This doesn’t describe how they determined what the deepest nodes are or (which is the same thing) how they rooted their subtrees. Perhaps they didn’t estimate sequences of the root nodes but only of the nodes closest to the root? That wouldn’t require an outgroup, and you could just assume the root based on that other data.

Who said anything about similarity? It’s relationships that count. There’s no bias unless you pick an outgroup for A that’s closer to B.
But that’s my point. If you use an outgroup for A that is guaranteed to give ancestral convergence towards B, the result you’re testing for, it doesn’t seem like much of a test then.

At any rate, you can only infer sequences at an internal node of some kind, so you would need an outgroup to infer the sequence at the root node of any clade.
Since the test is supposed to be for ancestral convergence between clades, when there supposedly is agreement that the members of each clade individually share common ancestry, you could just midpoint root each clade, or use some other criterion to pick an internal branch to root on, in each clade. If they don’t share common ancestry between the clades there’s no reason to expect them to converge, and then you haven’t used an outgroup in your subtree that could put bias in the data.
They even say this in the paper:
Estimating the Root of the Two Subtrees
There are several ways of estimating the root of the two subtrees, but in practice it appears to make little difference which of several methods we use. In the chloroplast example, the root of each subtree can be inferred from nuclear or mitochondrial DNA sequences (not chloroplast), and so is independent of the chloroplast data we use. This gives the position of the root in each subtree from prior information; alternatively they can be independently estimated by ‘midpoint rooting’. This can be done either by selecting the midpoint of the longest path, or the internal branch with the longest average of paths passing through it [16]. In practice, we take the node closest to the mid-point because we are estimating nodal sequences. There does appear to be an acceleration of the rate of evolution in the grasses [17], but, again in practice, this appeared to have little effect. The sequence of the root of the two subtrees appears to be quite robust.
I suppose when they say that “This gives the position of the root in each subtree from prior information” they mean they do what I described in the figure above, it tells them what internal branch to root on when they infer the subtree without an outgroup. But rather than make a big tree of both clades A and B to determine the root position of each, they just make a tree of A with mitochondrial data. Then they make a new tree using only clade A sequences and root it on the position implied by the tree that included mitochondrial sequences. In that case I agree there is no bias.
I see that in practice they’re just midpoint rooting and picking the nearest node as ancestral.

Perhaps they didn’t estimate sequences of the root nodes but only of the nodes closest to the root? That wouldn’t require an outgroup, and you could just assume the root based on that other data.
That seems to be what they did, given this quote from the caption of Figure 1:
We use two natural subgroups (X and Y), independently align the sequences for the species in each subgroup, independently determine the optimal tree for each subgroup, independently infer the ancestral sequences ax and ay on the optimal subtrees (in practice the sequence at the nearest node to the root of the subtree is estimated), and finally measure the pairwise alignment score between the ancestral sequences, s(ax,ay).
and this paragraph:
There are several ways of estimating the root of the two subtrees, but in practice it appears to make little difference which of several methods we use. In the chloroplast example, the root of each subtree can be inferred from nuclear or mitochondrial DNA sequences (not chloroplast), and so is independent of the chloroplast data we use. This gives the position of the root in each subtree from prior information; alternatively they can be independently estimated by ‘midpoint rooting’. This can be done either by selecting the midpoint of the longest path, or the internal branch with the longest average of paths passing through it [16]. In practice, we take the node closest to the mid-point because we are estimating nodal sequences. There does appear to be an acceleration of the rate of evolution in the grasses [17], but, again in practice, this appeared to have little effect. The sequence of the root of the two subtrees appears to be quite robust.

You estimate the states at ancestral nodes using a tree (or two trees in this case). Then you use some measure of similarity to determine the similarity of the extant species and the similarity of the estimated ancestral nodes, and observe whether the ancestral nodes are more similar to each other than the extant species are. And so they commonly are.
But if the estimated ancestral nodes are more similar to each other than the extant species are, does it logically follow that the two clades have necessarily a common ancestor? Can’t we imagine that the 2 groups come from 2 different ancestors created separately with very similar genes which then diverged over time?

But if the estimated ancestral nodes are more similar to each other than the extant species are, does it logically follow that the two clades have necessarily a common ancestor? Can’t we imagine that the 2 groups come from 2 different ancestors created separately with very similar genes which then diverged over time?
Yes. The mere fact of convergence between two clades does not prove common ancestry. Of course it doesn’t follow necessarily that there is a common ancestor. Everything can in principle be explained away. Nevertheless it is of course required on common descent, but not required on separate ancestry. Because on separate ancestry they could also have been created with similarity equal to the average similarity of the two groups instead. That means ancestral convergence is more probable a prioi on common descent than on separate ancestry.
A separate ancestry proponent might say something like the common ancestor to all felines, was created more similar to the common ancestor of all canines, and that explains why you get ancestral convergence between these two clades.
However, you can also show ancestral converge between more inclusive clades (that there is a nested hierarchy, and that it goes beyond any two groups a proponent of separate ancestry would argue were created more similar in the past). For example, that there is ancestral convergence between the clade including all mammals, to the clade including all birds, say. And you can keep going in this way to show the root nodes of increasingly more inclusive clades become more and more similar over time. Which is really just another way of showing that there is a nested hierarchy.
It becomes really strange to say that a clade that includes a node representing the common ancestor of felines and the common ancestor of canines, should also converge towards a clade containing the nodes representing the common ancestor of rodents and the common ancestor of primates.
And so on.

But if the estimated ancestral nodes are more similar to each other than the extant species are, does it logically follow that the two clades have necessarily a common ancestor? Can’t we imagine that the 2 groups come from 2 different ancestors created separately with very similar genes which then diverged over time?
True, but you can repeat the process at a deeper level and find the same situation. Eventually you end up with that nested hierarchy, whereas your hypothesis predicts a star tree.

your hypothesis predicts a star tree.
Can you explain the difference between a star tree and the tree that we see? Also, why does the common design hypothesis predict a star tree as opposed to the tree that we see?

Can you explain the difference between a star tree and the tree that we see? Also, why does the common design hypothesis predict a star tree as opposed to the tree that we see?
The “created with identical genes” model of independent creation is very strange. If all the different unrelated clades, that each have their own common ancestor, was created with identical genes, then for any tree inferred from these genes you’d basically have lots of trees all connecting directly to the same universally shared ancestor. Basically each branch on the star above would be it’s own clade, like this:
So basically a tree rooted in a giant polytomy.
Ah, thanks. I can understand how if each ‘kind’ began at a single starting point, then the ancestral nodes for each ‘kind’ or family should be identical, which would lead to that star pattern. Evidently, that’s not what we see, so the “identical starting point” hypothesis of common design fails leaving only common descent to explain the data.

The “created with identical genes” model of independent creation is very strange.
Hi Rum
Who is advocating this model?
Hi @colewd,
You are. As we’ve explained several times, the only ways to explain the ancestral convergence demonstrated by White, Zhong, and Penny (2013) without invoking common ancestry is either: (1) the hypothesis that each ‘kind’ was created with identical or nearly identical copies of the same genes; or (2) the untestable hypothesis that a duplicitous creator created each ‘kind’ to look like it shares common ancestry with other ‘kinds.’
Since you presumably don’t like the duplicitous creator hypothesis – I don’t blame you for that, that’s a terrible idea – the only way for you to explain the data without common ancestry is to assume that each ‘kind’ was created with identical or nearly identical genes. But that predicts the “star tree” pattern, as @Rumraket and @John_Harshman explained, which we don’t see. So the only plausible way to explain the data is common ancestry.

Hi Rum
Who is advocating this model?
It’s not really important who is advocating for it. The point is just to elaborate on what we should expect to see given different possible models. Consider it an exercise in trying to work out what would be the consequences, with respect to the evidence, given different models.
Do you agree with what Andrew is saying here?

- the hypothesis that each ‘kind’ was created with identical or nearly identical copies of the same genes; or (2) the untestable hypothesis that a duplicitous creator created each ‘kind’ to look like it shares common ancestry with other ‘kinds.’