Beyond Reasonable Doubt? A Test for Common Ancestry

I think I disagree. The test’s expectation starts from an understanding of sequence evolution, which is a transitional mechanism. It seems they are presuming largely neutral mutation with negative selection in setting the expectation for common descent.

Once again, @evograd, am I missing something? What do you think @Joel_Duff and @pnelson?

Yes, they’re beginning with assumptions about the stochastic nature of sequence evolution, but at the same time their results in figure 4, whereby ancestral convergence increases with sequence length, acts as a somewhat independent confirmation of this implicit assumption.

1 Like

I am trying to understand exactly what you said… are you referring to how the clades are initially defined? (i.e based on similarity between functionally conserved sequences?)

In that case would the test be more neutral if non conserved sequences are selected.

If i am way off the mark (which is very likely), can you explain in a way a non specialist could understand?

I’m not sure I quite understand what you’re describing. I’ll go with what I understood it to mean for now, but please correct me if I misunderstood.

So if we define a corner of sequence space as “functional”, then if this corner is big enough there will inevitably be a “centroid” sequence that is equidistant from each of the edges of this space. I agree with this part, but the second part seems suspect to me. If we samples sequences in this space uniformly, putting half of the sampled sequences into group 1, and half into group 2, then surely there would be no significant difference between the average distance between sequences in group 1 and 2, and the distance between the inferred ancestral sequences of group 1 and 2?

The only way to the observed pattern would be to sample from 2 different ends of the sequence space, one side for group 1 and 1 side from group 2. This is the point of the test though - the ancestral sequences will more closer than the average extant sequences because each group has neutrally explored different “sides” of the sequence space, and that the sequence space is inherently convex because of common ancestry. If there was no common ancestry, there would be no reason to expect this kind of “divergent angle” in the explored sequence space.

I think is the crux of it - precisely that the ancestral sequence will not be the “average” sequence. If it was, then we’d never expect to see any significant results in their test. Remember, they’re comparing the “average” sequences of each group and comparing the ancestral sequences of each group. If the average sequence and ancestral sequence were basically the same, the test wouldn’t return any significant differences in their comparisons, leading to results that would fit the null model. What the test does is see whether the inferred ancestral sequences are different to the average sequences. If they’re not, the null model would fit the results. If they are, then there are 2 options: either the ancestral sequences converge or diverge, relative to the comparison of the average sequences from each group. I’m not sure what it would mean if the ancestral sequences were more diverged, it might imply that the groups have seperate origins but are converging in the same direction over time, but the implications of the ancestral sequence converging is clear. The groups diverged form an ancestral sequence AKA common ancestry.

1 Like

On what basis are the sites “known to be rapidly-evolving” and “lacking any useful phylogenetic signal”? Every site in the sequences has a history. Right? So how do we know which sites represent noise?

I’ll comment further in this thread after I obtain the unaligned and aligned sequences from Zhong (from the PLoS 2013 paper linked by evograd).

1 Like

Well, in this case, Fang et al. point out:

By contrast, the phylogenetic tree inferred from the removed GC-heterogeneous sites of the gc_168 dataset under the heterogeneous model is poorly supported (supplementary Fig. S13), which indicated that these GC-heterogeneous sites contain noisy signals.


Because we can directly measure the increased rate of mutation, and see that it corresponds also with increased divergence. That is independent corroborating experimental evidence that they are less phlyogentically informative.

1 Like

This is a valid point and I can see why scientists use it for making phylogenetic trees based on assumptions of common descent.
However, the paper states that it wants to test seperate ancestry of individual species as well as that of CAs in a group as a null hypothesis. In such a case, shouldn’t they refrain from segregating sites in any way? Or is their claim that using “long sequences” would even things out fair?
Also I am really interested in your conversation with @evograd on the paper. Is his understanding of your point correct. Do you accept his argument for the premise of the test?

I request you to respond as and when you have time.

@evograd- I appreciate you pointing out this paper. I think Theobalds paper was a step in the right direction (even though his method was falsified). I am glad to see scientists continue to develop a formal test for common ancestry.

1 Like

There is no assumption of common descent in my claim. It is just a valid point, without an assumption of common descent. I’m not engaging in polemics here. This just logic. Do not dismiss what I am saying by assigning assumptions to it that are not there.

I take my scientific claims very seriously. If you think there might be an assumption, you can ask. Please do not assert falsely there is an assumption in order to dismiss it. I understand you may not know the assumptions behind my point, which is why you should ask the question instead.

Fair enough.Just to clarify, I was talking about assumptions made in phylogenetic trees. Not assumptions you made…That’s why I asked further questions which would clarify the issue as below.

Edit:@swamidass, pls note that right now I am not trying to make arguments for or against this paper. I am just trying to understand it… Hence the questions…

1 Like

I want to be clear here. I affirm evolutionary science including common descent. Also, my thoughts here are provisional. Nonetheless, as I’ve been thinking carefully about this, I do not think this is a valid test of common ancestry. It appears that an obvious special creation model that would produce the exact same results.

Why is this important? We have really strong arguments for common descent. Incorrect arguments, however, do much to cloud truth. We have no need for them. Of course, maybe I missed something and I want to put this out there to be reviewed.

This does not appear to be true. It appears you misread the paper, or miswrote what you correctly read. Let me show you what I mean. Here are the key parts of their methodology…

For Step 1 we take two subgroups of taxa X and Y (see Figure 1) that on independent evidence have non-overlapping subtrees; that is, they are natural subgroups (or clades). …

The program MUSCLE [14] is used for calculating alignment scores, see details later. For Step 5, the pairwise alignment score s(ax,ay) is then calculated between the inferred ancestral sequences ax and ay (we call this the ‘ancestral score’), with higher values showing that ancestral sequences are more similar (Table 1). In Step 6 we then calculate the alignment score s( i , j ) for all pairs of sequences (with just one sequence from each of the two subgroups). From the resulting distribution of between-subgroup scores (see Figure 2) we calculate (Step 7) the probability p of observing scores at least as high as the ancestral score under the null model, which we now describe.

They are NOT comparing the ancestral sequence difference to the average sequence difference (as you say). Rather, they are comparing the ancestral sequence difference to the distribution of inter-group pairwise differences. That is why they show these cumulative distribution plots (see Figure 2):


Given that fact, it remains exactly true what I said in the first place.

What I mean by a “generative model” here, is a non-common descent generative model, based on picking things from a defined space of sequences with a defined protein function.

So good. We agree.

What I am saying is that the “average” is approximately the same (not exactly) as the “ancestral” in this conception. That remains true. You had misunderstood the algorithm. It is comparing the distance between the ancestral/average and the distribution of inter group distances.

Remember, also, that the groups are not uniformly distributed in space. One group is one half of the convex space, the other group is the other half. If the extant sequences are confined this way, with any spread at all, it might even be possible to mathematically prove, that the ancestral sequences must be more similar than the median of the comparison distribution. I don’t have that proof yet, but my instinct tells me that such a proof is actually possible.

Contrast with their claim:

Our test is based on the expectation that, under evolution, the ancestral sequence of one natural group of taxa will be more similar to the ancestral sequence of a second natural group of taxa, than to any sequence from the first group will be to any sequence from the second. In contrast, a variety of proposed non-evolutionary models either do not make this prediction, or require so many parameters that they cannot be said to make any testable predictions at all.

First off all, they are being sloppy here. By that definition of their test, several figures here would seem to invalidate the predictions of common descent (!). Setting aside the sloppy language, notice that they do not even appear to test any of these non-evolutionary models (why not?). I might have missed it, it appears they have no negative controls, and merely make a high level logical argument of the expectations of a non-evolutionary model. This, honestly, seems to be poor methodology.

It is possible I missed something in the paper. I want to be corrected if I am wrong. If I am wrong, there are less consequential tings to work through. This, however, towers over the rest in potentially invalidating the entire study.

What do you think @evograd and @pnelson ?

1 Like

Great. These are all small potatoes questions though. I’m concerned about a more fundamental problem, that cannot be easily rectified. If my concern is justified, the test is not valid, and these other points are not important to engage.

In this case there is no assumption being made. They are making a statement that:

  1. We can build two trees of two groups of related sequence. (de facto true, no assumption)
  2. If common descent is true, (that is the claim to be tested)
  3. then there will be a specific mathematical pattern observable in these these two tree. (because it is between the two different trees, no common descent assumed)

No assumption of common descent. It is really critical to carefully understand these things instead of kicking up noise (not saying you are doing that @Ashwin_s). Bad arguments against bad arguments just add to the confusion. We want to get this right. I have no incentive to be dishonest or sloppy here. Quite the opposite.


You’re right, I remember now that I (over?)simplified this for my blog post. Rather than calculating the average sequence for each group and then comparing them, they obtain a distribution of inter-group pairwise differences. However, in my mind this is basically the same as calculating the difference between average sequences, except that it is a distribution around a particular mean difference rather than a single number. In both descriptions though, the p-value of the ancestral convergence is calculated against this distribution/single number, so the point is still that the ancestral convergence has to be greater than the similarity between extant sequences. I’m sure how that really changes the results.

I understand that, which is why I was confused earlier when you referred to uniform sampling of the space to assign sequences to different species - if the sampling was uniform then each group would comprise sequences that were randomly distributed through the functional space, rather than having group 1 from the left side of the space and group 2 from the right side.

I just feel like demonstrating that the explored sequence space is convex and the groups represent different halves of it is the whole point of the test. I don’t see why these things would be the case under a seperate origin model.

Can you elaborate on all of this. What’s sloppy about their language? Which figures falsify common descent? Why is their null model not appropriate?


First off, I want to emphasize my model of with a convex space was merely illustrative, to build some intuition here. It may break down in much higher dimensional space.

So here is the statement in conflict with the data…

Our test is based on the expectation that, under evolution, the ancestral sequence of one natural group of taxa will be more similar to the ancestral sequence of a second natural group of taxa, than to any sequence from the first group will be to any sequence from the second.

They talk about two things here

A. The difference between two inferred ancestral sequences of two taxa.
B. the difference between two sequences, each from a different one of the two taxa.

They say that A will be smaller than B for any pair of sequences from the two taxa. That any is where the problem is. Look at these data figures sliced from Figure 3:


The circle is where the ancestral sequence pair is. The line shows the distribution of inter-tax sequence pairs. Note, that in ALL cases, there exists at least a few examples of sequence pairs that have higher similarity than the ancestral sequence pair. Moreover, for some cases (psbL) there ancestral sequence is less similar than the bulk of sequence pairs.

Therefore, it is not true that any inter-taxa sequence pair is lower similarity than the ancestral sequence pair. If we take their sentence at face value, this should be evidence against common descent. Really, it is just sloppy language, and perhaps another error is affecting them too (lack of controls).

The next sentence says:

In contrast, a variety of proposed non-evolutionary models either do not make this prediction, or require so many parameters that they cannot be said to make any testable predictions at all.

It is not clear what these non-evolutionary models are to which the authors refer. I’ve proposed one that I think will produce similar results. Maybe I am wrong, but one cannot demonstrate that all possible special creation models do not make this prediction. We can make claims, however, about known models, but we actually have to test what we claim are predictions, ideally with simulation. They just assert the models don’t make this prediction, but it is not clear they have demonstrated it to be so, or even identified what these models actually are.

This gets to the final problem. They did not demonstrate that common descent makes this prediction. It is important to remember that population genetics is not intuitive. It is critical to test claims with simulation. They did not do this. Instead, the just asserted what the common descent expectation is, without ever testing to see if this is actually correct. That is, perhaps, why their is sloppy language. They weren’t actually testing their claims about what each model predicts.

So, their claims about what common descent predicts ends up actually contradicting the evidence they present. That is not a good situation to be in.

Questions are okay. I’m sorry if I was too firm. I just didn’t want to get distracted from the key thread. Did my posts make sense of things in the end?

1 Like

Yes they did to some extent.
I have only doubt.

This seems a very obvious error. Since this is a peer reviewed paper, i find it a little hard to believe it was missed. Their main argument seems based on

  1. Probability:
    " Combining probabilities for all genes using Fisher’s method as before, we find that the probability of observing such high ancestral scores for the 51 chloroplast proteins under our non-evolutionary null model is 1.51×10−57 (compared with ≈2×10−19, see the top row of Table 2) - our test is thus very conservative."

  2. Parsimony : They make the following argument.
    " The length of this tree is then by definition the minimum possible number of separate decisions, or equivalently free parameters, that a hypothetical external agent requires in order to produce the complete set of sequences, given any one of the sequences as a starting point. A lower bound for the length of a Steiner tree is given by half the length of a minimum spanning tree, which is a tree that connects all given points without introducing additional points [24]; minimum spanning trees can be computed efficiently. For the eudicot/monocot example, a minimum spanning tree requires 36,473 mutations to connect all 68 sequences, implying that we would need at least 36,473/2 = 18,237 free choices, each a separate parameter. Any suggestion that a model with such a huge number of parameters ‘explains’ the data is of course a serious violation of the scientific principle of selecting the simplest model."

I have two question -

  1. Does the error in initial premise of the test effect the probability calculations? Can they stand on their own vis a vis the generative model suggested as a counter by you… (or is it just impossible to calculate?)
  2. How would you treat the Parsimony argument. I dont see why a decision by an agency should be less parsimonious than a mutation. Perhaps future development of Ewerts model would be relevant to this?
1 Like

It is not an obvious error.

@evograd is a scientist, and has spent far more time with that paper than myself. @pnelson is a scientist too, and did not cue into this. You read the paper and missed it too, right? It takes a lot of training to be able to identify errors like that (if it is an error), and also a bit of luck. The paper, as written, is very difficult to untangle. I’m not sure if the authors we merely sloppy in their writing, or (worse) sloppy in their thinking.

It is not really worth engaging these arguments. Without getting into the details, they are stretching pretty far in their analysis and it even seems some of these arguments are intuitive, but incorrect. They do not actually do any simulation or modeling to demonstrate they are correct. It is not even clear if their basic premise is correct.

A lot would have been clarified if they had run positive controls (simulations of common descent) and negative controls (simulations of some basic models of design / special creation). If their positive controls had come out positive, and the negatives negative, then in many ways the details don’t matter so much. However, they didn’t run those controls, so any loophole in their argument or reasoning is of high consequence.

I’d predict, if they had rune the right simulation controls, then:

  1. They would have clarified and demonstrated the precise predictions of common descent in this case.
  2. They would have found that some non-evolutionary models would make the same predictions (and would be indistinguishable from common descent)
  3. They would have found that some non-evolutionary models would make different predictions (and would be challenged/ruled-out by the data).

Honestly, it is no terribly hard to imagine models that would fit into class #2 or #3. So it seems their claims are overreaching pretty far, especially in light of how they are framed.

Yes, it does affect the probability calculations. They are based on an undemonstrated premise.

It is not a good argument as presented. There are good ways to apply parsimony. I’m regularly publishing on with that principle

However, in this case, it appears to be a strawman argument against a non-evolutionary process. The fact is that there is very strong evidence for common descent. There is no need for weak or fallacious arguments when we have strong arguments. Even then, we can imagine special creaiton scenarios (in theology, not science) that would produce the same data.

So, outside of science in theology, at best, we can only say that it looks like common descent from a scientific point of view. Perhaps God created in a non-evolutionary process that happens to look like evolution. Is that plausible? Well, science can’t really answer questions about what God would plausibly do or not do. So we are well outside science.

It is already relevant.

The conceptual error is very similar (in some ways) to the argument we discussed here: Winston Ewert: The Dependency Graph of Life. He was working off a semantic model of common descent (a tree), rather than the reality of what the theory actually predicts (somewhat accessible by simulation). This undermined his conclusions against common descent almost entirely. It is possible these two papers are well paired, as making the same sort of logical error in opposite ways.

If Ewert’s observation pans out with a better analysis (and he is far from demonstrating it so), he would have uncovered something interesting. He has not, however, actually yet engaged with the actually theory of common descent. It is not really an observation that would challenge common descent, at least not as part of the approach he described to us.

@evograd thanks for bringing this paper to our attention. Really interesting read. I’m glad we were able to look at it. Do you think I missed anything big in my analysis? You’ve definitely been thinking about this one longer than I, so it is not inconceivable you’ll succeed in getting me to retract an error. You should give it a shot.

Ok thanks for the clarification. That was helpful.
Appreciate it.

1 Like

Ok, yeah that’s little bit sloppy taken on it’s own, of course the real expectation is that there will be a significant trend towards A being smaller than B. This is biology after all, and there are always going to be exceptions due to noise or convergent evolution. It would be like saying common descent predicts all gene trees to overlap perfectly: we can’t expect such perfect data in biology, and besides, we know of plenty of processes that would cause discordance there.
Indeed there is a significant trend in this direction, and most of the cases where B is greater than A are driven by short sequence length:

Across all 51 genes, on average 22% of pairwise scores were at least as high as the ancestral score, but this is mostly caused by a small number of shorter genes with relatively low ancestral scores (see Figure 3). Results are shown for each of the proteins in Table 3 and Figure 3. Figure 4 shows a correlation between protein sequence length and convergence, certainly consistent with a stochastic mechanism.

Of course we can’t, but we go with the best model available at the time. As they say in the introduction:

This clearly does not ‘prove’ that yet unknown models are impossible, but the theory of evolution leads to extremely strong predictions, and so the onus is now on others to propose testable alternatives.

Well, in their analysis their non-evolutionary model is simply one of seperate ancestry between the groups, allowing for the possibility that there is even seperate ancestry within each group all the way up to individual species. This seems like a pretty good estimation of a generic seperate ancestry model to me.

1 Like