Evolution Does not Predict Traits are in Trees or Nested Clades
First off, one reason this is so confusing is that it is often falsely stated that evolution produces and predicts perfect nested clades. This is false. Totally false. Obviously false. Homoplasies are observations that do not fit the tree, and evolution predicts that they will be numerous. Not knowing this, some people (e.g. @Cornelius_Hunter and @Nonlin.org ) think that the mere existence of homoplasies is evidence against common descent. This is false. Common descent does not predict that there will be no homoplasies.
How long have we known this? For over 150 years. Wings are an example of a homoplasy. We see them on birds, bats, and insects. That is an example of a trait that does not fit in a tree. So, to be clear, common descent does not predict and never has predicted that traits fall into perfect nested clades.
That doesn’t stop bad arguments from evolution being made all the time. As I explained:
Empirical Data Demonstrates Homoplasies
So what about empirical data? How do we know this for sure?
We can look at DNA from several humans, and ask if that fits in a tree or not. This is a great control experiment, because we both agree humans arise by common descent (from other humans). So if we see variation in humans that does not fit in a tree arise (i.e. nested clades), that covers what we know from the theory: common descent does not produce data that perfectly fits in a tree.
We see exactly this. For a whole host of reasons, human variation does not fit in a tree. This includes (1) recombination, (2) population structure and migrations, and (3) convergent evolution, especially in HLA, (4) the birthday paradox, and (5) horizontal gene transfer (most commonly lysogenic viruses. Without parsing it out precisely, we just see that a graph fits the data much better than a tree:
Other Mechanisms at Speciation
Perhaps the most important additional mechanisms important with speciation (for example explaining homoplasies between humans and primates) are:
Incomplete data, especially annotation of proteins (given the datasets that @Winston_Ewert used). This was correctly pointed out by @glipsnort.
Incomplete lineage sorting. Given that know for a fact (given the human variation) that variation in a population does not fit a tree, it is no surprise that if we form a new species by partitioning a large population, we expect there to be homoplasies in the final population.
There is more, but those are good starting points, that should be fairly easy to understand. Keep in mind that we’ve already known that the data does not fit a tree. Common descent does not predict the data fits a tree.
How Much Did Dependency Graphs Really Help?
This is perhaps one the of the more important points.
That means that the Dependency Tree only fits about 1.7% better than a tree. That is not very much. It is a very low improvement, that might vanish entirely if any of the factors we’ve pointed out above are addressed more carefully. A 1.7% improvement on noisy data is not going to be enough to convince people any ways.
I’ve not really even explained all the other relevant mechanisms. There are actually more causes than this, several more come to mind. This, however, should be enough to show you this work only unsettle a cartoon version of evolution, which we already know is false
That is really the right way to think about this. I’m really impressed by his attempt. He has a long way to go, but we are all going to be watching it closely. I wish him the best, and will help him any way we can. Give him time, and tell the polemicists to back off. All they do is hold him back.
That thread is being carefully considered by @aguager from DI. I’ve been putting that out there for several years (see the UD dialogue with @vjtorley). No one yet has offered an alternative explanation.
That still stands @JoeG. If you want a good scientific challenge to common descent, encourage ID, YEC, and OEC scientists to understand this data and produce an alternate model. As long as no one tries explain it in a better way, we will only have Common Descent.
A very well written response. A follow on question, you state that evolution predicts homoplasies will be numerous. Could you expand on this? When you say ‘numerous’ is this a raw number or a proportion? 1,000,000 homoplasies is a large number, but not in comparison to 1,000,000,000,000 organisms.
You also divide a very large Bayes factor by an even larger factor, and state the resulting small ratio is how much more the dependency graph explains, which is a very small number. Is this how Bayes factors work? Is there a way to calculate how much a tree must become non-treeish before a dependency graph is a better explanation? I.e. how many cross clade branches must be added and existing branches removed? 1.7% may seem like a small number, but it still may entail a huge amount of branch addition and removal to transition to a dependency graph.
There is also a sort of question begging going on. If intelligent design is at work, then we would get a large number of horizontal gene transfers and homoplasies, as we do with human driven breeding and genetic experimentation. So, ID also predicts a very high number of transfers and homoplasies.
At what point is the number of transfers and homoplasies evidence of intelligent design instead of evolution? Otherwise, we could just be artificially relabeling evidence for intelligent design as evidence for evolution. How do we ensure this is not happening?
There is also a bit of an equivocation. Just because one organism is descended from another does not mean genetic meddling is not happening. So, hypothetically, we could still have common descent in the purely procreative sense, but the genetic record can still be highly indicative of intelligent design, and thus the dependency graph would result. What is interesting, then, is not procreative common descent, but genetic common descent, and the former does not entail the latter.
To really falsify Dr. Ewert’s result, it is not enough to say A gave birth to B. One must also say that all the genetic material in B is only from A + random variation. There are other sources of variation that would not falsify the dependency graph, such as environmental patterns matching a genetic database of responses that is preconfigured in the DNA.
So it seems that not only does Dr. Ewert need to do some more work, but proponents of common descent also have to do some more work to explain why the evidence they already have does not improve Dr. Ewert’s case. As it stands, the evidence you offer and Dr. Ewert’s work suggests that common descent + dependency injection or pure creationism are better explanations than common descent + random variation. The question of procreative common descent is orthogonal to the discussion.
Common Design explain s the data you presented. Common Descent doesn’t even have a mechanism capable of producing eukaryotes.
What Common Descent requires is a mechanism capable of explaining the anatomical and physiological differences observed between two allegedly related species such as chimps and humans. Right now we don’t have that. So right now the best we can do is say “we don’t know”. That is the only honest scientific answer to the question.
(S. Joshua Swamidass)
split this topic
The “this” he is talking about is the lack of ID models. @pnelson (and ID fellow) has commented on this too. He is concerned that when this is not acknowledged in ID that it stifles innovation. People think there is a design model, when there is not, so they don’t work on one. I’ve agreed with them on this, and this is why @Winston_Ewert’s and @Agauger’s work is so significant.
Out of respect for them, I need to you be more respectful of their efforts. Rather than pretending that they are working on an already solved problem, give them space to do their work. Follow it closely and encourage them. Understand them. Do not undermine @Winston_Ewert and disrespect him by pretending all his hard work has been towards a solved problem. That is not true. If he is successful, it will be ground breaking for ID.
You do not have to agree with Common Descent. Maybe science is wrong here. Whatever the case, no one here cares what we personally believe in the privacy of our own hearts.
So it is entirely accurate that the patterns I presented do not have an ID model that explains them, except common descent. Remember, as every ID leader will tell you Common Descent is a design principle too. That might have been part of the story, one of the design principles, of how God created life.
If I am wrong, show me the mathematically theory that explains that pattern. There is none. Everyone knows this. That is why science is currently solidly behind common descent. If you want that to change, at minimum you must produce a better mathematical theory. That is what @Winston_Ewert, to his credit. Show some respect. Stop undermining his work.
That is also why the last part of your comment is a non-sequitor and totally irrelevant.
I’ve heard you say that a million times. It is false. Rather than distracting from the progress being made by real scientists, follow the wise ID leaders here like @pnelson and @Winston_Ewert. I will be fair to them. I will give them a hearing. If they are right, I’ll make it known.
As for @T.j_Runyon? He is new as a moderator. You can try pulling his chain now and then. At some point he learn to only pay attention when you are being kind. As for @Winston_Ewert, however, leave him alone. He is not here. Is our guest. I have too much respect for his work for you to disrespect it like this.
This is unfortunately a to undefined as a question. It all comes down to the mathematical details of the precise traits we are considering.
In Ewert’s Case…
In Ewert’s paper, we are talking about the presence or absence of specific protein families, which is a fuzzy concept. Unfortunately, we don’t have very strong “rate” information about how quickly they can be deleted, or gained, like we do for point mutations. So it is hard to compute a precise number.
In this specific formulation, mistakes in annotating proteins, and incomplete genomes will add “noise” to the data that the Dependency Graph will think is a signal, even though its just noise. He suggest the exactly right remedy.
He is right. That is the way to test this further, and build confidence it isn’t just a spurious result. As it is, we do not know for sure at this point. It seems he relied heavily on annotation databases, and that has some real weaknesses. They can be overcome.
That is how they work. The numbers are the “error” or the “unexplained information” in the data under each model. If he wrote his program correctly (and let us assume he did), then these are all proportional to “bits” of information that are unexplained in the data. In this data, the dependency graph explained less than 2% (though I’ll round up) of the information that the tree did not explain.
How much of that remaining information (the other 98%) is an unknown design principle? An unknown natural process principle? Stochastic randomness (still governed by God’s providence)? Divine action? We just do not know. It is impossible to know without figuring out a model that makes sense of it.
Remember, trees already explain quite a bit of the data. It is not clear at all how much of the 2% is really fitting a dependency graph, instead of just being impacted by the noise in his data. As a proportion of the data, it is just a small amount. However, there is a lot of data here. It is entirely possible that in terms of individual proteins, that there are actually many examples. Once again, it could just be noise. We do not know yet.
This has nothing to do with design vs. no design. Every ID advocate should know that Common Descent IS a design principle. None of this is about ID vs. evolution. It is about an alternative model for origins other than common descent. We already know that God can use common descent to design us.
ID does NOT predict a higher number of homoplasies. It makes zero predictions without specifying a design principle. @Cornelius_Hunter hashed this one out before. In the dialgram below, the more to the bottom of the graph, more homoplasies (and less tree like). The closer to the top the fewer (and more tree like).
A key variable you are missing is the “speed” at which a features evolves (which is often directly measurable). Depending on this parameter, common descent will make different predictions. This isn’t cheating, because we can directly measure the speed in many cases (especially with DNA), and this gives us a way of analytically computing the expectation.
Any how, I think (in my opinion), ReMine’s model corresponds to the blue line, the features used in this analysis correspond to a mixture of slow and fast evolving features, so in this case CD corresponds to the green line. And your model corresponds to the orange line, because it is not specified at all.
So the data ends up fitting CD the best. I should point out that I believe in design too; design by common descent. So in no way does this rule out that God created us. Rather, it tells us that common descent is the best design principle to understand biology. Signal and Noise - Scientific Evidence - The BioLogos Forum
How did you arrive at the conclusion that Design should have less consistency (at about 0.6) than common descent (at about 0.8)? Walter ReMine (a YEC that rejects evolution) argues exactly the opposite, that design should produces more consistency than common descent (he argues it should be close to 1). Which of you is right? And how do we know?
If possible (though I understand this is not likely), can you please show your math. How did you arrive at such precise numbers, with non-overlapping uncertainty?
You also point out that you are using an “idealized” common descent model (which presumably does not model any noise). So why should we trust it? Don’t we want a realistic common descent model that includes noise, and would therefore reduce the expected CI?
@Cornelius_Hunter had no response. What is exciting about @Winston_Ewert’s work is that he is giving it an attempt. That is respectable. I want to get as far as is possible with the data.
God Can Meddle
Of course. None of this is about God’s action. Maybe God is inspiring proteins, just like I already offered you:
Questions about God’s action are different than common descent. So there is an equivocation. Let’s stop thinking this is about ID. It is about Common Descent.
No one is looking to falsify his result. I’m hoping he will be successful as possible. I wish the best to @Winston_Ewert, and hope he will reveal new things to us about how life arose. We will help him as much as possible. Our exchange was merely the same standards that @sygare, @glipsnort, and myself hold ourselves, our colleagues, and our students. We treated @Winston_Ewert like an equal, and he handled himself well.
I think you just misunderstood what we are doing here. No one is arguing against anything. We are just trying to give an honest account of the science and be helpful.
Sure there is. If you want to believe God did it too, that’s fine. The evidence I laid out for common descent do not depend on purely natural causes.
It would be great to see someone develop that into a real mathematical model, and test it.
At Reasons to Believe, @AJRoberts and Fuz Rana have mentioned it to me before. I’m not sure how it can work, but will help them if I can. Part of the challenge for them is that most people in the OEC camp can’t agree where the divisions between creation events are. Is God creating distinct species? Families? Phyla? Until they sort that out, it will be hard for them to make progress.
As far as YECs are concerned, they have been trying with bariminology, but it hasn’t worked out terribly well. It just looks like humans are the same “kind” as apes, no matter what metric they use. They are stuck, but I think might have way forward. We’ll see if they are interested.
OK what is this alleged mechanism and how can we test it? Until you say you don’t have anything.
And why do we need a mathematical model? That doesn’t make any sense at all. Unguided evolution doesn’t have a mathematical model.
We have sequenced the chimp and human genomes and yet no one can link the genetic differences to the anatomical and physiological differences. Until you sort that out it will be hard for you to make progress.
(S. Joshua Swamidass)
split this topic
For clarity, here is my definition of homoplasies.
A feature that shows up in multiple descendants, but not in the ancestor.
In a design scenario, similarity of features occur all the time across independent descendant that did not occur in their ancestors, i.e. many kinds of vehicles have internal combustion engines, and before there was a greater diversity of propulsion mechanisms. This is what dependency injection is, and creates a dependency graph.
However, in a stochastic variation common descent (SCD) scenario, it seems extremely unlikely for the same feature to occur independently. We could test this computationally with genetic programming, and see how often the same function evolves from different ancestors.
Here is a mathematical formulation of the difference.
A function F is represented by some DNA sequence of length K, or set of proteins of cardinality K.
Independent occurrences means from different parents. I.e. if a set of children from the same parent all have F, this only counts as 1 occurrence.
M is the number of independent occurrences of F in ancestor generation G0.
N is the number of independent occurrences of F in generation G1, who are direct descendants of generation G0.
Then, ID predicts N-M > 1 and SCD predicts N-M <= 1 for sufficiently large K. The purpose of K is to account for the easy occurrence of small homoplaises by chance.
N-M > 1 generates a dependency graph, and N-M <= 1 will be a tree.
So, given a dependency graph is the best fit for data, then N-M > 1, and ID is a better explanation than SCD.
My overall take away from the discussion on Dr. Ewert’s paper is that no one believes a tree is a good fit for the data. Thus, everyone agrees with Dr. Ewert’s main result, that a graph with some degree of dependency structure is a much better fit. Hence, per my above reasoning, ID is the best explanation for what we see.
I haven’t seen explanations for the result. I’ve seen statements that the data exhibits a medium amount of homoplaises, and assertions that SCD can account for this. Without an explanation how, and not having expertise in the area, all I can do is offer my thinking why the ID conclusion seems the most plausible in my mind. Not trying to disrespect anyone here.
But, still the main takeaway is everyone agrees with Dr. Ewert’s result that pure trees are bad models for the data, and some sort of dependency injection is better, correct?
The fact that the dependency model fits modular software design is interesting. Winston has put a testable model together but we are in the first inning of validation as Joshua mentioned. Everyone realizes that the Common descent hypothesis is messy but it has until now been the only hypothesis on the table. Dr. Swamidass is pushing Winston to improve his model and this is a good thing.
One question I have is how would we measure neutral mutations without an evolutionary assumption.
Great to see you @colewd. Thanks for showing yourself here and welcome. A bit busy the next few days, but will try and answer you guys as soon as I can. While I’m in out, maybe others might engage too. I’ll be back!
Dr. Swamidass claims this is seen in human genetic history. However, this would be an especially good piece of evidence for dependency injection/intelligent design, since humans have been intentionally breeding themselves for as long as we know.
A better piece of counter evidence would homoplaises in the absence of intention, such as in plants and in simple creatures.