Hello @Winston_Ewert, I’m only a part-time reader and rarely a contributor here, but I’d like to extend a welcome (for what it’s worth). I think you will be very pleasantly surprised by the unique nature of the dialogue here. “Peaceful Science” is a description, not an aspiration. Although the crowd isn’t huge, participants here are largely quite respectful of differing views. Moderators here prescribe to very different views on God’s creation with the intent of allowing ALL viewpoints safe voice. Gracious dialogue isn’t displayed 100% of the time, but pretty darn close to it. I hope we can all learn from your contributions at this site.
Edit - As a P. S., I think we’ve all been guilty of being “that guy” in the cartoon!
Thank you for coming @Winston_Ewert. I’ve long been asking @pnelson for models just like yours and @Agauger’s, that recognizable to me as a computational biologist. I meant it not as a taunt, but as a genuine invitation. I’m looking forward to the conversation and expect also to learn through it too.
Great. I endorse that fully. Let me lay down some ground rules for everyone.
We also will have a zero-tolerance policy towards insults directed at you.
Everyone watching this thread, be very cautious on entering this thread. If you do not treat @Winston_Ewert with basic respect, your posts will be deleted. Repeated infractions will get you booted.
@Winston_Ewert if you see any posts that are a problem, please do not leave immediately. Instead, flag them as inappropriate, and let me (or the @moderators) lay down the law. You, however, do not need to engage with anyone rude to you at all. We will prevent this thread from being graffitied by trolls.
If this becomes an ongoing issue, I will move this thread to a protected place where others can watch, but only designated people can contribute.
And everyone watching, if you do not have sensible and nice things to say, just stay out of this thread. If you can’t be kind, you are not welcome to interact with us on this thread. If on the other hand you have legitimate questions or critiques, you should feel free to put a note here. This is an open forum, but it only works when we are kind to one another.
Speaking of which, I apologize ahead of time for the mistakes I make here. When I make them, I will be apologize and make it right to the best of my ability. My goal @Winston_Ewert is to treat you fairly. Thank you for giving us a chance to think about this together with you.
I see in your comments that is absolutely correct. Consistent with this, and to your credit, you write:
@Winston_Ewert, you are earning some real trust here. I hear this as honest engagement. You are not misrepresenting your results.
I suppose it seems that others (Cornelius) seems to be ignoring this, but we’ll ignore that for this thread. We are talking to you, and I agree you are not ignoring it. You are building the first attempt in years to address the nested clades pattern.
If you are to succeed at replacing common descent as an explanatory model for biology, you are going to need to make several advances. It is entirely okay that you’ve limited yourself to a subset of the problem for now. I’ll stop saying “ignore.”
Sure, but I’m not talking subjectively. I’m talking from a mathematical point of view. We can envision models that can explain parts of the nested clade pattern (as you have) without common descent. Walter ReMine did just this, and I’ve privately wondered about this too. For other patterns, it is much more difficult to imagine a solution.
I’d say, in your defense, that a lot of really bad arguments for evolution have been advanced. Nested clades, because it is often advanced as if there are no homoplasies. There are homoplasies, and they are predicted by evolutionary science too. We do not expect, from an evolutionary point of view, for nested clades in nature to be perfect nested clades. In this, Walter ReMine was correct. I don’t doubt that you are correct that others think that nested clades is strong evidence, but often the precise way that argument is advanced is actually in scientific error, even before an anti-common descent rebuttal arrives on the scene.
It is depressingly bad for dialogue when fallacious arguments like that are allowed to persist. There is a way that nested clades is evidence for common descent, but not in the way that arguement if often explained. I’m sorry for that absurdity in the conversation. I wish I could fix it, but I can only really manage what happens here in this little corner of the internet .
I want to give you a fair hearing here, and even get other legitimate and honest scientists to engage with you. Let’s give you a shot. My statements earlier about “ignoring” should all be transposed to caveats (with which you agree) that this only handles part of the problem.
Does that sound good?
@Winston_Ewert, as a scientist in the Church and a Christian in science, I want to publicly promise some things to you.
I will treat you fairly.
I will give ground when you are right.
I will publicly make known that of which you have convinced me.
From here, I’m going to work slowly through the specific points you’ve raised. As you have time, please fill in the details. I want you to get credit for what you do well and right here and now on this thread. Peace.
Same here. This thread will be open indefinitely though. This is a place where you can figure out what type of experiments might convince skeptics. Even if it takes months to get that analysis done, we will still be interested. Coming to agreement on the experiments ahead of time will help us understand the results when they come. And this also gives you a forum to make negative results known too, which also will build trust in your work.
I can see a few:
Incomplete sorting (which seems to be at play in the human data).
Deletion and large scale genome rearrangement.
The Birthday paradox.
Introgression and/or hybridization after speciation.
Now, there are ways to test to what extent these things are affecting your results. The primate lineages are a good place to focus, because we have the most data there, and is most salient because it deals with human origins.
Once again, there seem to be ways to test your theory on the data versus these mechanisms. However, it seems that as written that it is not specified clearly enough to do this yet (at least as a third party reviewer). Though we can, for example, start to apportion specific cases into the different classes I just mentioned based on some tests of the data.
This, also, is where the human data becomes important. I think we both agree that humans are monophyogenetic. So if your non-tree model fits better than a tree model, that is an important failed control. Without getting into the details yet, we already know that tree models fail on human diversity data. There have been several papers put out demonstrating this. We should get into the weeds on this I am sure, but that seems to indicate that common descent in the real world does not produce a tree, so your tests themselves are not demonstrating that your model is better than common descent. Where am I going wrong in that reasoning? And do you want to see some examples of what I am talking about?
Selection is not really the issue I’m referring to here. That has to be considered carefully too.
Instead, I’m referring to, for example, synonymous mutations in proteins. It seems (though I could be wrong), you’ve restricted this to gene families. If that is the case, you are glossing over mutations more likely to be neutral. That seems to be a real problem. As this is actually where the nested clade evidence is stronger (at least from my assessment).
If I am right (and I may not be), it seems this is a direct falsification of the conceptual argument being put forward. It seems that “design modules” must be defined to be non-neutral, and you may have an explanation for why they almost fit in a nested tree. However, that does not explain why more neutral mutations (it is a relative term) might fit more tightly in a nested three than design modules. That seems to be a looming problem for your proposal. I don’t think you can make your case without dealing with this head on. It seems to be a direct falsification of your proposal, unless I’m missing something here.
Thanks for the offer. I’m more interested in real dialogue with you. I want you to have the best model possible, and get credit for retracting it if its wrong. If it is wrong, maybe the next idea you come up would work. Dialogue like this, I’ve found, is a much faster way to help you out.
I’m not looking for a publication out of this, but to really help us all figure this out together. In a way, think about this forum like a “micro publication,” or a “public peer-review.” It is a public thread. If desired, I can even get a DOI for it so you can reference it.
In service of this goal @Winston_Ewert, can you make available to us the results you computed for this study? In particular, I want like to see the full dependency graph you compute, in all its gory detail. I’m sure you have them in text files somewhere. I’d like a copy of it, with a reasonable README file.
This data will not be used for “gotcha” argument silliness, but for legitimate scientists outside your camp to understand what you have done and what your data is telling us.
I also want to reiterate that even if there are some problems ultimately with your model, this still represents a major movement in a good direction for ID. We expect that some models will fail. We get credit for retracting them when they fail. Perhaps next time around, with lessons learned, a better model can be put forward. Maybe after some long hard work, you might even win.
Whatever happens with this model, @Winston_Ewert, this is a big win for you. I hope to see more more work like this come from your camp, even though it is not my camp.
(S. Joshua Swamidass)
split this topic
@Winston_Ewert, in my experience @sygarte has been an honest and fair scientist. He is convincible, and is not aggressively opposed to you. It is worth your time to engage with his questions. It also looks like @glipsnort is joining the conversation. He also has been honest too, and really should be engaged. Right now there are three legitimate scientists from secular institutions engaging your work with an open mind. Congratulations. You have our attention.
@sygarte, in this specific case can you elaborate why you are asking this question? What will this tell you? What information will it give you? I’m not sure where you are going with this (though I have one guess).
That is not surprising to me at all. The key thing is to look at the proportion of the difference to the total. Let’s just take one line as an example:
You are saying the 111,823 is large, but that is only (approximately) 1.7% of the unexplained fit (111 / 6308). That means the dependency graph only explains 1.7% more of the data’s patterns than a tree. Not very much. And, as @Winston_Ewert correctly notes, this is not even a real model of common descent.
So why are the numbers so large? Merely because he has a lot of data. Increasing the data will arbitrarily increase the absolute values of the log probability, but the relative values should remain somewhat stable.
Thanks Josh. As a matter of fact, I find the module idea very interesting. And I was pleased to see that Dr. Ewert acknowledged what I see as the main objection to the conclusions. He writes:
“…the dependency graph model has an advantage over common descent in fitting the data because it can postulate modules to explain otherwise inexplicably distributed gene families…This is why we must also take into account the parsimony or complexity of the model.”
I find that to be a very honest statement of the problem in comparing the two models. But I am philosophically not in agreement with the solution using parsimony. My general attitude toward parsimony in biology is negative, since Occam’s Razor is violated at almost every turn in biochemistry and physiology.
What I think would be fascinating would be an incorporation of the gene module approach in an explanation of convergence within the evolutionary framework.
True, however, @Winston_Ewertpenalizes more complex models. We can debate whether he did that correctly or not, but this objection is not really valid without a careful review of his penalization. I think a better question concerns the fit we would get from an undirected graph model versus a dependency graph (which is a directed graph), using the same penalization.
That is an important control to do. If it comes out wrong, it would undermine his argument significantly if the dependency graph model does not do better than the undirected graph. That, it seems, would be a speed bump for his proposal. As a matter of fact, we already know that human diversity data better fits an undirected graph than a tree. So the real question is if the dependency graph does better than an undirected graph, not just better than a tree.
It might be resolvable even if it initially fails this control. Perhaps the penalization function would need to be improved. It is, nonetheless, an important control study to run.
(side question to @Winston_Ewert, do the numbers in Table 4 include the penalization factor too?)
Firstly, I see that I need to clarify the nature of the argument I made in the paper.
If my hypothesis is correct, this predicts that a dependency graph ought to be a better fit to the biological data than a tree. This prediction is fulfilled, thus providing some level of evidence that my hypothesis was correct. My argument is not that because the dependency graph model beats the tree model that the dependency graph model is correct. Such an argument would not be valid. Instead, I’m merely arguing that this fulfills a prediction.
The challenge this leaves to common descent is explaining why this prediction worked.
I expected 1 and 4 as obvious candidates (and mentioned them in my paper).
As for 2, I do have deletions in my model. But I’m curious about how you see large scale genome rearrangement playing into this. Since I’m just looking at the presence or absence of gene families, I’d think a rearrangement wouldn’t do anything interesting there. But presumably you know something about that which I don’t.
As for 3, it seems to me that this should be taken care of by the probabilistic analysis. I assume what you are thinking here is that some genes could end up in a similar set of species and thus look a lot like a module, but by pure coincidence. But the Bayesian analysis, and in the particular the penalty for the dependency graph should prevent that happening.
What I’m surprised by is you not bringing up horizontal gene transfer. Do you not think it is a good candidate?
My thinking is that none of these mechanisms seem like good candidates to explain my successful prediction. Obviously, my intuition on this point is worth diddly squat. It has to be backed up by cold hard evidence which I don’t have (yet).
So, yes, dealing with the exact sequence (instead of just gene family) and in particular the more neutral elements of that sequence is really key. If that can’t be done my proposal fails. It remains to be seen whether a model can be developed here.
It should be emphasized, the fact that human variability deviates from the expectations of tree does not automatically mean that it will fit a dependency graph better. So its very much an open question as to what the results will look like.
But more critically, I’m not sure what prediction I would make about the results of the test. Whether or not common descent produces a tree depends on various assumption you make about the evolutionary process. I suspect that neither a tree or a dependency graph is the right model in this case.
Without going into the details of the models, here’s my understanding of the situation: given a set of N species and a set of gene families, any gene family that appears in more than one species and in less than (N-1) species can contribute to the comparison of the two models. If the gene family appears only in a subclade of the full set of species, or is missing only in a subclade, then it is consistent with both models. If its presence or absence is not consistent with a single subclade, then it is improbable under the simple tree model but still probable under the dependency model. (And the dependency model is penalized for its extra degrees of freedom.)
Is my summary accurate? (If not, you can probably ignore the rest of my comments.)
If so, then it seems to me that this kind of comparison is critically dependent on the completeness and consistency of the dataset, since missing data in more than one species appears as a signal for one of the models. Comparative genome sequence data is typically typically come from independent sequencing projects with different degrees of completeness and accuracy, so the issue is particularly severe for this test.
If it were my study, the first thing I would want to do is understand the completeness of the data. For the case of the closely related fish, for example, how many gene families are there in total? How many are missing from a single species? How consistent is this number from species to species? Is there a correlation between the number of singleton missing gene families and the number of shared missing gene families, when assessed across species? (These would be good numbers to post here, by the way.)
The second thing I would absolutely, positively do – and would insist on an author doing if I were reviewing a study like this – is look at some of the genes that are supposedly missing (in a way not consistent with common descent) and confirm that they are really are missing genes and not missing (or different annotations. What’s in the genome where they should be, based on related species that have them?
I hope that isn’t your argument. We already know this to be true from other literature, for other reasons. The only reason we would think common descent would not produce violations of a tree is if we were using a strawman version of common descent (not saying you are doing that here).
The real question, it seems, is different. You have to show that this model works better than the current best models of common descent, which right now are undirected graphs. Even then, I can give an account of why you might get a signal (and it would be exciting if you did).
I can grant you that, but that is a meaningless claim. That is not how we adjudicate models in computational biology.
Why exactly? I can produce strong evidence that human diversity does not follow a tree, even though we all agree it arises from a process of common descent. That is de facto evidence that common descent does not produce DNA that fits a tree perfectly.
I do think horizontal gene transfer is important, but I’m not sure that is the dominant process involved here.
Large scale genomic rearrangements can create correlated deletions in multiple branches of the tree. One test for this is to see if modules are correlated with synteny. I expect they are. If so, that gives a fairly straightforward reason for why the data shakes out this way.
Not necessarily. It gets into the details of how you handle pseudogenes (or what ever you want to call what looks like inactivated genes). In a proper analysis, you’d have to call each type of inactivation a different type of gene family, whether or not they actually are functional. That, it seems, will really break your analysis. Though you are welcome to prove me wrong.
As I’ve said, the human diversity data is de facto evidence that your intuition is wrong. I haven’t posted papers on this yet, but I will when you are ready to take a look.
Once again, that is honesty. You are earning trust every time you do that.
True. We do, however, know that an undirected graph does better than a tree. The finding that a middle ground model (a directed graph) fits better than a tree is no surprise. That is what everyone should have predicted. The real question is if your middle ground model does better than the state of the art, which is NOT a tree.
It seems to me that this conversation has served its purpose.
I’ve acknowledged the limitations of what I’ve done so far. You’ve pointed out the limitations and the sorts of issues that need to be dealt with. They largely correspond to what I’d already thought would be the concerns with a few surprises. Now I need to go ponder these things for a while.
Just to help you out, here is on example of a study that shows human diversity (which obviously results from common descent) does not show a tree pattern (https://doi.org/10.1534/genetics.115.182626). I can produce more examples when you are ready to engage on that. Alan Templeton, a leading population geneticist, argues that not a single human diversity dataset he has examined survives a statistical test to see if it is a tree (private communication).
As I stated earlier, human variation data serves as an excellent negative control. If you solve the technical problems in this approach, the signal should disappear when you look at human diversity data. Gnomad (http://gnomad.broadinstitute.org/) should provide more than enough data to test this. When that analysis is done, whatever the results, please let us know. Even if it does not work out, you get credit for being upfront about the negative results.
Thank you too. It has been a pleasure having you here. Whenever you would like to re-open the conversation, let us know. I will reopen it for you. It seems like many of us are looking forward to seeing how this develops.