I hope that isn’t your argument. We already know this to be true from other literature, for other reasons. The only reason we would think common descent would not produce violations of a tree is if we were using a strawman version of common descent (not saying you are doing that here).
The real question, it seems, is different. You have to show that this model works better than the current best models of common descent, which right now are undirected graphs. Even then, I can give an account of why you might get a signal (and it would be exciting if you did).
I can grant you that, but that is a meaningless claim. That is not how we adjudicate models in computational biology.
Why exactly? I can produce strong evidence that human diversity does not follow a tree, even though we all agree it arises from a process of common descent. That is de facto evidence that common descent does not produce DNA that fits a tree perfectly.
I do think horizontal gene transfer is important, but I’m not sure that is the dominant process involved here.
Large scale genomic rearrangements can create correlated deletions in multiple branches of the tree. One test for this is to see if modules are correlated with synteny. I expect they are. If so, that gives a fairly straightforward reason for why the data shakes out this way.
Not necessarily. It gets into the details of how you handle pseudogenes (or what ever you want to call what looks like inactivated genes). In a proper analysis, you’d have to call each type of inactivation a different type of gene family, whether or not they actually are functional. That, it seems, will really break your analysis. Though you are welcome to prove me wrong.
As I’ve said, the human diversity data is de facto evidence that your intuition is wrong. I haven’t posted papers on this yet, but I will when you are ready to take a look.
Once again, that is honesty. You are earning trust every time you do that.
True. We do, however, know that an undirected graph does better than a tree. The finding that a middle ground model (a directed graph) fits better than a tree is no surprise. That is what everyone should have predicted. The real question is if your middle ground model does better than the state of the art, which is NOT a tree.
It seems to me that this conversation has served its purpose.
I’ve acknowledged the limitations of what I’ve done so far. You’ve pointed out the limitations and the sorts of issues that need to be dealt with. They largely correspond to what I’d already thought would be the concerns with a few surprises. Now I need to go ponder these things for a while.
Just to help you out, here is on example of a study that shows human diversity (which obviously results from common descent) does not show a tree pattern (https://doi.org/10.1534/genetics.115.182626). I can produce more examples when you are ready to engage on that. Alan Templeton, a leading population geneticist, argues that not a single human diversity dataset he has examined survives a statistical test to see if it is a tree (private communication).
As I stated earlier, human variation data serves as an excellent negative control. If you solve the technical problems in this approach, the signal should disappear when you look at human diversity data. Gnomad (http://gnomad.broadinstitute.org/) should provide more than enough data to test this. When that analysis is done, whatever the results, please let us know. Even if it does not work out, you get credit for being upfront about the negative results.
Thank you too. It has been a pleasure having you here. Whenever you would like to re-open the conversation, let us know. I will reopen it for you. It seems like many of us are looking forward to seeing how this develops.