swamidass
(S. Joshua Swamidass)
July 30, 2018, 1:07am
86
@colewd and @EricMH thanks for your patience. I’m sorry I was delayed.
EricMH:
Were dealing with exponents here right? This is important to nail down going forward when we are comparing probabilities. Is it 111,111 or 111,111 bits. I think it has to be bits because of the massive improbability of the null being true.
I think this is correct. Here’s an example.
The data is a string of 16 1s: 1111111111111111
The first model is 1100000000000000, which only fits 1/8 of the data.
The second model is 1111111100000000, which fits 1/2 of the data.
Our final model is 111111111111100, which fits 7/8 of the data.
The -log Bayes factor for #1 /#2 is 2, and #2 explains 2^2=4 times more of the data than #1 .
The -log Bayes factor for #2 /#3 is ~ 0.8, and #3 explains 2^0.8=7/4 times more of the data than #2 .
So, this means that if Dr. Ewert’s analysis shows the -log Bayes factor for tree/dependency is 10,000, then that means the dependency graph explains the data 2^10,000 times better than the tree graph. That means P(Data|Tree)/P(Data|Dep.) = 2^-10,000. 1.7% difference in explanation, on the other hand, means that P(Data|Tree)/P(Data|Dep.) = 0.98. Big difference.
Unless I’m misunderstanding all this, it looks like Dr. Swamidass is incorrect saying that Dr. Ewert’s results show the dependency graph only explains 1.7% more than the tree.
We can also think about this in terms of edit distance. In order to turn the best fitting tree into the best fitting dependency graph, then 2^10000 edits the size of the tree must be made to make the transition. More than a factor of 2 already seems implausible, let alone 2^10000. So, I would say this makes common descent a non workable solution, regardless of a few homoplaises here and there.
It is correct that these are log units, but the interpretation is incorrect. We are supposed to compute percentages in log space, not as you are doing here. One reason is that even small errors in the dataset (especially if the dataset is large) will artificially increase the error (the magnitude of the Bayes factor) by a linear factor. Just increasing the size of a noisy database will increase the error. There is a lot more here, but at the end I’ll show how @Winston_Ewert responded on the main thread.
colewd:
Even in the biological gene database least favorable to the dependency graph, HomoloGene, the log Bayes factor is in favor of the dependency graph by over 10,000 bits. Recall that 6.6 bits is commonly considered decisive.
I’m unfamiliar with that rule. @Winston_Ewert provides no reference. A google search for “6.6 bits” reveals nothing relevant. He might have mispoken there.
Any how, the general question came up in the main thread…
That is not surprising to me at all. The key thing is to look at the proportion of the difference to the total. Let’s just take one line as an example:
Dataset
Dependency Graph
Tree
Difference
UniRef-50
6,193,801
6,308,988
111,823
You are saying the 111,823 is large, but that is only (approximately) 1.7% of the unexplained fit (111 / 6308). That means the dependency graph only explains 1.7% more of the data’s patterns than a tree. Not very much. And, as @Winston_Ewert correctly notes, this is not even a real model of common descent.
So why are the numbers so large? Merely because he has a lot of data. Increasing the data will arbitrarily increase the absolute values of the log probability, but the relative values should remain somewhat stable.
It appears that @Winston_Ewert agrees with me here.
1 Like