That is not surprising to me at all. The key thing is to look at the proportion of the difference to the total. Let’s just take one line as an example:
Dataset | Dependency Graph | Tree | Difference |
---|---|---|---|
UniRef-50 | 6,193,801 | 6,308,988 | 111,823 |
You are saying the 111,823 is large, but that is only (approximately) 1.7% of the unexplained fit (111 / 6308). That means the dependency graph only explains 1.7% more of the data’s patterns than a tree. Not very much. And, as @Winston_Ewert correctly notes, this is not even a real model of common descent.
So why are the numbers so large? Merely because he has a lot of data. Increasing the data will arbitrarily increase the absolute values of the log probability, but the relative values should remain somewhat stable.