Without going into the details of the models, here’s my understanding of the situation: given a set of N species and a set of gene families, any gene family that appears in more than one species and in less than (N-1) species can contribute to the comparison of the two models. If the gene family appears only in a subclade of the full set of species, or is missing only in a subclade, then it is consistent with both models. If its presence or absence is not consistent with a single subclade, then it is improbable under the simple tree model but still probable under the dependency model. (And the dependency model is penalized for its extra degrees of freedom.)
Is my summary accurate? (If not, you can probably ignore the rest of my comments.)
If so, then it seems to me that this kind of comparison is critically dependent on the completeness and consistency of the dataset, since missing data in more than one species appears as a signal for one of the models. Comparative genome sequence data is typically typically come from independent sequencing projects with different degrees of completeness and accuracy, so the issue is particularly severe for this test.
If it were my study, the first thing I would want to do is understand the completeness of the data. For the case of the closely related fish, for example, how many gene families are there in total? How many are missing from a single species? How consistent is this number from species to species? Is there a correlation between the number of singleton missing gene families and the number of shared missing gene families, when assessed across species? (These would be good numbers to post here, by the way.)
The second thing I would absolutely, positively do – and would insist on an author doing if I were reviewing a study like this – is look at some of the genes that are supposedly missing (in a way not consistent with common descent) and confirm that they are really are missing genes and not missing (or different annotations. What’s in the genome where they should be, based on related species that have them?
Steve Schaffner