Typically, genomes are enormous data sets, even for modern computers to deal with. Mechanisms by which they change over evolutionary timescales also do very little to ensure size constancy, let alone global alignment. Consequently, genomes cannot be directly compared codon by codon (assuming they have been sequenced fully like that to begin with), but must rather be sequenced in chunks, cut and edited to align large sections. You can think of it as comparing two versions of, say, a text document on a computer. Text editors, too, do very little to prevent a user from deleting, duplicating, or inserting parts of basically arbitrary lengths between two versions. So if a comparison is to be made at all, one cannot commence it character by character, but must instead first identify sections that show overall similarity, align those, and then look at differences.
Identity and difference is therefore only quantifiable at first on a per-chunk basis, with statements about the whole pair of genomes only possible with some statistical analysis. Some compared segments will be shorter than others, some will be more similar, others less similar, and such fluctuations can be quantified by what’s mathematically known as moments of distribution.
Of course, it is not enough to have two genomes to assert without knowing of any further data that they therefore share a common ancestor, or how many generations in the past of each lineage that ancestor would have roughly lived at. Two genomes are insufficient to construct family trees either. My claim is not that any such first analysis would reveal either result. However, knowing how many different loci there are for a mutation to occur, and what it takes for them to spread in a population if they do, and by having access to many, many genomes, we can begin mapping out which changes would have occurred in which lineages and in what order. How, you ask? Well, there is this thing called a likelihood function. It is a means by which confidence can be assigned to a set of competing hypotheses. It tells you how likely it is that one set of assumptions would be consistent with the data set. Generally speaking, and with few exceptions, that the exact same change in the exact same location should occur in several independent lineages is (orders of magnitude) less likely than that the opposite change occurred once in the one lineage that appears to be the exception. Therefore sequence quirks common to more species place these species in a larger category, whereas quirks common to smaller subsets unite them in correspondingly smaller ones. If all organisms of the less represented quirk happen to also share the defining quirk of that larger set, then, without assuming artificially that they must not belong despite meeting the defining criterion, their smaller set is left a subset of the larger.
This analysis naturally produces a nested hierarchy of sets within sets within sets with ever more specific criteria for the finest of subdivisions. What my claim was, is that this branching structure not only occurs for basically any gene one might try and compare over the whole collection of available genomes, but also happens to be almost completely the same structure for all the other genes one subjects to this analysis, with discrepancies occurring at a rate below the confidence the likelihood estimation would have warranted to begin with. That low rate of occurrence, a “signal” that does not rise above what one might rightly call “noise” is what I call “a statistically insignificant fraction of cases”.