Uncommon or Common Descent?

For clarity, here is my definition of homoplasies.

A feature that shows up in multiple descendants, but not in the ancestor.

In a design scenario, similarity of features occur all the time across independent descendant that did not occur in their ancestors, i.e. many kinds of vehicles have internal combustion engines, and before there was a greater diversity of propulsion mechanisms. This is what dependency injection is, and creates a dependency graph.

However, in a stochastic variation common descent (SCD) scenario, it seems extremely unlikely for the same feature to occur independently. We could test this computationally with genetic programming, and see how often the same function evolves from different ancestors.

Here is a mathematical formulation of the difference.

  1. A function F is represented by some DNA sequence of length K, or set of proteins of cardinality K.
  2. Independent occurrences means from different parents. I.e. if a set of children from the same parent all have F, this only counts as 1 occurrence.
  3. M is the number of independent occurrences of F in ancestor generation G0.
  4. N is the number of independent occurrences of F in generation G1, who are direct descendants of generation G0.

Then, ID predicts N-M > 1 and SCD predicts N-M <= 1 for sufficiently large K. The purpose of K is to account for the easy occurrence of small homoplaises by chance.

N-M > 1 generates a dependency graph, and N-M <= 1 will be a tree.

So, given a dependency graph is the best fit for data, then N-M > 1, and ID is a better explanation than SCD.

My overall take away from the discussion on Dr. Ewert’s paper is that no one believes a tree is a good fit for the data. Thus, everyone agrees with Dr. Ewert’s main result, that a graph with some degree of dependency structure is a much better fit. Hence, per my above reasoning, ID is the best explanation for what we see.