This is your mistake, you think the phylogenetic trees inferred from the genetic data should correlate with phylogenetic trees inferred from morphological data. But there is no functional reason that should be so other than as an artifact of the shared genealogical history of the different sets of data. Even for genes that are actually involved in morphological development. Though we can simply exclude such genes from analysis in case you donât believe this.
But before we even get to that, try to think for a moment about why a phylogenetic tree inferred from one gene, should match the phylogenetic tree inferred from another gene. Or why it should be the case that a tree inferred from morphological data should mach a tree inferred from genetic sequences?
You first have to consider how a phylogenetic algorithm actually works. A historically much used algorithm called Maximum Parsimony basically works like this: What hypothesis of common descent explains the data we see (the gene sequences from different species) using the smallest number of character state changes? A character state change for genetic data is a differences in sequence between similar genes in different species.
So if gene sequence (A) differs from gene sequence (B) by having a G nucleotide instead of a T nucleotide in some location, then that difference is explained by a character state change, as in a mutation. If you only have two genes, one from each species, then of course you donât know whether the âoriginalâ nucleotide was T which then mutated to G, or whether it was T that then mutated to G. But if you have three species, and two of them have G and only one has T, then the simplest (most parsimonious, invoking the least amount of evolution, as in the fewest character state changes) explanation is that the original was G, and then just one of them mutated to T. As opposed to having the original be a T, and then having two independent T->G mutations.
So the hypothesis is that they are different because one of them mutated. So the most parsimonious tree is the one that accounts for the different sequences with the fewest total number of mutations. Hence, Maximum Parsimony.
There are many other phylogenetic algorithms in use today. Another type of algorithm is called maximum likelihood. This algorithm uses something called a substitution model to evaluate how âlikelyâ a particular tree is compared to another. A substitution model is basically a hypothesis that assumes that some mutations are more likely than others. So this maximum likelihood algorithm compares the likelihood of different trees and scores them according to which is the most likely combination of mutations that explains the data we have. If one tree implies that a lot of lower-probability mutations must have happened, and another tree explains the same data with higher-probability mutations, the most likely tree is chosen. Hence, maximum likelihood.
With this understanding in hand, we can proceed to consider the fact that for any given gene sequence, there are an incredible number of different ways to change the gene and achieve the same result/function. Countless gene sequences from different species, despite the fact that they are different in sequence, are known to be functionally equivalent, and can often times even be exchanged with little to no functional significance. Just to pick one examples of this, for a long time cow insulin was used as a substitute for human insulin for human diabetics. And it worked, and the people who used it didnât turn into cows. The cow insulin protein, despite being different from the human insulin protein, still worked in humans without turning them into cows. So the function was preserved, and it was independent of morphology.
Now the question is, why should this pattern repeat itself for different genes, and morphology too, if evolution didnât actually take place? If they donât actually share common descent? What functional reason would there be for having similar tree patterns repeat in other genes shared between the same species too, or in morphological data?
Even If you doubt the fact that there are many different ways to achieve the same functional result, try to consider that we can pick genetic sequences which are known to be completely independent of morphology. To pick an example we can use the enzyme in saliva called salivary amylase.
Itâs function is to degrade starch in food you eat.
This enzyme is not involved in making your morphology how it is (it doesnât cause you to be a member of Homo sapiens), yet it is found in countless animals.
The gene sequence of this enzyme does not cause you to have a spine, nor to have four limbs, nor to have a bony skeleton, nor mammary glands, nor to have hair instead of feathers, nor to have five digits on each limb, nor the patterns of the arrangements of bones in your four limbs, nor to have bilateral symmetry, nor to embryologically develop ass-first (be a deuterostome) or to be multicellular, or for your cells to be eukaryotes.
It does not cause you to be a hominid, or a great ape, or a primate, or whatever other level of classification you can think of. All it does is degrade starch you eat (break up long chains of carbohydrate molecules into glucose monomers). So if you eat a potato, or pasta, or an apple or what have you, the enzyme simply degrades starch into glucose so you can digest it.
You have this enzyme, chimps have this enzyme, gorillas have this enzyme, pretty much all mammals have this enzyme afaik. But the horse version, or the pig version, or the cow or dog or fish version, works just as well as the human version. They each are just as capable of breaking down starch.
So this gene sequence is completely independent from morphology, it is not involved in anything that we could use to classify an organism as belonging to a particular clade using their morphology. Yet when we use a phylogenetic algorithm to infer a tree from the gene sequences from many different species using this amylase gene, we get one that overwhelmingly agrees with the morphological tree.
Even more amazingly, we can just pick other enzymes too also shared among countless species which are also independent of morphology, and independent of each other. Core metabolic enzymes in a pathway responsible for something like RNA or DNA nucleotide biosynthesis. Also completely independent from morphology, and not related to breaking down starch into glucose obviously. Or we can pick genes for enzymes responsible for replicating strands of DNA (DNA polymerases), or for another digestive enzyme responsible for breaking down proteins into amino acids in food. Genes that are even shared among plants, animals, fungi, indeed all eukaryotes, or all known cellular life.
And theyâre independent of each other. They are not somehow mutually constraining each otherâs gene sequences. Why should it matter with respect to the function of the gene, what kind of phylogenetic tree researchers are able to infer using some particular algorithm of inference? Obviously such a constraint is not what is causing gene sequences to be the particular way they are.
And why should the gene sequences from different genes influence each other in a way that constrains the topology of trees that human researchers infer from them? That is clearly not a constraint that actually operates on genes. The gene sequence of your salivary amylase gene has no bearing on the gene sequence of your DNA polymerase gene, and in so far as some sort of epistatic interaction might actually exist between them, there is no reason to expect this interaction to be of such a nature that it just so happens to constrain them to yield similar trees if subjected to a phylogenetic inference. That would simply not make sense.
So for these gene sequences too, we can submit them to the same algorithm (basically whichever one you choose), and still get pretty much the same tree. Why would that be the case? Remember, theyâre all known to be completely independent of morphology, and independent of each other, certainly independent in the sense that their sequences are not constrained by what kind of tree a human systematist can infer from them.
The only sensible constraint that would operate on them are those that preserve function. This is known because we can splice the human(or cow, or pig, or goat, or mouse, or bacterial) variants into fish, and they work just fine and donât turn the fish into humans or anything else (they remain unaltered), or into fungi, and they remain fungi, or into bacteria, and they donât suddenly grow a spine and four limbs, or anything like that. Yet we still get a similar tree out when we do the inference from independent sets of data.
The most obvious explanation for the fact that the genes exhibit the same general patterns if subjected to a phylogenetic inference, is because they really do share the same genealogical relationship. The pattern repeats in different sets of data because the different sets of data came to be the way they are through their common evolutionary history. What constrains them to yield similar patterns is the shared constraint of their common descent.
No other explanation makes logical sense. To say that a designer designed it that way, while that is not logically impossible, is equivalent to saying that the designer has created starlight coming to us from distant galaxies âwith the appearance of ageâ, or put fossils in the ground âto test our faithâ. You can believe that if you want, but it has rendered your designer deliberately deceptive. The designer would have had to create the pattern expected from common descent, but not expected for any functional reason, simply because⌠whatever reason you can invent in your head.