I wrote this several years ago, but I think it’s a useful introduction to molecular data, even though the method is not one anybody actually uses in phylogenetics.
Here is a set of DNA sequences. They come from two mitochondrial genes, ND4 and ND5. If you put them together, they total 694 nucleotides. But most of those nucleotides either are identical among all the species here, or they differ in only one species. Those are uninformative about relationships, so I have removed them, leaving 76 nucleotides that make some claim. I’ll let you look at them for a while.
[ 10 20 30 40 50]
[ . . . . .]
+ 1 2++ 3 11 +4 3 ++ 52+1 2615+4 14+ 3 3+6+
gibbon ACCGCCCCCA TCCCCTCCCT CAAGTCCTAT CCAATCTACT GTACTTTGCC
orangutan ACCACTCCCA CCCTTCCTCC TAAGACTCAC ACAACTCGCC ACACCTCGTC
human GTCATCATCC TTCTTTTTTT AGGAATTTCC TCTCTCCGTC ACGCTCTACT
chimpanzee ATTACCATTC CTTTTTTCCC CGGATTCTCC CTTCTTCATT ATGTCTCATT
gorilla GTTGTTATTA CCTCCCTTTC AAGAACCCCT TTCACCTATC GCGTCCCACT
[ 60 70 ]
[ . . ]
+++ +++1 + +? 2 + +++
gibbon CCTACAGCCC AGCCAAACGA CACTAA
orangutan CCTACCGCCT AGCCATTTCA CACTAA
human CCCCTTATTT TCTTGTCCGG TGACCG
chimpanzee TTCCTCATTT TCTTACTCAG TGACCG
gorilla TTCCTTATTC TTTCGCCTAG TGATTA
I’ve marked with a plus sign all those sites at which gibbon and orangutan match each other, and the three African apes (including humans) have a different base but match each other. These sites all support a relationship among the African apes, exclusive of gibbon and orangutan. You will note there are quite a lot of them, 24 to be exact. The sites I have marked with numbers from 1-6 contradict this relationship. (Sites without numbers don’t have anything to say about this particular question.) We expect a certain amount of this because sometimes the same mutation will happen twice in different lineages; we call that homoplasy. However you will note that there are fewer of these sites, only 22 of them, and more importantly they contradict each other. Each number stands for a different hypothesis of relationships; for example, number one is for sites that support a relationship betwen gibbons and gorillas, and number two is for sites that support a relationship between orangutans and gorillas (all exclusive of the rest). One and two can’t be true at the same time. So we have to consider each competing hypothesis separately. If you do that it comes out this way:
hypothesis sites supporting
African apes (+) 24
gibbon+gorilla (1) 6
orangutan+gorilla (2) 4
gibbon+human (3) 4
gibbon+chimp (4) 3
orangutan+human (5) 2
orangutan+chimp (6) 2
I think we can see that the African ape hypothesis is way out front, and the others can be attributed to random homoplasy. This result would be very difficult to explain by chance.
Let’s try a statistical test just to be sure. Let’s suppose, as our null hypothesis, that the sequences are randomized with respect to phylogeny (perhaps because there is no phylogeny) and that apparent support for African apes is merely a chance fluctuation. And let’s try a chi-square test. Here it is:
These are all the possible hypotheses of relationship, and the observed number of sites supporting them. Expected values would be equal, or the sum/7. There are 6 degrees of freedom, and the sum of squares is 57.8. P, or the probability of this amount of asymmetry in the distribution arising by chance, is very low. When I tried it in Excel, I got P=1.25*10^-10, or 0.000000000125. Might as well call that zero, I think.
hypothesis obs. exp.
African apes (+) 24 6.43
gibbon+gorilla (1) 6 6.43
orangutan+gorilla (2) 4 6.43
gibbon+human (3) 4 6.43
gibbon+chimp (4) 3 6.43
orangutan+human (5) 2 6.43
orangutan+chimp (6) 2 6.43
sum 45 45
The difference is significant. Now the question is how you account for it. I account for it by supposing that the null hypothesis is just plain wrong, and that there is a phylogeny, and that the phylogeny involves the African apes, including Homo, being related by a common ancestor more recent than their common ancestor with orangutans or gibbons. How about you?
By itself, this is pretty good evidence for the African ape connection. But if I did this little exercise with any other gene I would get the same result too. (If you don’t believe me I would be glad to do that.) Why? I say it’s because all the genes evolved on the same tree, the true tree of evolutionary relationships. That’s the multiple nested hierarchy for you.
So what’s your alternative explanation for all this? You say…what? It’s because of a necessary similarity between similar organisms? But out of these 76 sites with informative differences, only 18 involve differences that change the amino acid composition of the protein; the rest can have no effect on phenotype. Further, many of those amino acid changes are to similar amino acids that have no real effect on protein function. In fact, ND4 and ND5 do exactly the same thing in all organisms. These nested similarities have nothing to do with function, so similar design is not a credible explanation.
God did it that way because he felt like it? Fine, but this explains any possible result. It’s not science. We have to ask why god just happened to feel like doing it in a way that matches the unique expectations of common descent.