Some molecular evidence for human evolution

John_Harshman · October 9, 2019, 5:20pm

I wrote this several years ago, but I think it’s a useful introduction to molecular data, even though the method is not one anybody actually uses in phylogenetics.

Here is a set of DNA sequences. They come from two mitochondrial genes, ND4 and ND5. If you put them together, they total 694 nucleotides. But most of those nucleotides either are identical among all the species here, or they differ in only one species. Those are uninformative about relationships, so I have removed them, leaving 76 nucleotides that make some claim. I’ll let you look at them for a while.

[                        10         20         30         40         50]
[                        .          .          .          .          .]
                 + 1 2++   3  11 +4 3   ++  52+1     2615+4 14+ 3 3+6+
gibbon          ACCGCCCCCA TCCCCTCCCT CAAGTCCTAT CCAATCTACT GTACTTTGCC
orangutan       ACCACTCCCA CCCTTCCTCC TAAGACTCAC ACAACTCGCC ACACCTCGTC
human           GTCATCATCC TTCTTTTTTT AGGAATTTCC TCTCTCCGTC ACGCTCTACT
chimpanzee      ATTACCATTC CTTTTTTCCC CGGATTCTCC CTTCTTCATT ATGTCTCATT
gorilla         GTTGTTATTA CCTCCCTTTC AAGAACCCCT TTCACCTATC GCGTCCCACT
[                        60         70     ]
[                        .          .      ]
                  +++ +++1 + +?   2 + +++
gibbon          CCTACAGCCC AGCCAAACGA CACTAA
orangutan       CCTACCGCCT AGCCATTTCA CACTAA
human           CCCCTTATTT TCTTGTCCGG TGACCG
chimpanzee      TTCCTCATTT TCTTACTCAG TGACCG
gorilla         TTCCTTATTC TTTCGCCTAG TGATTA

I’ve marked with a plus sign all those sites at which gibbon and orangutan match each other, and the three African apes (including humans) have a different base but match each other. These sites all support a relationship among the African apes, exclusive of gibbon and orangutan. You will note there are quite a lot of them, 24 to be exact. The sites I have marked with numbers from 1-6 contradict this relationship. (Sites without numbers don’t have anything to say about this particular question.) We expect a certain amount of this because sometimes the same mutation will happen twice in different lineages; we call that homoplasy. However you will note that there are fewer of these sites, only 22 of them, and more importantly they contradict each other. Each number stands for a different hypothesis of relationships; for example, number one is for sites that support a relationship betwen gibbons and gorillas, and number two is for sites that support a relationship between orangutans and gorillas (all exclusive of the rest). One and two can’t be true at the same time. So we have to consider each competing hypothesis separately. If you do that it comes out this way:

hypothesis            sites supporting
African apes (+)      24
gibbon+gorilla (1)     6
orangutan+gorilla (2)  4
gibbon+human (3)       4
gibbon+chimp (4)       3
orangutan+human (5)    2
orangutan+chimp (6)    2

I think we can see that the African ape hypothesis is way out front, and the others can be attributed to random homoplasy. This result would be very difficult to explain by chance.

Let’s try a statistical test just to be sure. Let’s suppose, as our null hypothesis, that the sequences are randomized with respect to phylogeny (perhaps because there is no phylogeny) and that apparent support for African apes is merely a chance fluctuation. And let’s try a chi-square test. Here it is:

These are all the possible hypotheses of relationship, and the observed number of sites supporting them. Expected values would be equal, or the sum/7. There are 6 degrees of freedom, and the sum of squares is 57.8. P, or the probability of this amount of asymmetry in the distribution arising by chance, is very low. When I tried it in Excel, I got P=1.25*10^-10, or 0.000000000125. Might as well call that zero, I think.

hypothesis            obs.   exp.
African apes (+)      24     6.43
gibbon+gorilla (1)     6     6.43
orangutan+gorilla (2)  4     6.43
gibbon+human (3)       4     6.43
gibbon+chimp (4)       3     6.43
orangutan+human (5)    2     6.43
orangutan+chimp (6)    2     6.43
sum                    45    45

The difference is significant. Now the question is how you account for it. I account for it by supposing that the null hypothesis is just plain wrong, and that there is a phylogeny, and that the phylogeny involves the African apes, including Homo, being related by a common ancestor more recent than their common ancestor with orangutans or gibbons. How about you?

By itself, this is pretty good evidence for the African ape connection. But if I did this little exercise with any other gene I would get the same result too. (If you don’t believe me I would be glad to do that.) Why? I say it’s because all the genes evolved on the same tree, the true tree of evolutionary relationships. That’s the multiple nested hierarchy for you.

So what’s your alternative explanation for all this? You say…what? It’s because of a necessary similarity between similar organisms? But out of these 76 sites with informative differences, only 18 involve differences that change the amino acid composition of the protein; the rest can have no effect on phenotype. Further, many of those amino acid changes are to similar amino acids that have no real effect on protein function. In fact, ND4 and ND5 do exactly the same thing in all organisms. These nested similarities have nothing to do with function, so similar design is not a credible explanation.

God did it that way because he felt like it? Fine, but this explains any possible result. It’s not science. We have to ask why god just happened to feel like doing it in a way that matches the unique expectations of common descent.

T_aquaticus · October 9, 2019, 5:33pm

[ 10 20 30 40 50]
[ . . . . .]
           + 1 2++ 3 11 +4 3 ++ 52+1 2615+4 14+ 3 3+6+
gibbon     ACCGCCCCCA TCCCCTCCCT CAAGTCCTAT CCAATCTACT GTACTTTGCC
orangutan  ACCACTCCCA CCCTTCCTCC TAAGACTCAC ACAACTCGCC ACACCTCGTC
human      GTCATCATCC TTCTTTTTTT AGGAATTTCC TCTCTCCGTC ACGCTCTACT
chimpanzee ATTACCATTC CTTTTTTCCC CGGATTCTCC CTTCTTCATT ATGTCTCATT
gorilla    GTTGTTATTA CCTCCCTTTC AAGAACCCCT TTCACCTATC GCGTCCCACT
[ 60 70 ]
[ . . ]
           +++ +++1 + +? 2 + +++
gibbon     CCTACAGCCC AGCCAAACGA CACTAA
orangutan  CCTACCGCCT AGCCATTTCA CACTAA
human      CCCCTTATTT TCTTGTCCGG TGACCG
chimpanzee TTCCTCATTT TCTTACTCAG TGACCG
gorilla    TTCCTTATTC TTTCGCCTAG TGATTA

I aligned the sequences manually, but not sure what to do with footers/headers. You can use “code” in brackets for sections with spaces and tabs.

colewd · October 9, 2019, 5:38pm

Design is the right negative control and the starting point could be design or common descent. You have successfully isolated neutral or nearly neutral mutations are getting fixed in the population.

John_Harshman · October 9, 2019, 5:42pm

Bill, I do not know what you said there.

John_Harshman · October 9, 2019, 5:46pm

How did you get that into an equally spaced font? How can I reproduce that in the post?

colewd · October 9, 2019, 5:49pm

Let me think about it and re phrase.

colewd · October 9, 2019, 6:03pm

Let me try this. How would you argue against design plus neutral mutations over time?

Rumraket · October 9, 2019, 6:21pm

What, exactly, is designed, and what is due to neutral mutation? Be specific.

We are asking: Why does the data look like this?
One explanation proposed is that the data evolved through common ancestors. That’s how these sequences came to be this way, through splitting lines of descent and accumulating changes along the way.

You come and say, it’s due to “design plus neutral mutations”. Okay, but why does that yield those five particular sequences? Try to explain how those five particular sequences come to exist by “design plus neutral mutations”.

What does the designer start with designing, and then what happens to it by neutral mutations? Please explain the process for each sequence. The designer designs what sequence for each species, and then neutral mutations does what to it?

colewd · October 9, 2019, 6:24pm

It’s a negative control for common descent. The detail is not important.

Rumraket · October 9, 2019, 6:43pm

Bill, a negative control is, essentially, a measurement of the absense of sample material(how much signal you get just by background). So you do a negative control to see how much signal you should expect to measure without your sample.

If you want to measure how much zinc there is in oatmeal (for example), you measure a sample without any oatmeal in it but treat it in the same way you would a real sample with oatmeal. That way you get a measurement of the background level of noise, which you can then subtract from the measurements you get from real samples.

What you said then makes no coherent logical sense in this context. You seem to be just blurting out some phrase you think sounds clever.

Of course the detail is important. The details are the whole point here. What we are trying to explain is the detail. Why are the details the way they are? That’s the whole point of the exercise here. You can’t just declare them unimportant.

Common descent explains why the sequences are the way they are. They incrementally changed through multiple splitting events from what was originally a single sequence. That’s how they came to be different, and why they are different in the exact ways that they are.

So now you come and say “it’s due to design and neutral mutation”. I ask why that would produce those sequences? (the very thing common descent explains), and now you say it’s not important. How is that not an admission that you don’t know how to explain it otherwise?

You seem to have just tossed out some phrase you have internalized, like “it’s a negative control”. What does that mean? In what sense?

colewd · October 9, 2019, 7:01pm

The detail is great if we have it. In science we don’t often have it. We have a mechanistic explanation. An alternative explanation is a negative control. John made the claim that Design alone does not explain the data and I agree with him. Now his job is to eliminate Design plus neutral mutations.

Rumraket · October 9, 2019, 7:08pm

In this case we do. The DNA sequences, they really do exist. So your excuse here doesn’t work.

That’s what we are trying to do by getting you to first DESCRIBE the “design plus neutral mutations” explanation(giving your hypothesis a title is not itself an explanation). So, it’s your job to explain the details that we really do have, using “design plus neutral mutations”(that would be your title).

What we can then do is compare it to the common descent explanation(which would actually be a tree, with changes, aka mutations along the branches), and proceed to eliminate the less parsimonious or less likely explanation. That’s the whole point here Bill.

So, in order for us to do the elimination stuff, we need your explanation. We need to see the actual contents of the “design plus neutral mutations” explanation.

T_aquaticus · October 9, 2019, 7:08pm

I have heard of null hypotheses being alternative explanations, but I have never heard a negative control called that. Negative controls are part of a experimental methodology, not an explanation.

You are assuming that Design is falsifiable. It isn’t. As we have seen throughout these discussions, Design can explain anything, and in doing so it explains nothing.

colewd · October 9, 2019, 7:13pm

Explain why this request is not arbitrary.

colewd · October 9, 2019, 7:19pm

They are testing against an alternative cause.

It is an explanation for the universe yes but science job is to find interim explanations. The interim explanation falsifies design as the direct explanation of what we are observing.

Rumraket · October 9, 2019, 7:19pm

I think we’re done here. Thanks for the concession Bill.

colewd · October 9, 2019, 7:21pm

Your arguments are ending up being supported by your assertions of what science is or is not. What substance is there to this type of reasoning?

John_Harshman · October 9, 2019, 7:38pm

The post isn’t about design, which is a sufficiently ambiguous concept that it can encompass any data. It’s about common descent vs. separate creation. The data support common descent and argue against separate creation. That’s all.

I made no such claim. In context, the claim is clearly about separate creation.

colewd · October 9, 2019, 7:43pm

What about separate creation plus neutral mutations?

T_aquaticus · October 9, 2019, 7:43pm

Please tell us what that test is and what the cause is.

As far as I can tell (@John_Harshman please correct me if I am wrong), the test is a statistical test expressed as the probability of a random distribution of mutations producing the observed tree structure in the data. From the original post:

This is standard science, going clear back to Fisher determining the probability of a woman correctly guessing if the milk was added before the tea.

What in the world is an “interim explanation”?

Topic		Replies	Views
Beyond Reasonable Doubt? A Test for Common Ancestry Conversation	96	6800	January 31, 2021
Human Evolution Discussion with Ahmed Conversation	239	5372	August 2, 2021
Introducing Babacar Conversation Introduction	40	3290	June 2, 2020
What Line of Evidence is Strongest for Evolution? Conversation Science	166	3254	January 31, 2021
Phylogeny - Help me see what you see Conversation Science	128	3791	February 6, 2021

Some molecular evidence for human evolution

Related topics