Human and Chimp Similarity (Mind the Controls)

Responding to this article quoted to me by @J.E.S :

I once wrote this elsewhere: Human and Chimp Similarity - Faith & Science Conversation - The BioLogos Forum

What about Thompkins work?

Very familiar with his work. Tomkins seems like a nice guy, but his study is deeply flawed. The data is freely available and I reproduced his work for fun. I found he made some large errors, by excluding some basic controls. Including the controls, we find the human/chimp reads similarity is about 98%. For comparison, the similarity of mice/rats is about 80%.

Mice and rats are the same “kind.” Seems that humans and chimps are too.

I could show you his error if you are interested.

I would be interested in seeing the error…

How exactly do we figure out the percentage similarity between organisms’ DNA???

There are many ways to do it. Some are better than others. The exact number isn’t important. The key thing is to lay the number alongside controls . It is the relationship between the computed similarity and the controls that allows us to interpret it.

Tompkins proposal is reasonable. He suggests measuring similarity between chimp “reads” (short fragments of raw data) and the human assembled genome, to eliminate bias introduced by using the human genome to scaffold the chimp genome. That is a good idea, clever really. Let’s use it.

So Tomkins takes chimp reads, and computes the average similarity between these reads and the reference genome. He computes about 85% (chimp read -> human genome). I agree that is what he got, and I get the same number as him. But how much is because of error in the reads? There are no controls, so we do not really know.

85% = (chimp read -> human genome)

How do we solve this? We add controls. Let’s try a few. We could add some more data to our analysis. Let’s say we look at the chimp genome too. Here is approximately what we get…

87% = (chimp read -> chimp genome)

Hmm. That isn’t right. No chimp is that different from their genome. We expect something closer to 100%. We can try another control. How about adding human reads, and seeing what we see there.

89% = (human read -> human genome)
87% = (human read -> chimp genome)

Hmm, so we see the same problem here. All humans are less than 0.5% different, so something clearly wrong is happening. What is going on?

It turns out this identifies a big problem, that would be obvious to those who sequence genomes. There is a lot of random error in reads (it is raw data after all). The error in the final genome is a lot lower, because the errors in individual reads cancels out. The error in the reads, however, is artificially lower similarity computed against the genome.

Is there a way to fix this? Yes there is! We can subtract out the amount of error in the data, by bringing chimp and human reads close to 100% when measured against their own genomes.

There are two ways to compute the similarity between humans and chimps this way…

98% = 100 - (human read -> human genome) + (human read -> chimp genome)
98% = 100 - (chimp read -> chimp genome) + (chimp read -> human genome)

We can do this over and over again, for every individual (human or chimp) that we have data. We have a lot of a lot like this, and the percent difference comes out, by Tompkins method, to be about 2% different or 98% the same , when you take the controls into account to subtract out the sequencing error.

But what does 2% or 98% mean any ways? How do we interpret that? Controls to the rescue. Let’s take mice and rats, animals most YECs think are of the same kind. “Microevolution” (to borrow their term) can account for the differences here. We can measure this the same way we measured the difference between human chimp. It is critical to measure the same way, so we can compare the numbers. We get approximately…

82% (mice - rats)

And that is compared with…

98% (human - chimp)

In evolutionary theory, there is mathematical theory that explains strange result. We can predict that there will be about 10x more differences between mice-rat than human-chimp (18% vs 2%), just as we see in the data. In the YEC world, this is clear evidence that humans and chimps genomes look like they are the same kind. Maybe God made us separate, but disproving evolution was not one of his design goals.

Of course, if you do not like my correction to Thompkin figures, you could always just look at the mice read to rat genome numbers. The uncorrected number is about…

70% = (mice read -> rat genome)
85% = (chimp read -> human genome)

Which is clearly below 85%, leaving us with the same interpretation. Humans and chimps are more similar than mice and rats. This is explained by the mathematical formulas of evolution, but is strange in YEC. At the very least, it tells us that God is not nearly as concerned about disproving evolution as we are.

NOTE: The numbers here are approximate, rounded off for clarity of text. This is post is supplemented well with this one: Common Descent: Humans and Chimps / Mice and Rats


Ended up posting this year to link to answer @Wayne_Rossiter’s claim on this blog post: Private Site.

As such, Richard Bugg, professor of evolutionary genomics at the University of London, decided to do the comparison, and found that, “The percentage of nucleotides in the human genome that had one-to-one exact matches in the chimpanzee genome was 84.38%” (a far cry from the presumed 98% similarity between chimps and humans). What’s really scary is that this is almost exactly the degree of similarity predicted by Jeffrey Tomkins, a young earth creationist scientist! BioLogians may have to lay down for that one. Again, clearly not what we would predict under Darwinian evolution and UCA.

This is from @Wayne_Rossiter, a biologist professor. I wrote back to him:

Rather than argue about flaws, try applying that analysis with a couple controls. Look at the mice-rate genome similarity (it will always be less than human-chimp, about 70% by Thompkin’s method). Look at human to human similarity (which will be about 89% by both Buggs’ and Thompkins method). If human-human similarity is about 89% and human chimp similarity is about 87%, that should tell you something. That is exactly what you see if you run those controls.

The exact numbers depend on precisely which one we are discussing. So WHY are their results so far off? Why do they fail the controls so badly? They have actually done it several times, getting it wrong in different ways each time. If you point me to he precise method, I’ll show you the error. It is pretty straightforward, usually, to pick out why.

The key thing is that they leave out positive and negative controls. Doing this, their results are just flat out wrong, and you can’t tell except by going into the labrynth of their explanations. Just run the controls too, and their numbers fall apart, both Thomkins and Buggs.

You, Dr. Rossiter, are a biologist, and I would expect you would know better. A computer programmer, with no biology training, He was so stunned by the claim, that he went to go measure it himself. His analysis is correct: Is 1% a myth? – roohif. If Glen can do it, I’m sure you could too. Once again, just put in the controls too (human-human and mice-rat), and you’ll have a better analysis. This isn’t the sort of thing you should rely on hearsay. The data and tools are freely available. Go check it yourself. Especially now that de novo Ape assemblies are available, you can directly check how valid Thompkins objection was. If it didn’t affect the answer much, you know his hypothesis is wrong.

About de novo genes in humans? That’s a myth too. Once again, you can go look at it yourself. Why not? James Tour on Orphan Genes

As far as not being consisting with “Darwinian evolution.” Well duh. Darwinian evolution doesn’t include neutral theory, and does not make predictions about the distance between humans and chimps. Neutral theory (non-Darwinian evolution) does make predictions, and these predictions are validated in spades: Common Descent: Humans and Chimps / Mice and Rats

You don’t have agree [with] mainstream science, but these are just the basic facts we are talking about here. You can go verify them yourself. You don’t have to take Buggs, Thompkins, or my word for it. Why not go check it out yourself?

And then I pointed him here. Hopefully he returns and can explain himself.

Thank you for writing a thread on the importance of controls. I don’t know how many times I have repeated to undergrad interns that you can’t interpret results without controls. The quality of any experiment is determined by the quality of the controls.

Instead of using the mouse/rat as the model for within “kind” variation, it might also be interesting to look at the results for a chimp/gorilla or chimp/orangutan comparison using the same methods. We would expect the same methods to show more similarity, on average, between the chimp and human genome than between the chimp and gorilla/orangutan genome.

1 Like

That is a great point. This is another project to make easy to understand and do. I’m still fairly surprised that biologists cab be taken in by these poorly done and selective studies.

Controls are nice, but one thing they do is point out that some distance measures don’t tell you what you might imagine. You need to understand what the distance measure is really measuring and whether it’s what you were looking for.

Simple case, counting an indel of 10 bases as 10 differences rather than 1 tells you nothing about evolutionary distance, yet that’s what these folks are using it for. You know, humans and chimps are too different for that difference to have evolved.

We should also note that we now have an assembly of the chimp genome that was done without any influence from or reference to the human genome, so even if Tomkins had been right, it wouldn’t matter any more: