Endogenous Retroviruses and Common Descent

There are many types of genetic evidence that support common ancestry. One of those pieces of evidence is the remnants of past retroviral infections that are shared by different species.

What are retroviruses, what are these remnants, and why do they point to common ancestry? First, let’s get a handle on what retroviruses are.

Retroviruses are viruses with an RNA genome that is reverse transcribed into DNA (hence the retro portion of the name) and then inserted into the human genome. There are strong promoters in the inserted viral genome that force the host cell to transcribe the viral genome back into RNA and to also translate some of that viral genome into the proteins needed for new viral capsids. The virus works its way to the cell surface, detaches, binds to a new cell, injects its genome into the cell, and the process repeats.

So where in the host genome do retroviruses insert. As it turns out, all over the place. A group of scientists infected human cells with 3 different retroviruses, and then they mapped where in the human genome these viruses inserted. This is their data:

Each bar is a chromosome, and each lollipop marker is a mapped viral integration. There were insertions down the length of every chromosome. Some viruses did show preference for general features, but these areas cover massive portions of the genome:

"For HIV the frequency of integration in transcription units ranged from 75% to 80%, while the frequency for MLV was 61% and for ASLV was 57%. For comparison, about 45% of the human genome is composed of transcription units (using the Acembly gene definition). "
reference above

So for HIV, that virus favors about 1.5 billion bases out of the 3 billion base genome. Even then, it inserts into the unfavorable part of the genome about 20% of the time.

To sum up, we can directly observe that retroviruses insert into the host genome, and they do so all over the place. In the next post, we will go over the total number of ERVs in the human genome, how we know they are the result of viral infections, and how many of those insertions are shared with the chimp genome.


If a retrovirus creates an insertion in the genome of an egg or sperm then it has a chance to be passed on to the next generation. Those insertions are called endogenous retroviruses. How many are found in the human genome? According to the 2001 human genome paper, there are over 200,000 endogenous retroviruses in the human genome (ERV-classI-III):


How do we know that they are endogenized versions of retroviruses? Because that is what they look like, and that is how they act. A viral genome will have flanking long terminal repeats that act as promoters (LTRs), and then viral genes like reverse transcriptases and capsid proteins between the LTRs. Many ERVs are solo LTRs due to recombination between the similar sequence at the bookends of the viral genome, but we also have matching sequence between viral LTRs and ERV LTRs, and we have a known mechanism to produce them.

Also, if you line up a bunch of closely related ERVs from the human genome and remove the mutations from them you get a functional retrovirus:

Here is where we get to the evidence for common ancestry. Johnson and Coffin put it best in their paper:

Since retroviruses insert all over the place the chances that two insertions will happen at the same place in two separate genomes is close to zero. If we are talking about hundreds of thousands of ERVs, this probability gets really, really close to zero.

So how many ERVs do we share with chimps? More than 99% of them. Of the 200,000+ human ERVs, less than 100 are not found in the same place in the chimp genome. Of the 200,000+ chimp ERVs, less than 300 are not found at the same place in the human genome. This info can be found in the chimp genome paper:


This is smoking gun evidence for common ancestry. We have a process that creates genetic markers all over the place, and two genomes that share more than 99% of those markers. Independent insertion of retroviruses can’t explain this pattern. The only observed and evidenced mechanism we have for explaining this pattern is viral insertion into the genome of a shared ancestor.


Excellent posts @T_aquaticus. The science here is solid.