There are many types of genetic evidence that support common ancestry. One of those pieces of evidence is the remnants of past retroviral infections that are shared by different species.
What are retroviruses, what are these remnants, and why do they point to common ancestry? First, let’s get a handle on what retroviruses are.
Retroviruses are viruses with an RNA genome that is reverse transcribed into DNA (hence the retro portion of the name) and then inserted into the human genome. There are strong promoters in the inserted viral genome that force the host cell to transcribe the viral genome back into RNA and to also translate some of that viral genome into the proteins needed for new viral capsids. The virus works its way to the cell surface, detaches, binds to a new cell, injects its genome into the cell, and the process repeats.
So where in the host genome do retroviruses insert. As it turns out, all over the place. A group of scientists infected human cells with 3 different retroviruses, and then they mapped where in the human genome these viruses inserted. This is their data:
Each bar is a chromosome, and each lollipop marker is a mapped viral integration. There were insertions down the length of every chromosome. Some viruses did show preference for general features, but these areas cover massive portions of the genome:
"For HIV the frequency of integration in transcription units ranged from 75% to 80%, while the frequency for MLV was 61% and for ASLV was 57%. For comparison, about 45% of the human genome is composed of transcription units (using the Acembly gene definition). "
reference above
So for HIV, that virus favors about 1.5 billion bases out of the 3 billion base genome. Even then, it inserts into the unfavorable part of the genome about 20% of the time.
To sum up, we can directly observe that retroviruses insert into the host genome, and they do so all over the place. In the next post, we will go over the total number of ERVs in the human genome, how we know they are the result of viral infections, and how many of those insertions are shared with the chimp genome.