My thoughts on an Evolution News post on ERVs

Jimspace · July 29, 2020, 3:57pm

This 2015 article from Evolution News by Casey Luskin portrays the argument from Endogenous Retroviruses (ERVs) as junk DNA vs. functional DNA. His argument goes that evolutionary common descent depends on ERVs being junk DNA. But since they are functional, they support common design. I‘m not sure it follows though that common descent depends on ERVs being only junk DNA.

He also mentions Abbie Smith and her ERV blog. I found where she actually addressed his dichotomy in 2007, where she writes that functionality “has been co-opted,” and that “ERVs and host co-opting and pirating ERV parts is a prediction made by [evolution].”

So it appears that the same lady he mentioned already addressed his dichotomy years earlier.

I see that much discussion over this dichotomy took place here: ERV's, Junk DNA, Activity, and Function

swamidass · July 29, 2020, 4:00pm

The argument does not depend on ERV’s being junk DNA.

By what definition of function? By most sensible ones, only a small fraction are functional. So we can throw out all those that are shown to be functional, if we must, and the argument doesn’t change one bit.

Did they predict functionality? We’ve known from just about the beginning that ERVs can have function at times. Creationists didn’t predict it because we already knew it to be true.

Jimspace · July 29, 2020, 4:14pm

Right, Casey said that

An assumption in the question (and in the arguments from supporters of common ancestry) is that ERVs are a type of functionless “junk” DNA. Thus if apes and humans share ERVs in the same position in our genomes, that would seemingly count as evidence for common descent. But what if ERVs aren’t junk? What if they are a type of functional DNA? If that’s the case, then shared ERVs could easily be explained by common design rather than common descent, and they would certainly no longer be some kind of special argument for common ancestry.

But I don;t think his dichotomy is as strong as he would like it to be.

The line “Creationists Predicted ERV Functionality.” is a title of a blog post by Abbie Smith where she refutes that. It was a citation where I quoted her. I should have used footnotes.

swamidass · July 29, 2020, 4:16pm

Yup, he is relying on all or none thinking. We all agree that some ERVs have important function, but it doesn’t follow that all ERVs have important function. Yet that is exactly what his argument requires.

T_aquaticus · July 29, 2020, 4:16pm

Even if every ERV was functional they would still be evidence for common ancestry because ERVs are found at the same location in the genomes of different species. Additionally, the divergence of ERV sequences recapitulates the canonical phylogenies.

Here is the boiled down ERV argument. Retroviruses are observed to insert randomly into host genomes. The probability of two retroviral insertions occurring at the same spot in two separate insertion events is exceedingly rare. Therefore, finding a large percentage of ERVs at the same location in genomes of multiple species is best explained by a single insertion in a common ancestor. In the case of humans and chimps, of the 200,000+ ERVs in the human genome all but ~100 of them are found at the same location in the chimp genome. That’s over 99.9% of ERVs at the same location.

Notice that I never mention function in that argument. It is irrelevant to the argument. That being said, the ENCODE study has been widely debunked so using it to support an argument about function in the human genome isn’t going to get very far. What the ENCODE study did was conflate “does something” with “has function”. Those aren’t the same thing. The trash in your kitchen trash can releases odor molecules into the air, so it does something. However, that trash is still junk because it isn’t a functional part of your kitchen.

The best paper on the problems with the ENCODE study is this one, IMHO:

A recent slew of ENCyclopedia Of DNA Elements (ENCODE) Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is less than 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 − 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, by committing a logical fallacy known as “affirming the consequent,” by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” by using analytical methods that yield biased errors and inflate estimates of functionality, by favoring statistical sensitivity over specificity, and by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.
Dan Graur, Yichen Zheng, Nicholas Price, Ricardo B.R. Azevedo, Rebecca A. Zufall, Eran Elhaik, On the Immortality of Television Sets: “Function” in the Human Genome According to the Evolution-Free Gospel of ENCODE, Genome Biology and Evolution , Volume 5, Issue 3, March 2013, Pages 578–590, On the Immortality of Television Sets: “Function” in the Human Genome According to the Evolution-Free Gospel of ENCODE | Genome Biology and Evolution | Oxford Academic

swamidass · July 29, 2020, 4:17pm

It is Stacey Smith it seems. The content of the post is:

There will be several more posts on ERV ‘junk’/‘function’/etc, however a main Common Creationist Claim about ERVs is that they predicted ERV functionality before Evilutionists. Evilutionists refused to study ERVs because they are ‘junk.’

Parts of ERVs are functional. For instance, the env gene of specific endogenous retroviruses has been co-opted by mammals to form placentas (I elaborate here). Also, endogenous viruses carry genes with them called LTRs. This is how an exogenous virus like HIV uses an LTR-- but when the virus becomes endogenous, the LTR can act as a promoter for endogenous genes, either up or downstream. This paper has a humorous, but accurate name for this: “domesticated long terminal repeats”.

Evilutionists did this research, not Creationists. Evilutionists predicted functionality in non-coding regions, not Creationists (see MarkHs post at Denialismblog). ERVs and host co-opting and pirating ERV parts is a prediction made by evilution, not Creationism

Rumraket · July 29, 2020, 4:28pm

Here’s a simple figure that is supposed to explain the principle behind how ERVs provide evidence for common descent, regardless of whether they are functional or not:
0080_n

That is of course a highly idealized diagram, and the real world is a bit more complicated than that, but the underlying principle is correct. It is the fact that we can find ERVs that show distributions in different species that imply the kind of branching process of descent with modification, the phylogenetic tree on the left side of the figure, that gives rise to the distribution of ERVs within species depicted on the right side.

To be sure, ERV insertions can happen to all the species depicted following split from a common ancestor, such that for example the Gibbon will have it’s own unique ERV insertions in addition to those depicted on the figure, that aren’t shared by any of the other species.

There are many ways in which such ERV insertions can support the theory of common descent besides merely the fact of their insertion position. For example, As you go further back in time, shared ERV insertions should diverge more in sequence from each other. That is to say, the blue ERV insertion shared between Gibbon and Human, would be expected to be more different from each other, than the blue ERV insertion between human and Chimp, because it would have had more time to accumulate mutations independently in the Gibbon lineage and the lineage leading to Humans.

Notice how none of this requires that the ERVs be nonfunctional.

A link that explains this sample principle and more, in much more detail, while also giving real examples, can be found here: Evidence for the Evolutionary Model - ERVs

Jimspace · July 29, 2020, 4:35pm

Thanks guys, this is very absorbing reading!

Jimspace · July 29, 2020, 4:38pm

I noticed that too. Her currrent blog is sa smith | ScienceBlogs titled sa smith.

T_aquaticus · July 29, 2020, 4:38pm

That’s another line of evidence that can actually be useful when discussing these arguments. For example, PtERVs are found in both the gorilla and chimp genomes but not in the human or orangutan genomes. It is always interesting to ask ID/creationists what predictions they would make about PtERV insertions. Should they be found at the same location in the chimp and gorilla genome, or not? How did you derive these predictions using the ID/creationist model?

T_aquaticus · July 29, 2020, 4:49pm

For completeness, here is great paper that lays out the 3 layers of ERV evidence:

Endogenous retrovirus loci provide no less than three sources of phylogenetic signal, which can be used in complementary fashion to obtain much more information than simple distance estimates of homologous sequences. First, the distribution of provirus-containing loci among taxa dates the insertion. Given the size of vertebrate genomes (>1 × 10^9 bp) and the random nature of retroviral integration (22, 23), multiple integrations (and subsequent fixation) of ERV loci at precisely the same location are highly unlikely (24). Therefore, an ERV locus shared by two or more species is descended from a single integration event and is proof that the species share a common ancestor into whose germ line the original integration took place (14). Furthermore, integrated proviruses are extremely stable: there is no mechanism for removing proviruses precisely from the genome, without leaving behind a solo LTR or deleting chromosomal DNA. The distribution of an ERV among related species also reflects the age of the provirus: older loci are found among widely divergent species, whereas younger proviruses are limited to more closely related species. In theory, the species distribution of a set of known integration sites can be used to construct phylogenetic trees in a manner similar to restriction fragment length polymorphism (RFLP) analysis.

Second, as with other sequence-based phylogenetic analyses, mutations in a provirus that have accumulated since the divergence of the species provide an estimate of the genetic distance between the species. Because, for any given provirus, it is highly unlikely that there will be selection for or against any specific sequence, it is safe to assume that the rate of accumulation of mutations approximates the rate of their occurrence, with appropriate corrections for reversion. Analysis of closely related proviruses integrated at different sites should also reveal regional differences in mutation rates.

Third, sequence divergence between the LTRs at the ends of a given provirus provides an important and unique source of phylogenetic information. The LTRs are created during reverse transcription to regenerate cis-acting elements required for integration and transcription. Because of the mechanism of reverse transcription, the two LTRs must be identical at the time of integration, even if they differed in the precursor provirus (Fig. 1A). Over time, they will diverge in sequence because of substitutions, insertions, and deletions acquired during cellular DNA replication. Although it has been noted that the divergence between the two LTRs of an ERV can serve as a molecular clock (8, 15, 18, 25), there are no reported prior attempts to utilize the LTRs of individual ERV loci as a source of phylogenetic signal.

Constructing primate phylogenies from ancient retrovirus sequences

Welkin E. Johnson, John M. Coffin

Proceedings of the National Academy of Sciences Aug 1999, 96 (18) 10254-10260; DOI: 10.1073/pnas.96.18.10254

https://www.pnas.org/content/96/18/10254

davecarlson · July 29, 2020, 6:08pm

This new paper looking at full length long terminal repeat (LTR) retrotransposons diversity in varieties of assembled maize genomes might be germane:

European maize genomes highlight intraspecies variation in repeat and gene content

Abstract:

The diversity of maize ( Zea mays ) is the backbone of modern heterotic patterns and hybrid breeding. Historically, US farmers exploited this variability to establish today’s highly productive Corn Belt inbred lines from blends of dent and flint germplasm pools. Here, we report de novo genome sequences of four European flint lines assembled to pseudomolecules with scaffold N50 ranging from 6.1 to 10.4 Mb. Comparative analyses with two US Corn Belt lines explains the pronounced differences between both germplasms. While overall syntenic order and consolidated gene annotations reveal only moderate pangenomic differences, whole-genome alignments delineating the core and dispensable genome, and the analysis of heterochromatic knobs and orthologous long terminal repeat retrotransposons unveil the dynamics of the maize genome. The high-quality genome sequences of the flint pool complement the maize pangenome and provide an important tool to study maize improvement at a genome scale and to enhance modern hybrid breeding.

If you compare the ~15,000 the full length LTR insertions (which are likely to be younger, on average, than truncated LTRs), only about 20-30% of the LTRs are shared on average between any two of the individuals:

a , Proportion of the pan and core sets for fl-LTRs and genes with increasing line numbers. b , Percentage of pairwise shared and still-intact fl-LTRs at syntenic positions for seven maize lines. Reading direction is column to row; for example, EP1 shares 27% of its fl-LTRs with F7 and F7 shares 28% with EP1. On the basis of the similarity matrix, the seven lines cluster into a relationship context that separates flint and dent. Within flint and dent, around 30% of the locations are shared; between flint and dent, most of the values are reduced to about 20%, except for PE0075 (25%). The most pronounced overlap of intact and shared fl-LTR locations is found between W22 and B73 (32%); the least pronounced is found between EP1 and Mo17 (18%). A corresponding evalution for genes gives pairwise shared numbers between 82 and 91%. c , Insertion age ( y axis) and chromosomal distribution of all fl-LTRs (top row), line unique (label 1) and cluster constellations of increased sharing ranging from two to all eight lines. The chromosomal location is collapsed for all ten chromosomes and given in percentage of the respective chromosome length. The line-specific or shared among fewer lines fl-LTRs contain a higher proportion of younger elements and are less frequently found in the central, low recombining regions. There is a continuous shift towards a more pericentromeric location and towards older elements with the increase of lines sharing corresponding elements.

Interestingly, if you build a tree from the presence-absence of LTR insertions, you get the same result as the tree built from the sequence data:

Differences in pairwise shared numbers match the phylogenies from gene-derived phylogenetic relationships and reveal a clear distinction between flint and dent lines (Fig. 2b)

sfmatheson · July 29, 2020, 6:37pm

It’s Abbie Smith, whose ERV blog was very well known several years ago. Her first name is Stacey but her web persona is Abbie.

Barry_Desborough1 · July 29, 2020, 8:42pm

My FAQ on ERVs ERV FAQ.

Dan_Eastwood · July 29, 2020, 9:26pm

Thanks Barry!

@Barry_Desborough1 is a friend from FB with a keen interest in ERV’s.

swamidass · July 30, 2020, 12:21am

Welcome to PS @Barry_Desborough1

scd · July 30, 2020, 12:53pm

here is one problem with ERV’s prediction: the numbers 1-4 represent the age of the insertions so 4 is the oldest:

its seems that there is no correlation between insertion time and the number of mutations since the insertion. evolution doesnt predict that.

Rumraket · July 30, 2020, 1:42pm

Uhm, dude! You appear to be comparing different loci to each other (which merely show that all parts of the genome don’t accumulate mutations at equal rates), instead of looking at the number of changes that separate the same insertion between different species.

Look at your trees. The prediction is pretty well confirmed for every single case The number of changes that separate chimp-bonobo is generally less than for human-chimp, which in turn is generally less than what separates human-gorilla, and so on. Every tree reflects the canonical phylogeny in terms of branching order, and generally also in terms of number of changes that separate the species. Thanks for proving the point.

John_Harshman · July 30, 2020, 1:42pm

I’m going to need some explanation here. What is the source for this figure? What do the red bars mean? Where on each tree do you claim all the insertions happen, and what data were used to build the trees?

Rumraket · July 30, 2020, 2:11pm

It’s from figure 3 in this paper: https://www.genetics.org/content/171/3/1183

Legend says:

Phylogenetic analysis of HERV-K elements that conform to predicted topology. Maximum-parsimony trees are shown for each element. Branch lengths are proportional to the number of changes occurring along a lineage. Bootstrap values taken from 100 replicates are shown. Nodes without indicated bootstrap values had very high support, ≥95%. Human 5′ and 3′ LTR sequences of closely related HERV-K elements were used as outgroups in each analysis and are indicated by element name. HERV-K3p25 is not full length in the chimpanzee and bonobo, but a solitary LTR, which is formed by homologous recombination between the 5′ and 3′ LTRs, was found at this locus and included in the analysis.

Topic		Replies	Views
Are ERVs evidence for common ancestry Conversation	78	2921	February 15, 2022
Alternative Explanation for Retroviruses in DNA Conversation	62	533	June 17, 2025
Stated Clearly: The DNA evidence for Common Descent in ERVs Conversation Science	99	2285	April 15, 2021
Does Dr. Andrew Fabich have the worst ERV argument? Conversation Science , Design , Article	35	1383	February 23, 2022
Some Questions About ERV Evidence for Common Descent Conversation Science	22	1473	February 18, 2021

My thoughts on an Evolution News post on ERVs

Related topics