The Many Faces of Genetic Entropy
Since John Sanford published Genetic Entropy & the Mystery of the Genome in 2005, genetic entropy (GE hereafter) has become a staple creationist argument. In his book, he defines GE thusly:
I am using the term entropy as it is most commonly used, i.e., the universal tendency for things to run down or degrade apart from intelligent intervention. Genetic entropy specifically means entropy as it applies to the genome. It reflects the inherent tendency for genomes to degenerate over time apart from intelligent intervention.
(Emphasis his.) Since Sanford likens genetic entropy to physical entropy, and by using the word âuniversal,â a plain reading suggests that Sanford is contending that GE applies to all life, not just humans or large-bodied vertebrates. Indeed, the only empirical paper ever purported to demonstrate the reality of GE was Carter & Sanford (2012) on mutation accumulation in the H1N1 virus - obviously not a large vertebrate. If viruses can degrade via GE, surely bacteria can as well.
Second, it has been understood that GE, as formulated by Sanford, relies on the existence of a âno selection zoneâ - a region of the distribution of fitness effects (DFE) in which selection does not operate at all. Sanford describes it like so:
Essentially every beneficial mutation must fall within Kimuraâs âno selection zoneâ. All such mutations can never be selected for.
He emphasizes the word ânever,â and throughout uses âno selection.â Again, a plain reading of this text implies that Sanford contends that nearly neutral mutations cannot be selected for or against.
Since 2005, both of these appear to have been walked-back. Carter (2012) has conceded that bacteria at least could survive GE, and Paul Price (@UncensoredPilgrims) now refers to âlarge multicellular eukaryotesâ (LMEs) as those destined for extinction:
Given that most living things are not, in fact, LMEs, GE has fallen far from a universal principle akin to entropy.
It has also been recognized by Paul Price and Sanford himself that effectively neutral mutations are, in fact, selectable. Sanford wrote in 2020:
The nature of near-neutral mutations is such that they are not only un-selectable due to environmental noise, but they are also un-selectable because they are ânoiseâ to each other
And then a few sentences down wrote:
If an individual carries just one near-neutral mutation, it might be very weakly selectable
Hence, un-selectable does not really mean âun-selectable.â Paul effectively said the same thing in my recent debate with him, starting around minute 24:30.
Needless to say, Sanford has changed his mind about what appear to be otherwise core ideas of GE or wrote Genetic Entropy in such a way as to obscure what he really meant.
Creationist supporters of GE appear just as confused as we are. In my recent debate with Paul, he directly equates GE with the stochastic load paradox (Kondrashov, 1995):
Because the stochastic mutation load paradox [genetic entropy] appears real, it requires a resolution.
Above, Paul is quoting Kondrashov but adds the bracketed genetic entropy. He is even clearer in the comments, where he writes:
He said he had not resolved Kondrashovâs paradox (which is, in fact, GE!), which means he now effectively disavows his own published work.
Emphasis mine. As with Sanford, a plain reading of this suggests Kondrashovâs paradox = GE. However, Paul has since walked this back when asked to clarify in the forums:
Perhaps the most confusing part for all of us is the âand then some.â This is truly where the mystery lies. Indeed, we may never know.
Kondrashovâs Paradox Made Clear
Despite the confusion of what the core arguments of GE are and what sorts of organisms they are relevant to, both Sanford and Paul are clear that Kondrashovâs paradox is âat the heart of GE.â While this paradox is often invoked, Iâve never seen a creationist derive the paradox or talk through the assumptions involved in classic genetic load theory. If Kondrashovâs paradox is central to GE, then each assumption involved should be acceptable to creationists and comport to biological reality sufficiently. Thus, letâs derive the stochastic mutation load and chat through how we arrive at the paradox.
Firstly, some background on the genetic load. Unlike GE, the genetic load is an important and well-established concept in population genetics. Broadly, it refers to the reduction in fitness of a population due to recurrent mutation. No population is free of deleterious mutations; even when selection is highly efficient, new harmful mutations continue to appear such that some proportion of the population possesses them. Most eukaryotes possess a large genetic load; humans, for example, likely carry several hundred to even thousands of deleterious mutations per individual. The genetic load is very real.
What is the impact of all these harmful mutations? Letâs derive the classic mutation load from Haldane (1937) to get a taste of load theory. For a single locus, we have three possible genotypes, AA, Aa, and aa. There is recurrent mutation from A > a, which occurs at rate \mu; a is harmful with effect -s, and recessive to A. Let p be the frequency of A and q the frequency of a. Fitness of each genotype is then:
Mean population fitness is thus: \bar{W}=1-sq^2. At equilibrium between selection and mutation, \mu p = sq^2. Since q is deleterious, itâs assumed that its frequency is likely very small. At equilibrium it should be q \approx \sqrt{\mu/s}, and so itâs assumed that p \approx 1. Thus, \mu \approx sq^2. Plugging \mu in for sq^2, we see mean fitness is now: \bar{W}=1-\mu. Thus, the genetic load is the difference in mean fitness between a population with no mutations (W_{max}=1) and those with, \bar{W}:
In the single-locus recessive case, L=\mu (it is 2\mu in the additive case).
For the multi-locus case, the classic load assumes that mutational fitness effects at each locus combine multiplicatively. Thus, the multi-locus load becomes:
where n is the number of loci and U is the total number of deleterious mutations expected per individual each generation. Notice now that W_{max} is no longer the mean fitness of a population with no mutation at a single locus, but now a population with no mutations at all. This is because the reduction in fitness per locus is 1-\mu, such that the total load becomes L=1-e^{-U}. This is known as the mutation load.
Before we chat through the implications of this, I want to show how Kondrashovâs stochastic load is built using the same machinery as the classic mutation load. In the classic load, the expected frequency of q was \approx \mu / s. This assumes an infinite population in which genetic drift plays no role. But if drift is also important, then N_e will play a part in equilibrium frequencies. Kondrashov showed that, in a single-locus case, the expected frequency is:
where v is the rate of back-mutation from a > A. Next, let f be the fraction of nearly neutral sites within the genome:
The average selective effect against each of these sites is:
As with the classic load, if we assume all sites are independent with their own unique fitness effects, then the number of expected harmful near neutral mutations (equivalent to U in the classic load) becomes:
where G is the size of the genome. Thus, defining the genetic load as we did above, we arrive at the stochastic load: L=1-e^{-Gf\bar{s}}. As with the mutation load, notice that this load is defined relative to a population with no nearly neutral mutations at all.
What is the Paradox?
Nothing presented thus far is obviously paradoxical. The paradox emerges when we start to plug in numbers. In humans, U \approx 2.2 (Keightley, 2012) - that is, each individual suffers around 2.2 new deleterious mutations per generation. The load is then L=1-e^{-2.2}=0.89. That is, the average fitness of the human population is only 11% of what it would be if there were no harmful mutations. Another way of stating this is that 89% of the population each generation suffer genetic or selective death.
This sounds like a lot of carnage. Some have argued that such a load is intolerable (Kondrashov & Crow, 1993; Reed & Aquadro, 2006). If selection acts on offspring viability, then each human female would need to have 2\frac{1}{1-L}=18.2 children on average to maintain the population size. While this is technically within a human femaleâs reproductive ability, itâs far above the average. A similar issue emerges in the stochastic load, except that instead of solely relying on the deleterious mutation rate, the dangerous zone is when s falls within 1/G \leq s \leq 1/4N_e.
In this view of the genetic load, there is an apparent paradox: the deleterious mutation rate is too high for populationâs to persist, and yet here we are. As Kondrashov asks, âWhy have we not died 100 times over?â Creationists interpret this paradox to mean âthe mutation rate is too high, thus we couldnât have evolved.â Either way, the paradox requires a resolution.
Why We Havenât Died 100 Times Over
A Mutation-free Individual Doesnât Exist
As the derivations above make clear, the load is defined relative to an individual with no mutations at all. The genetic load imagines taking the average Joe - one of us with all our deleterious mutations - and plopping us into a population of unloaded individuals. In such a context, we might, indeed, do quite poorly. Agrawal & Whitlock (2012) write:
In principle, a load of any magnitude is compatible with species persistence because heavily loaded individuals can have high absolute fitness when competing against one another, even though each would have negligible fitness if forced to compete in a population of unloaded individuals.
Galeota-Sprung et al. (2020) argue that in large multicellular eukaryotes, such an individual is exceedingly unlikely to exist. Sewall Wright (1977) argued the same thing, writing:
âŚif many loci are involved, the genotype that combines the [optimal] genotypes at all loci is in general so rare theoretically that neither it nor anything approaching it exists in a finite population.
Dobzhansky (1957) wrote concerning the genetic load:
What would a Drosophila and a man be like if they did not carry such recessive mutations? Perhaps they would be a superfly and a superman, but the fact is that such prodigies have never existed on earth. The species Drosophila pseudoobscura and Homo sapiens have been molded in the process of evolution as Mendelian populations which carry mutational loads.
Now, in the side-comments, Paul stated that:
Hopefully, the derivations above show why this is false. Both the mutation and stochastic load assume W_{max}=1; i.e., an individual is totally mutation free.
Galeota-Sprung et al. (2020) calculate the fitness of the most fit individual that might actually exist in a finite population. They find that it is e^{5\sqrt{n\mu s}}, where n is the number of sites capable of suffering deleterious mutations. Let \bar{W}=1, then, assuming the entire genome is functional and capable of suffering deleterious mutations of \bar{s}=0.0001, then L=1.22-1=0.22. This is obviously much lower than a load of nearly 90%. In the context of the mutation and stochastic load paradoxes, far fewer genetic deaths occur because the difference in fitness between the max and average are much smaller. Individuals in real populations arenât that different.
Relative vs. Absolute Fitness
In both the mutation and stochastic loads, the mean fitness of individuals is relative. However, population persistence and growth rates depend on absolute fitness - the expected number of offspring an individual with a given genotype will have. While the genetic load may reduce a loaded individualâs ability to compete against an unloaded one, how that influences demography is opaque.
Agrawal & Whitlock (2012) derive a few simple expressions to try and connect the genetic load to ecology. They define L as the number of selective deaths that occur, \beta as the amount of precious resources an individual destined to die due to the genetic load consumes before dying, b is the birth rate and d the death rate, and I is the rate of resource conversion to reproduction. The carrying-capacity of the population is then:
Notice that even if L is very large but \beta is small, the genetic load has a tiny impact on population size. For example, if the first term is 10,000 and the load is 0.89, if juveniles only utilize 1% of resources before they die then the equilibrium population size is 9,251 - about 92% of what it would be with no load at all. In other words, the earlier in life (gametes, zygotes, young juveniles) selection acts, the smaller the ecological impact of the load. Importantly, the load is still there, it just has a small demographic impact. Highly loaded populations can persist forever.
Stabilizing Selection
Another key assumption you mightâve noticed is that the mutation and stochastic load assume that selection acts on each mutation independently. This presumes that each mutation has an effect independent of the body that carries it, or the genomic context in which it finds itself.
In reality, most traits are polygenic (e.g., Boyle et al. (2017), Mathieson (2021)). When traits are determined by many genes, mutations donât have independent fitness effects. Instead, mutations alter a trait value in some direction, the fitness effect of which is determined by the suite of other alleles an individual has. I described this in the debate starting at 38:22.
Stabilizing selection acts to maintain a trait around an optimum, with mutation and drift pulling it away. Charlesworth (2013) showed that the load for a polygenic trait is then a function of the genetic variance: L \approx Sna^2\pi where S is the intensity of stabilizing selection, n is the number of sites, a is the effect size of the mutation, and \pi is the genetic variance. Using data from Ward & Hillis (2012) and Kong et al. (2012), he estimated for humans that Sa^2=5e-7 given N_e=20,000, and with \pi = 0.001, the genetic load is a measly 0.05. That is, 5% of individuals suffer genetic death due to the load in humans. Again, this load is a far cry from 89%, and easily resolves Kondrashovâs paradox.
Nick Barton (2022) argued the same thing with respect to growing evidence that complex traits have a highly polygenic (what is called omnigenic) architecture. He writes:
Lynch (21) has applied this concept to argue that molecular adaptations that are under weak selection cannot be established or maintained in (relatively) smaller populations, imposing a âdrift barrierâ to adaptation. Along the same lines, Kondrashov (55) has argued that deleterious mutations with Nes â 1 will accumulate, steadily degrading the population. Both ideas seem problematic if we view adaptation as due to optimization of polygenic traits: Organisms can be well adapted even if drift dominates selection on individual alleles, and, under a model of stabilizing selection on very many traits, any change that degrades fitness can be compensated.
The above is the result of the infinitesimal model of quantitative genetics, which I describe here. My opening in the debate was built around this specific resolution of Kondrashovâs paradox, though Iâm not sure Paul really understood it.
Soft Selection (Density-dependence)
The final resolution Iâll discuss here (though by no means the last) is the specific mode of selection assumed under paradox models. All of the above discussion assumed that selection is hard, acting on absolute fitness independent of other genotypes. For example, a population of rabbits suffers dramatically cold weather due to a polar vortex and only those with thick enough fur survive.
However, a great deal of selection in nature is not like this at all. Bruce Wallace (1975) coined the term soft selection to describe the situation in which absolute fitness is a function of the availability of some resource and thus the density of superior competitors for that resource. In a species in which males compete for access to females, fitness variance, and hence selection, acts on male competitive ability. Importantly, all females will mate with someone even if the ideal male doesnât exist.
Haldane (1956) argued that whenever a population is well-adapted to its environment and at or near carrying-capacity, selection is density-dependent (âsoftâ), operating on competition for space, resources, mates, etc. Near range edges, when populations have colonized a new area, or during dramatic environmental shifts, selection becomes density-independent (âhardâ), as individuals struggle to gain a foothold.
As with stabilizing selection, under soft selection the genetic load is determined by variance in fitness between the best and worst competitors. Charlesworth (2013) argued this could be determined using the standard deviation of the natural logarithm for fitness. Assuming \pi = 0.001 and N_e=20,000, as above, with 5\times 10^8 sites under purifying selection, the load is only 0.0088 in humans. If we use the load calculation from the Stabilizing Selection section above, the coefficient of variation (another proxy for fitness variance) was 0.074. While higher than 5%, itâs still an easily tolerable load.
Charlesworth (2013) writes:
In general, if we treat fitness as a function of the probability of success in obtaining access to a limiting resource, the mean fitness of the population relative to the fitness of a hypothetical optimal genotype that has a very low chance of being present in the population is essentially irrelevant.
Summary
In conclusion, Iâve discussed the derivation and implication of the genetic load as it pertains to GE. If itâs true that the stochastic load is central to GE, as Paul claims, it does not appear theoretically to pose any serious issue to evolution. Note that I have refrained from discussing any empirical implications of the stochastic load (e.g., natural populations maintain high fitness despite relatively low population numbers). To summarize, Iâve made the following claims:
- Both the mutation and stochastic load are measures of the difference in fitness between an idealized, mutation-free population and a genetically loaded one.
- A mutation-free individual never exists in eukaryotes, implying the experienced load is much less because individuals are competing against equally loaded individuals.
- Genetic load is a population genetic measure, not a demographic one. To link the load to demography requires knowing when selection acts. If it acts early, its impacts are much diminished.
- Most traits are polygenic and hence donât experience selection on individual alleles, but on the sum of their effects. In such a case, the genetic load is a function of the genetic variance, which is very small.
- For populations at or near carrying-capacity, most fitness variance is competitive and thus determined by density-dependent processes. The load is thus a measure of fitness variance, which is tiny in natural populations.