Brian Miller: Co-option and Irreducible Complexity


(S. Joshua Swamidass) #1

Continuing the discussion from Bilbo Defends' Behe's Irreducible Complexity:

A more common term we use is “exaptation”.

@bjmiller are you concerned that the IC1 definition used by Behe does not even mention exaptation? It seems that failure to mention exaptation in the his argument that IC1 is unevolvable is a fairly massive oversight.

(T J Runyon) #2

Do any of the biologists on here want to try to address Dr. Miller’s post? @Mercer @Art @John_Harshman @T_aquaticus

(T J Runyon) #3

My favorite exaptation paper:

(John Harshman) #4

It’s so hard to get through ENV articles. But the first bit seems to focus on a “Harvard study”, which is otherwise unnamed. Why no reference? And based solely on the description in that article, the study seems to suffer from the Texas Sharpshooter problem, assuming that the target that was hit was the only target rather than being any point on the side of a barn. I lost interest at that point, and was too uninterested to discover whether he moved beyond bacterial flagella to make the general point implied by the title.

(John Mercer) #5

@bjmiller , you wrote:

All evidence points to the conclusion that the formation of the flagellum requires vast quantities of new genetic information…”

When you write “all evidence points to” like that, it raises two questions:

  1. How many papers did you read before coming to your conclusion?
  2. If you are familiar with “all evidence,” why does your article repeatedly refer to only a single flagellum, as though no others exist in biology?

(T J Runyon) #6

Nick Matzke’s brief reply to Miller’s post

(Timothy Horton) #7

I tried to wade through it too but my hip boots weren’t tall enough. :slightly_smiling_face: Seems like just one more case of the “it’s too improbable!!” argument where the probability is supposedly calculated for evolution to hit one very specific genetic combination. But evolution was never required to target only that specific genetic variation. It only had to find ones that work. It also didn’t have to search some ginormous sequence space. It only had to search in the immediate vicinity of an already working earlier precursor.

ID-Creationists love their silly “argument by tiny teeny probabilities” but then still gripe when science just looks at them and laughs.

(Brian Miller) #8

The Harvard study is actually linked. The color of the link might have not been easy to see do to your browser settings:

Here is a link to a version of the article where the links are easily accessible:

(John Harshman) #9

Thanks. Just from the abstract it appears that ENV is guilty of cherry-picking and quote-mining.

(John Mercer) #10


Are you aware that referring to studies by institutions instead of authors is a pretty blatant tipoff for pseudoscience? This seems especially unwarranted since the corresponding author is not even affiliated with Harvard!

I’m still puzzled by your claim of “All evidence points to the conclusion…” coupled with your use of the singular “the flagellum.” How can one be familiar with all the evidence, but simultaneously not be aware that there exist multiple flagella?

I must say that I’ve never made any reference to “all evidence” in my scientific career, and would never do so unless there were less than 3 papers in the entire field and I had authored 2 of them. :grin:

(Arthur Hunt) #11

From the ENV article, about “the Harvard study”:

Some of the most relevant research related to evolutionary timescales was conducted through Harvard’s Program for Evolutionary Dynamics and IST Austria. They published a key article which lays out two crucial findings:

  1. The expected time required for a random search to find one member of a set of target sequences (e.g., nucleotide sequences corresponding to a functional gene) of length L increases exponentially with L.
  2. The expected time required to find a target from a starting sequence that is only a “few steps away from the target set” is the same as from a starting sequence that is randomly chosen.

The second conclusion can be understood from the fact that nearly all random changes to an initial trial sequence close to a target would move early trials away from the target. The search would then need to explore the entire sequence space just as with an initial random sequence. In the context of the cooption process, the time required for a copy of a preexisting protein with some sequence similarity to a flagellar protein to evolve into the latter is just as long as for the flagellar protein to evolve from a random sequence. Therefore, calculating the minimum expected waiting time for the arrival of a new protein through cooption equates with the time for an initial random sequence to find a protein target as modeled in the Harvard study.

From the article itself:

We show that a variety of evolutionary processes take exponential time in sequence length. We propose a specific process, which we call ‘regeneration processes’, and show that it allows evolution to work on polynomial time scales. In this view, evolution can solve a problem efficiently if it has solved a similar problem already.

In other words, “the Harvard study” purports to identify a conundrum, and then describes a mechanism that solves the problem.

Brian, you don’t mention the “regeneration process”. Nor do you discuss the obvious problems that your argument has vis-a-vis the extreme length dependence discussed in “the Harvard study”, a dependence that we have seen in this forum probably does not reflect biochemical reality. Maybe you could discuss these things here.

(S. Joshua Swamidass) #12

@bjmiller, this was my reading of the paper too. I’ve been puzzled by how this study is explained at ENV everytime it comes up. Can you explain to us why you disagree with the conclusions of the paper?

(Mikkel R.) #13

I actually wrote one of the authors of that paper back when I first read it, to ask what sorts of processes were assumed to operate. They only consider a model of substitution (with duplication iirc), and don’t consider the effects of recombination or shuffling at all, which would be able to radically increase “search” speeds. Even so, the paper still concludes there isn’t an actual problem. That paper is no comfort to a creationist.

(Brian Miller) #14

An upper bound can be estimated for the number of required distinct protein targets for a random search to find any of them. Research on model proteins found that proteins can be very tolerant to a few mutations. However, as mutations accumulate, the protein moves past a stability threshold where additional mutations have increasingly deleterious effects:

[My comments will focus on nonsynonymous mutations to remain consistent with the assumptions in my article.]
After around 10 mutations, over half of subsequent mutations completely disable a protein. And, after less than 20% of the sequence is altered, the protein is permanently disabled based on the exponential decline in fitness and on epistatic effects. The authors suggest that this pattern is common to many proteins which require high stability. As a consequence, the target in the Chatterjee et al. study for c=0.30 is considerably larger than for actual proteins (c < 0.20).

The Chatterjee study calculated that the number of trials required to find a target for c=0.30 and L=1000 is 10 to the power of 170 (1E170). Since only 1E38 trials are possible, a random search could not likely find any target even if the number of targets were 1E170 / 1E38 = 1E132. In other words, for cooption to generate a flagellar protein, the number of distinct proteins which could perform the needed function would have to be one trillion, trillion, trillion, trillion, trillion, trillion, trillion, trillion, trillion, trillion, trillion. No one has ever claimed proteins were so abundant in sequence space.

The last issue is whether flagellar proteins demonstrate similar reductions in performance with accumulating mutations. Several mutation studies demonstrate that a large number of individual mutations either degrade or eliminate flagellar functions:

  • Brown et al., “Mutational analysis of the flagellar protein FliG: sites of interaction with FliM and implications for organization of the switch complex.” Journal of Bacteriology, vol: 189 (2) pp: 305-12.
  • Williams et al., “Mutations in fliK and flhB Affecting Flagellar Hook and Filament Assembly in Salmonella typhimurium,” Journal of Bacteriology, vol. 178 (10), pp: 2960–2970.

Therefore, the cited results likely apply sufficiently well to conclude the cooption faces extreme mathematical challenges.

Actually, the challenges are much greater if one considers that multiple proteins are needed to simply add one additional piece to a flagellum such as a filament.

(Brian Miller) #15

Very good question.

The Chatterjee et al. study described the regeneration process as follows:

What are then adaptive problems that can be solved by evolution in polynomial time? We propose a “regeneration process”. The basic idea is that evolution can solve a new problem efficiently, if it is has solved a similar problem already . Suppose gene duplication or genome rearrangement can give rise to starting sequences that are at most k point mutations away from the target set, where k is a number that is independent of L. It is important that starting sequences can be regenerated again and again.

There are two key aspects to the “regeneration process”: (a) the starting sequence is only a small number of steps away from the target ; and (b) the starting sequence can be generated repeatedly.

The challenge is that proposed cooption scenarios for irreducibly complex molecular machines rarely, if ever, meet these criteria. A straightforward calculation demonstrates that k must be under 10 for the regeneration process to succeed. In contrast, comparing flagellar proteins to their proposed homologous proteins reveals that k would have been considerably larger than this limit:

Nor do you discuss the obvious problems that your argument has vis-a-vis the extreme length dependence

Many of the flagellar proteins exceed L=1000, which the authors equate to a protein of average length. Remember that L is the number of nucleotides, not the number of amino acids. One can fit their results to an exponential curve and then work backwards to determine that the maximum length of a protein which could be found is less than the length of a large portion of the flagellar proteins. In addition, since multiple proteins are needed to add components (see my article), the total sequence length exceeds L=1000 considerably.

(Brian Miller) #16

If the process has to search a vast, sparsely populated space, no undirected search method would help without being given knowledge of the target. Imagine swapping words and sentences from two different texts and adding additional random changes in an attempt to generate a completely new readable paragraph.

Even within the target region, functional proteins are extremely rare and interspersed within nonfunctional sequences. As the studies I mentioned indicate, at the fringes of the fitness landscape, more than half of the mutations disable the protein, so less than half of the amino acids on average at each position correspond to a functional sequence. Barely functional sequences vastly outnumber optimized ones, so the chance of finding a functional protein is equivalent to finding a barely functional one.

(John Harshman) #17

Sorry, but you’re still apparently estimating a single, specified target with some region around that target being selectable. You’re talking about a particular flagellar protein, for example. What you should be considering is all possible proteins that might be part of a structure that aided motility. The fact that we got a flagellum of a particular sort is purely contingent. So you’re still indulging in the Texas sharpshooter fallacy.

(Mikkel R.) #18

Increasing the number of targets is still a texas sharpshooter fallacy. You’d have to increase the number of target to all possible functional proteins under all possible physical environments. Evolution isn’t searching for any particular protein, nor a whole bunch of arbitrarily picked ones. In nature, anything that works is the criterion. And of course there’s the fact that the fitness effects of all mutations are in the end context-dependent. The environment is what determines whether any particular mutation is deleterious or beneficial.

After around 10 mutations, over half of subsequent mutations completely disable a protein. And, after less than 20% of the sequence is altered, the protein is permanently disabled based on the exponential decline in fitness and on epistatic effects.

Of course it is, but where does natural selection fit into that equation? Nobody here claims one can take a protein, and then just willy-nilly accumulate mutations in it indefinitely. The question is what happens once you involve natural selection to weed out deleterious mutations, and include compensatory epistasis from the rest of the genome? Or a more complex environment that involve multiple selection pressures. The type of mutagenesis and selection involved in the studies you cite was explicitly designed to only test the effect of mutations on one function of the protein in question in isolation, and disallowed the possibility of compensatory epistasis from chromosomal mutations. Neither were effects such as gene dosage resulting from changes in regulation, or duplication, included as potentially compensatory effects.

(Brian Miller) #19

I forgot to mention that the Chatterjee et al. study considered the possibility of multiple targets scattered throughout sequence space, and the search times are still exponential in L. Therefore, multiple targets (multiple possible proteins) do not solve the problem.

The article specifically addresses recombination:

It is known that recombination may accelerate evolution on certain fitness landscapes, and recombination may also slow down evolution on other fitness landscapes. Recombination, however, reduces the discovery time only by at most a linear factor in sequence length. A linear or even polynomial factor improvement over an exponential function does not convert the exponential function into a polynomial one. Hence, recombination can make a significant difference only if the underlying evolutionary process without recombination already operates in polynomial time.

The authors conclude that no process can overcome the exponential growth in time with L except for the regeneration process.

The challenge is that the number of targets would need to vastly exceed the most optimistic estimates of the total number of all proteins before one could be found.

(John Harshman) #20

That actually seems like a reasonable situation. The number of possible functional proteins ought to vastly exceed the number that actually exist.