How Does Tokuriki 2009 Affect Conclusions from Axe 2004?

Continuing the discussion from Gauger and Mercer: Bifunctional Proteins and Protein Sequence Space:

Dr @Agauger -

I apologize that I could not raise this question directly in the “Bifunctional Proteins” thread due to my “noob” status. Hopefully I will gain enough site cred soon. Help my by liking my post, if you think it’s worth it! :grin:

You cited Tokuriki 2009 in the other thread to provide data on the question of a protein’s tolerance for mutations. Tokuriki’s key finding is that the loss of functionality increases exponentially with the number of mutations.

My recollection of Axe 2004 (please correct me if I am misremembering) is that Axe introduced several mutations that weakened the enzyme’s functionality before beginning the measurement of effects of further mutations on proteins.

Since the effects of mutations are exponentially worse for Axe’s crippled protein than for “normal” proteins, it would seem inappropriate to project Axe’s findings onto the state space for normal proteins. Does that make sense?

Of course, Tokuriki’s paper came 5 years after Axe’s, so I would not want to criticize Axe’s research on that basis. However, it would seem highly pertinent to scientific discussions from 2009 onward.

I would appreciate your thoughts on the matter whenever you have time.

Chris Falter


@Chris_Falter, you were just added to the @conversant group, and should be able to post on the original thread. Can you try and do so please?

1 Like


You ask a good question, Chris. Take a look at the curve in the graph I posted above. It has a sigmoidal shape, roughly. The first mutations have little effect on protein activity. This buffering effect is pronounced for wild type proteins. (I am excluding active site mutations.) Then there is an exponential decline in activity as you noted. This is followed by a very low level of activity for an extended period of increasing mutation. Doug did not want to measure where buffering was taking place, neither did he want to measure during exponential decline. He wanted to measure at the threshold of enzyme activity, because he was trying to quantify the number of changes to go from no function to function. This is the transition boundary that is of interest to evolutionary biologists. Therefore he used a weakened enzyme to represent that initial emerging enzyme, and he determined what proportion of mutations would reduce that enzyme to lack of threshold function. That was the proportion he was seeking—what proportion of folds were capable of carrying out a minimal enzyme function.

1 Like

@swamidass - My apologies for not moving the thread earlier; my work has been quite busy. Now that we’re here, perhaps we should stay?

Dr. Gauger,

Thanks for your response. I am wondering why you are suggesting that Axe’s question, a legitimate one for sure, would be the only one of interest to evolutionary biologists.

Evolutionary biologists have identified a variety of mechanisms that introduce new information (in the Shannon sense) into genes. Off the top of my head I would list copy-and-modify, DNA shuffling, frame shifts, and indels; I’m sure a real biologist could list a lot more.

One thing that is common to all these mechanisms is that they explain transitions from existing functionality to new functionality.

Since these mechanisms are all part of evolutionary theory and of interest to evolutionary biologists, why would you characterize the transition from no function to weak function as the one true question biologists care about? Or have I misunderstood you?

Axe’s study would certainly be of interest to origin-of-life researchers, in my opinion, because they are dealing with the question of how to get from no function to function.

My understanding of evolutionary biologists, on the other hand, is that they bracket off the OOL question as something to be answered later, and start with the assumption of a population of primitive working cells billions of years ago. They pose questions about how transitions from earlier function to later function occur, and find answers through the mechanisms I mentioned above, inter alia. Thus Axe’s research would not really address the existing body of evolutionary theory, and apologists like Stephen Meyer who think otherwise are mistaken, in my opinion.

Chris Falter


The original thread is closed, so it makes sense to continue it here. Honestly, that other thread you wouldn’t be able to ge ta word in edgewise. Here, maybe you can get your questions answered.

I’m not @Agauger, but I suspect she would reply that Axe’s work is relevant to the origins of new genes from sequences that specify no functional proteins (ORFans and the like), which includes genes arising long after life began.

1 Like

Yeah I would say the same thing. The question is a relevant one, and in that sense Axe’s work is not without some merits.
The real problem for me is the conclusions drawn from Axe’s work go far beyond what the experiment can really tell us and are presented to laymen in ways they simply can’t support. You can’t generalize from an experiment done on one protein, looking only at one of it’s possible multiple functions, to all other proteins (of a similar size) with all other functions. There could very well be large differences in the frequency of functions for different proteins.

The way Axe’s work has been sold to the ID crowd has been misleading. Take a look at this claim made on EN&V:

This paper is interesting because it relates to the work of Douglas Axe that resulted in a paper in the Journal of Molecular Biology in 2004. Axe answered questions about this paper earlier this year, and also mentioned it in his recent book Undeniable (p. 54). In the paper, Axe estimated the prevalence of sequences that could fold into a functional shape by random combinations. It was already known that the functional space was a small fraction of sequence space, but Axe put a number on it based on his experience with random changes to an enzyme. He estimated that one in 10^74 sequences of 150 amino acids could fold and thereby perform some function — any function.

That’s a generalization about all functions for all of protein sequence space for sequences 150 amino acids in length. But how can an experiment looking at only one functoin for a particular fold be generalized to all de novo protein evolution? It it is rather obvious that it can’t. Axe and Gouger should be spending some time trying to fight back misconceptions about their work.


Sorry to be late responding again, @Chris_Falter. I did not say the one true question. But I did say the transition boundary that is of interest to evolutionary biologists. I did not say only evolutionary biologists.

Yes, there are a variety of mechanisms proposed to introduce new information, some involving the gradual accrual of neutral mutations like copy and modify as you call it. Shuffling, frameshifts, and most indel (which often cause frameshifts) would be radical changes. But even “radical changes” have to go from no function to weak function, since random mutation is highly unlikely to achieve even moderate levels of function right off the bat. So the transition is of interest. To ask about the difficulty of shuffling achieving a new function we would have to design experiments to do that. Frances Arnold’s group has done a lot of work on developing that method but I am not up on what kinds of methods and library sizes they have to screen, and what levels of function they can detect in the beginning of their search.

My opinion on frameshifts is that getting a functional protein by frameshift requires an unusual protein with repetitive sequence and a not very sequence-specific function, if it is going to happen at all. The evidence I have seen for frameshifts generating functional protein is based on sequence comparisons. To say it happened by an evolutionary process is an assumption that requires evidence.

Everything I know about genetics says that frameshifts are bad news, unless the sequence has been prepared beforehand. I know, ID ooga booga, I think it was. Probably another thread.

1 Like

I agree.

No, we wouldn’t. It’s much easier to look at other people’s results from the past 32 years.

Lots of people have shuffled sequences at random then selected catalytic antibodies.

There’s a lot of older papers that state methods and library sizes, as well as levels of function they can detect in the beginning of their search.

If Doug Axe’s 1 in 10^-77 is anywhere near accurate, this shouldn’t work at all, because I think we can agree that it is not feasible to screen that many phage, correct?

Since you think that function is rare, how many plaques would you predict they’ve had to screen to get weak enzymatic function–the transition that interests you?


If Axe is correct, then about 10^30 suns worth of M13 phage mass should about do the trick.

That’s a heck of a lot of Petri dishes…


Would you please give me the reference?

1 Like

THE reference?

As I noted, there are many references, so your use of the singular in your question seems strange.

What is your prediction?

That’s a great illustration. I must confess that I could not think of a good illustration for the magnitude of Axe’s error, as it is enormous.

You should calculate the error in Dembskis. :slightly_smiling_face:

This from Panda’s Thumb Archives

From you Panda’s thumb article you appear to believe that Axes results are in the range of other experimental results:

10^-10 -> 10^-63 (or thereabout): this is the range of estimates of the density of functional sequences in sequence space that can be found in the scientific literature. The caveats given in Section 2 notwithstanding, Axe’s work does not extend or narrow the range. To give the reader a sense of the higher end (10^-10) of this range, it helps to keep in mind that 1000 liters of a typical pond will likely contain some 10^12 bacterial cells of various sorts. If each cell gives rise to just one new protein-coding region or variant (by any of a number of processes) in the course of several thousands of generations, then the probability of occurrence of a function that occurs once in every 10^10 random sequences is going to be pretty nearly 1. In other words, 1 in 10^-10 is a pretty large number when it comes to “probabilities” in the biosphere.

No, Bill, they are well outside that range. Please read more carefully.

And we’re not talking about Axe’s results, we’re talking about the overinterpretation involved in the estimate, the utter failure to qualify it, and to put them in the context of massive amounts of existing data.

1 Like

What am I missing here?

I didn’t write that, Bill. You appear to be missing a fundamental understanding of mathematical notation.

Ok, and my question was to Art who did so I look forward to his answer.

In a tiny, table top, M13 phage experiment they were able to find functional proteins from random sequences. If Axe’s calculations were correct this should be impossible. It would have required an experiment quintillion of millions of times larger than the sun to be successful.

This is not the only experiment of its kind, because there are many more published that do exactly the same. This is direct falsification of his number. Period.