Beta-Lactamase, Antibody Enzymes, and Sequence Space

Yes. Axe’s was horribly sloppy and dealt with an N of 1. He didn’t measure enzyme activity. He started with a ts mutant.

Most telling, he didn’t follow up his outlier result.

How could they, when he was starting with a ts mutant of a single protein?

This is a distinction without significance for three reasons:

  1. A stable fold is not a requirement for an enzyme in the first place. Intrinsically disordered proteins are likely an important initial state for de novo enzymes, that are subsequently tightened up into a stable fold by positive selection. For an IDP to work as an enzyme, it needs one groove in the protein to be in the right configuration some of the time. That isn’t too hard to do, and might even be easier for an IDP than a stable protein.

  2. It is fairly easy to get a stable fold. Right now, protein prediction algorithms are good enough to study this. Put some random sequences in. See how often stable folds arise. It is quite easy. (I think @T_aquaticus has done this before).

  3. An antibody is essentially a stable fold with a variable groove or pocket. Imagine a protein has one function, and is maintained by negative selection to be a stable fold. An “unused” groove of the protein can be varied by neutral evolution till it finds the function. Antibody evolution mimics this process. Add in the Real Time-Evolution mechanism you just wrote about (duplication-then-divergence), and you have a new enzyme, with a totally new function.

So for those three reasons, I do not think there is reason to doubt the relevance of these studies.

[EDIT: Removed discussion on CCC. It’s too late to get this straight.]

1 Like


[My question to Mercer]
My question is, how large is the epitope universe? How many unique epitomes are there. More than 10^10?

Let me clarify. By epitope you mean the things the antibodies can recognize, and it’s by definition going to be a different set for every organism, right. I have 10^8, You hav a different 10^8 etc.There are 10^8 antibodies and theach have an epitope
So how many different protein shapes are there in the protein universe? More than 10^8. How many different proteins are there? A single amino acid change can change antibody recognition. Just think flu virus.
This observation gives me pause. That one protein should be represented so frequently in so large a universe argues to me that something has been overlooked

1 Like

@swamidass what is a CCC?

1 Like

Apologies, my mistake. The number of different V regions, not epitopes, is 10^8.

Not even close. Those 10^8 V regions are sufficient to recognize far more than 10^8 epitopes.

It’s not. There is no 1-to-1 correspondence between V regions and epitopes.


This is Behe’s term! I really did read his work you know :wink:. It is a cloroquine complexity cluster (CCC), which Behe says is 10^20 difficult to find, and estimates that every protein-protein interaction requires about one CCC.

[EDIT: Discussion on CCC needs development. It’s too late to get this straight.]

As @mercer just put it, Behe seems unaware that “proteins are sticky.” The more important challenge is that the antibody system demonstrates that protein-protien interactions are far easier to evolve.

1 Like

@swamidass @Mercer
Of course proteins are sticky. We had to treat the glassware so they didn’t stick to the glass. But that kind of sticky is nonspecific. It doesn’t represent a substrate enzyme interaction or catalysis that’s not enzymatic. That’s sticky. That might be good enough for antibodies but not for enzymes. You can’t run a metabolism that way. Yeah.

1 Like

The issue of proteins being sticky is important because that is how protein-protien interactions arise, not enzymes. That is a subtopic here, under the main topic. The specificity is baked into the antibody results too. It is not hard to evolve new protein interactions.

@swamidass You are not being reasonable. I know you are on a campaign to discredit Doug. I know you don’t approve of his having only done one trial. If it had been me I would’ve done more trials also. But the fact remains his is not the only study of enzyme function to show that it is very rare in sequence space. There are a number of studies that make that case and none of them are on the order of one in 10^10. Doug’s study was about enzyme function the rarity of enzyme function in sequence space. Yours is about how easy it is to get an antibody to mimic an enzyme. How does the Kcat/Km compare to a real enzyme? I am not defending Doug. I am rejecting what you were offering as an explanation. Have either of you studied enzymes? Have you looked at their structure? And watched how their conformation changes during a reaction? Have you looked at the particular amino acid interactions that are involved? And how one amino acid change can destabilize the whole thing? It’s not sticky outy stick on toys.I have told you why why I think it’s wrong, why it doesn’t make sense. 10^-40 OK 10^ -10 no way. Not for a true enzyme. BTW the paper I have been waiting on are about the soluble abzyme story. I sent you the citations Josh.And as for the subtext. The fact that abzymes are so common and so easy to get means to me that they don’t represent anything like what is necessary for a cell to work. And I still wonder why I was still alive if they’re so common.

Nope. Not at all. I have no reason or motivation to discredit Doug. I’ve just been inviting him into dialogue. The fact he does not show up, well that does seem to create some problems for his credibility. It need not be this way of course.

Regarding Doug’s study, however, it has several problems. As we have already discussed ad infinitum.

You are joking right? That is a large portion of my research. Of course I have done this.

You don’t have to agree with me, or go through it. I’ll step out of the conversation. Work it out with @mercer and @art if you like. The claims Doug is making just do not add up.


His is an extreme outlier. There’s simply no excuse for his, or your, failure to cite the studies that disagree with his extrapolation.

No, it wasn’t, because there was no systematic exploration of sequence space, nor were there any assays of enzyme function.

That would be the prevalence of function in random sequence space, much more direct than Doug’s sloppy study. There’s no mimicry there, btw.

You appear to be desperately avoiding the evidence to do so.

We’ve been through this already. I study myosins, remember? They are enzymes, remember?

Yes, very much. Have you?

More often, how it doesn’t. I’ve studied many naturally-occurring amino-acid changes that are only sometimes pathogenic.

As for stability, for the tropomyosins in the second paper, the changes made tropomyosin MORE stable, negatively affecting function. That alone contradicts most of what you’ve said about stability.

Gangadharan B, Sunitha MS, Mukherjee S, Chowdhury RR, Haque F, Sekar N, Sowdhamini R, Spudich JA, Mercer JA. Molecular mechanisms and structural features of cardiomyopathy-causing troponin T mutants in the tropomyosin overlap region.
Proc Natl Acad Sci U S A. 2017 Oct 17;114(42):11115-11120. doi:
10.1073/pnas.1710354114. Epub 2017 Oct 2. PubMed PMID: 28973951; PubMed Central
PMCID: PMC5651771.

Gupte TM, Haque F, Gangadharan B, Sunitha MS, Mukherjee S, Anandhan S, Rani DS, Mukundan N, Jambekar A, Thangaraj K, Sowdhamini R, Sommese RF, Nag S, Spudich JA, Mercer JA. Mechanistic heterogeneity in contractile properties of α-tropomyosin (TPM1) mutants associated with inherited cardiomyopathies.
J Biol Chem. 2015 Mar 13;290(11):7003-15. doi: 10.1074/jbc.M114.596676. Epub 2014 Dec
29. PubMed PMID: 25548289; PubMed Central PMCID: PMC4358124.

Have you done anything as enzymatically and structurally rigorous as the work in these two papers?

Your simplistic “no way” denial does not tell us why. Your pretense that there is a 1-to-1 correspondence between antibodies and epitopes was absurd.

And that leads to an obvious hypothesis that you’ve shown zero interest in testing: if we remove the Ig constraint, will they get better or not? If you’re right, we can test a lot of them and NONE will.

1 Like

There’s no bright white line separating specific from nonspecific.

Most of catalysis is binding. Again, there’s a testable hypothesis there.

1 Like

5 posts were split to a new topic: How to Invite ID Into Conversation

We must also remember that an antibody is most often recognizing a very small piece of the larger protein. Peptides as short as 10 amino acids can serve as antigens as demonstrated by the specificity of monoclonal antibodies. In fact, I have had animals immunized with short peptides from a larger protein and successfully recovered specific antibody. Since such small features are antigenic there is also the potential for cross-reactivity with other proteins. This may be the driving factor in some autoimmune diseases, such as rheumatic fever, where antibodies produced against a pathogen end up binding to self molecules.

Indeed. We think Axe may be the one who has been overlooking the possibilities.


No. Unless these protein shapes are repeated over and over in the protein universe, there would be no way that 1 in 10^9 or fewer would bind B-lactamase. Did they look for cross-reactivity? Do these same antibodies bind other proteins?

1 Like

It seems to me the whole basis for the ability of the adaptive immune system is to randomly mutate the hypervariable region of antibodies so as to constitute a mechanism for enhancing some initially low level of binding activity.
That implies the “low level” activity regions of many protein binding and catalytic functions are actually a lot more frequent than you suspect, and can be incrementally enhanced by mutation and selection from those low levels towards highly specific and effective catalysis and binding. Perhaps they’re not not strong enough to provide immediate effective immunity, but frequent enough to provide the basis for further improvement through mutation and selection?

If these binding pockets needed to be so ultra-specific and rare as you apparently think, one wonders how the immune system could even work through a stochastic mutation and selection process. There has to be some low level of association between antibodies and their millions of potential novel antigens, to provide some basis for triggering that cascade of further adaptation.

What is the alternative here? If such selectable functions were really only found at a rate of 1 in 10^77, then even if the immune system actively carried around on the order of 10^20 substantially different antibodies, that would still be over fifty orders of magnitude less than required to have any realistic hope of attacking even one novel antigen.

The only sensible picture here is that many of these functions, at least at a low level from which they can be adaptively enhanced, are much more frequent than you think, and that many of them have some non-neglible degree of “overlap” in sequence space.


And yet they did. If your model doesn’t match up to reality, it is the model that is wrong, not reality.



I am willing to grant some of what you said. There has to be a fair amount of non-specificity to the naive antibodies so as to increase the number of things they can bind to and then adapt to. I’ll grant that. But—of the library they screened, 60% of the clones they obtained were from non-immunized animals. There had been no optimization…

So we have two absurd numbers. Absurdly high estimates for the presence of a beta-lactamase catalytic antibody out of a repertoire of 10^8, 1 in 10-20 are beta lactamase. Then you say, but the antibody repertoire cannot cover 10^77 epitopes. Yes. They don’t. They only cover some of them. The set is different with each person. But 10^77 is probably too great. So take a look at the other numbers. Dou discusses them in his paper which I have excerpted here. Note: none are anywhere near 10^-77, but neither are they 10^-10. At the end of his paper he does a calculation i

From Doug’s paper.

His intro explains the method used, their difficulties, and what results have been. You will see a range of results intermediate between 10^-77 and 1^-10. At the end of his paper, which I didn’t copy, he looks for ways to make the numbers more or less comparable. Also note he gives a range. That also seems to get lost. when people talk about the paper.

Every quantifiable function that can be performed

by proteins has a definite mapping onto

the conceptual space representing all protein

sequences. What can be discovered about these

functional maps? Although the immense size of

sequence space greatly limits the utility of direct

experimental exploration, the sparse sampling that

is feasible ought to be of use in addressing the most

basic question of the overall prevalence of function.

Progress on this front will both enhance our

understanding of how new functional proteins

arise naturally and inform our approach to generating

them artificially.

This is a difficult problem to approach experimentally,

however, and no clear picture has yet

emerged. A number of studies have suggested that

functional sequences are not extraordinarily rare,1–5

while others have suggested that they are.6–9 One of

two approaches is typically used in these studies.

The first, which could be termed the forward

approach, involves producing a large collection of

sequences with no specified resemblance to known

functional sequences and searching either for

function or for properties generally associated

with functional proteins. If the relevant sort of

properties can be found among more or less

random sequences, this provides a direct demonstration

of their prevalence. The second approach

works in reverse from an existing functional

sequence. Here, the question is how much randomization

a sequence known to have the relevant

sort of function can withstand without losing that


Although both approaches have provided

important insights, they may have drawbacks that

contribute to the apparent discrepancies. The

forward approach has not produced a sequence

with properties that place it unequivocally among

natural functional sequences. Whether the properties

that have been found (e.g. proteolytic stability10

or cooperative denaturation1 ) actually warrant such

placement therefore remains an open question. On

the other hand, because the reverse approach starts

with a sequence that is not just functional but often

nearly optimal, it may fail to take account of

sequences having the relevant functional properties

in a very rudimentary form. Also the difficulty of

taking proper account of sequence context presents

itself when natural proteins are studied by making

one or a few substitutions at a time.8 Substitutions

found to be functionally tolerable in such experiments

might be tolerable only because the vast

majority of the protein remains untouched.11

In light of these difficulties, an important first step

in the present study is to consider carefully what we

mean by function in the first place. Different

answers to this may well lead to different experimental

approaches and different conclusions, each

valid when properly understood. The focus here

will be upon enzymatic function, by which we

mean not mere catalytic activity but rather catalysis

that is mechanistically enzyme-like, requiring an

active site with definite geometry (at least during

chemical conversion) by which particular sidechains

make specific contributions to the overall

catalytic process. The focus, then, will be on mode

of catalysis rather than rate. The justification for this

is that there is a clear connection between active-site

formation and protein folding, in that active sites

generally require the local positioning of multiple

side-chains that are dispersed in the sequence.

Something akin to tertiary structure, however

crude, must therefore emerge in working form

before natural selection can begin the process of

refining a new fold. By assessing the difficulty of

achieving the sort of structure needed to form a

working active site, we therefore gain insight into a

critical step in the emergence of new protein folds.

How might the other difficulties be avoided? A

recent study of the requirements for chorismate

mutase function in vivo demonstrates a promising

approach.9 Chorismate mutase gene libraries prepared

in that work were constrained to preserve all

active-site residues and the sequential arrangement

of hydrophobic and hydrophilic side-chains present

in a natural version of the enzyme. Within these

constraints, though, specific residue assignments

were essentially random, resulting in numerous

disruptive changes throughout the encoded proteins.

This is an example of the reverse approach, in

that it uses a natural sequence as a starting point

but, because the produced variants carry extensive

disruption throughout the structure rather than just

local disruption, they provide reliable information

on the stringency of functional requirements. The

prevalence of functional chorismate mutases among

sequences carrying the specified hydropathic

pattern was estimated to be just one in 1024 .9

In view of the rarity of sequences carrying that

pattern (among all possible sequences) and the

relative simplicity of the chorismate mutase fold

(Figure 1 a), this result suggests that sequences

encoding working enzymes may generally be very

rare. Further exploration of this possibility should

address two points. First, it is important that

enzyme folds of more typical complexity be

examined. And second, since many different folds

might be comparably suited to any given enzymatic

function, it is important that we have some way to

factor this in. In other words, if the prevalence of

sequences performing a particular function enzymatically

is our primary interest, then our analysis

must not presume the necessity of any particular

fold.c .

As discussed in Introduction, the method applied

in the study of chorismate mutase by Taylor and coworkers

9 should provide a more accurate estimate

than the earlier l -repressor study. Their search for

functional chorismate mutases was restricted to

sequences matching the hydropathic pattern of a

natural version of the enzyme. So, bearing in mind

the difference between a single-sequence pattern

and a multi-sequence signature, their estimated

functional prevalence should be compared to the

estimated prevalence among signature-compliant

sequences in the present study. Scaling their figure

gives 10K40 for a 153 residue sequence (10–24(153/

93)Z 10K40 ). This is significantly larger in logarithmic

terms than the above estimate for the large

domain (10–64 ). However, in view of the difference

in fold complexity (Figure 1 ) and the fact that

pattern-based randomization is more restrictive

than a signature-based randomization, there is no

reason to think the two estimates are inconsistent. It

seems, rather, that a number of studies using the

reverse approach lead to a consistent picture in

which sequences with function clearly akin to that

of natural proteins are extremely rare, the

Look at other studies. using similar methods

  1. Davidson, A. R., Lumb, K. J. & Sauer, R. T. (1995).

Cooperatively folded proteins in random sequence

libraries. Nature Struct. Biol. 2, 856–863.

  1. Axe, D. D., Foster, N.W. & Fersht, A. R. (1996). Active

barnase variants with completely random hydrophobic

cores. Proc. Natl Acad. Sci. USA, 93, 5590–5594

see also pp. 7157–7166.

  1. Keefe, A. D. & Szostak, J. W. (2001). Functional

proteins from a random-sequence library. Nature, 410,


  1. Yamouchi, A., Nakashima, T., Tokuriki, N.,

Hosokawa, M., Nogamai, H., Arioka, S. et al. (2002).

Evolvability of random polypeptides through functional

selection within a small library. Protein Eng. 15,


  1. Hayashi, Y., Sakata, H., Makino, Y., Urabe, I. & Yomo,

T. (2003). Can an arbitrary sequence evolve towards

acquiring a biological function? J. Mol. Evol. 56,


  1. Yockey, H. P. (1977). On the information content of

cytochrome c. J. Theoret. Biol. 67, 345–376.

  1. Reidhaar-Olson, J. F. & Sauer, R. T. (1990). Functionally

acceptable substitutions in two a-helical regions

of l repressor. Proteins: Struct. Funct. Genet. 7, 306–316.

  1. Axe, D. D. (2000). Extreme functional sensitivity to

conservative amino acid changes on enzyme

exteriors. J. Mol. Biol. 301, 585–596.

  1. Taylor, S. V.,Walter, K. U., Kast, P. & Hilvert, D. (2001).

Searching sequence space for protein catalysts. Proc.

Natl Acad. Sci. USA, 98, 10596–10601.

  1. Davidson, A. R. & Sauer, R. T. (1994). Folded proteins

occur frequently in libraries of random amino acid

sequences. Proc. Natl Acad. Sci. USA, 91, 2146–2150.

  1. Axe, D. D., Foster, N. W. & Fersht, A. R. (1998). A

search for single substitutions that eliminate enzymatic

function in a bacterial ribonuclease. Biochemistry,

37, 7157–7166.


At least at first glance, Doug’s paper would seem to predict that we shouldn’t find beta-lactamase in an antibody library of 10^8 to 10^10 B-cell clones. Therefore, if beta-lactamase is found in these libraries then it falsifies Doug’s model.

Where do you get this 1 in 10-20 being beta lactamases number from?