Antibody Enzymes and Sequence Space


(John Mercer) #141

Apologies, my mistake. The number of different V regions, not epitopes, is 10^8.

Not even close. Those 10^8 V regions are sufficient to recognize far more than 10^8 epitopes.

It’s not. There is no 1-to-1 correspondence between V regions and epitopes.

(S. Joshua Swamidass) #142

This is Behe’s term! I really did read his work you know :wink:. It is a cloroquine complexity cluster (CCC), which Behe says is 10^20 difficult to find, and estimates that every protein-protein interaction requires about one CCC.

[EDIT: Discussion on CCC needs development. It’s too late to get this straight.]

As @mercer just put it, Behe seems unaware that “proteins are sticky.” The more important challenge is that the antibody system demonstrates that protein-protien interactions are far easier to evolve.

(Ann Gauger) #143

@swamidass @Mercer
Of course proteins are sticky. We had to treat the glassware so they didn’t stick to the glass. But that kind of sticky is nonspecific. It doesn’t represent a substrate enzyme interaction or catalysis that’s not enzymatic. That’s sticky. That might be good enough for antibodies but not for enzymes. You can’t run a metabolism that way. Yeah.

(S. Joshua Swamidass) #144

The issue of proteins being sticky is important because that is how protein-protien interactions arise, not enzymes. That is a subtopic here, under the main topic. The specificity is baked into the antibody results too. It is not hard to evolve new protein interactions.

(Ann Gauger) #146

@swamidass You are not being reasonable. I know you are on a campaign to discredit Doug. I know you don’t approve of his having only done one trial. If it had been me I would’ve done more trials also. But the fact remains his is not the only study of enzyme function to show that it is very rare in sequence space. There are a number of studies that make that case and none of them are on the order of one in 10^10. Doug’s study was about enzyme function the rarity of enzyme function in sequence space. Yours is about how easy it is to get an antibody to mimic an enzyme. How does the Kcat/Km compare to a real enzyme? I am not defending Doug. I am rejecting what you were offering as an explanation. Have either of you studied enzymes? Have you looked at their structure? And watched how their conformation changes during a reaction? Have you looked at the particular amino acid interactions that are involved? And how one amino acid change can destabilize the whole thing? It’s not sticky outy stick on toys.I have told you why why I think it’s wrong, why it doesn’t make sense. 10^-40 OK 10^ -10 no way. Not for a true enzyme. BTW the paper I have been waiting on are about the soluble abzyme story. I sent you the citations Josh.And as for the subtext. The fact that abzymes are so common and so easy to get means to me that they don’t represent anything like what is necessary for a cell to work. And I still wonder why I was still alive if they’re so common.

(S. Joshua Swamidass) #147

Nope. Not at all. I have no reason or motivation to discredit Doug. I’ve just been inviting him into dialogue. The fact he does not show up, well that does seem to create some problems for his credibility. It need not be this way of course.

Regarding Doug’s study, however, it has several problems. As we have already discussed ad infinitum.

You are joking right? That is a large portion of my research. Of course I have done this.

You don’t have to agree with me, or go through it. I’ll step out of the conversation. Work it out with @mercer and @art if you like. The claims Doug is making just do not add up.

(John Mercer) #148

His is an extreme outlier. There’s simply no excuse for his, or your, failure to cite the studies that disagree with his extrapolation.

No, it wasn’t, because there was no systematic exploration of sequence space, nor were there any assays of enzyme function.

That would be the prevalence of function in random sequence space, much more direct than Doug’s sloppy study. There’s no mimicry there, btw.

You appear to be desperately avoiding the evidence to do so.

We’ve been through this already. I study myosins, remember? They are enzymes, remember?

Yes, very much. Have you?

More often, how it doesn’t. I’ve studied many naturally-occurring amino-acid changes that are only sometimes pathogenic.

As for stability, for the tropomyosins in the second paper, the changes made tropomyosin MORE stable, negatively affecting function. That alone contradicts most of what you’ve said about stability.

Gangadharan B, Sunitha MS, Mukherjee S, Chowdhury RR, Haque F, Sekar N, Sowdhamini R, Spudich JA, Mercer JA. Molecular mechanisms and structural features of cardiomyopathy-causing troponin T mutants in the tropomyosin overlap region.
Proc Natl Acad Sci U S A. 2017 Oct 17;114(42):11115-11120. doi:
10.1073/pnas.1710354114. Epub 2017 Oct 2. PubMed PMID: 28973951; PubMed Central
PMCID: PMC5651771.

Gupte TM, Haque F, Gangadharan B, Sunitha MS, Mukherjee S, Anandhan S, Rani DS, Mukundan N, Jambekar A, Thangaraj K, Sowdhamini R, Sommese RF, Nag S, Spudich JA, Mercer JA. Mechanistic heterogeneity in contractile properties of α-tropomyosin (TPM1) mutants associated with inherited cardiomyopathies.
J Biol Chem. 2015 Mar 13;290(11):7003-15. doi: 10.1074/jbc.M114.596676. Epub 2014 Dec
29. PubMed PMID: 25548289; PubMed Central PMCID: PMC4358124.

Have you done anything as enzymatically and structurally rigorous as the work in these two papers?

Your simplistic “no way” denial does not tell us why. Your pretense that there is a 1-to-1 correspondence between antibodies and epitopes was absurd.

And that leads to an obvious hypothesis that you’ve shown zero interest in testing: if we remove the Ig constraint, will they get better or not? If you’re right, we can test a lot of them and NONE will.

(John Mercer) #149

There’s no bright white line separating specific from nonspecific.

Most of catalysis is binding. Again, there’s a testable hypothesis there.

(S. Joshua Swamidass) split this topic #150

5 posts were split to a new topic: How to Invite ID Into Conversation


We must also remember that an antibody is most often recognizing a very small piece of the larger protein. Peptides as short as 10 amino acids can serve as antigens as demonstrated by the specificity of monoclonal antibodies. In fact, I have had animals immunized with short peptides from a larger protein and successfully recovered specific antibody. Since such small features are antigenic there is also the potential for cross-reactivity with other proteins. This may be the driving factor in some autoimmune diseases, such as rheumatic fever, where antibodies produced against a pathogen end up binding to self molecules.

Indeed. We think Axe may be the one who has been overlooking the possibilities.

(Ann Gauger) #152

No. Unless these protein shapes are repeated over and over in the protein universe, there would be no way that 1 in 10^9 or fewer would bind B-lactamase. Did they look for cross-reactivity? Do these same antibodies bind other proteins?

(Mikkel R.) #153

It seems to me the whole basis for the ability of the adaptive immune system is to randomly mutate the hypervariable region of antibodies so as to constitute a mechanism for enhancing some initially low level of binding activity.
That implies the “low level” activity regions of many protein binding and catalytic functions are actually a lot more frequent than you suspect, and can be incrementally enhanced by mutation and selection from those low levels towards highly specific and effective catalysis and binding. Perhaps they’re not not strong enough to provide immediate effective immunity, but frequent enough to provide the basis for further improvement through mutation and selection?

If these binding pockets needed to be so ultra-specific and rare as you apparently think, one wonders how the immune system could even work through a stochastic mutation and selection process. There has to be some low level of association between antibodies and their millions of potential novel antigens, to provide some basis for triggering that cascade of further adaptation.

What is the alternative here? If such selectable functions were really only found at a rate of 1 in 10^77, then even if the immune system actively carried around on the order of 10^20 substantially different antibodies, that would still be over fifty orders of magnitude less than required to have any realistic hope of attacking even one novel antigen.

The only sensible picture here is that many of these functions, at least at a low level from which they can be adaptively enhanced, are much more frequent than you think, and that many of them have some non-neglible degree of “overlap” in sequence space.


And yet they did. If your model doesn’t match up to reality, it is the model that is wrong, not reality.

(Ann Gauger) #155


I am willing to grant some of what you said. There has to be a fair amount of non-specificity to the naive antibodies so as to increase the number of things they can bind to and then adapt to. I’ll grant that. But—of the library they screened, 60% of the clones they obtained were from non-immunized animals. There had been no optimization…

So we have two absurd numbers. Absurdly high estimates for the presence of a beta-lactamase catalytic antibody out of a repertoire of 10^8, 1 in 10-20 are beta lactamase. Then you say, but the antibody repertoire cannot cover 10^77 epitopes. Yes. They don’t. They only cover some of them. The set is different with each person. But 10^77 is probably too great. So take a look at the other numbers. Dou discusses them in his paper which I have excerpted here. Note: none are anywhere near 10^-77, but neither are they 10^-10. At the end of his paper he does a calculation i

From Doug’s paper.

His intro explains the method used, their difficulties, and what results have been. You will see a range of results intermediate between 10^-77 and 1^-10. At the end of his paper, which I didn’t copy, he looks for ways to make the numbers more or less comparable. Also note he gives a range. That also seems to get lost. when people talk about the paper.

Every quantifiable function that can be performed

by proteins has a definite mapping onto

the conceptual space representing all protein

sequences. What can be discovered about these

functional maps? Although the immense size of

sequence space greatly limits the utility of direct

experimental exploration, the sparse sampling that

is feasible ought to be of use in addressing the most

basic question of the overall prevalence of function.

Progress on this front will both enhance our

understanding of how new functional proteins

arise naturally and inform our approach to generating

them artificially.

This is a difficult problem to approach experimentally,

however, and no clear picture has yet

emerged. A number of studies have suggested that

functional sequences are not extraordinarily rare,1–5

while others have suggested that they are.6–9 One of

two approaches is typically used in these studies.

The first, which could be termed the forward

approach, involves producing a large collection of

sequences with no specified resemblance to known

functional sequences and searching either for

function or for properties generally associated

with functional proteins. If the relevant sort of

properties can be found among more or less

random sequences, this provides a direct demonstration

of their prevalence. The second approach

works in reverse from an existing functional

sequence. Here, the question is how much randomization

a sequence known to have the relevant

sort of function can withstand without losing that


Although both approaches have provided

important insights, they may have drawbacks that

contribute to the apparent discrepancies. The

forward approach has not produced a sequence

with properties that place it unequivocally among

natural functional sequences. Whether the properties

that have been found (e.g. proteolytic stability10

or cooperative denaturation1 ) actually warrant such

placement therefore remains an open question. On

the other hand, because the reverse approach starts

with a sequence that is not just functional but often

nearly optimal, it may fail to take account of

sequences having the relevant functional properties

in a very rudimentary form. Also the difficulty of

taking proper account of sequence context presents

itself when natural proteins are studied by making

one or a few substitutions at a time.8 Substitutions

found to be functionally tolerable in such experiments

might be tolerable only because the vast

majority of the protein remains untouched.11

In light of these difficulties, an important first step

in the present study is to consider carefully what we

mean by function in the first place. Different

answers to this may well lead to different experimental

approaches and different conclusions, each

valid when properly understood. The focus here

will be upon enzymatic function, by which we

mean not mere catalytic activity but rather catalysis

that is mechanistically enzyme-like, requiring an

active site with definite geometry (at least during

chemical conversion) by which particular sidechains

make specific contributions to the overall

catalytic process. The focus, then, will be on mode

of catalysis rather than rate. The justification for this

is that there is a clear connection between active-site

formation and protein folding, in that active sites

generally require the local positioning of multiple

side-chains that are dispersed in the sequence.

Something akin to tertiary structure, however

crude, must therefore emerge in working form

before natural selection can begin the process of

refining a new fold. By assessing the difficulty of

achieving the sort of structure needed to form a

working active site, we therefore gain insight into a

critical step in the emergence of new protein folds.

How might the other difficulties be avoided? A

recent study of the requirements for chorismate

mutase function in vivo demonstrates a promising

approach.9 Chorismate mutase gene libraries prepared

in that work were constrained to preserve all

active-site residues and the sequential arrangement

of hydrophobic and hydrophilic side-chains present

in a natural version of the enzyme. Within these

constraints, though, specific residue assignments

were essentially random, resulting in numerous

disruptive changes throughout the encoded proteins.

This is an example of the reverse approach, in

that it uses a natural sequence as a starting point

but, because the produced variants carry extensive

disruption throughout the structure rather than just

local disruption, they provide reliable information

on the stringency of functional requirements. The

prevalence of functional chorismate mutases among

sequences carrying the specified hydropathic

pattern was estimated to be just one in 1024 .9

In view of the rarity of sequences carrying that

pattern (among all possible sequences) and the

relative simplicity of the chorismate mutase fold

(Figure 1 a), this result suggests that sequences

encoding working enzymes may generally be very

rare. Further exploration of this possibility should

address two points. First, it is important that

enzyme folds of more typical complexity be

examined. And second, since many different folds

might be comparably suited to any given enzymatic

function, it is important that we have some way to

factor this in. In other words, if the prevalence of

sequences performing a particular function enzymatically

is our primary interest, then our analysis

must not presume the necessity of any particular

fold.c .

As discussed in Introduction, the method applied

in the study of chorismate mutase by Taylor and coworkers

9 should provide a more accurate estimate

than the earlier l -repressor study. Their search for

functional chorismate mutases was restricted to

sequences matching the hydropathic pattern of a

natural version of the enzyme. So, bearing in mind

the difference between a single-sequence pattern

and a multi-sequence signature, their estimated

functional prevalence should be compared to the

estimated prevalence among signature-compliant

sequences in the present study. Scaling their figure

gives 10K40 for a 153 residue sequence (10–24(153/

93)Z 10K40 ). This is significantly larger in logarithmic

terms than the above estimate for the large

domain (10–64 ). However, in view of the difference

in fold complexity (Figure 1 ) and the fact that

pattern-based randomization is more restrictive

than a signature-based randomization, there is no

reason to think the two estimates are inconsistent. It

seems, rather, that a number of studies using the

reverse approach lead to a consistent picture in

which sequences with function clearly akin to that

of natural proteins are extremely rare, the

Look at other studies. using similar methods

  1. Davidson, A. R., Lumb, K. J. & Sauer, R. T. (1995).

Cooperatively folded proteins in random sequence

libraries. Nature Struct. Biol. 2, 856–863.

  1. Axe, D. D., Foster, N.W. & Fersht, A. R. (1996). Active

barnase variants with completely random hydrophobic

cores. Proc. Natl Acad. Sci. USA, 93, 5590–5594

see also pp. 7157–7166.

  1. Keefe, A. D. & Szostak, J. W. (2001). Functional

proteins from a random-sequence library. Nature, 410,


  1. Yamouchi, A., Nakashima, T., Tokuriki, N.,

Hosokawa, M., Nogamai, H., Arioka, S. et al. (2002).

Evolvability of random polypeptides through functional

selection within a small library. Protein Eng. 15,


  1. Hayashi, Y., Sakata, H., Makino, Y., Urabe, I. & Yomo,

T. (2003). Can an arbitrary sequence evolve towards

acquiring a biological function? J. Mol. Evol. 56,


  1. Yockey, H. P. (1977). On the information content of

cytochrome c. J. Theoret. Biol. 67, 345–376.

  1. Reidhaar-Olson, J. F. & Sauer, R. T. (1990). Functionally

acceptable substitutions in two a-helical regions

of l repressor. Proteins: Struct. Funct. Genet. 7, 306–316.

  1. Axe, D. D. (2000). Extreme functional sensitivity to

conservative amino acid changes on enzyme

exteriors. J. Mol. Biol. 301, 585–596.

  1. Taylor, S. V.,Walter, K. U., Kast, P. & Hilvert, D. (2001).

Searching sequence space for protein catalysts. Proc.

Natl Acad. Sci. USA, 98, 10596–10601.

  1. Davidson, A. R. & Sauer, R. T. (1994). Folded proteins

occur frequently in libraries of random amino acid

sequences. Proc. Natl Acad. Sci. USA, 91, 2146–2150.

  1. Axe, D. D., Foster, N. W. & Fersht, A. R. (1998). A

search for single substitutions that eliminate enzymatic

function in a bacterial ribonuclease. Biochemistry,

37, 7157–7166.


At least at first glance, Doug’s paper would seem to predict that we shouldn’t find beta-lactamase in an antibody library of 10^8 to 10^10 B-cell clones. Therefore, if beta-lactamase is found in these libraries then it falsifies Doug’s model.

(Mikkel R.) #157

Where do you get this 1 in 10-20 being beta lactamases number from?

(Arthur Hunt) #158

To remind everyone:

The goal of studies such as Axe’s may be posed as a problem in measuring the base of a hill that defines the functional landscape for a particular enzyme:

What Axe actually did was measure the base of a hill defined by a sequence with minimal putative activity:

We know that the “sides” of this hill are steep, as drawn, because Axe tells us as much. (The subject of his study was temperature sensitive, and thus had a much steeper profile than the original sequence.)

This helps to explain the discrepancy between Axe’s numbers and other results (catalytic antibodies, random combinatorial screens, T-urf13, etc.) when it comes to the supposed rarity of function in sequence space.

What has never been discussed or explored experimentally by anyone at the DI is the relationship between these two “hills”. Rather than think about, acknowledge, and explore this issue, Axe et al. have dug in their heels and denied what the collection of results discussed here and elsewhere say about this subject. (As @Mercer would say, there is an exciting hypothesis here, and the outlines of interesting experiments that would contribute to the general field of protein structure and function.)

(Ann Gauger) #159

@Art @Mercer

If you’re so excited about it go do it or persuade on of these young ones to. But they don’t seem to see if an experimental value makes no sense biologically, that means there is something more going on biologically than is recognized. @Mercer says ther are over 5000 papers on the subject which he knows so well. Fine. I bring papers, not guessing games. If there is a paper that shows you can find nearly anyprotein at all represented in a naive library, I will wonder. But it still wont solve the problem. How many proteins are currently known-- any one know? How many estimated to exist? 10^9 or fewer?

(Ann Gauger) #160

To everyone–
I am signing off. I won’t respond to more posts. Try thinking about something besides the antibody 10^9 thing. There is something strange going on. Otherwise auto immune disease woul be much worse than it is.

(S. Joshua Swamidass) #161

@Agauger there is a process for screening out self-reactive antibodies. That makes this objection incorrect.