RumRaket,
I am willing to grant some of what you said. There has to be a fair amount of non-specificity to the naive antibodies so as to increase the number of things they can bind to and then adapt to. I’ll grant that. But—of the library they screened, 60% of the clones they obtained were from non-immunized animals. There had been no optimization…
So we have two absurd numbers. Absurdly high estimates for the presence of a beta-lactamase catalytic antibody out of a repertoire of 10^8, 1 in 10-20 are beta lactamase. Then you say, but the antibody repertoire cannot cover 10^77 epitopes. Yes. They don’t. They only cover some of them. The set is different with each person. But 10^77 is probably too great. So take a look at the other numbers. Dou discusses them in his paper which I have excerpted here. Note: none are anywhere near 10^-77, but neither are they 10^-10. At the end of his paper he does a calculation i
From Doug’s paper.
His intro explains the method used, their difficulties, and what results have been. You will see a range of results intermediate between 10^-77 and 1^-10. At the end of his paper, which I didn’t copy, he looks for ways to make the numbers more or less comparable. Also note he gives a range. That also seems to get lost. when people talk about the paper.
Every quantifiable function that can be performed
by proteins has a definite mapping onto
the conceptual space representing all protein
sequences. What can be discovered about these
functional maps? Although the immense size of
sequence space greatly limits the utility of direct
experimental exploration, the sparse sampling that
is feasible ought to be of use in addressing the most
basic question of the overall prevalence of function.
Progress on this front will both enhance our
understanding of how new functional proteins
arise naturally and inform our approach to generating
them artificially.
This is a difficult problem to approach experimentally,
however, and no clear picture has yet
emerged. A number of studies have suggested that
functional sequences are not extraordinarily rare,1–5
while others have suggested that they are.6–9 One of
two approaches is typically used in these studies.
The first, which could be termed the forward
approach, involves producing a large collection of
sequences with no specified resemblance to known
functional sequences and searching either for
function or for properties generally associated
with functional proteins. If the relevant sort of
properties can be found among more or less
random sequences, this provides a direct demonstration
of their prevalence. The second approach
works in reverse from an existing functional
sequence. Here, the question is how much randomization
a sequence known to have the relevant
sort of function can withstand without losing that
function.
Although both approaches have provided
important insights, they may have drawbacks that
contribute to the apparent discrepancies. The
forward approach has not produced a sequence
with properties that place it unequivocally among
natural functional sequences. Whether the properties
that have been found (e.g. proteolytic stability10
or cooperative denaturation1 ) actually warrant such
placement therefore remains an open question. On
the other hand, because the reverse approach starts
with a sequence that is not just functional but often
nearly optimal, it may fail to take account of
sequences having the relevant functional properties
in a very rudimentary form. Also the difficulty of
taking proper account of sequence context presents
itself when natural proteins are studied by making
one or a few substitutions at a time.8 Substitutions
found to be functionally tolerable in such experiments
might be tolerable only because the vast
majority of the protein remains untouched.11
In light of these difficulties, an important first step
in the present study is to consider carefully what we
mean by function in the first place. Different
answers to this may well lead to different experimental
approaches and different conclusions, each
valid when properly understood. The focus here
will be upon enzymatic function, by which we
mean not mere catalytic activity but rather catalysis
that is mechanistically enzyme-like, requiring an
active site with definite geometry (at least during
chemical conversion) by which particular sidechains
make specific contributions to the overall
catalytic process. The focus, then, will be on mode
of catalysis rather than rate. The justification for this
is that there is a clear connection between active-site
formation and protein folding, in that active sites
generally require the local positioning of multiple
side-chains that are dispersed in the sequence.
Something akin to tertiary structure, however
crude, must therefore emerge in working form
before natural selection can begin the process of
refining a new fold. By assessing the difficulty of
achieving the sort of structure needed to form a
working active site, we therefore gain insight into a
critical step in the emergence of new protein folds.
How might the other difficulties be avoided? A
recent study of the requirements for chorismate
mutase function in vivo demonstrates a promising
approach.9 Chorismate mutase gene libraries prepared
in that work were constrained to preserve all
active-site residues and the sequential arrangement
of hydrophobic and hydrophilic side-chains present
in a natural version of the enzyme. Within these
constraints, though, specific residue assignments
were essentially random, resulting in numerous
disruptive changes throughout the encoded proteins.
This is an example of the reverse approach, in
that it uses a natural sequence as a starting point
but, because the produced variants carry extensive
disruption throughout the structure rather than just
local disruption, they provide reliable information
on the stringency of functional requirements. The
prevalence of functional chorismate mutases among
sequences carrying the specified hydropathic
pattern was estimated to be just one in 1024 .9
In view of the rarity of sequences carrying that
pattern (among all possible sequences) and the
relative simplicity of the chorismate mutase fold
(Figure 1 a), this result suggests that sequences
encoding working enzymes may generally be very
rare. Further exploration of this possibility should
address two points. First, it is important that
enzyme folds of more typical complexity be
examined. And second, since many different folds
might be comparably suited to any given enzymatic
function, it is important that we have some way to
factor this in. In other words, if the prevalence of
sequences performing a particular function enzymatically
is our primary interest, then our analysis
must not presume the necessity of any particular
fold.c .
As discussed in Introduction, the method applied
in the study of chorismate mutase by Taylor and coworkers
9 should provide a more accurate estimate
than the earlier l -repressor study. Their search for
functional chorismate mutases was restricted to
sequences matching the hydropathic pattern of a
natural version of the enzyme. So, bearing in mind
the difference between a single-sequence pattern
and a multi-sequence signature, their estimated
functional prevalence should be compared to the
estimated prevalence among signature-compliant
sequences in the present study. Scaling their figure
gives 10K40 for a 153 residue sequence (10–24(153/
93)Z 10K40 ). This is significantly larger in logarithmic
terms than the above estimate for the large
domain (10–64 ). However, in view of the difference
in fold complexity (Figure 1 ) and the fact that
pattern-based randomization is more restrictive
than a signature-based randomization, there is no
reason to think the two estimates are inconsistent. It
seems, rather, that a number of studies using the
reverse approach lead to a consistent picture in
which sequences with function clearly akin to that
of natural proteins are extremely rare, the
Look at other studies. using similar methods
- Davidson, A. R., Lumb, K. J. & Sauer, R. T. (1995).
Cooperatively folded proteins in random sequence
libraries. Nature Struct. Biol. 2, 856–863.
- Axe, D. D., Foster, N.W. & Fersht, A. R. (1996). Active
barnase variants with completely random hydrophobic
cores. Proc. Natl Acad. Sci. USA, 93, 5590–5594
see also pp. 7157–7166.
- Keefe, A. D. & Szostak, J. W. (2001). Functional
proteins from a random-sequence library. Nature, 410,
715–718.
- Yamouchi, A., Nakashima, T., Tokuriki, N.,
Hosokawa, M., Nogamai, H., Arioka, S. et al. (2002).
Evolvability of random polypeptides through functional
selection within a small library. Protein Eng. 15,
619–626.
- Hayashi, Y., Sakata, H., Makino, Y., Urabe, I. & Yomo,
T. (2003). Can an arbitrary sequence evolve towards
acquiring a biological function? J. Mol. Evol. 56,
162–168.
- Yockey, H. P. (1977). On the information content of
cytochrome c. J. Theoret. Biol. 67, 345–376.
- Reidhaar-Olson, J. F. & Sauer, R. T. (1990). Functionally
acceptable substitutions in two a-helical regions
of l repressor. Proteins: Struct. Funct. Genet. 7, 306–316.
- Axe, D. D. (2000). Extreme functional sensitivity to
conservative amino acid changes on enzyme
exteriors. J. Mol. Biol. 301, 585–596.
- Taylor, S. V.,Walter, K. U., Kast, P. & Hilvert, D. (2001).
Searching sequence space for protein catalysts. Proc.
Natl Acad. Sci. USA, 98, 10596–10601.
- Davidson, A. R. & Sauer, R. T. (1994). Folded proteins
occur frequently in libraries of random amino acid
sequences. Proc. Natl Acad. Sci. USA, 91, 2146–2150.
- Axe, D. D., Foster, N. W. & Fersht, A. R. (1998). A
search for single substitutions that eliminate enzymatic
function in a bacterial ribonuclease. Biochemistry,
37, 7157–7166.