Gauger and Mercer: Bifunctional Proteins and Protein Sequence Space

Adding a few comments before sitting back and enjoying the exchange:

  1. I believe the sort of altered function John and his coworkers described may be relevant to the discussion about information, the Cambrian Explosion, and the like. Not in a specific sense, but it seems to be relevant to the issue of the evolution of proteins that might be expected to be important for morphological change.

  2. Interestingly enough, the research described by John and his coworkers contradicts an assertion Meyer made in our exchange at Biola back in 2010 - namely, that it is not possible to mutationally alter cytoskeleton components without destroying function.

  3. I guess I am not going to find out in this forum why Ann believes I do not understand Axe’s work. I will admit to being more than a bit disappointed. Oh well.

I apologize for the interjection. I am looking forward to learning more about John’s work in this discussion.

4 Likes

Addressing your points respectively,

  1. Exactly. That’s a major reason for bringing it up. My understanding of evolution has played an enormous role in my work, but I’ve never actually studied it.

  2. Meyer’s assertion is truly mind-bogglingly wrong, both from the work I’ve started to present here, and in my more recent work on inherited cardiomyopathies. The cardiomyopathies have the advantage of being natural, but the literature is so granular that it’s harder to get the big picture. The gross generalization is, for all but the most severe single-amino acid residue changes in cardiac cytoskeletal proteins, for every teenager who dies suddenly and tragically from cardiac arrest caused by hypertrophy, the half of her/his family members carrying the same disease allele will not have any problem for decades or will die from something else while never showing any symptoms. I suspect that there’s more tolerance for these mutations in the heart over skeletal muscle because of the plasticity of cardiac physiology. The bottom line is that cardiac cytoskeletal genes are pretty darn polymorphic, we knew that in 2010, and more so now.

  3. I found your analysis to be compelling, so I thought that adding a more empirical approach might stimulate more discussion. We’ll see…

2 Likes

I can verify that @Agauger wants to participate but is occupied with some more pressing concerns. In my experience with @Agauger, she is honest about things like this. This is not a dodge. I expect she will be up for engaging in the coming weeks. In the mean time, I request that we do not impute mal-intent on her.

I’m looking forward to the exchange too.

1 Like

@Agauger, great to see you back. @Mercer sent me a note, hoping to pick up the conversation here if you can.

1 Like

@Mung, @swamidass, @Art, @Mercer

I am about halfway through the boxes at this point. I hope to be able to put some time into this topic this weekend. But before I do that I have to apologize to @Mercer for having misread his paper long ago. I read it quickly and saw that he and his colleagues had change the active site to accommodate a modified substrate, and I then jumped to the conclusion that they were trying to create a protein that could be crystallized with the modified substrate in the active site. That is a standard thing that crystallographers do. The reason for doing it is that the modified substrate does not complete the reaction and remains in the active site, thus “gumming up the works”. But that was not the intent of your work. I did not read carefully enough. Sloppy on my part. But if you go back and read the context of the conversation you might understand.

I will return to the issue of what constitutes a significant change in function as I can this week. Let me just say that I have never disputed that some kinds of changes are possible with a single mutation, especially if it enlarges the binding pocket of the active site. Changing the activities of a promiscuous enzyme to favor one substrate over another is definitely possible. Getting a new chemistry that was not already present in any way is very much harder, and I believe beyond the reach of mutation and selection. Here’s where the debate begins.

My kitchen looks better than this now, after unpacking quite a few boxes today, but I still have no dishes or glasses. We did however find the can opener.

4 Likes

8 posts were split to a new topic: An Intelligently Designed Kitchen

Thanks @Agauger. I really appreciate your humility here. We all make mistakes too. Thanks for this.

2 Likes

Accepted. That is gracious of you.

That’s not really my point, though. It has nothing to do with a single mutation. It has nothing to do with favoring one substrate over another, which we never tested.

Before we did the experiment, I hypothesized that enlarging the active site would likely kill the enzymatic activity. We had a whole set of substitutions lined up to try, but the first, most radical one worked. If you and Doug Axe are right, that never should have happened. The same substitution also worked on a very different myosin.

My point is that function is far more prevalent in sequence space than you argue that it is. The point of my story above is that it was far more prevalent than I ever thought it would be. My experience completely changed my thinking. I would suggest that your misreading of my paper was facilitated by your prejudices.

I should also note that I have never seen the term “promiscuous” applied to any myosin activity, so I am puzzled by your use of it. Besides, subjective terms like “promiscuous” and “new chemistry” are too subjective to be very useful.

1 Like

There are objective meanings to “promiscuous,” in this context, meaning the opposite of specificity.

2 Likes

So are you saying that the myosins are enzymatically promiscuous?

I don’t understand why you think this is contrary to our argument. Did you modify myosin’s function to a new one (a different chemical reaction)?

1 Like

Not that I know of, but you would know more than I in this case.

@Mercer @Art

How does your experiment demonstrate this? It’s possible to have many different sequences encode proteins that fold into nearly identical folds, and have the same catalytic activity. They can differ substantially in sequence. Beta-lactamases for example. Changing one amino acid in myosin is nothing to that. Compared to the whole sea of possible sequences, that particular beta-lactamase activity was very rare, even though there are probably tens of thousands (or more) of sequences that can carry out the function. It’s the proportion between total possible sequences of a given length, and the number of them that perform a particular function that matter. That’s what Doug was measuring.

I’d like to recommend a paper by Doug Axe.
Axe DD (2010) The case against a Darwinian origin of protein folds. BIO-Complexity 2010(1):1-12. doi:10.5048/BIO-C.2010.1

I’ll quote here the section I think is relevant.

The proportion of protein sequences that perform specified functions.
One study focused on the AroQ-type chorismate mutase,
which is formed by the symmetrical association of two identical
93-residue chains [24]. These relatively small chains form a very
simple folded structure (Figure 5A). The other study examined
a 153-residue section of a 263-residue beta-lactamase [25]. That
section forms a compact structural component known as a domain
within the folded structure of the whole beta-lactamase (Figure
5B). Compared to the chorismate mutase, this beta-lactamase domain
has both larger size and a more complex fold structure.
In both studies, large sets of extensively mutated genes were
produced and tested. By placing suitable restrictions on the allowed
mutations and counting the proportion of working genes
that result, it was possible to estimate the expected prevalence of
working sequences for the hypothetical case where those restrictions
are lifted. In that way, prevalence values far too low to be
measured directly were estimated with reasonable confidence.
The results allow the average fraction of sampled amino acid
substitutions that are functionally acceptable at a single amino
acid position to be calculated. By raising this fraction to the power
ℓ, it is possible to estimate the overall fraction of working sequences
expected when ℓ positions are simultaneously substituted
(see reference 25 for details). Applying this approach to the data
from the chorismate mutase and the beta-lactamase experiments
gives a range of values (bracketed by the two cases) for the prevalence
of protein sequences that perform a specified function. The
reported range [25] is one in 10^77 (based on data from the more
complex beta-lactamase fold; ℓ = 153) to one in 10^53 (based on
the data from the simpler chorismate mutase fold, adjusted to the
same length: ℓ = 153). As remarkable as these figures are, particularly
when interpreted as probabilities, they were not without
precedent when reported [21, 22]. Rather, they strengthened an
existing case for thinking that even very simple protein folds can
place very severe constraints on sequence.
Rescaling the figures to reflect a more typical chain length of
300 residues gives a prevalence range of one in 10^151 to one in
10^104. On the one hand, this range confirms the very highly many-to-
one mapping of sequences to functions. The corresponding
range of m values is 10^239 (=20^300/10^151) to 10^286 (=20^300/10104),
meaning that vast numbers of viable sequence possibilities exist
for each protein function. But on the other hand it appears that
these functional sequences are nowhere near as common as they
would have to be in order for the sampling problem to be dismissed.
The shortfall is itself a staggering figure—some 80 to 127
orders of magnitude (comparing the above prevalence range to
the cutoff value of 1 in 5×10^23). So it appears that even when m
is taken into account, protein sequences that perform particular
functions are far too rare to be found by random sampling.

Sorry, you’ll have to look at the original to get the exponents and references right.

When I spoke of promiscuous enzymes I was not referring to myosin, but to the many cases in the literature where promiscuous enzymes have been pushed to favor one substrate over another. Sometimes this can be done by directed evolution, for example.

In your example, you made a single change to the binding pocket that allowed a modified substrate to bind. That’s got nothing to do with promiscuity. I guess I was trying to be general. I agree, my failure to read carefully was in part due to my prejudices, as you call them. Not a good thing, but I have already apologized, and I am certainly not the first scientist to do so.

1 Like

Ann, you know my response to Axe’s work. But, just for my own curiosity, how many of Axe’s beta-lactamase variants (the original crippled reference sequence and the various positive and negative variants) did he actually assay for activity? How active were these? What was the range of activities seen? What was the activity cut-off for the plating assay he used?

Besides the flaw in the strategy (this issue awaits a longer response from someone somewhere else in this forum), these issues weigh heavily on the conclusions you want to draw here. And they affect tremendously the numbers you and Axe toss around.

1 Like

Link to Axe paper cited by @Agauger above:

http://bio-complexity.org/ojs/index.php/main/article/viewFile/BIO-C.2010.1/BIO-C.2010.1

3 Likes

Another thought experiment, submitted to provoke discussion:

In a random DNA sequence, the “distance” between an ATG triplet and one of the three stop codons will be 21 codons, on average. (It’s a thought experiment, so I will take the liberty and keep this simple. I am full aware that this value is better represented as a range and will be affected by many, many variables, but the number 21 is a convenient ball-park value for this exercise.) In other words, a typical orphan gene arising from a random sequence is going to encode a 20 amino acid polypeptide.

Using Axe’s per-residue estimate for the “probability” of function, it is easy to calculate that (in round numbers), about 1 in 10^10 such peptides will possess enzymatic function. Moreover, if we grant that, say, 1 in 10^10 of all the cells on earth will, in a generation, “yield” a single new orphan (a pretty safe, in fact extremely conservative, estimate) owing to the spontaneous appearance of a transcription start site, then it is apparent that the probability of new function arising in the biosphere is, NOT 1 in 10^10, but essentially 1. (Again, I am keeping things simple. Some of the mathematicians reading this can revise my estimates, but the revisions won’t much affect the bottom line here.).

If we generalize Axe’s estimate to all possible enzyme activities (such as ID proponents are wont to do), then we see that any conceivable new function likely appears on almost a daily basis in the biosphere (that may have as many as 10^30 organisms at any one time).

3 Likes

@art it would be helpful to observers to work out that math step by step.

@Art
I think I understand the first part of your calculation. Because of the way that the genetic code is structured, on average a stop codon will occur once every 20 amino acids or so in a random sequence. That has in fact been the reason for excluding short peptide sequences as functional proteins in the past. It’s been assumed that anything 20 amino acids long is not going to be a true enzyme. Attitudes on this have changed recently, with many peptides having been shown to be functional.

Now for the second part. Axe’s calculation of the per residue probability had to do with conversion between protein sequences that varied in length and their measured probabilities. He wasn’t working with an estimate of short peptides. I don’t think your calculation can be extrapolated as you have.

1 Like

Sure. Remember, this is a thought experiment, not a comprehensive theoretical treatise…

Since 3 of the 64 codons are stop codons, a round number for the frequency of a stop codon in a random sequence will be once every 21 codons or so. (3 divided by 64, throw away the remainder since I don’t want to clutter things up…)

For Axe’s per-residue “probability”, he assumes a uniform average probability, and raises it it to the power reflecting the length of the protein. Thus, for his number of 10^-77, this comes out to somewhere in the 0.33 or so range per residue. (Again, this is rough, off the top of my head. Precision here doesn’t change the final point.)

Thus, for a 20-mer (that is the expected size of a newly-occurring random polypeptide), the “probability” or function is 0.33 raised to the 20th power. Again, using round numbers, about 1 in 10^10.

I figure on about 10^30 cells in the biosphere. I forget where I got this from, but includes all bacteria, and the volumes of oceans, etc., etc. I figure that at least 1 in 10^20 of these will see a mutation that will create a new transcriptions start site. I am too lazy to provide a citation for this, but I am pretty sure that, given the known rates of mutation and the relatively uncomplicated nature of promoters, such a mutation will probably occur at least once with every round of replication, even in a bacterial genome. Certainly, using a value which is, in essence, 1 in 10^-10 for such mutations is pretty conservative here. (As has been seen elsewhere in this discussion, a new gene arises basically via “creation” of a promoter.)

Thus, if a function arises once in every 10^10 new proteins, and we have 10^20 new proteins with each passing generation, the probability of getting a new function is essentially 1. (I won’t show my work here - hopefully, readers can at least grasp this.)

Again - THIS IS JUST A THOUGHT EXPERIMENT!!!. I want to provoke some general discussion about the approach that ID proponents take when they talk about function, accessibility, sequence space, and the like. Do not take these rough estimates any more seriously than needed to see where I am trying to lead things (or be led, as the case may be). Please.

2 Likes

@Art

I’d like to begin by first saying that Doug‘s work in the 2004 paper is independent of anything he and I did together. Our conclusions did not depend on the previous work. We were asking a different question.
However I will try to explain what I know about how the work was done, based on my own use of the technique for another project.

The lactamase variants that Doug used in his experiment were first a plasmid that had a deletion in the active site of beta-lactamase, and was not capable of carrying out the enzymatic reaction. Call it Delta. Then there was a second plasmid carrying a beta lactamase gene that had been heavily mutated almost to the point of no activity. Call it basal. Third was a plasmid carrying a wild type beta-lactamase gene (call it wt). The main negative control was the host bacterial strain lacking any plasmid at all. This established the level of antibiotic resistance that the strain itself had. This then was compared to cells carrying delta, which had a MIC ( minimal inhibitory concentration) of 5 µg per ml, and to cells with basal, which had a MIC of 10 µg per ml and to cells with wt, with a MIC of 6000 µg per ml. These numbers had to be recalibrated any time a new batch of naive cells were used, because the base line changed. But these are the numbers I observed in the lab using Doug’s plasmids and they are in pretty good agreement with his. Other controls: only freshly poured plates with fresh ampicillin, as its potency changes quickly, and exactly the same pouring technique and drying protocol. Our numbers were reproducible, and the difference between wild type and heavily mutated lactamase activity was about three orders of magnitude.

3 Likes