Lessons from the pandemic: A new look at an new virus: patterns of mutation accumulation in SARS-CoV-2 since 2019

Ah okay, makes sense.

That isn’t an inference. That is made up. There is absolutely no reason why ATP binding can be extrapolated to other molecules.

Not only that, but we can find multiple B-cell clones that can bind any given molecule in a total B-cell library of a few billion. This is exactly what we see in this experiment:

They found 5 antibodies that cleaved beta-lactamase among 3E9 antibody clones, or 1 in 6E8. That’s three orders of magnitude lower than the number you are giving us.


With a molecule of a similar size and complexity of ATP, given the same physiological conditions etc. then as long as these constraints are held near equal enough, I don’t see why you couldn’t do that to a rough approximation.
The Szostak lab found similar results for the frequency of GTP binding proteins, and they even found that the frequency of RNA polymers able to bind ATP was similar to the frequency of ATP binding proteins. That seems to indicate there’s some similar physical explanation at work, like maybe total surface area and number of contact points and so on that would give similar level of constraints.

I think instead the extrapolation quickly becomes dubious the more the ligands are different from each other, and the more you change the relevant physiological conditions. We know for example that lowering the temperature will enable many more otherwise too weak contacts to stabilize, shown easily with a simple PCR primer melting temperature experiment.

Yes, but that’s not the frequency of picking sequences randomly from all of sequence space, instead of picking just variants of an already evolved protein. That’s the frequency within the repertoire of valid antibodies. That’s not the same thing at all. You can’t say that picking among a population of antibody variants is equivalent to picking among a totally random sample of all of sequence space.

Already evolved proteins are really an excellent substrate for further protein evolution and functional exploration, but you can’t extrapolate the frequencies of a particular function you find by sampling variants of particular family of proteins, so say that the same frequency of function must exist for all of sequence space more generally. There really is a huge difference between the two.

Sure you can. The only qualification is that sequences that are incompatible with immunoglobulin structure are excluded.

No, it doesn’t make sense.

Consider the following scenario: First you select blindly from a large unbiased library of sequences sampled randomly from all of sequence space, until you find a decent ATP binding protein.

Now you go on to another function, binding of ITP.

You select from two different libraries A and B this time. Library A consists of a huge number of variants of the ATP binding protein, and library B is similar to the first one that is just a random sampling from all of protein sequence space.

Which one has the highest frequency of ITP binding proteins?

For triphosphates it might hold. I don’t see why this would apply to a much wider array of molecules.

The variable region within an antibody is not already evolved. I will agree that comparing an antibody to a completely random 80 residue peptide may not be comparable, but the variable region is still random within the context of V(D)J recombination.

Antibodies allow us to ask what would happen if we randomized the sequence within an exposed portion of a protein. Given the overall lack of function around the region of the antibody variable region, I think this is a pretty good test of how function can be found. This is a bit different than a handful of mutations within the active site of an existing enzyme, as one example.

As long as they are of a similar size, complexity, and have similar chemical properties there are some basic geometric principles at work. They have roughly similar surface area, number of potential contact points, numbers of and types of functional groups and so on. That is after all the reason why we should be able to generalize from triphosphates at least. They are similar enough that binding them have similar types and levels of constraint.

But as a corollary, when ligand molecules become increasingly different from each other the less we can extrapolate from one to the other.

For any antibody that goes on to be presented on some immune cell, it has to be to a certain extend. At the very least mutations that cause the b-cell to recognizes self are screened for and selected against.
But even so, the issue is not only about the variable regions of course. It is the fact that the function of the variable regions are context specific, and that interactions within and between other parts of the antibody molecule can compensate for potentially deleterious effects on antibody function that mutations in the variable region could happen to have.

Of course, the variable regions are themselves not completely scrambled to the level where they have no more sequence similarity to their ancestors than expected by a chance sampling of sequence space, as a product of SHM and VDJ recombination. They are still clearly variants of their ancestors.

That is not in dispute. I think the central issue here is that there are functional and structural contexts in already existing and functional proteins that raise the frequency of functionally productive mutations way above what you expect from a totally blind and random sampling of sequence space.

Sampling mutants into the vicinity of an antibody protein sequence, in the variable regions, because it is in some sense “buffered” and aided by the larger context of the antibody molecule’s structure, is more likely to produce other productive molecules, than totally blindly sampling random sequences of a similar size to an entire antibody.

Keefe and Szostak disagree with you for they precisely extrapolate their observation with ATP to other molecules. To see this, here is how they conclude their paper:
« In conclusion, we suggest that functional proteins are sufficiently common in protein sequence space (roughly 1 in 1011) that they may be discovered by entirely stochastic means, such as presumably operated when proteins were first used by living organisms. However, this frequency is still low enough to emphasize the magnitude of the problem faced by those attempting de novo protein design. »

So what?

So what you are saying is that the odd of finding an antibody exhibiting a given specificity in a naive repertoire is much higher than the odd of finding this specificity in a pool of random polypeptides. But this is exactly what I’ve argued here. Let me cite myself:
« Take any molecular entity exhibiting about the same size/complexity than ATP. What can be inferred from the paper by Keefe and Szostak is that roughly 1 in 10^11 of all random 80 aa long peptides will have the ability to bind this particular molecular entity, not only ATP . This frequency (1 in 10^11) must be contrasted with the frequency of antibodies exhibiting a given specificity in a naive repertoire, which, at roughly 1 in 10^8, is about 1000 times more frequent. So it is clearly not the case that « there is nothing to indicate that any individual VDJ segment is closer in sequence space to matching a never before encountered antigen epitope than just any sequence pulled at random from sequence space ».

Yes, it does, because I was referring to primary selection, not the secondary selection of antibodies you describe.

No, they don’t. You removed all the precision.

No, we should see what they were careful to precisely note earlier and you omitted:

The four MBP fusion proteins have dissociation constants (Kd) for ATP that fall within the range 100 nM to 10 µM at 4°C, as determined by equilibrium dialysis with [α-32P] ATP. Of these, protein 18–19 binds most strongly, with a K d of 100 nM for ATP at 4 °C and a 1:1 stoichiometry.

Because binding is not a binary function as you are falsely portraying it to be, your comparisons are meaningless unless they include affinity.

No, no one is saying anything about antibody specificity. You made that up. Please stop.

It’s not a language issue, as these are all cognates between French and English.

We brought up antibody function as an enzyme.

Again, there’s no specificity comparison available. Please stop using terms you don’t understand.

What we are saying is that function is easy to find.

Gil, if you can’t answer my questions in seconds:

You don’t understand this at all.

Note that they precisely qualified it with affinity earlier than the conclusion. Note that you didn’t. Why?

Why would that make a difference?

No, it doesn’t make sense, for in either case, be it primary or secondary selection of antibodies, Rum’s claim holds true.

I would say that if they had eluted with 0,5mM ATP instead of 5mM, they would have selected only the less potent binders. If they had eluted with 15mM ATP instead of 5mM, they would have selected the weak, the strong and the intermediate binders. So, as a matter of facts, the higher the concentration of ATP used for elution, the higher the frequency of binders.

No. The structure of the V region is loosely like a pair of lips. Therefore, for binding some molecules, that structural constraint would make it more likely to get binding, but for others it will make it less likely.

Right, so the number of hits is a direct function of the binding and elution conditions used.

So that means that every time you falsely treated binding as binary above, you were actually aware that binding is a continuous function. Why did you do that?

Okay, strictly speaking, you are right, specificity and affinity are not the same thing. But note that a high specificity is often accompanied by a high affinity. Hence “high affinity” and “high specificity” are frequently used in an interchangeable manner.

I’ve never treated binding as binary, so your comment is irrelevant.

Thanks. They are VERY different.

How often? Provide some data here. I think you just made that up.

Not by anyone who expects to be taken seriously.

You did. See below:

That claim is meaningless because it treats binding as binary. If you were not assuming that, you would have stipulated binding affinities (and/or specificities).

So, there’s still your claim that you have been dancing around:

1 Like

They are different concept, yes, but in general, high affinity correlates pretty well with high specificity.

No, Gil. That’s true for small-molecule candidate screens, the context of the paper you cited (but apparently didn’t read), The subject here is the converse and its evolution.

Please just stop making things up.

1 Like

Uh, yes. What he’s saying is both correct, and obviously so. Obviously specificity and affinity would strongly correlate even though they wouldn’t always track exactly, both for small and large molecules, because the same basic principles are at work: Surface topology and charge complementarity. Increasing the size of the target molecule isn’t somehow magically going to make those no longer be the fundamental principles of affinity and specificity.

I don’t know what’s going on with you but I want you to stop saying false things in what appears to be disconcertingly knee-jerk dismissals of everything uttered by people who are ID-proponents. It is possible to show that ID is wrong and misguided without saying something false, or denying something true.

1 Like

If it’s so obvious, you’d be providing data.

Less obviously, charge heterogeneities over those topologies are a big driver of differences between specificity and affinity. If one molecule has an overwhelmingly positive surface and the other a negative surface, you’ve going to get awesome affinity with no real specificity.

The difference between affinity and specificity is incredibly important in diagnostics: because of it, one can basically never develop an ideal test. There are always tradeoffs between sensitivity (driven by affinity, minimizing false-negatives) and specificity (minimizing false-positives). The tests for serum antibodies against the SARS-CoV-2 are an example of that, intelligently designed at the latter extreme, because a false-positives influence patient behavior in a much more dangerous way.

And there is a big difference if we’re looking at whether large vs. small molecules, as proteins tend to bind other molecules with multiple short regions. Different regions often have dramatically different effects on affinity vs. specificity. If we’re looking at small molecules binding to large ones, affinity is better correlated with specificity.

This difference is important in evolution and drug screening, because distinguishing between homologs is often important. It’s the primary reason why all drugs have side effects. In evolution, high affinity is generally avoided; most physiological interactions have low affinities and protein aggregation (prions, Alzheimer’s disease) can be fatal.

This difference is important in the immune response, because distinguishing between self and nonself (specificity) is always critical. Failure produces autoimmune disease. Would you be surprised if Omicron and future strains produce more autoimmune disease in association with long COVID, because on some level they’re mimicking a host antigen to escape?

Last I checked, COVID was the subject here.

And even in the context of the Szostak paper, specificity is much harder to measure than affinity. That’s why they only looked at one of their hits. And you also might note that while Gil was claiming that they were measuring specificity by their number of hits, he was conveniently ignoring their careful specification of affinity so that he could compare noncomparable frequencies.

So, yes, affinity and specificity often go together, but less so when we (or evolution) push them to their limits. Simplistically equivocating between them can be fatal.


The nays still have it Rum.

You forget that a lot of molecular recognition phenomena in vivo is flexible and this substantially diminishes the supposed positive correlation between specificity and affinity in those cases.

1 Like