This next paper is a dense tour de force of molecular biology and biophysics. Here is a paragraph from the Introduction in which the authors explain the problem and the challenge (I removed references, numbering in the dozens, for readability):
Quantifying the effects of individual mutations or pairs of mutations provides little information about the genetic and energetic architecture of cores and surfaces because it only explores very local sequence space, revealing the outcome when one or two side chains are changed. Rather, what is needed are experiments at scale, in which the side chains of many positions are simultaneously changed in many different combinations — an approach referred to as combinatorial mutagenesis or core/surface randomization. However, to date, the number of alternative core and surface genotypes that have been experimentally characterized is extremely small for any protein. The lack of experimental data limits our understanding of genetic and energetic architecture and our ability to predict sequence evolution over large evolutionary distances.
Link at Science:
Genetics, energetics, and allostery in proteins with randomized cores and surfaces
Preprint version is dated May 2024, the day after the manuscript was submitted to Science, so this is the manuscript before peer review and revision:
https://www.biorxiv.org/content/10.1101/2024.05.11.593672v1.full
DM me for the PDF.
Structured abstract:
INTRODUCTION
Proteins typically contain hydrophobic amino acids buried in their cores and polar amino acids on their solvent-exposed surfaces. The rules governing which combinations of the 20 possible amino acids constitute stable and functional protein cores and surfaces are not well understood. This is partly because of the combinatorial explosion of possibilities when considering more than a few residues — experimental characterization of all combinations quickly becomes daunting.RATIONALE
To better understand the genetic architecture of protein cores and surfaces, we designed experiments in which we quantified the stability of tens of thousands of proteins with randomized cores and surfaces, using reduced amino acid alphabets to bias toward stable combinations. For proteins with randomized cores, we also quantified their ability to bind to a ligand through a surface binding interface.RESULTS
We found that very large numbers of proteins with randomized core or surface sequences are stable. However, we also observed that stable proteins with alternative core sequences quite frequently have impaired binding to a ligand; i.e., they are functionally impaired. We used our data to train energy models to accurately predict the stability and binding of proteins with randomized sequences. These models are simple and interpretable, with mutations having fixed additive energetic effects and a small contribution from energetic interactions between specific pairs of mutations. These energy models successfully identify the combinations of amino acids present in natural proteins that have evolved over more than a billion years, with only rare energetic interactions that we experimentally identify that prevent the transplantation of cores between highly diverged proteins.CONCLUSION
Our results show that vast numbers of amino acid combinations can replace the core or surface of a small protein and that both the stability and binding of these proteins with randomized sequences can be predicted with simple energy models. These models also identify amino acid combinations present in natural proteins. However, changing the core of a small protein frequently disrupts its ability to bind a ligand, presumably through changes in surface conformation or altered dynamics. Indirect “allosteric” effects of mutations may thus be an important influence on the evolution of protein sequences.
I’ll follow up tonight or tomorrow with an outline of the paper and some of my comments.