Impromptu Journal Club 2020-01-24

Dan_Eastwood · January 24, 2020, 11:39pm

I took a little time to do some reading about protein sequence space and function. I learned something relevant to many of the discussions here, and so I thought I would share the links.

(1) I knew some of this already simple by exposure in multiple discussions, but I thought it was a good summary of what we can learn from directed evolution, and it’s simple enough for a statistician to understand it.

Dan_Eastwood · January 24, 2020, 11:41pm

(2) I’ve been tinkering with the mathematics of protein space on my own, but it seems that for practical reasons protein space isn’t as big as I thought it was.

Dryden, D. T., Thomson, A. R., & White, J. H. (2008). How much of protein sequence space has been explored by life on Earth?. Journal of The Royal Society Interface , 5 (25), 953-956.

https://royalsocietypublishing.org/doi/full/10.1098/rsif.2008.0085

Dan_Eastwood · January 24, 2020, 11:44pm

(3) Some things evolution doesn’t do.

Vermeij, G. J. (2015). Forbidden phenotypes and the limits of evolution. Interface Focus , 5 (6), 20150028.

https://royalsocietypublishing.org/doi/full/10.1098/rsfs.2015.0028

Rumraket · January 25, 2020, 3:22pm

I have some issues with the 2nd paper, which I think makes numerous oversimplifications to argue their case. For example the choice to reduce the amino acid alphabet to essentially two properties: Hydrophilic vs hydrophobic. It just isn’t true to say that is the only constraint operating on what makes a functional protein. What they’re arguing is more like saying that evolution has explored much of a sort of “structure space”, not really sequence space. But minor variations in structure really can have significant functional consequences.

The statement in the abstract that there is no role for contingency at the molecular level is also contradicted by numerous results of directed protein evolution (that show lots of epistasis), and ancestral sequence reconstruction, where it has been shown that later evolved functions were critically dependent on what mutations occurred ancestrally because they epistatically interact.

They are essentially saying that there are good reasons to think evolution could have found the structure and function of any known natural protein(and that re-running the tape from the origin of life, generally most of the same structures would be expected to evolve again, aka lots of molecular structural convergence), which I think is true, and for the same reasons they state. But this is just not the same as saying that all of protein sequence space has been explored by life on Earth.

Rumraket · January 25, 2020, 3:35pm

Another good one I read recently is this:
https://www.pnas.org/content/106/Supplement_1/9995

Empirical Lessons from the Directed Evolution of Proteins

In this section, we offer what we consider to be some of the general lessons about protein adaptation that can be drawn from directed evolution experiments.

Many Desirable Protein Properties Can Be Improved Incrementally, Through Single Mutations.

Perhaps the most surprising result from directed evolution experiments is simply how effectively random mutation and selection are able to enhance target protein properties. In most cases where the researcher has been able to devise a high-throughput and sensitive screening assay, it has proved possible to find mutations that improve function (usually a catalytic activity or binding affinity). Directed evolution experiments naturally classify mutations as beneficial, neutral, or deleterious, depending on how they affect the target property. These studies tend to reach remarkably similar conclusions about the fractions of mutations that fall into each of these 3 classifications, despite applying different methodologies to different proteins to optimize different properties. Typically, ≈30–50% of random mutations are strongly deleterious (17 ⇓–19), 50–70% are approximately neutral (17 ⇓–19), and perhaps 0.5–0.01% are beneficial (20 ⇓⇓⇓⇓⇓–26). These experiments therefore make clear that, in a laboratory context, it is almost always possible to find a substantial number of neutral mutations and usually at least a few that enhance stability or an existing function.

Most cases where directed evolution fails to immediately find beneficial mutations come when the bar is set too high, such as searching for activity on a new substrate on which the parent protein is completely inert. Such functional jumps may simply be too big for single mutations. However, these functions can usually still be generated by taking a more incremental path, as in the case described above where a cytochrome P450 became a propane hydroxylase by first becoming an octane hydroxylase (16, 27). A similar approach of identifying appropriate intermediate challenges was used to engineer a steroid receptor to respond to a novel ligand (28). In both cases, the target activity was absent in the initial parent protein, making it refractory toward improvement by any single mutation. Selection on the intermediate substrates gave rise to low levels of the target activities, which were rapidly improved by beneficial single mutations.

These findings indicate that directed protein evolution can usually avoid being stymied by local fitness peaks, where no further incremental improvements are possible. Concern about becoming trapped on local optima probably comes from viewing evolution as occurring on a landscape created by assigning a fitness to each possible genotype. Although fitness landscapes are conceptually valid constructs, the mind effectively visualizes only 3D spaces, which are often reduced to 2 dimensions for ease of representation on paper. However, a 300-residue protein can undergo 5,700 unique single amino acid mutations, each of which represents a different direction on the fitness landscape. For a protein to occupy a peak in such a multidimensional landscape, a step in each of these directions must lead to a decrease in fitness, meaning that all 5,700 possible mutations are deleterious. In contrast, every protein evolved in the laboratory has many possible neutral mutations, and often several beneficial ones, at least as measured by a specific biochemical assay. It may therefore be more helpful to think of protein evolution in terms of neutral networks (29, 30) rather than in terms of fitness peaks (see Fig. 3). The key difference is that fitness peaks imply a need for multiple simultaneous mutations to escape from a trap, whereas the neutral network view emphasizes the availability of many possible evolutionary pathways, which may include initially neutral and immediately beneficial mutations.

Dan_Eastwood · January 27, 2020, 4:52pm

Those simplifications bother me too. I was mostly interested in the effort to even try to answer the question.

I am aware of the multidimensional aspect, and that fitness landscapes are not necessarily fixed.

Fitness peaks are easier to consider mathematically, but seem less essential because evolution does not require an optimum. Fitness “valleys” are more complex because there are many starting points and an even greater volume of multidimensional pathways. To make an analogy, we can’t single-step across the Grand Canyon in 3D, but there might be a short path across in 300D.

@Rumraket Thanks for your comments, I’ll do some reading on Neutral Networks.

glipsnort · January 27, 2020, 6:11pm

We just read that one in my little evolutionary reading group. Our conclusion was that it presented a number of interesting facts but that statements about missingness were hard to interpret since you can choose any traits you like and all traits are going to be missing in some lineages. More broadly, there wasn’t much in the way of a hypothesis to test.

stcordova · February 2, 2020, 6:53pm

HT: Paul Nelson

This relates to an ENCODE follow-on called 4D Nucleome.

https://advances.sciencemag.org/content/6/2/eaay4055

Here, we show that the reconciliation of these exotic properties necessitates modularizing three-dimensional genome into tree data structures on top of, and in striking contrast to, the linear topology of DNA double helix. These functional modules need to be connected and isolated by an open backbone that results in porous and heterogeneous packing in a quasi–self-similar manner, as revealed by our electron and optical imaging. Our multiscale theoretical and experimental results suggest the existence of higher-order universal folding principles for a disordered chromatin fiber to avoid entanglement and fulfill its biological functions.

John_Harshman · February 2, 2020, 8:46pm

“Nucleome”: yet another example of Bad -Omics.

stcordova · February 2, 2020, 11:27pm

“Nucleome”: yet another example of Bad -Omics.

Thank you for sharing your viewpoint.

But regarding the 4D Nucleome project at the NIH:

From the prestigious scientific journal Nature:
https://www.nature.com/articles/nature23884

John_Harshman · February 3, 2020, 1:39am

I love how you have to add “prestigious” to anything you agree with.

Dan_Eastwood · February 12, 2020, 11:00pm

John, can you expand that (just a little) for me? I don’t understand what you think is bad here. - Thanks.

John_Harshman · February 12, 2020, 11:49pm

This might be instructive.

Dan_Eastwood · February 13, 2020, 4:03pm

Very! thanks!!

T_aquaticus · February 13, 2020, 11:43pm

An interesting snippet from the paper in the opening post:

Hundreds of directed evolution experiments have demonstrated the ease with which proteins adapt to new challenges11. Notable recent examples include a recombinase evolved to remove proviral HIV from the host genome (providing a new strategy for treating retroviral infections)12, a cytochrome P450 fatty acid hydroxylase that was converted into a highly efficient propane hydroxylase (thereby proving that a cytochrome P450 is fully capable of hydroxylating small alkanes, even though most propane-utilizing organisms utilize structurally and mechanistically-unrelated enzymes)13, a more than 40 °C increase in the thermostability (T50) of lipase A (extending its application in biocatalysis to a whole new set of environments)14, and a variant of GFP which tolerates having all its leucine residues replaced with a nonnatural amino acid, trifluoroleucine15.

We often talk about the possibility of very different proteins having the same function, and this is the case with propane hydroxylases.

stcordova · February 17, 2020, 5:16am

Something my professor of Biomedical Science/Cellular Biology passed onto me:

Raising a glass to grapes' surprising genetic diversity | ScienceDaily

Each of us inherits one copy of their gene from their mother and one from their father," said Professor Gaut. “One would assume that the grapes inherit two copies of every gene, too, with one coming from each of their two parents. However, we found there was just one copy, not two, for 15 percent of the genes in Chardonnay, and it was also true of Cabernet Sauvignon grapes. Together, that means that grape varieties differ in the presence or absence of thousands of genes.”

The original source journal article was:

The population genetics of structural variants in grapevine domestication | Nature Plants

This could be a consequence definitely of domestication, but such loss in the wild types have not been refuted or confirmed to my knowlege.

stcordova · February 17, 2020, 5:45am

Large-scale gene losses underlie the genome evolution of parasitic plant Cuscuta australis | Nature Communications

C. australis genome harbors 19,671 protein-coding genes, and importantly, 11.7% of the conserved orthologs in autotrophic plants are lost in C. australis . Many of these gene loss events likely result from its parasitic lifestyle and the massive changes of its body plan. Moreover, comparison of the gene expression patterns in Cuscuta prehaustoria/haustoria and various tissues of closely related autotrophic plants suggests that Cuscuta haustorium formation requires mostly genes normally involved in root development. The C. australis genome provides important resources for studying the evolution of parasitism, regressive evolution, and evo-devo in plant parasites.

stcordova · February 17, 2020, 7:21am

https://core.ac.uk/download/pdf/82185921.pdf

Gene Loss from a Plant Sex Chromosome System

Sex chromosomes have evolved independently in numerous animal and plant lineages. After recombination becomes suppressed between two homologous sex chromosomes, genes on the non-recombining Y chromosomes (and W chromosomes in ZW systems) undergo genetic degeneration, losing functions retained by their X- or Z-linked homologs, changing their expression, and becoming lost [1, 2].

In plants like this, are there unique genes on the sex chromosomes. It would seem, Muller’s ratchet is in play for sex chromosomes.

John_Harshman · February 17, 2020, 4:40pm

Is there a point here? Is it relevant to anything?

stcordova · February 18, 2020, 3:13am

From:

https://www.genetics.org/content/181/1/3

… if there is one event in the whole evolutionary sequence at which my own mind lets my awe still overcome my instinct to analyse, and where I might concede that there may be a difficulty in seeing a Darwinian gradualism hold sway throughout almost all, it is this event—the initiation of meiosis.

W. J. Hamilton (1999, p. 419)

The paper puts forward speculative theories not based on physics, chemistry or probability as to how meiosis evolved naturally, but it was an interesting survey of what evolution has to overcome, and certainly something that can’t be proven to be mechanically feasibly by simply appealing to phylogenetic methods.

The paper lays out the problem:

While meiosis almost certainly evolved from mitosis, it has not one but four novel steps: the pairing of homologous chromosomes, the occurrence of extensive recombination between non-sister chromatids during pairing, the suppression of sister-chromatid separation during the first meiotic division, and the absence of chromosome replication during the second meiotic division. This complexity presents a challenge to any Darwinian explanation of meiotic origins. While the simultaneous creation of these new features in one step seems impossible, their step-by-step acquisition via selection of separate mutations seems highly problematic, given that the entire sequence is required for reliable production of haploid chromosome sets. Both Maynard Smith (1978) and Hamilton (1999) regarded the origins of meiosis as one of the most difficult evolutionary problems.

Topic		Replies	Views
Mercer's Work on Protein Function and Sequence Space Office Hours Design	5	768	June 19, 2021
Gauger and Mercer: Bifunctional Proteins and Protein Sequence Space Office Hours Design	188	6867	November 15, 2018
Functions are not so rare at all, and definitely not isolated, in sequence space of biopolymers Conversation Science	48	2426	July 19, 2021
Gpuccio: Functional Information Methodology Conversation Science , Design	183	11236	September 1, 2019
Miller: Axe Decisively Confirmed? Conversation Science , Design	31	4234	February 23, 2019

Impromptu Journal Club 2020-01-24

Empirical Lessons from the Directed Evolution of Proteins

Many Desirable Protein Properties Can Be Improved Incrementally, Through Single Mutations.

Related Topics