Impromptu Journal Club 2020-01-24

I took a little time to do some reading about protein sequence space and function. I learned something relevant to many of the discussions here, and so I thought I would share the links.

(1) I knew some of this already simple by exposure in multiple discussions, but I thought it was a good summary of what we can learn from directed evolution, and it’s simple enough for a statistician to understand it. :slight_smile:


(2) I’ve been tinkering with the mathematics of protein space on my own, but it seems that for practical reasons protein space isn’t as big as I thought it was.

Dryden, D. T., Thomson, A. R., & White, J. H. (2008). How much of protein sequence space has been explored by life on Earth?. Journal of The Royal Society Interface , 5 (25), 953-956.


(3) Some things evolution doesn’t do.

Vermeij, G. J. (2015). Forbidden phenotypes and the limits of evolution. Interface Focus , 5 (6), 20150028.

1 Like

I have some issues with the 2nd paper, which I think makes numerous oversimplifications to argue their case. For example the choice to reduce the amino acid alphabet to essentially two properties: Hydrophilic vs hydrophobic. It just isn’t true to say that is the only constraint operating on what makes a functional protein. What they’re arguing is more like saying that evolution has explored much of a sort of “structure space”, not really sequence space. But minor variations in structure really can have significant functional consequences.

The statement in the abstract that there is no role for contingency at the molecular level is also contradicted by numerous results of directed protein evolution (that show lots of epistasis), and ancestral sequence reconstruction, where it has been shown that later evolved functions were critically dependent on what mutations occurred ancestrally because they epistatically interact.

They are essentially saying that there are good reasons to think evolution could have found the structure and function of any known natural protein(and that re-running the tape from the origin of life, generally most of the same structures would be expected to evolve again, aka lots of molecular structural convergence), which I think is true, and for the same reasons they state. But this is just not the same as saying that all of protein sequence space has been explored by life on Earth.


Another good one I read recently is this:

Empirical Lessons from the Directed Evolution of Proteins

In this section, we offer what we consider to be some of the general lessons about protein adaptation that can be drawn from directed evolution experiments.

Many Desirable Protein Properties Can Be Improved Incrementally, Through Single Mutations.

Perhaps the most surprising result from directed evolution experiments is simply how effectively random mutation and selection are able to enhance target protein properties. In most cases where the researcher has been able to devise a high-throughput and sensitive screening assay, it has proved possible to find mutations that improve function (usually a catalytic activity or binding affinity). Directed evolution experiments naturally classify mutations as beneficial, neutral, or deleterious, depending on how they affect the target property. These studies tend to reach remarkably similar conclusions about the fractions of mutations that fall into each of these 3 classifications, despite applying different methodologies to different proteins to optimize different properties. Typically, ≈30–50% of random mutations are strongly deleterious (1719), 50–70% are approximately neutral (1719), and perhaps 0.5–0.01% are beneficial (2026). These experiments therefore make clear that, in a laboratory context, it is almost always possible to find a substantial number of neutral mutations and usually at least a few that enhance stability or an existing function.

Most cases where directed evolution fails to immediately find beneficial mutations come when the bar is set too high, such as searching for activity on a new substrate on which the parent protein is completely inert. Such functional jumps may simply be too big for single mutations. However, these functions can usually still be generated by taking a more incremental path, as in the case described above where a cytochrome P450 became a propane hydroxylase by first becoming an octane hydroxylase (16, 27). A similar approach of identifying appropriate intermediate challenges was used to engineer a steroid receptor to respond to a novel ligand (28). In both cases, the target activity was absent in the initial parent protein, making it refractory toward improvement by any single mutation. Selection on the intermediate substrates gave rise to low levels of the target activities, which were rapidly improved by beneficial single mutations.

These findings indicate that directed protein evolution can usually avoid being stymied by local fitness peaks, where no further incremental improvements are possible. Concern about becoming trapped on local optima probably comes from viewing evolution as occurring on a landscape created by assigning a fitness to each possible genotype. Although fitness landscapes are conceptually valid constructs, the mind effectively visualizes only 3D spaces, which are often reduced to 2 dimensions for ease of representation on paper. However, a 300-residue protein can undergo 5,700 unique single amino acid mutations, each of which represents a different direction on the fitness landscape. For a protein to occupy a peak in such a multidimensional landscape, a step in each of these directions must lead to a decrease in fitness, meaning that all 5,700 possible mutations are deleterious. In contrast, every protein evolved in the laboratory has many possible neutral mutations, and often several beneficial ones, at least as measured by a specific biochemical assay. It may therefore be more helpful to think of protein evolution in terms of neutral networks (29, 30) rather than in terms of fitness peaks (see Fig. 3). The key difference is that fitness peaks imply a need for multiple simultaneous mutations to escape from a trap, whereas the neutral network view emphasizes the availability of many possible evolutionary pathways, which may include initially neutral and immediately beneficial mutations.


Those simplifications bother me too. I was mostly interested in the effort to even try to answer the question.

I am aware of the multidimensional aspect, and that fitness landscapes are not necessarily fixed.

Fitness peaks are easier to consider mathematically, but seem less essential because evolution does not require an optimum. Fitness “valleys” are more complex because there are many starting points and an even greater volume of multidimensional pathways. To make an analogy, we can’t single-step across the Grand Canyon in 3D, but there might be a short path across in 300D.

@Rumraket Thanks for your comments, I’ll do some reading on Neutral Networks. :slight_smile:

1 Like

We just read that one in my little evolutionary reading group. Our conclusion was that it presented a number of interesting facts but that statements about missingness were hard to interpret since you can choose any traits you like and all traits are going to be missing in some lineages. More broadly, there wasn’t much in the way of a hypothesis to test.


HT: Paul Nelson

This relates to an ENCODE follow-on called 4D Nucleome.

Physical and data structure of 3D genome | Science Advances

Here, we show that the reconciliation of these exotic properties necessitates modularizing three-dimensional genome into tree data structures on top of, and in striking contrast to, the linear topology of DNA double helix. These functional modules need to be connected and isolated by an open backbone that results in porous and heterogeneous packing in a quasi–self-similar manner, as revealed by our electron and optical imaging. Our multiscale theoretical and experimental results suggest the existence of higher-order universal folding principles for a disordered chromatin fiber to avoid entanglement and fulfill its biological functions.

“Nucleome”: yet another example of Bad -Omics.

1 Like

“Nucleome”: yet another example of Bad -Omics.

Thank you for sharing your viewpoint.

But regarding the 4D Nucleome project at the NIH:

From the prestigious scientific journal Nature:

I love how you have to add “prestigious” to anything you agree with.


John, can you expand that (just a little) for me? I don’t understand what you think is bad here. - Thanks.

This might be instructive.

1 Like

Very! thanks!! :grinning:

An interesting snippet from the paper in the opening post:

We often talk about the possibility of very different proteins having the same function, and this is the case with propane hydroxylases.


Something my professor of Biomedical Science/Cellular Biology passed onto me:

Raising a glass to grapes' surprising genetic diversity: Could contribute to wine's varying flavors, aromas, researchers say -- ScienceDaily

Each of us inherits one copy of their gene from their mother and one from their father," said Professor Gaut. “One would assume that the grapes inherit two copies of every gene, too, with one coming from each of their two parents. However, we found there was just one copy, not two, for 15 percent of the genes in Chardonnay, and it was also true of Cabernet Sauvignon grapes. Together, that means that grape varieties differ in the presence or absence of thousands of genes.”

The original source journal article was:

This could be a consequence definitely of domestication, but such loss in the wild types have not been refuted or confirmed to my knowlege.

C. australis genome harbors 19,671 protein-coding genes, and importantly, 11.7% of the conserved orthologs in autotrophic plants are lost in C. australis . Many of these gene loss events likely result from its parasitic lifestyle and the massive changes of its body plan. Moreover, comparison of the gene expression patterns in Cuscuta prehaustoria/haustoria and various tissues of closely related autotrophic plants suggests that Cuscuta haustorium formation requires mostly genes normally involved in root development. The C. australis genome provides important resources for studying the evolution of parasitism, regressive evolution, and evo-devo in plant parasites.

Gene Loss from a Plant Sex Chromosome System

Sex chromosomes have evolved independently in numerous animal and plant lineages. After recombination becomes suppressed between two homologous sex chromosomes, genes on the non-recombining Y chromosomes (and W chromosomes in ZW systems) undergo genetic degeneration, losing functions retained by their X- or Z-linked homologs, changing their expression, and becoming lost [1, 2].

In plants like this, are there unique genes on the sex chromosomes. It would seem, Muller’s ratchet is in play for sex chromosomes.

Is there a point here? Is it relevant to anything?


… if there is one event in the whole evolutionary sequence at which my own mind lets my awe still overcome my instinct to analyse, and where I might concede that there may be a difficulty in seeing a Darwinian gradualism hold sway throughout almost all, it is this event—the initiation of meiosis.

W. J. Hamilton (1999, p. 419)

The paper puts forward speculative theories not based on physics, chemistry or probability as to how meiosis evolved naturally, but it was an interesting survey of what evolution has to overcome, and certainly something that can’t be proven to be mechanically feasibly by simply appealing to phylogenetic methods.

The paper lays out the problem:

While meiosis almost certainly evolved from mitosis, it has not one but four novel steps: the pairing of homologous chromosomes, the occurrence of extensive recombination between non-sister chromatids during pairing, the suppression of sister-chromatid separation during the first meiotic division, and the absence of chromosome replication during the second meiotic division. This complexity presents a challenge to any Darwinian explanation of meiotic origins. While the simultaneous creation of these new features in one step seems impossible, their step-by-step acquisition via selection of separate mutations seems highly problematic, given that the entire sequence is required for reliable production of haploid chromosome sets. Both Maynard Smith (1978) and Hamilton (1999) regarded the origins of meiosis as one of the most difficult evolutionary problems.