Came across this extremely interesting paper on the biorxiv:
Engineers routinely design systems to be modular and symmetric in order to increase robustness to perturbations and to facilitate alterations at a later date. Biological structures also frequently exhibit modularity and symmetry, but the origin of such trends is much less well understood. It can be tempting to assume – by analogy to engineering design – that symmetry and modularity arise from natural selection. But evolution, unlike engineers, cannot plan ahead, and so these traits must also afford some immediate selective advantage which is hard to reconcile with the breadth of systems where symmetry is observed. Here we introduce an alternative non-adaptive hypothesis based on an algorithmic picture of evolution. It suggests that symmetric structures preferentially arise not just due to natural selection, but also because they require less specific information to encode, and are therefore much more likely to appear as phenotypic variation through random mutations. Arguments from algorithmic information theory can formalise this intuition, leading to the prediction that many genotype-phenotype maps are exponentially biased towards phenotypes with low descriptional complexity. A preference for symmetry is a special case of this bias towards compressible descriptions. We test these predictions with extensive biological data, showing that that protein complexes, RNA secondary structures, and a model gene-regulatory network all exhibit the expected exponential bias towards simpler (and more symmetric) phenotypes. Lower descriptional complexity also correlates with higher mutational robustness, which may aid the evolution of complex modular assemblies of multiple components.
Symmetry in protein quaternary structure and polyominoes
We first explore evidence for this algorithmic hypothesis by studying protein quaternary structure, which describes the multimeric complexes into which many proteins self-assemble in order to perform key cellular functions (Fig. 1A and Supporting Information (SI) Fig S1 and section S1). These complexes can form in the cell if proteins evolve attractive interfaces allowing them to bind to each other [14–16]. We analysed a curated set of 34,287 protein complexes extracted from the Protein Data Bank (PDB) that were categorised into 120 different bonding topologies . In Fig. 1B, we plot, for all complexes involving 6 subunits (6-mers), the frequency with which a protein complex of topology p appears against the descriptional complexity , an approximate measure of its true Kolmogorov assembly complexity K ( p ), defined here as the minimal number of distinct interfaces required to assemble the given structure under general self-assembly rules (Methods). Here can also be thought of as a measure of the minimal number of evolutionary innovations needed to make a self-assembling com-plex. The highest probability structures all have relatively low . Since structures with higher symmetry need less in-formation to describe [6, 9–11], the most frequently observed complexes are also highly symmetric. Figs. 1C and Figs S2A & S3A further demonstrate that structures found in the PDB are significantly more symmetric than the set of all possible 6-mers (Methods). Similar biases towards high symmetry structures obtain for other sizes (Fig S2B).
So basically the kinds of protein structures that are most easily evolved are those that form symmetric complexes through self-assembly of the same basic unit, which are rather likely to evolve because they actually require rather little information to describe.
There is really a lot of data that supports the case made in this paper, which also seems to support the notion that de novo proteins(and most ancient proteins, and the first to evolve in the history of life) are likely to begin as repeats of small self-assembling peptides. It is probably no coincidence that novel genes like T-urf13 and VPU1 also exhibit these properties of a larger structure with symmetry owing to simple repetition of a smaller structure(symmetrical channels forming from self-assembling repetitions of a simple helix). And then there’s things like the icosahedral structure of bacteriophage and capsids of innumerable other viruses.