A Response to David Gelernter’s Attack on Evolution

One must be very careful to distinguish between what some might imagine being true from what has been demonstrated to be true empirically. Dan Tawfik has stated that all theories about the origin of new protein folds are based primarily on speculation. Ignoring the implications of hard data by appealing to speculative theories is like a trial lawyer ignoring video footage of a crime in favor of the testimony of his client’s imaginary friends.

The hard evidence points to the following:

  • Enzymes can only evolve to the extent that the structure does not change, the active site remains basically the same, and the catalyzed chemistry is similar. Tawfik labeled these changes as micro-transitions.
  • Evolving a new protein fold requires an evolving gene to pass through regions of sequence space without any function.
  • A straightforward mathematical analysis of studies on the effect of random mutations on protein stability/function demonstrates that sequences corresponding to functional proteins are exceedingly rare.

The analysis of protein rarity is now much more accessible to the public. Doug Axe’s 2004 JMB article was extremely difficult to understand by anyone who was not an expert in the field. Consequently, critiques of his work could use erroneous arguments, and the public was powerless to identify the errors:

In contrast, Tawfik’s experiments can much more easily be interpreted. For instance, roughly half of all beta-lactamase mutants with three random amino acid changes are still functional. That change corresponds to a 1% alteration in the initial sequence. And, nearly all mutants with 10% of the sequence randomly altered are nonfunctional. In comparison, a 10% change in the letters of a short paragraph is still largely readable. Therefore, functional protein sequences are rarer than readable English paragraphs.

In addition, a large proportion of proteins consist of combinations of a limited number of domains just as a limited number of words are used in most sentences. This pattern was described by Scaiewicz and Levitt, and they identified numerous other similarities between protein sequences and human language including syntax, semantics, grammar, and the importance of context.

This observation relates to the common error of claiming that estimates of protein rarity exaggerate the difficulty of finding a functional target since other proteins or other distinct versions of the same protein might exist which could perform the same function. A multitude of alternative targets could dramatically increase the odds of finding one of them. Yet, this possibility seems remote given the extremely low probability of a random search entering a target region. It is also challenged by the fact that newly discovered multidomain proteins are very often “combinations of domains characterized by a limited number of sequence profiles.” If sequence space contained such vast numbers of targets, newly discovered proteins should not repeat the same sequence and structural patterns so often.

In addition, the bacteria population exceeds the population of most eukaryotic taxa by many orders of magnitude (e.g. 10^30 bacteria verses 10^13 trees). Yet, the percentage of taxonomically restricted genes (TRG) in ash trees is 25%, to name just one example, and this percentage is at least as large as the percentage in most taxa of bacteria. Some have argued that the TRG estimates are greatly exaggerated due to limited sampling, but a recent paper from Carvunis’ lab challenges this argument. The fact that TRG numbers in bacteria do not vastly outnumber those in eukaryotic species also strongly suggests that sequence space is not supersaturated with targets.