Is the Genetic Code Well-Designed?

Hi Vincent,

I’m not disagreeing with you, merely curious about the numbers you are using here. I read some of your post at TSZ, but may have missed a detailed explanation of these numbers.

1 Like


That other potential codes exist which are “better” is an interesting idea; better than what, exactly? Faster? Less resource-dependent? Just as a matter of the “big picture” when it comes to such ostensibly “better” codes, would we really want to live in a world with more efficient, say, predatory capabilities? Or, was it wise of an Intelligence not to overdo it? : )

Hi Curtis,

The figures I’m using come from a paper by Dr. Eugene Koonin, titled, Frozen Accident Pushing 50: Stereochemistry, Expansion, and Chance in the Evolution of the Genetic Code (Life 2017, 7(2), 22; doi:10.3390/life7020022). In his article, Koonin presents what seems to be powerful prima facie evidence against the hypothesis that the standard genetic code (or SGC) was designed. The problem is that despite the SGC’s impressive ability to keep the number of mutational and translational errors very low, there are lots of other genetic codes which are even better:

Quantitative analyses of the code using cost functions derived from physico-chemical properties of amino acids or their evolutionary exchangeability have confirmed the exceptional robustness of the standard genetic code (SGC): the probability to reach the same level of error minimization as in the SGC by random permutation of codons is below 10^−6 [10,13,14,15]. However, the SGC is far from being optimal because, given the enormous overall number of possible codes (>10^84), billions of variants are even more robust to error [10].

Extensive quantitative analyses that employed cost functions differently derived from physico-chemical properties of amino acids have shown that the code is indeed highly resilient, with the probability to pick an equally robust random code being on the order of 10^−7 to 10^−8 [14,15,61,62,63,64,65,66,67,68]. Obviously, however, among the ~10^84 possible random codes, there is a huge number with a higher degree of error minimization than the SGC [standard genetic code – VJT]. Furthermore, the SGC is not a local peak on the code fitness landscape because certain local rearrangements can increase the level of error minimization; quantitatively, the SGC is positioned roughly halfway from an average random code to the summit of the corresponding local peak [15] (Figure 1).

I submit that if the standard genetic code were situated on a local peak, it would at least be a choiceworthy code for a designer to produce. But when it’s halfway from an average random code to the summit of the local peak, I have to ask myself: why would a designer pick a code like that?

1 Like

That the ostensible Designer chose to instantiate a 33 and1/3 rpm world instead of the possible 78 rpm one seems a weak criticism of the fact that both systems, literally, “revolve around” a well-engineered record player. Perhaps there was a good reason to choose the “lower settings’” creative potential?

1 Like

Thanks for the reference, Vincent. I’ve started taking a look at it, and plan to study it more over my break! This is a very interesting “op-ed” piece from Koonin.

This is a possibility, but the true “creative potential” of life seems to center around the DNA level (unless you are thinking of a scenario prior to cells as we now know them). I don’t see a reason why the “coding” events from DNA to functional protein would need to have creative potential when the capacity for DNA variability is already present. I haven’t yet given sufficient thought to alternative explanations, but I think Vincent’s question is quite insightful.

Figure 1 is the heart of the argument:

Figure 1. The code fitness landscape. The figure is a cartoon illustration of peaks of different heights separated by low fitness valleys on the code fitness landscape. The summit of each peak corresponds to a local optimum (O). Evolution towards local peaks is shown by arrows and starts either from a random code ® or from the SGC. Modified from [14], under Creative License.

I think the “perfection” of the genetic code is a myth, and so is the argument that it’s perfection is a convincing argument for design. Likewise, I find the “bad design” arguments to be unconvincing too. They assume we know the design goals, and also imply that if they system evolved it was not designed. Maybe God designed with evolution, right?

My counter argument is three fold.

  1. Maybe the code is optimizing a different design goal. Perhaps error tolerance is important, but other things are too. We cannot know Perhaps we do not want too much error tolerance, because we want some errors, so things can evolve. In this case, evolution itself might be the design goal, which might benefit from a trade off between error and accuracy.

  2. Perhaps what we see is not so important to optimize regardless, and just records the history of what happened here, and perhaps the path required to arrive at a genetic code. It could be a “frozen accident”, though the more neutral terminology would be a random dice. Or it could be…#3.

  3. This might even be the only viable (or vastly most likely) code possible given the constraints of chemistry if we require a stepwise path to get from a simplified code to a complex one. In this case, it would not even be properly called a code (!). Several constraints (which we cannot even know for sure) might be it has to be able to encode useful proteins with a limited (and easy to find) set of amino and nucleic acids. Perhaps there is only a very small number of codes capable of doing this, which constrains the starting point dramatically. Then, we can add a few aa at a time and so on.

I’m not sure which of these is correct, but who could know this anyways? There are just too many unknowns. We know that the genetic code works well enough, but neither “good design” nor “bad design” appears to be clear from the evidence. In context, @vjtorley is disputing the mythology of “perfection in the Genetic Code is evidence it was designed de novo, without an incremental process”; so I do ultimately agree with @vjtorley’s argument, but I would resist taking it too far.

1 Like

Except… The “landscape” is n-dimensional where ‘n’ is much, much greater than 3.
…and changes over time.

Which is to say, ‘optimality’ is terrifically hard to assess, and life just has to be ‘good enough’ at any particular point in time.


Hi Joshua,

Thanks very much for your helpful suggestions. I think #2 and #3 get to the heart of the matter. The genetic code we see now has a long history, and evolved in a stepwise fashion. If this were the case, then its present level of fitness makes a lot more sense.

The real question we need to ask, then, is the one you touched on: is it a code?

1 Like

Saying the Genetic Code is a code only as a weak analogy. As everything is in biology, the analogy to human designs is very weak. We can list similarities, differences, and exceptions.

My counter argument is three fold.
Maybe the code is optimizing a different design goal. Perhaps error tolerance is important, but other things are too. We cannot know Perhaps we do not want too much error tolerance, because we want some errors, so things can evolve. In this case, evolution itself might be the design goal, which might benefit from a trade off between error and accuracy.

Other authors have included the following goals:

We can never know if we have identified all of the goals which take part in the constrained optimization problem. Yet, the code seems very optimized for the goals that we have identified. A useful research project would be to test the rate at which new codes could potentially form and then see how upper limits of trials compare to the estimated optimality of the actual code.

My point exactly.

The question, however, is different: how was it optimized? By God directly creating the first cell? Or lower forms of life optimizing a code by natural selection and drift? I’m just not sure we can tell at this point, can we? There is just too many unknowns. That makes this a pretty weak argument either for or against design.

Of course, it is reasonable for those that believe in God to think God created the first cell. It is also reasonable for scientists to try an untangle the role of natural mechanisms.

I find it interesting that you raise this as a goal. As I understand it @bjmiller, you are an OEC, right? You do not affirm evolution. So why exactly would God build the the Genetic code to allow for frameshift genes? If he is going to specially create organisms any ways, why would He choose a code that would increase the evolutionary potential of life?

Of course, maybe I misread you, and you do affirm evolution. Sorry if that is the case. But it seems one could only adopt this as a valid design goal if one was arguing against abiogenesis and for evolution. Is that what you are doing?

1 Like

@bjmiller the article you linked to is from an ID book, and it lays out the argument regarding the Genetic Code.

I think this clarifies the point…

Consequently, translation of a frame-shift error is halted more quickly on average in the real genetic code than in 99.3% of alternative codes, thus saving the cell significant expense.

I think that is what you meant. Even though we do see proteins arising from frameshifts, you are arguing that frameshifts will lead to stop codons, thereby making the code robust to that error. Right? This would be different than optimizing for evolvability.

I think that is what you meant. Even though we do see proteins arising from frameshifts, you are arguing that frameshifts will lead to stop codons, thereby making the code robust to that error. Right? This would be different than optimizing for evolvability.

My point was that the code allows for messages within messages, but this capacity does not mean they could come about by chance. One could set up blank spaces in the configuration of a crossword puzzle and then use a random letter generator and selection to attempt to generate words or sentences. One might obtain a few words over an extended period of time, but the likelihood is remote that one would find multiple horizontal words forming a vertical word even if significant spelling errors were allowed. In the same way, the constraints on a gene forming which was hidden in a frameshift are much greater than on the original gene. The code may be optimized for such data compression, but the existence of genes within genes points to intentionality.

Also, where do we see proteins arising from frameshifts?

My 2c. I agree that the objective/fitness function is an insurmountable problem for analysing the genetic code in optimization terms.

Some other observations:

  1. We assume that there is a single objective. If there are multiple objectives, how does the designer weigh them in the final summation? This is a further difficulty with the fitness/objective function specification.
  2. What if the​ ​multiple objectives are incommensurable? This is very common. Then you need a multi-objective optimization. But in that case it boils down to a Pareto frontier, and the designer has to pick the tradeoff based on yet another factor.
  3. What if the design goal is unoptimizable? Many interesting qualities are unquantifiable, e.g. personality, beauty, etc. If we assume that the designer had such qualities in mind, there would be an abundance of equally-good options. The fact that the goals span multiple generations makes it even more inscrutable.

Proteins by Frameshift

There are several examples. It is hard to detect these in most systems, but viruses make it easy to study how new proteins evolve.

From a forward approach to design you are correct. It is impossible to estimate the probability of evolving a new protein based on the fitness function for important cases, because we just do not know the fitness function to high enough certainty.

It is not just viruses though. At least two proteins gave us new abilities appear to be caused by frameshifts. No new enzymes evolved here, but the point is that new enzymes were not needed. Here is one of them.

If there is interest, when I get a chance I’ll dig up the other one.


However, we do know Axe is wrong, or we would not observe de novo proteins in viruses and neofunctionalization cancer. By his view, that should be impossible.

Axe defined enzymes in a very narrow way. For example, he insists that enzymes have a stable structure, which is a strawman. There is no reason to think enzymes must have a stable structure to have the initial residual function required for selection to take hold. The issue is that the whole argument really is a strawman, because no one believes that enzymes by his definition are commonly evolving in one whole swoop. So arguing that pathway is low likelihood does not really tell us about the pathways that we do think took place.

Moreover, according to Axe’s definition of enzymes, it appears that zero of the functional differences between humans and chimps require a de novo new enzyme (by his definition). It turns out that only small tweaks to proteins. That is the whole point. Small tweaks can do give a great deal of new functions at the organism level.

The fact that we see neofunctionalization over and over again is why the arguments he is making only apply to the origin of the first enzymes, not the evolution of species today. Even then, however, we have to engage the actual mechanisms put forward by others here, not just strawmen.

That is correct. However, the messages within messages should be impossible if protein functionality is exceedingly rare. The math there should be easy enough to work out, right? The fact that overprinting is so common and easy to evolve is a very strong evidence against the rarity of function in sequence space.

1 Like

If the stop codons were well-designed, the start codon was stupidly designed. They are controls for each other.