I mean, to me this is just another instance of a ‘Watchmaker analogy’ and I find it rather uninteresting. Even putting aside the fact that the ‘genetic code’ is more like a cypher than an actual code, the question on whether or not the analogy holds does not really help our understanding, sometimes it can distract us.
It reminds me of when physicists were first trying to understand the structure of the atom, trying to figure out how the negatively charged electrons did not crash into the positively charged nucleus. Some compared it to how planets were also able to avoid colliding with the sun by remaining in a stable ‘oribit’. Hence why we are still talking about ‘electron orbitals’. Of course, electrons are not planets, and one can debate whether the analogy holds or not. Regardless whether you want to keep the analogy (and we often do as a conceptual stepping-stone in high school) that doesn’t matter to the questions of how electrons actually behave. Similarly, we can also argue endlessly whether the genetic code is really a cypher or not, it tells us little about system in question and I would like to move away from such analogies (or at least not put so much stock into it) for the same reasons outlined by Daniel Nicholson in his papers where he criticize the many metaphors that compare organisms to machines. Also see this discussion I had on the Forum.
Case in point, one can argue over (1) whether tRNA’s is really ‘interpreting’ anything. Doesn’t the anti-codon just bind onto its complementary codons due to the physics of hydrogen bonding? (2) Why do we assign this ‘interpreter’ role to the tRNA. Can’t we just as easily say that the codon ‘interprets’ the tRNA? (3) How is ‘interpretation’ even defined? Does the action have to be cognitive, e.g. when I am interpreting a red-coded faucet as one that provides hot water? If so, then aren’t we treating tRNA as if they are tiny homunculi inside the cell? If cognitive agency is not required, then can one say that oxygen and hydrogen are ‘interpreting’ each other to produce H2O? If not, why not?
When the genetic code was first described, it was indeed said that there was no stereochemical relationship between the (anti-)codons and the amino acids even by prominent biologists like Francis Crick and Jacques Monod. This led to the hypothesis that the genetic code is a frozen accident; i.e. the code we ended up with was just one of many equally likely possibilities, but the biosphere ended up being stuck with one code. However, this is very wrong in my opinion. The genetic code(s) has a lot of non-arbitrary structure and in more ways than one.
First, we are all familiar with the fact it is ‘degenerate’ (redundant), since there are (4x4x4=) 64 different codons that code for 20 (rarely 22) amino acids. If we would randomly assign each codon to one amino acid, the random genetic code we would end up with would (by overwhelming probability) have redundancy equally distributed in the 3 nucleotide of the codons. However, this is definitely NOT true for the genetic code(s) that we observe. Most of the redundancy is concentrated in the 3rd nucleotide P3. For example, if P2 is C, or if P1+P2 are G or C and G or U respectively, then P3 is completely redundant; changing the nucleotide at P3 won’t change the amino acid. When the P3 does matter, the only distinction that matters is purine (AG) vs pyrimidine (UC) but there are two interesting exceptions: the start codon AUG (Methionine) vs AU[U/C/A] (Isoleucine) and UGG (Tryptophan) vs UGA (‘Opal’ stop codon). And regarding non-standard genetic codes; organelles and some organisms use UGA to also specify Tryptophan, while some other organisms use UGA for Glycine or for selenocysteine (one of the 2 non-standard amino acids). In contrast, although P1 is never redundant in the strict sense, changing the nucleotides at P1 does occasionally result in the same specified amino aicid: CG[N] vs AG[A/G] for Arginine and CU[N] vs UU[A/G] for Leucine. However, changing the nucleotide at P2 will ALWAYS result in a different amino acid specified.
So the P1, P2, P3 positions are clearly not arbitrary with regard to redundancy;… but why? First, is the geometry of the tRNA loop allows some free movement of the first nucleotide of the anti-codon (opposite to the P3 of the codon). This free movement allows the nucleotide to pair up in a manner that does not follow Watson-Crick rules, called wobble base pairs. For example, if the P1 of the anti-codon is G, then it can pair with C (Watson-Crick) or U (wobble base pair) in the mRNA. Note that in this example, C and U are both pyrimidines. Indeed, when anti-codon P1 is a purine it still prefers to bind with pyrimidines and vice versa. Hence why when codon P3 matters, it’s distinct that most often matters is purine (AG) vs pyrimidine (UC). Thus, one anti-codon is able to bind onto multiple codons, hence organisms don’t make 61 tRNAs, one for each unique non-stop codon (most make fewer than 45).
But why does doesn’t P3 matter when P2 is C, or when P1+P2 are G/C+G/U? Well, the geometry of the tRNA makes P2 the most reliable to form canonical Watson-Crick base pairs. P1 is a little less reliable but still far more reliable than the wobble-prone P3. Furthermore, not every base pair is equal; G-C have 3 hydrogen bonds while U-A have only 2. Thus, G-C base pairs are stronger than A-U. Secondly, due to the geometric differences between the codon and anti-codon, the direction of the bond also matters. Pyrimidine(codon)-purine(anti-codon) are stronger than purine(codon)-pyrimidine(anti-codon), e.g. U-A pairs are stronger when U is in the codon and G-C pairs are stronger when the C is in the codon. So, when only looking at the codon, the order of bases that form strongest to weakest is C>G>U>A. This is why P3 is always redundant when P2 is C, i.e. when the most reliable position [P2] has the strongest binding possible [C in the codon]; and why P3 is still redundant when P1+P2 are G/C+G/U, i.e. when P1 has the strongest or 2nd strongest base [C or G] and P2 has the 2nd or 3rd strongest base [G or U]. Base ‘A’ is the weakest, which explains why P3 is never redundant when P2 is ‘A’. We also see interesting things with regard to the start codon and stop codons. The wobble position for the start codon is at the P1 (AUG, and lesser used CUG > GUG > UUG), because the start-tRNA interacts with the P-site (for initiation) of the ribosome while all other tRNAs interact with the A-site (for elongation). The geometry is almost reversed between the A and P sites. The stop codons (UAA>UGA>UAG in order of usage) are rather weak with very few hydrogen bonds. Perhaps they are too weak to be used for codon/anti-codon interaction, serving better as recognition sites for protein terminating factors. And when these and similar codons are used for specifyiing amino acids, they are used for rather special and rarely used amino acids; e.g. Tryptophan and especially Selenocysteine.
Aside from the redundancy, there are also non-arbitrary relationships between the codons and the properties of the amino acids themselves. This becomes more clear when we change the wheel graph of the genetic code such that we can use P2 to group the amino-acids into four quadrants. Here we we see that the amino acids within each group share the same properties regarding their ‘polarity’ and thus their affinity to water. Codons with P2 as U specify amino acids that are hydrophobic; those with A specify ones that are hydrophilic, and those with G/G specify ones that are semi-polar. The only exception is Arginine (Arg) an hydrophilic amino acid in the otherwise semipolar ‘G’ group. Polarity is among the most important properties to since protein folding involves a 'hydrophobic collapse’. You also have other important amino acid properties: strongly acidic vs strongly basic and aliphatic vs aromatic. Amino acids that share such properties are specified by codons that often only differ by one base.
Why is this the case? Again, this is likely due to P2 being the most reliable for reasons explained previously. Assigning amino acids with different properties to the different “codon groups” which are more easily distinguished by the translation system means that most translational errors occur within the codon groups, meaning when an amino-acid is misplaced it tends to have similar properties to the one it replaced; and such amino acid substitutions are often tolerated. Case in point, almost no sarcomere (unit of muscle) in your body will be error-free regarding the amino acid sequences of its proteins, yet you are still able to walk.
But we can go even further. The nucleotides of the codons don’t only correspond to the properties of the amino acids. The codons also have a non-arbitrary correspondence to the biochemical synthesis of the amino acids. Also see section 6.1 of this excellent paper, particularly figure 6, which is TOO big to put here. A less comprehensive but simpler one is shown below. As you can see, the P1 of the codon correlates to the alpha-keto acid precursor of the amino acid: C (ketoglutarate) A (oxaloacetate) and U (Pyruvate). Furthermore, other nucleotides correlate with the subsequent chemical reactions. For example, if P1 = G, start with α-ketoacids and P2 correlates to the subsequent chemical alteration(s).
Exactly why this metabolic correspondence exists in the genetic code is the case is up for debate. In this paper the authors propose that specifically dinucleotides (the precursor to the first two bases of codons) were responsible for catalyzing the synthesis of amino acids from α-ketoacids, and that the sequence of the dinucleotide chemically determined the specific amino acid synthesized. I am not sure about this myself.
When I see this metabolic correspondence, it reminds me more about when amino-acids are charged to the tRNA they are still able to undergo chemistry which can change the amino acid into a different one. It’s one type of error that can occur during translation. Many amino acids are ‘connected’ within the same biochemical pathway. One example, glutamate-tRNA is able to change glutamine-tRNA. In fact, this is happens in all Archaea and most bacteria, where one aaRS enzyme (GluRS) charges glutamate onto two types of tRNA, but when one type is charged, the glutamate (while attached to the tRNA) is changed into glutamine by an amidotransferase. Some bacteria and all eukaryotes do have a seperate aaRS (GlnRS) for directly charging glutamine, but evolved from the aforementioned aaRS (GluRS) via gene duplication and neofunctionalisation. This is one likely mechanism for how new amino acids were incorporated into the genetic code.