The emergence of catalysis in a noncatalytic protein scaffold is a rare, unexplored event. Chalcone isomerase (CHI), a key enzyme in plant flavonoid biosynthesis, is presumed to have evolved from a nonenzymatic ancestor related to the widely distributed fatty-acid binding proteins (FAPs) and a plant protein family with no isomerase activity (CHILs). Ancestral inference supported the evolution of CHI from a protein lacking isomerase activity. Further, we identified four alternative founder mutations, i.e., mutations that individually instated activity, including a mutation that is not phylogenetically traceable. Despite strong epistasis in other cases of protein evolution, CHI’s laboratory reconstructed mutational trajectory shows weak epistasis. Thus, enantioselective CHI activity could readily emerge despite a catalytically inactive starting point. Accordingly, X-ray crystallography, NMR, and molecular dynamics simulations reveal reshaping of the active site toward a productive substrate-binding mode and repositioning of the catalytic arginine that was inherited from the ancestral fatty-acid binding proteins.
A great article from @Rumraket.
I wanted to read this paper because this is the first case I’ve come across where structural alignments were used to infer ancestor states in the protein, because the proteins are so distantly related in sequence that several sites were inferred ambiguously.
Ancestral sequence inference. We inferred three ancestral nodes by maximum likelihood10: the most probable ancestor of all chalcone isomerases (ancCHI), of all CHI-like proteins (ancCHIL), and the CHI/CHIL common ancestor (ancCC). Given the wide divergence between CHIs and FAPs, an earlier ancestor was not inferred. Details of the procedure and prediction statistics are provided in the Supplementary Information and in Methods. Briefly, because protein sequence divergence between CHI, CHIL, and FAPs is high, and includes insertions and deletions (InDels), we generated a structure-based alignment (Supplementary Table 1, Supplementary Fig. 1, and Supplementary Dataset 1). No systematic InDels were found between the CHI and CHIL lineages. Hence, the structural alignment was trimmed in loop regions and at the N and C termini and a phylogenetic tree was generated (Fig. 1b; see Supplementary Fig. 2 for the complete tree). Remaining gaps and ambiguously aligned positions were handled manually in the reconstructed ancestors (Supplementary Dataset 2 and Supplementary Fig. 3).
Check out this legend to supplementary figure 3:
Supplementary Figure 3. An advanced guide to ancestral sequence inference with no headaches (or fewer, at least): representative example of the decision making process to determine the amino acid sequence in ambiguously inferred positions. Due to the high level of divergence within and between the three protein families, sequence-based alignments give ambiguous and inconsistent results, i.e. different programs place gaps differently. We therefore performed a structure-based alignment (Armougom, F. et al., Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res 34, W604-608 (2006)), which preferentially places gaps in loop regions and improved the quality of the alignment. The next challenge was that by default, ancestral inference places an amino acid at each position in the alignment, even if it consists mostly of gaps. In other words, the underlying model assumes that the ancestor was of maximum length and that every gap is a deletion, while in reality the opposite is often true. Therefore, we inspected all ambiguously inferred positions manually and corrected them to the best of our knowledge. Our approach was that even if our decision-making was flawed, this should have little or no effect on ancestral protein structure and function because indels typically occur in loop regions of high divergence. In total, eleven loop regions were manually revised as illustrated in this figure. First, the structure-based alignment (trimmed only at the termini) was used to determine the consensus number of amino acids - three in the example at hand. Second, the most probable ancestral sequences were added to the trimmed alignment (that was used for generation of the phylogenetic tree and ancestral inference) and reduced to the consensus number of amino acids. In cases where a particular amino acid could not be decided on (in the example, both S and G in the third position of seem plausible), additional information such as phylogeny, structural information, and chemical intuition were used to make the decision. In the above example, the structural context in a helix kink led us to choose G due to its frequent occurrence in turns. Additionally, N- and C-terminal adaptor sequences were added to all ancestors as shown in Supplementary Figure 1.
Other (to me) Interesting facts about this article is that they find not just multiple pathways by which functional descendant enzyme states could have evolved from the inferred ancestor, but multiple possible ancestral starting points could have given rise enzyme activity too.
Overall, the above results reinforce the conclusion of facile emergence of CHI despite its origin from a catalytically inactive ancestor: multiple founder mutations and subsequent trajectories are available with unexpectedly weak functional epistasis. In other words, the evolutionary landscape underlying CHI’s emergence is smooth rather than rugged.
It is a really nice counter point to the Axe/Gauger studies, showing how these sorts of puzzles can be untangled.
Oh and another thing, apparently the function of this specific class of enzymes has evolved independently in bacteria in a completely different protein, which was already an enzyme. So apparently this function can be reached from wildly different positions in protein sequence space. From the OP paper:
Our results indicate that CHI evolved from a catalytically inactive ancestor, thus demonstrating that emergence of catalysis and stereospecificity in noncatalytic scaffolds is a feasible evolutionary scenario. That said, a bacterial CHI29 has evolved independently in an enzymatic protein fold, exemplifying that plant CHI is an exception rather than the rule.
The bacterial CHI referenced in 29 apparently evolved from a Ferredoxin-fold enzyme and stress-related protein. In that paper (29) we can read:
The two-domain structure of the bacterial CHI is closely related to the ferredoxin-like fold of a chlorite dismutase and the stress-related protein SP1, despite the lack of any functional relationship. The tertiary structure of bacterial CHI, with ferredoxin-like folds, is completely different to that of the plant CHI, suggesting that the enzymes evolved convergently from different ancestor proteins. The bacterial CHI is only related to the plant CHI with respect to the products of the catalysed oxa-Michael addition.
Apparently the bacterial enzyme is also about 20 times more efficient than the plant enzyme.
That’s not natural evolution in the wild.
It is, however, “the emergence of catalysis in a noncatalytic protein scaffold.” Literally thousands of times.