A recurring theme in creationism and especially ID, is that proteins are brittle, that the coding sequence must be very exact to function, and the odds of randomly generating the required order is one in a million gazillion raised a bazillion. This argument, which may be genuinely persuasive to the lay person, has never honestly conveyed the full biochemical understanding. For one, the same folding, binding, releasing, or catalyzing function could be served by variations, or even essentially alternate, proteins. So these recent papers dealing with AI techniques developed for natural language processing applied to generate proteins are a case in point, as they indicate nature does not implement all possible solutions.
Here we demonstrate that language models generalize beyond natural proteins to generate de novo proteins, different in sequenceand structure from natural proteins. We experimentally validate a large number of designs spanning diverse topologies and sequences. We ﬁnd that although language models are trained only on the sequences of proteins, they are capable of designing protein structure, including structures of artiﬁcially engineered de novo proteins that are distinct from those of natural proteins. Given the backbone of ade novo protein structure as a target, the language model generates sequences that are predicted to fold to the speciﬁed structure. When the sequence and structure are both free, language models produce designs that span a wide range of fold topologies and secondary structure compositions, creating proteins which overlap the natural sequence distribution as well as extend beyond it. Designs succeed experimentally across the space of sampled proteins, including many designs that are distant in sequence from natural proteins.
Scientists have created an AI system capable of generating artificial enzymes from scratch. In laboratory tests, some of these enzymes worked as well as those found in nature, even when their artificially generated amino acid sequences diverged significantly from any known natural protein.