Language Models Generate Functional Proteins With Completely Novel Sequences

RonSewell · January 31, 2023, 12:05am

A recurring theme in creationism and especially ID, is that proteins are brittle, that the coding sequence must be very exact to function, and the odds of randomly generating the required order is one in a million gazillion raised a bazillion. This argument, which may be genuinely persuasive to the lay person, has never honestly conveyed the full biochemical understanding. For one, the same folding, binding, releasing, or catalyzing function could be served by variations, or even essentially alternate, proteins. So these recent papers dealing with AI techniques developed for natural language processing applied to generate proteins are a case in point, as they indicate nature does not implement all possible solutions.

Language models generalize beyond natural proteins

Here we demonstrate that language models generalize beyond natural proteins to generate de novo proteins, different in sequenceand structure from natural proteins. We experimentally validate a large number of designs spanning diverse topologies and sequences. We ﬁnd that although language models are trained only on the sequences of proteins, they are capable of designing protein structure, including structures of artiﬁcially engineered de novo proteins that are distinct from those of natural proteins. Given the backbone of ade novo protein structure as a target, the language model generates sequences that are predicted to fold to the speciﬁed structure. When the sequence and structure are both free, language models produce designs that span a wide range of fold topologies and secondary structure compositions, creating proteins which overlap the natural sequence distribution as well as extend beyond it. Designs succeed experimentally across the space of sampled proteins, including many designs that are distant in sequence from natural proteins.

AI technology generates original proteins from scratch

Scientists have created an AI system capable of generating artificial enzymes from scratch. In laboratory tests, some of these enzymes worked as well as those found in nature, even when their artificially generated amino acid sequences diverged significantly from any known natural protein.

Rumraket · January 31, 2023, 3:30am

Interestingly AI protein-language models are now starting to be employed to find homologous relationships between proteins that previously couldn’t be determined to be related using sequence-based alignments alone(where structural information was previously necessary, which usually required crystal structures from already known to be homologous proteins, which could be a problem because obtaining structures from some proteins couldn’t be done):

Recently, pLMs were also leveraged for establishing homologous relationships between sequences. While this is achievable with standard alignment tools [14], whenever the comparison falls into the so-called twilight zone [16], the pairwise signal gets blurry. This is where pLMs shine by capturing relationships way beyond simple sequence comparisons, uncovering otherwise undetected evolutionary relationships that can guide, for example, protein annotation or structure prediction efforts.

Dan_Eastwood · August 29, 2023, 3:30am

This topic was automatically closed 210 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Simulating 500 million years of evolution with a language model Conversation Science , Artificial-Intelligence	9	195	February 2, 2025
Our expanding knowledge of the expanding protein universe Conversation Science	19	241	May 16, 2025
Pierre Baldi: Protein Folding and AI's Impact on Science Peaceful Science	5	425	December 16, 2020
Miller: Axe Decisively Confirmed? Conversation Science , Design	31	4586	February 23, 2019
Functions are not so rare at all, and definitely not isolated, in sequence space of biopolymers Conversation Science	48	2785	July 19, 2021

Language Models Generate Functional Proteins With Completely Novel Sequences

Related topics