Our expanding knowledge of the expanding protein universe

For those who share my enthusiasm for protein evolution and diversity, and for the protein universe, two recent papers to read, both open access.

Evolving concepts of the protein universe is a Perspective piece that provides a very useful historical account (which I didn’t know) of how views of protein structure/function evolved (heh).

These observations led Anfinsen to postulate the “thermodynamic hypothesis”, which states that the protein’s native conformation is comprised of the totality of interatomic interactions which is determined by the amino acid sequence in a given environment.
However, classic work by Kauzmann (1959) explicitly refers to an alternative view of the protein folding problem: “According to all that we know now about protein structure, we have good reason to believe that disorder might be introduced into a protein in small increments.”

Then:

The idea that excited states in the funnel landscape can have functional relevance became a major research topic, and in further major developments started in the 1990s, it became clear that proteins need not always be highly structured to be functional as presciently suspected by Kauzmann back in 1959. Indeed, it is now well recognized that a large fraction of the proteomes of organisms across all three domains of life is comprised of intrinsically disordered proteins/peptides (IDPs) and many more proteins with ordered domains contain intrinsically disordered regions (IDRs) that, by definition, lack rigid 3D structure yet are functional. Furthermore, it is now clear that some ordered proteins can switch folds and gain new function and that regions in certain folded proteins (and even entire proteins) can “unfold”, in a transition from order to disorder, in response to physical or chemical stimuli.

These concepts are known to those who read the literature, but of course many people don’t, and I suspect that obsolete views of protein structure/function are common, and not just at the decrepit creationist/ID potlucks.

The whole paper is a fun read and approachable for non-specialists.

This second one (EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals) is a lot more technical but worth a look. One thing to be enthused about is the fact that they used (co)evolutionary knowledge to identify protein relationships that are not already known–a great example of how different domains of evolutionary biology feed back/forward onto each other. The other (to me) interesting facet is the ability to find functions (and modules) in poorly-explored (and often newly-discovered) regions of the protein universe. Here’s how they put it in their nice abstract:

The known universe of uncharacterized proteins is expanding far faster than our ability to annotate their functions through laboratory study. Computational annotation approaches rely on similarity to previously studied proteins, thereby ignoring unstudied proteins. Coevolutionary approaches hold promise for injecting new information into our knowledge of the protein universe by linking proteins through ‘guilt-by-association’. However, existing coevolutionary algorithms have insufficient accuracy and scalability to connect the entire universe of proteins. We present EvoWeaver, a method that weaves together 12 signals of coevolution to quantify the degree of shared evolution between genes. EvoWeaver accurately identifies proteins involved in protein complexes or separate steps of a biochemical pathway. We show the merits of EvoWeaver by partly reconstructing known biochemical pathways without any prior knowledge other than that available from genomic sequences. Applying EvoWeaver to 1545 gene groups from 8564 genomes reveals missing connections in popular databases and potentially undiscovered links between proteins.

7 Likes

The multidisciplinary approach seems effective in many areas of research.

2 Likes

And yet, given that many of the stories at Evolution News deal with the issue of Intrinsically Disordered Proteins, your suspicion that ID proponents views on protein structure/function are obsolete isn’t warranted.

In what way do they “deal with the issue”?

1 Like

Many? There are only 5 articles they specifically tagged with " intrinsically disordered proteins" on that site since 2018.

1 Like

All are spectacularly bad and grossly misleading.

1 Like

How many? Links?

I don’t think it’s mere suspicion but fact; moreover, spectacularly decrepit views have been expressed right here in this forum.

Would you like an example or two?

1 Like

I do not, as a general rule, visit that site. But I will read any link you post. I will be looking for evidence that anyone at the DI seems to understand the current (and complex) concepts of protein structure and function, given that older “classic” rubbish from the DI is built on old concepts (that were, admittedly, misrepresented laughably even back then) of folds, protein structure, protein function, and evolutionary trajectories.

My suspicion is beyond warranted. When I called it a “suspicion” I was being cautious and generous. If you think that an ID proponent has recently written something on the topic, something that is even within 5 years of being up to date, and has published it somewhere other than a culture war website, please provide references. Thanks.

2 Likes

Okay, several may have been a better choice of word than many. Are you happy with this? The important point here is that these guys are plainly aware that not all functional proteins exhibit a neatly ordered structure.

My “expanding knowledge of the expanding protein universe” includes baked sesame tofu and lentil burgers.

4 Likes

The question was:

So? Especially in the context of their previous focus on “a coherent, stably folded unit of protein structure,” as written by Ann Gauger.

For an example, search this essay by Axe for “stable,” then see if you can find any variation of “disordered”:

Surely the best example of expanding protein is :popcorn:.

1 Like

Am I understanding this paper correctly: That from nothing more than genomic sequences this program allows investigators to construct phylogenetic trees and, from that, reconstruct biochemical pathways? If so, that is very cool.

1 Like

Which only goes to show that, when confronted with evidence that clearly refutes one of their key claims, ID’s propagandists will simply deny that is the case and write some nonsense on one of their websites. They can do so secure in the knowledge that their supporters are incapable of reading with comprehension and will simply take it on faith that the ID claim still stands unscathed.

4 Likes

My understanding is that phylogenetic trees are the input. Some text from the Results section:

Our approach, named EvoWeaver, takes as input a set of phylogenetic gene trees and optional metadata (Fig. 1a). EvoWeaver then performs four types of coevolutionary analyses, comprised of 12 algorithms optimized for scalable performance. The output of EvoWeaver is 12 scores ranging from −1 to 1 that quantify the strength of coevolution between a pair of gene groups. These scores
can be combined using a machine learning classifier to generate inferences or hypotheses about gene function.

The goal, in the authors’ words: “EvoWeaver’s primary purpose is to serve as a generator for hypotheses about functional associations.” That sentence is followed by an interesting case study in which EvoWeaver “mispredicted” a functional association between two proteins; in other words, it identified an association that wasn’t known. They then explain why the association is actually well supported by other data. Lots of examples of this in the paper.

But otherwise your summary is right: using only phylogenetic info, and with no prior knowledge of function or association, the tool can infer/suggest functional relationships even including biochemical pathways. Very cool indeed.

1 Like

IOW, evolutionary theory can be used to predict previously unknown functions of proteins.

How awkward for @Giltil and other ID’ers. Yet another thing they have to try their best to explain away.

5 Likes

It’s going to get a lot worse:

:grimacing: :exploding_head: :open_mouth: :smiling_face_with_sunglasses:

6 Likes

Uh oh.

3 Likes

The first thing that strikes me about that paper is the enormous amount of lab work it must have required to perform those experiments. All the cloning and transformation work, all the competition and fitness assays, all the sequencing work.

Speaking of their results I’m fascinated that many of these novel adaptations were found and optimized in 1, 2, some times up to 8 or more mutations.

One of their genes even aquired an apparently necessary frameshift mutation to evolve a new function.

Copper tolerance was achieved through the discovery of a frameshift mutation that led to a de novo peptide appendage contributing to function

Two compelling outcomes emerged from copper tolerance EVO experiments starting from the E. coli ORF library and the yeast ORF library. The first outcome was the fixation of lineages descended from the E. coli essQ gene. EssQ is a predicted class II holin of the cryptic DLP12 prophage67. While overexpression of EssQ increased copper tolerance (Fig. S10A), the mutant variants produced through EVO did not improve upon the WT EssQ function.

The second outcome was the fixation of lineages descended from S. cerevisiae RTS3. RTS3 encodes a largely unstructured protein previously linked to caffeine tolerance68. Strikingly, the evolved variant of RTS3 that largely fixed after EVO contained a nucleotide deletion that resulted in a truncated protein with 120 amino acids (instead of RTS3’s original 263 amino acid size). Notably, the single nucleotide deletion caused a frameshift that introduced 18 amino acids prior to the stop codon, such that the last 18 amino acids of the 120 amino acid evolutionary outcome can be considered a de novo peptide. Furthermore, the frameshifted RTS3 conferred copper resistance while the wild-type RTS3 conferred no detectable resistance (Fig. S10B-C), suggesting that ORACLE drove the full origination and evolution of a new gene function in this case.

Several interesting observations about the frameshifted RTS3 are of note. First, simply truncating WT RTS3 to create a 120 amino acid fragment was not sufficient to observe any copper resistance, demonstrating that the 18 amino acid de novo peptide in the frameshifted RTS3 was critical for function (Fig. 3E) (Variant 1 contains only the frameshift mutation). (…)

1 Like