Create a Protein with Your Mind

Preservation is evidence of restriction. If the protein had lots of function in protein space the expectation would be that there are significant divergence over deep evolutionary time. We don’t have evidence of enough substitutability of prp8 to make a case that it evolved even comparing the human version to slime mold.

Yes. But mere restriction doesn’t tell you there are no other functions out there, only that the space immediately surrounding your sequence is mostly deleterious. But what is it like further away and how do you know?

You already conceded you don’t know.

If the protein had lots of function in protein space the expectation would be that there are significant divergence over deep evolutionary time.

Preservation doesn’t tell you anything about how many hills are out there in the fitness landscape. It just tells you the sequence you have is being maintained by purifying selection. That only tells you that if you move away from the current sequence you’re likely to be moving downhill.

But how do you know there are no other hills if you move even further?

We don’t have evidence of enough substitutability of prp8 to make a case that it evolved even comparing the human version to slime mold.

That sentence doesn’t make sense. How conserved the sequence is between human and smile mold doesn’t tell you what the fitness landscape is like further away from the sequence. Again, all it shows it that the area immediately surrounding it is downhill, as in it has lower fitness.

1 Like

The main point Rum is there can be thousands of small peaks out there and still not enough for evolution to find them. When I say 50% substitutability that represents many trillions of functions in sequence space yet it also represents 2335 bits of functional information.

The evidence is that these proteins are in a position where mutation is rare despite DNA mutating.

Do you have evidence that the fitness landscape is robust enough so it could have evolved? Again 50% substitutability is an extremely robust landscape yet it is also 2335 bits of FI.

This protein is part of a much larger system so the idea of a fitness landscape is not clear.

There can be all sorts of things, the question is how you know which of all the possibilities is actually the case.

You are claiming to know that when you plug in a particular number of sequences that meet the minimal threshold for function.

When I say 50% substitutability that represents many trillions of functions in sequence space

I don’t understand what “50% substitutability represents many trillions of functions” means.

The evidence is that these proteins are in a position where mutation is rare despite DNA mutating.

But you reject that evidence. The evidence you are talking about here is a phylogeny of eukaryotes based on Prp8, which quite explicitly implies common descent.

1 Like

Better stated trillions of combinations where prp8 can still function. We have no evidence this is the case however.

I don’t reject the evidence. I just think there is an alternative to common descent as a conclusion.

For those who may be unfamiliar with zinc finger proteins (ZFP), they were the heavy metal contaminated raw materials which Charlton Heston discovered on a conveyor belt inside a MacDonald’s regional production plant and distribution center in the cult classic 1973 film, Soylent Green.

The dark-of-night scene where entire fleets of refrigerated delivery trucks are driving away from the factory’s loading docks and fanning out across the city—with a smiling Ronald MacDonald painted on their sides—is particularly eerie.

3 Likes

I have no idea what it means to say “the fitness landscape is robust enough so it could have evolved”.
The evidence that eukaryotic Prp8 evolved is that it is a fusion protein containing multiple distinct domains(homologous to other proteins) with somewhat related functions, that’s why it’s so large.

Dlakić M, Mushegian A. Prp8, the pivotal protein of the spliceosomal catalytic center, evolved from a retroelement-encoded reverse transcriptase. RNA. 2011 May;17(5):799-808. DOI:10.1261/rna.2396011

Abstract

Prp8 is the largest and most highly conserved protein of the spliceosome, encoded by all sequenced eukaryotic genomes but missing from prokaryotes and viruses. Despite all evidence that Prp8 is an integral part of the spliceosomal catalytic center, much remains to be learned about its molecular functions and evolutionary origin. By analyzing sequence and structure similarities between Prp8 and other protein domains, we show that its N-terminal region contains a putative bromodomain. The central conserved domain of Prp8 is related to the catalytic domain of reverse transcriptases (RTs) and is most similar to homologous enzymes encoded by prokaryotic retroelements. However, putative catalytic residues in this RT domain are only partially conserved and may not be sufficient for the nucleotidyltransferase activity. The RT domain is followed by an uncharacterized sequence region with relatives found in fungal RT-like proteins. This part of Prp8 is predicted to adopt an α-helical structure and may be functionally equivalent to diverse maturase/X domains of retroelements and to the thumb domain of retroviral RTs. Together with a previously identified C-terminal domain that has an RNaseH-like fold, our results suggest evolutionary connections between Prp8 and ancient mobile elements. Prp8 may have evolved by acquiring nucleic acid-binding domains from inactivated retroelements, and their present-day role may be in maintaining proper conformation of the bound RNA cofactors and substrates of the splicing reaction. This is only the second example-the other one being telomerase-of the RT recruitment from a genomic parasite to serve an essential cellular function.

It’s rather obvious when you think about it. Fusion proteins are most likely to be generated by transposition of protein coding genes into the coding regions of other protein coding genes. Any one of it’s domains would have been present in it’s own original transposable element, which inserted into other genes also capable of transposition. It’s made of a hodge-podge of internal genetic parasites all involved in the mechanisms required for effective translocation throughout the host genome.

2 Likes

It is a discussion of how many functional sequences there are in sequence space.

How did all these domains end up on the same continuous strip of DNA? What about the sequences without “homology”.

1 Like

The literature on catalytic antibodies gives us a good lower estimate.

But you’re afraid to look at the evidence, aren’t you?

2 Likes

Curiously enough, Axe’s work refutes @colewd. Axe, if you may recall, crafted a beta-lactamase that was quite unlike any that is seen in databases. His work shows that “preservation” is not a good way to estimate the numbers of functional sequences, and thus FI.

5 Likes

Darn! Now I have a hankering for MacDonalds…

1 Like

Watch out for those french fries. You don’t know where (what?) they’ve been.

And I’d bet that a lot of his mutants with insufficient activity to confer survival still have beta-lactamase activity. If only he’d done some activity assays…

1 Like

That does raise the question of how natural selection could facilitate the evolution of a function with an activity so low it’s fitness effect is invisible. You’d then need multiple copies to be expressed simultaneously, or very strong expression of a smaller number, but very strong expression in turn carries a metabolic penalty which the activity of the enzyme would have to offset. For extremely low activities it is possible they are so weak that no amount of expression would amount to a fitness benefit, because the cost of churning out so many proteins is greater than their combined effect.

While the issue you bring up is one among several I see that cast doubt on Axe’s number about the frequency of functional sequences in sequence space, I do think the type of method he used to screen for enzymes with desired function has some level of relevance to how you would expect selection to enhance a novel enzymatic function in the wild. Selection can only work with functions that are actually visible to selection.
While there might strictly be more enzymes out there with very weak catalytic efficiency, if they are so low they are practically nonfunctional from the standpoint of their effect on fitness, it is difficult to argue their existence is all that relevant.

I can however envisage a very particular situation that would offset the metabolic cost of extremely high expression of an extremely weak enzyme. I suppose that in a situation where nutrients are abundant, and the only growth-limiting factor is the very activity performed by the novel but weak enzyme, for example in the case where a microorganism finds itself close to the concentration limit at which it can survive in the presence of a novel antibiotic, but all other resources necessary for growth and division are otherwise extremely abundant and continuously replenished.

1 Like

Beta lactamase is not highly preserved.

That particular one, maybe. I suspect that you are afraid to look for unrelated ones.

We have real data from forward searches in random sequence space. You’ll ignore them.

That’s not relevant to the point he is making. He’s saying that Axe’s experiments actually show that there are functional sequences with variants not seen in extant life.

You’d have to assess that by experimentally assessing each functional domain. Which is work you and Gpuccio haven’t done. So you have no justification for the number of sequences you’re plugging into the equation as the number of sequences that meet the minimal threshold for function.

By transposition, that’s what transposons do.

What about the sequences without “homology”.

Yeah what about them, do you know the number of sequences that meet the minimum threshold for function for those? Nope, not in this case either.

2 Likes

It’s critical to the point he is making. He is saying preservation is not an indicator or rarity in sequence space. To show this he needs to use a highly preserved protein as an example.

Certainly more experimentation will make the number more precise.

Can you give an example of transposons creating a significantly complex new protein? Art has a single example of recombination doing this.