Have to agree with @Giltil here. @Mercer if you’re not saying conservation of protein sequences over deep time is due to purifying selection against deleterious amino acid substitutions, then what is it?
There’s no reason for proteins to have maximum activity. Evidence tells us that increasing activity can be just as bad for the organism as decreasing it.
I understand and agree, I just don’t see how that is relevant to the observation that some protein sequences are very well conserved over long time periods. Presumably that is because mutations are selected against?
I did not take Gilber to imply that protein activity is always selected to be maximized. But the protein’s effect on fitness is. There does not have to be a direct relationship between the two for purifying selection to select against deleterious mutations. Selection is towards the optimal level of function(whether that is high or low activity), not the maximal level of function. So, both mutations that increase or decrease the activity outside of the optimal threshold are selected against.
I would agree that conservation is due to negative selection, including selection against deleterious mutations that increase activity in certain conditions.
However, that can’t tell us anything about the probability of getting that first protein in the lineage that was passed down to all of those descendants. All it tells us is that the protein affects fitness.
Let’s use cars as a crude analogy. If you went out to your car and started removing parts from under the hood you would end up with a non-functional engine at some point. If that engine is an internal combustion engine, does that indicate that only those types of engines will work in a car? No. We could start with an electric engine. We could start with a steam engine. The conservation needed in car parts does not tell us about which parts we could start with. That’s the error in the FI calculations.
Okay, you are assuming that several group of unrelated proteins exist in the sequence space that can perform a given function. My take is that this may be true for some functions but not for others. For example, I am very skeptical that an unrelated protein exists in the sequence space that could perform the function of the beta subunit of ATP synthase. But let’s say that I am wrong and that such unrelated group of proteins exist that can perform the function of ATP synthase. In that case, the question is how many such unrelated groups of protein exist? The reasonable answer would be to say that they are few. And for this reason, the FI associated with the function of the beta subunit of the ATP synthase would remain high enough to warrant a design inference.
For FI to make any sense, you would have to know all of the protein sequences for a given function. Being skeptical about their existence isn’t enough. You need to KNOW.
Why is that reasonable?
It tells you what the protein looked like at the split if it is similar for 2 organisms split over x million years. Given it is in a property of high functional constraint we can be confident that FI is higher than 0. @ Art agrees with this as he has estimated 70 bits for most proteins. I am interested in his argument as Gpuccio method measures certain proteins at 4000 bits. Durston’s method also gets numbers closer to Gpuccio.
Previously, @gpuccio stated that sequences with no constraint and no known function can have FI, so that doesn’t compute. I described a situation where a nonfunctional sequence acquired one mutation and gained function, and I was told that the nonfunctional sequence had FI.
If all you are going by is constraint, then that can only tell you how long the protein has kept its function in different lineages. It says nothing about the probability of getting that protein to being with.
That is one possibility, yes. We know of concrete examples where this is the case.
My take is that this may be true for some functions but not for others.
Perhaps that is true, but for you to claim you know the FI for some protein, you’d need to actually know that, and how many of them there are.
For example, I am very skeptical that an unrelated protein exists in the sequence space that could perform the function of the beta subunit of ATP synthase.
What is that skepticism really based on? It’s not based on any empirical knowledge about what is “out there” in the unexplored sequence space (it is unexplored, after all). So what merits, or motivates it, in the end? Considering how colossal that space is, why should there not be other very different sequences that are able to perform the same function?
But let’s say that I am wrong and that such unrelated group of proteins exist that can perform the function of ATP synthase. In that case, the question is how many such unrelated groups of protein exist?
Yes, and how could we know that, without somehow testing for it?
The reasonable answer would be to say that they are few.
Why? When scientists actually test for specific protein functions by generating large ensembles of random sequences, they usually discover mutiple completely unrelated sequences able to perform the same function. Now, I wouldn’t go so far as to say we can generalize that to all possible functions. But it does seem to suggest that it is the other way around than your intuition here. It is usually the case that multiple different ones are found.
To pick just one example, when the Szostak lab attempted to test for the presence of ATP binding proteins in a library of approximately 10^11 different random protein sequences 80 residues long,they found four completely different ones, with a level of sequence similarity below that expected by chance. They were almost completely dissimilar. So in that random sample of only 10^11 sequences, there were four.
This is actually somewhat related to the function of ATP synthase. In order to catalyze the interconversion of ATP and ADP+PO4, the protein needs to bind it’s substrate. The random proteins the Szostak lab found were able to do that. Next step would be the catalysis of the ATP to ADP+PO4 (or reverse) reactions. Surprisingly, the protein found by the Szostak lab is actually a catalyst of that reaction too.
You have to be careful with semantic miscommunication. To make clearer which gpuccio did not object to I separated the concept of functional bits with functional information. A sequence can have functional bits but no functional information relative to a specific function.
If you have a sequence that has been preserved over deep time it does tell you something about the probability of getting that protein function with a random search. How much it tells you is where the debate lies.
If I have a protein that is 500 amino acids long, and one mutation produces new function in that protein, what would the FI of that protein be given there are no other homologous proteins with that function? What does that calculation look like?
No, it doesn’t, as I have already explained. It can only tell you about the local fitness landscape, not the total fitness landscape.
I think you need more information to make a calculation.
The local fitness land scape is more information than nothing. How much information it is again is up for debate. How big are the fitness landscapes or are there fitness landscapes in this specific case?
It’s almost certainly a waste of time to explain to you for the 67,836th time evolution doesn’t work just by a random search.
But it’s not enough information for the probabilities you claim to be calculating.
We don’t know, which is why those probability calculations can’t be made.
It can get you off the zero number for FI. Again how far off is open to debate. Some correlation work can help sort this out in my opinion.
Before a protein forms what is the cell selecting for?
You and others are making claims about FI well past zero, and don’t have the data to back it up.
You don’t know if it is legit or not as your objections are based on conjecture and not evidence. Do other hills exist for Prp8? If there are only a few or they do not constrain the protein then GP’s calculations could be very conservative.
I am simply objecting to your false claim that the comparison gives us NO information.
Fitness. Always. Both before and after a protein evolves. It is the protein’s effect on fitness that matters, however the protein’s function accomplishes that.
So it is mutations that increase fitness that have a higher chance of going to fixation, and mutations that reduce fitness are selected against. If a mutation results in a new protein that has a positive effect on fitness, it has a higher chance of going to fixation that if it was neutral, or deleterious.
All else being equal is a AA sequence likely to effect fitness before it is functional?