Is Functional Information Functional?

Rumraket · September 24, 2019, 8:12pm

Have to agree with @Giltil here. @Mercer if you’re not saying conservation of protein sequences over deep time is due to purifying selection against deleterious amino acid substitutions, then what is it?

Mercer · September 24, 2019, 8:14pm

There’s no reason for proteins to have maximum activity. Evidence tells us that increasing activity can be just as bad for the organism as decreasing it.

Rumraket · September 24, 2019, 8:16pm

I understand and agree, I just don’t see how that is relevant to the observation that some protein sequences are very well conserved over long time periods. Presumably that is because mutations are selected against?

I did not take Gilber to imply that protein activity is always selected to be maximized. But the protein’s effect on fitness is. There does not have to be a direct relationship between the two for purifying selection to select against deleterious mutations. Selection is towards the optimal level of function(whether that is high or low activity), not the maximal level of function. So, both mutations that increase or decrease the activity outside of the optimal threshold are selected against.

T_aquaticus · September 24, 2019, 8:31pm

I would agree that conservation is due to negative selection, including selection against deleterious mutations that increase activity in certain conditions.

However, that can’t tell us anything about the probability of getting that first protein in the lineage that was passed down to all of those descendants. All it tells us is that the protein affects fitness.

Let’s use cars as a crude analogy. If you went out to your car and started removing parts from under the hood you would end up with a non-functional engine at some point. If that engine is an internal combustion engine, does that indicate that only those types of engines will work in a car? No. We could start with an electric engine. We could start with a steam engine. The conservation needed in car parts does not tell us about which parts we could start with. That’s the error in the FI calculations.

Giltil · September 24, 2019, 8:33pm

Rumraket:

Suppose you have two groups of apparently unrelated proteins of equal length, both of which can do the same job. Group 1 proteins can do the job, and they’re all homologous to each other, but they’re not homologous to group 2. But group 2 can also do the job, but they’re only homologous to each other, not to group 1. They are able to perform the same function, despite having no detectable homology. What’s the FI? Well technically both those proteins are sequences in the same sequence space, and they can both do the same function. So they’re both groups of proteins that meet the minimal threshold for function, yet with no detectable homology(you wouldn’t find group 1 proteins when doing BLAST searches for group 2, or the other way around).

Both of those groups of sequences would occupy some area of sequence space but isolated from each other, but even when the total number of all the sequences were added together, for even a relatively short protein sequence they would only occupy an infinitesimal fraction of that sequence space. Restricting ourselves to only looking at homologous sequences actually risks us underestimating how many sequences are out there that meet the minimal threshold.

So why is homology important for calculating FI again? It obviously can’t be.

Okay, you are assuming that several group of unrelated proteins exist in the sequence space that can perform a given function. My take is that this may be true for some functions but not for others. For example, I am very skeptical that an unrelated protein exists in the sequence space that could perform the function of the beta subunit of ATP synthase. But let’s say that I am wrong and that such unrelated group of proteins exist that can perform the function of ATP synthase. In that case, the question is how many such unrelated groups of protein exist? The reasonable answer would be to say that they are few. And for this reason, the FI associated with the function of the beta subunit of the ATP synthase would remain high enough to warrant a design inference.

T_aquaticus · September 24, 2019, 8:43pm

For FI to make any sense, you would have to know all of the protein sequences for a given function. Being skeptical about their existence isn’t enough. You need to KNOW.

Why is that reasonable?

colewd · September 24, 2019, 8:45pm

It tells you what the protein looked like at the split if it is similar for 2 organisms split over x million years. Given it is in a property of high functional constraint we can be confident that FI is higher than 0. @ Art agrees with this as he has estimated 70 bits for most proteins. I am interested in his argument as Gpuccio method measures certain proteins at 4000 bits. Durston’s method also gets numbers closer to Gpuccio.

T_aquaticus · September 24, 2019, 8:48pm

Previously, @gpuccio stated that sequences with no constraint and no known function can have FI, so that doesn’t compute. I described a situation where a nonfunctional sequence acquired one mutation and gained function, and I was told that the nonfunctional sequence had FI.

If all you are going by is constraint, then that can only tell you how long the protein has kept its function in different lineages. It says nothing about the probability of getting that protein to being with.

Rumraket · September 24, 2019, 9:51pm

That is one possibility, yes. We know of concrete examples where this is the case.

My take is that this may be true for some functions but not for others.

Perhaps that is true, but for you to claim you know the FI for some protein, you’d need to actually know that, and how many of them there are.

For example, I am very skeptical that an unrelated protein exists in the sequence space that could perform the function of the beta subunit of ATP synthase.

What is that skepticism really based on? It’s not based on any empirical knowledge about what is “out there” in the unexplored sequence space (it is unexplored, after all). So what merits, or motivates it, in the end? Considering how colossal that space is, why should there not be other very different sequences that are able to perform the same function?

But let’s say that I am wrong and that such unrelated group of proteins exist that can perform the function of ATP synthase. In that case, the question is how many such unrelated groups of protein exist?

Yes, and how could we know that, without somehow testing for it?

The reasonable answer would be to say that they are few.

Why? When scientists actually test for specific protein functions by generating large ensembles of random sequences, they usually discover mutiple completely unrelated sequences able to perform the same function. Now, I wouldn’t go so far as to say we can generalize that to all possible functions. But it does seem to suggest that it is the other way around than your intuition here. It is usually the case that multiple different ones are found.

To pick just one example, when the Szostak lab attempted to test for the presence of ATP binding proteins in a library of approximately 10^11 different random protein sequences 80 residues long,they found four completely different ones, with a level of sequence similarity below that expected by chance. They were almost completely dissimilar. So in that random sample of only 10^11 sequences, there were four.

This is actually somewhat related to the function of ATP synthase. In order to catalyze the interconversion of ATP and ADP+PO4, the protein needs to bind it’s substrate. The random proteins the Szostak lab found were able to do that. Next step would be the catalysis of the ATP to ADP+PO4 (or reverse) reactions. Surprisingly, the protein found by the Szostak lab is actually a catalyst of that reaction too.

colewd · September 24, 2019, 10:04pm

You have to be careful with semantic miscommunication. To make clearer which gpuccio did not object to I separated the concept of functional bits with functional information. A sequence can have functional bits but no functional information relative to a specific function.

If you have a sequence that has been preserved over deep time it does tell you something about the probability of getting that protein function with a random search. How much it tells you is where the debate lies.

T_aquaticus · September 24, 2019, 10:11pm

If I have a protein that is 500 amino acids long, and one mutation produces new function in that protein, what would the FI of that protein be given there are no other homologous proteins with that function? What does that calculation look like?

No, it doesn’t, as I have already explained. It can only tell you about the local fitness landscape, not the total fitness landscape.

colewd · September 24, 2019, 10:16pm

I think you need more information to make a calculation.

The local fitness land scape is more information than nothing. How much information it is again is up for debate. How big are the fitness landscapes or are there fitness landscapes in this specific case?

Timothy_Horton · September 24, 2019, 10:18pm

It’s almost certainly a waste of time to explain to you for the 67,836th time evolution doesn’t work just by a random search.

T_aquaticus · September 24, 2019, 10:21pm

But it’s not enough information for the probabilities you claim to be calculating.

We don’t know, which is why those probability calculations can’t be made.

colewd · September 24, 2019, 10:31pm

It can get you off the zero number for FI. Again how far off is open to debate. Some correlation work can help sort this out in my opinion.

colewd · September 24, 2019, 10:33pm

Before a protein forms what is the cell selecting for?

T_aquaticus · September 24, 2019, 10:35pm

You and others are making claims about FI well past zero, and don’t have the data to back it up.

colewd · September 24, 2019, 10:41pm

You don’t know if it is legit or not as your objections are based on conjecture and not evidence. Do other hills exist for Prp8? If there are only a few or they do not constrain the protein then GP’s calculations could be very conservative.

I am simply objecting to your false claim that the comparison gives us NO information.

Rumraket · September 24, 2019, 10:44pm

Fitness. Always. Both before and after a protein evolves. It is the protein’s effect on fitness that matters, however the protein’s function accomplishes that.

So it is mutations that increase fitness that have a higher chance of going to fixation, and mutations that reduce fitness are selected against. If a mutation results in a new protein that has a positive effect on fitness, it has a higher chance of going to fixation that if it was neutral, or deleterious.

colewd · September 24, 2019, 10:46pm

All else being equal is a AA sequence likely to effect fitness before it is functional?

Topic		Replies	Views
Gpuccio: Functional Information Methodology Conversation Science , Design	183	12549	September 1, 2019
Durston: Functional Information Office Hours Design	63	8185	December 5, 2018
Information is Additive but Evolutionary Wait Time is Not Conversation Science	12	1508	September 3, 2019
Explaining the Cancer Information Calculation Conversation	85	6654	September 28, 2020
Shannon information and COVID-19 Conversation Science , Article	93	1896	October 7, 2022

Is Functional Information Functional?

Related topics