Is Functional Information Functional?

Rumraket · September 23, 2019, 9:20am

I’m not aware of any protein where this imaginary scenario is true. Short of very specific active site residues exactly where a protein binds a substrate *(a cysteine residue coordinating a metal cofactor, for example), most residues exhibit intramolecular epistasis. Meaning for a constrained residue which if changed, would negatively affect the functional performance of the protein, there is some mutation elsewhere in the protein that can open up for the constrained residues to change neutrally. So they’re very rarely “absolutely required” for function. Another factor to consider is that other residues in place of those “absolutely required” ones are often times just significantly deleterious. They still work, the protein still functions, strictly speaking. But it’s functional performance is reduced such that carriers of this protein have significantly lower fitness. Such mutations are unlikely to reach fixation if they occur, hence we generally don’t see them at high frequencies in any populations. But if we were to correctly estimate the FI for some particular protein’s function, we’d need to include those unknown sequences and we’d need to known how many of them there are in order to plug in the correct number of sequences that meet the minimal threshold for function.

Now, I’m just pointing this out to correct what appears to be a misapprehension of yours on how mutations constrain protein function. This is actually not all that important to Gpuccio’s claim that mutations at synonymous sites reach saturation at 400 million years(though it does have some effect on what we should take “constrained” to mean).

How this protein would have looked like say 10 millions years later in the different vertebrate species of that time. Obviously, the 70% required for its function would have remained unchanged. As for the 30 % neutral part, only a small fraction of the positions would have changed through the action of neutral drift. IOW, 10 millions years later, the homology signal of the neutral part of the protein would have remained very strong.

Now, what about this same protein 100 millions years later in the different vertebrate species of the time? Well, whereas the constrained part (70%) would have remained unchanged, a large fraction of the neutral positions would have changed. However, a sufficient number of neutral positions would still have remained unaffected by the effect of drift, so that a homology signal for the neutral part of the protein would still be detectable.

But 400 millions years later, whereas the constrained part would still be there unchanged, the homology signal of the neutral part would no longer be detectable.

The bottom line here is that the homology signal corresponding to the neutral part of a protein decreases with time until a point P where it becomes undetectable, whereas the homology signal corresponding to the constrained part of the protein remains constant with time. This is why it is only when P is reach that one can safely conclude that all the homology signal observed for a given protein belong necessarily to the constrained, functional part of it. IOW, far from being useless, P is a crucial element of the reasoning for determining FI.

I’m sorry but I really don’t see how this is relevant to estimating FI. You are speaking about detecting homology, not calculating FI. FI is calculated for FUNCTIONS. It doesn’t matter whether the sequences are homologous or not. If two completely unrelated proteins can do the same function, they both need to be counted among the minimum number of sequences that meet the minimal threshold for function.

Also, you are making one long argument for why evolution, starting from a functional sequence, is limited in the degree to which it can explore sequence space for the function. In other words, you’re really just stating reasons for why we can’t make any claims about whether there are other functional, but evolutionarily unrelated sequences out there in that space. Which is what I have been saying all along. That taking some functional protein sequence, evolution will be exploring the space immediately surrounding it with point mutations, perhaps with a few jumps if we include larger deletions or insertions, but that this exploring is always constrained to the immediate surroundings of some already functional sequence. So we simply can’t claim no other functions are out there, so we can’t say we really know the FI for any protein’s function.

Suppose you have two groups of apparently unrelated proteins of equal length, both of which can do the same job. Group 1 proteins can do the job, and they’re all homologous to each other, but they’re not homologous to group 2. But group 2 can also do the job, but they’re only homologous to each other, not to group 1. They are able to perform the same function, despite having no detectable homology. What’s the FI? Well technically both those proteins are sequences in the same sequence space, and they can both do the same function. So they’re both groups of proteins that meet the minimal threshold for function, yet with no detectable homology(you wouldn’t find group 1 proteins when doing BLAST searches for group 2, or the other way around).

Both of those groups of sequences would occupy some area of sequence space but isolated from each other, but even when the total number of all the sequences were added together, for even a relatively short protein sequence they would only occupy an infinitesimal fraction of that sequence space. Restricting ourselves to only looking at homologous sequences actually risks us underestimating how many sequences are out there that meet the minimal threshold.

So why is homology important for calculating FI again? It obviously can’t be.
Edit:spelling

Roy · September 23, 2019, 9:37am

Or possibly higher fitness, since sometimes having less functional proteins is beneficial to survival in a new niche.

The 400 million year wait assumes that the selection pressure on the protein has remained unchanged for all that time. Since 400 million years ago there were no land vertebrates, that’s an assumption that needs justification.

Rumraket · September 23, 2019, 11:02am

Well I think if less protein is beneficial, then regulatory mutations that reduce expression is what would be selected for, not changes in the amino acid sequence of the protein itself.

The 400 million year wait assumes that the selection pressure on the protein has remained unchanged for all that time. Since 400 million years ago there were no land vertebrates, that’s an assumption that needs justification.

That does raise another important point about the relationship between fitness and environment. That just makes the case even worse for the idea that we can estimate FI by looking at the diversity of sequences in some clade, since the functionality, and hence fitness of any given residue is always dependent on the environmental context in which it appears. Many proteins in my cells would become misfolding, nonfunctional junk, if expressed in a hyperthermophilic prokaryote that lives at temperatures >100 degrees C, and possibly the other way around.

We aren’t even able to confidently speak about the FI for some function in a single environment, but once we consider all other possible environments it becomes completely hopeless.

T_aquaticus · September 23, 2019, 2:47pm

With modern technology and modern knowledge? No. Such a measurement is theoretically possible, but not technically possible with what we have right now. What we would need to know is every possible amino acid sequence that would result in a given function, and that just isn’t knowable in today’s world. In fact, we can’t even do the reverse problem of determining what function a given protein has with consistent accuracy.

T_aquaticus · September 23, 2019, 2:53pm

@glipsnort already explained why this doesn’t work.

For any given protein, there might be a sequence that is way different but with the same function. However, evolution will much more likely favor the protein it already has with that function which has also probably had its function improved over time. Conservation of sequence can only tell us about the local fitness peak that the sequence is found in, as @glipsnort describes.

T_aquaticus · September 23, 2019, 2:57pm

How would you determine this for any protein? How would you be able to rule out proteins of different lengths that had the same function but don’t have any recognizable homology to the protein of interest?

What you are measuring is the number of changes you can make without losing function in the protein sequence you started with. It can’t tell you how many different protein sequences you can start with that have that function.

colewd · September 23, 2019, 4:30pm

You are creating your argument and that’s fine.

colewd · September 23, 2019, 4:31pm

If you need a precise number you are right but an estimate is a place to start.

colewd · September 23, 2019, 4:33pm

Rumraket:

I’m not aware of any protein where this imaginary scenario is true. Short of very specific active site residues exactly where a protein bind’s a substrate *(a cysteine residue coordinating a metal cofactor, for example), most residues exhibit intramolecular epistasis. Meaning for a constrainted residue which if changed, would negatively affect the functional performance of the protein, there is some mutation elsewhere in the protein that can open up for the constrained residues to change neutrally. So they’re very rarely “absolutely required” for function. Another factor to consider is that other residues in place of those “absolutely required” ones are often times just significantly deleterious. They still work, the protein still functions, strictly speaking. But it’s functional performance is reduced such that carriers of this protein have significantly lower fitness. Such mutations are unlikely to reach fixation if they occur, hence we generally don’t see them at high frequencies in any populations. But if we were to correctly estimate the FI for some particular protein’s function, we’d need to include those unknown sequences and we’d need to known how many of them there are in order to plug in the correct number of sequences that meet the minimal threshold for function.

What percentage of eukaryotic proteins have you studied?

T_aquaticus · September 23, 2019, 4:35pm

You don’t even have an estimate.

colewd · September 23, 2019, 4:37pm

This is your assertion. I assume you would chose not to want to participate further if this is your belief.

T_aquaticus · September 23, 2019, 4:39pm

Huh? I will continue to critique claims made by the ID/creationist community. I wasn’t aware that someone had to believe something was true in order to critique it. Echo chambers are notorious for creating poor outcomes.

colewd · September 23, 2019, 4:45pm

You are not critiquing the claims. Your setting arbitrary requirements for making an estimate. Estimates are made with partial information. I have made a suggestion on how this could move forward and that would be testing gpuccio’s method against kirks. You do not seem interested so at this point why continue?

T_aquaticus · September 23, 2019, 4:48pm

I want to estimate how many different types of rocks there are on the Earth. I go outside and look at 1 square centimeter of dirt. I use this partial information to make estimates of how many different types of rock there are. Do I have a valid estimate?

colewd · September 23, 2019, 4:50pm

Your estimate is better than 0. Now how do you refine it?

T_aquaticus · September 23, 2019, 4:56pm

It’s a worthless estimate, as is the estimate of FI used by @gpuccio.

colewd · September 23, 2019, 4:57pm

Now you are returning to arbitrary denial. I don’t think we need to continue on this subject.

T_aquaticus · September 23, 2019, 5:00pm

A hundred irony meters just exploded in unison.

Rumraket · September 23, 2019, 5:02pm

Why don’t you ask that question of Giltil, or Gpuccio? The reason I didn’t ask a question like that, of them, is that I don’t think any of us need to have personally characterized the epistasis of some fraction of proteins in order to be able to read and comprehend literature on the subject.

I have read review articles(the kind that summarizes the state of a particular field), written by scientists with expertise in the relevant fields. I don’t need to have personally studied any particular fraction of proteins to be able to read and understand what these articles say.

Rumraket · September 23, 2019, 5:07pm

This is a bad rationalization. You can’t just make up an estimate. It has to at least make sense.

You can’t sensibly “estimate” the prevalence of function in sequence space when you have only sampled 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000561% of it.

That’s many orders of magnitude worse than saying you can estimate how many Saturn-like planets there are in the universe by looking only at our own solar system, and perhaps also the nearest star Proxima Centauri. Your sample size is too small to make a sensible estimation.

Topic		Replies	Views
Gpuccio: Functional Information Methodology Conversation Science , Design	183	13493	September 1, 2019
Durston: Functional Information Office Hours Design	63	8285	December 5, 2018
Explaining the Cancer Information Calculation Conversation	85	6777	September 28, 2020
Looking for sources on the information argument Conversation Design	127	2889	September 10, 2021
Computing the Functional Information in Cancer Conversation Design	41	5443	July 6, 2020

Is Functional Information Functional?

Related topics