Durston: Functional Information

Kirk · September 25, 2018, 3:54pm

Hello all. I appreciate the thoughtful and collegial comments, for the most part. Since my approach to estimating functional information, outlined in my TBMM paper, has been mentioned, I thought I should join the discussion. Before I explain/defend how I estimate the functional information required to code for a protein family, I want to ensure we are all on the same page regarding the basic principles of functional information. From the comments, I think we might be, but I’ll outline seven basics just to make sure we can all sign off on them before moving on.

In general, Shannon information is the difference in Shannon entropy between two states. Shannon entropy is defined in Claude Shannon’s 1948 paper.
The equation for functional information presented by Hazen et al. is merely a special case of Shannon information when the co-variable of function is included. It represents the difference in Shannon entropy between a non-functional ground state and a functional state.
You will note that Hazen’s equation looks quite a bit simpler than the normal equation for Shannon information (i.e., no summation signs, and no variable probabilities). This is because Hazen’s equation assumes that all sequences are equally probable. When that is done, then the normal equation for Shannon information simplifies to the form Hazen presents.
In reality, not all sequences are equally probable. For example, when we estimate the functional information required to code for protein families, the genetic code ensures that not all amino acids are equally probable, hence, not all sequences are equally probable. Even ignoring that, not all sequences are equally probable across all taxa due to environmental and phenotypic constraints on functionality, etc. Nevertheless, it simplifies things to grant Hazen’s assumption and I’m happy to work with it so far as we can.
There is a difference between the functional information required to perform a function, and creating that information.
Duplicating an existing sequence that carries functional information does not increase the amount of functional information created unless the two identical sequences in combination can perform a new function that a single sequence could not. Even then, the amount of information required to produce the duplicate sequence will not be 2X the amount to produce a single sequence, since the new ground state for producing the duplicate sequence already includes the original sequence. If the ground state includes the original sequence as well as a mechanism for duplicating a sequence, then the amount of functional information required to produce the duplicate sequence may be trivial or even zero.
The problem with using Hazen’s equation to estimate the functional information required to code for a particular protein family is that it is a single equation with two unknowns. We haven’t the faintest idea what M(Ex) is, therefore, we cannot solve for I(Ex).

Topic		Replies	Views
Shannon information and COVID-19 Conversation Science , Article	93	1727	October 7, 2022
Explaining the Cancer Information Calculation Conversation	85	6096	September 28, 2020
Gpuccio: Functional Information Methodology Conversation Science , Design	183	11275	September 1, 2019
Mercer's Work on Protein Function and Sequence Space Office Hours Design	5	769	June 19, 2021
Information is Additive but Evolutionary Wait Time is Not Conversation Science	12	1388	September 3, 2019

Durston: Functional Information

Related Topics