Durston: Functional Information

Hello all. I appreciate the thoughtful and collegial comments, for the most part. Since my approach to estimating functional information, outlined in my TBMM paper, has been mentioned, I thought I should join the discussion. Before I explain/defend how I estimate the functional information required to code for a protein family, I want to ensure we are all on the same page regarding the basic principles of functional information. From the comments, I think we might be, but I’ll outline seven basics just to make sure we can all sign off on them before moving on.

  1. In general, Shannon information is the difference in Shannon entropy between two states. Shannon entropy is defined in Claude Shannon’s 1948 paper.

  2. The equation for functional information presented by Hazen et al. is merely a special case of Shannon information when the co-variable of function is included. It represents the difference in Shannon entropy between a non-functional ground state and a functional state.

  3. You will note that Hazen’s equation looks quite a bit simpler than the normal equation for Shannon information (i.e., no summation signs, and no variable probabilities). This is because Hazen’s equation assumes that all sequences are equally probable. When that is done, then the normal equation for Shannon information simplifies to the form Hazen presents.

  4. In reality, not all sequences are equally probable. For example, when we estimate the functional information required to code for protein families, the genetic code ensures that not all amino acids are equally probable, hence, not all sequences are equally probable. Even ignoring that, not all sequences are equally probable across all taxa due to environmental and phenotypic constraints on functionality, etc. Nevertheless, it simplifies things to grant Hazen’s assumption and I’m happy to work with it so far as we can.

  5. There is a difference between the functional information required to perform a function, and creating that information.

  6. Duplicating an existing sequence that carries functional information does not increase the amount of functional information created unless the two identical sequences in combination can perform a new function that a single sequence could not. Even then, the amount of information required to produce the duplicate sequence will not be 2X the amount to produce a single sequence, since the new ground state for producing the duplicate sequence already includes the original sequence. If the ground state includes the original sequence as well as a mechanism for duplicating a sequence, then the amount of functional information required to produce the duplicate sequence may be trivial or even zero.

  7. The problem with using Hazen’s equation to estimate the functional information required to code for a particular protein family is that it is a single equation with two unknowns. We haven’t the faintest idea what M(Ex) is, therefore, we cannot solve for I(Ex).

6 Likes