Durston: Functional Information

Ok, let’s take a look at p53. Just to make it a little more interesting and realistic, I downloaded an 872-sequence MSA from Pfam for the p53 domain and ran it through a program I wrote. You can see a portion of the results in the linked file, but I’ll highlight the main items of interest here.

Initial Info:

Number unique sequences: 487

No. columns in MSA before stripping out insertions: 669

No. Columns in MSA after reducing the sequences to their essential core: 187

Average no. of different aa’s tolerated per site: 14

Estimated minimum no. of mutational events per site: 23

Disclaimer:

The Pfam MSA does not have adequate data for the p53 domain to obtain an acceptable estimate of functional information. I prefer to have a minimum of 30 estimated mutational events per site and at least a few thousand unique sequences. The data here demonstrates only a minimum of 23. Nevertheless, let’s just work with what we have as a “toy” problem.

Functional information required to code for a p53 domain:

  • starting from a physical system in the null state = 409 Bits (Avg. density = 2.17 bits/site)

  • starting from a physical system skewed by the genetic code = 374 Bits (Avg. density = 1.99 bits/site).

  • extreme lower limit = 123 bits (Avg density = 0.65 bits).

Functional information for non-functional p53 domain:

Here, it is critical to clearly define what the function is that determines whether a sequence is functional or not. There are three perspectives we will take: a) relative to an intelligent observer, b) relative to the normal, physical p53 function c) relative to a cancerous cell. Looking at the attached data, we observe that site 180 permits only one, specific amino acid, arginine. Let us suppose that it is mutated to something else, thereby inactivating p53 so that it can no longer perform it’s function. The mutated sequence falls into the set of sequences that are non-functional relative to the normal function. Let us use the extreme lower limit of 123 bits for a functional p53 domain.

a) relative to an intelligent observer: Even though the sequence is now non-functional we can see from the data I linked to, that only 4.32 bits of functional information has been lost, but the function has shifted from fulfilling it’s cellular duties to becoming “meaningful” to an intelligent observer, relative to the cellular function the intelligent observer “knows” it should have. The intelligent observer “knows” what the desired function is and “sees” that this sequence is quite “close”. The physical system does not work that way; it does not know or see anything, only whether it is functional or not.

b) relative to the normal p53 function: the mutated sequence is now a member of the set of non-functional sequences. The cell cannot “see” how close the sequence is; the sequence either satisfies the function or it does not. In this case it does not. A non-functional sequence requires approximately 0 bits to encode. Relative to the physical system, 123 bits of functional information has been lost. The gene is now inactive. This, however, is the simplest case. There are more complex cases that we shall leave aside for now.

Let’s pause here to underscore a critical point, and a frequent source of confusion. “Meaningful” is a special case of the more general “functional”. For intelligent observers/engineers, a sequence can be meaningful to the mind, while at the same time be non-functional to the physical system. For example, an intelligent observer attaches a certain degree of “meaning” to a p53 sequence that is “close” to being functional. The cell may not. In discussions of functional information, therefore, the function that is being discussed must always be clearly defined, as distinguished in the above example between (a) and (b). A common mistake is to conflate functions, illustrated by overlapping Venn diagrams, when the two functions are actually completely independent, not even existing in the same reference frame.

c) relative to a cancerous cell: There are three possibilities relative to the physical system of a cancerous cell, setting subjective “meaning” aside.

  1. If p53 is still functional relative to it’s normal physical function, then the FI = 123 bits.

  2. If p53 is non-functional relative to it’s normal physical function, then FI is 0 bits.

  3. If p53 is non-functional relative to it’s normal function but has a new function to do with cancer, then we need to have some way of determining what sequences satisfy that new function, before we can estimate FI. If non-functional p53 is simply a “bystander” in a cancerous cell (i.e., doesn’t do anything), then without a function, by definition, a system has no FI. FI is contingent on a specified function. If we wish to arbitrarily say that being a “bystander” is a function, then we must realize we are no longer talking about any function required by the physical system … the system does not require “bystanders”. We are now talking about an arbitrary, subjective function that we have invented in our minds. We can still estimate FI in that context, but it no longer has anything to do with biology.