Gpuccio: Functional Information Methodology

Swamidass et al. :

I see that many comments here are about the relationship of my analysis with a possible philogenetic analysis. I am trying to understand better what you mnean and why you suggest that. Maybe some further feedback from you would help.

I will try to clarify a few points about my analysis that, apparently, are not well understood.

  1. My analysis is focusing on the vertebrate transition only because it is very easy to study it. A number of circumstances are particularly favorable, as I have tried to explain. In particular, the time pttern of the pertinent splits, and the presence of sufficient protein sequences in the NCBI database to make the comparisons, and of course, the very good data about the human proteome.

But in no way I am trying to affirm that there is something special in the vertebrate transition. There is a lot of functional information added at that time, and we can easily check the sequence conservation of that information up to humans. However, the same thing probably happens at many other transitions.

So, why do I find a lot of FI at the vertebrate transition? It’s because I am looking for FI specific to the vertebrate branch. Indeed, I am using human proteins as a probe, and humans are of course vertebrate. My analysis shows that a big part of the specific FI found in vertebrates was added at the initial transition. It is not cpmparing that to what happens in other branches of natural history.

Just to be clear, I could analyze in a similat way the transition to hymenoptera. In that case, I would take as probes the protein sequences in some bee, for example, and blast them against pre-hymenoptera and some common ancestor of the main branches in that tree. I have not done that, but it can be done, and it would have the same meaning of my vertebrate analysis: to quantify how much specific FI was added at the beginning of that evolutionary branch.

I am not saying that vertebrates are in any way special. I am not saying that humans are in any way special (well, they are, but for different reasons).

  1. It should be clear that my methodology is not measuring the absolute FI present in a protein. It is only measuring the FI conserved up to humans, and specific to the vertebrate branch.

So, let’s say that protein A has 800 bits of human conserved sequence similarity (conserved for 400+ million years). My methodology affirms that those 800 bits are a good estimator of specific FI. But let’s say that the same protein A, in bees, has only 400 bits of sequence similarity with the human form. Does it mean that the bee protein has less FI?

Absolutely not. It probably just means that the bee protein has less vertebrate specific FI. But it can well have a lot of Hymenoptera specific FI. That can be verified by measuring the sequence similarity conserved in that branch for a few hundred million years, in that protein.

OK, time has expired. More in next post.


Whether some protein grew in size during the evolution of some clade or along some branch, or how conserved that protein is in that clade, seems to me to have little to nothing to do with whether it was designed. What am I missing?


Hi Art, thank you for your contribution, that allows me to clarify some important point.

You are right of course, the bitscore of a BLAST search and the value of FI as -log2 of the ratio between target space and search space are not the same thing. But the point is: the first is a very good estimator of the second, provided that some conditions are satisfied.

The idea of using conserved sequence similarity to estimate FI is not mine. I owe it completely to Durston, and probably others have pointed to that concept before. It is, indeed, a direct consequence of some basic ideas of evolutionary theory. I have just developed a simple method to apply that concept to get a quantitative foundation to the design inference in appropriate contexts.

The condition that essentially has to be satisfied is: sequence conservation for long evolutionary periods. I have always tried to emphasize that it is not simply sequence conservation, but that long evolutionary periods are absolutely needed. But sometimes that aspect is not understood well,/ so i am happy that i can emphasize it here.

I will be more clear. A strong sequence similarity between, say, a human protein and the chimp homologue of course is not a good estimator of FI. The reason for that should be clear enough: the split between chimps and humans is very recent. Any sequence configuration that was present in the common ancestor, be it functionally constrained or not, will probably still be there in humans and chimps, well detectable by BLAST, just because there has not been enough evolutionary time after he split for the sequences to diverge because of neutral variation. IOWs, we cannot distinguish between similarity due to functional constraint and passive similarity, if the time after split is too short.

But what if the time after split is 400+ million years, like in the case of the transition to vertebrates, or maybe a couple billion years, like in the case of ATP synthase beta chain in E. coli and humans?

According to what we know about divergence of synonimous sites, I would say that time windows higher than 200 million years begin to be interesting, and probably 400+ million years are more than enough to guarantee that most of all the sequence similarity can be attributed to strong functional constraint. For 2 billion years, I would say that there can be no possible doubt.

So, in this particular case of long conservation, the degree of similarity becomes a good estimator of functional constraint, and therefore of FI. The unit is the same (bits). The meaning is the same, in this special context.

Technically, the bitscore measures the improbability of finding that similarity by chance in the specific protein database we are using. FI measures the improbability of finding that specific sequence by a random walk from some unrelated starting point. If the sequence similarity can be attributed only to functional constraint, because of the long evolutionary separation, then the two measures are strongly connected.

Of course, there are differences and technical problems. We can discuss them, if you want. The general idea is that the BLAST bitscore is a biased estimator, because it always underestimates the true FI.

But that is not the important point, because we are not trying to measure FI with great precision. We just need some reliable approximation and order of magnitude. Why?

Because in the biological world, a lot of objects (in this case, proteins) exhibit FI well beyond the threshold of 500 bits, that can be conventionally be assumed as safe for any physical system to infer design. So, when I get a result of 1250 bits of new FI added to CARD11 at the start of vertebrate evolution, I don’t really need absolute precision. The true FI is certainly much more than that, but who cares? 1250 bits are more than enough to infer design.

To all those who have expressed doubts about the value of long conserved sequence similarity to estimate FI, I would simply ask the following simple question.

Let’s take again the beta chain of ATP synthase. Let’s BLAST again the E. coli sequence against the human sequence. And, for a moment, let’s forget the bitscore, and just look at identities. P06576 vs WP_106631526.

We get 335 identities. 72%. Conserved for, say, a couple billion years.

My simple question is: if we are not measuring FI, what are we measuring here?

IOWs, how can you explain that amazing conserved sequence similarity, if not as an estimate of functional specificity?

Just to know.

1 Like

You are missing the central core of ID theory, which connects complex FI to the design inference. I will get to that soon enough.

1 Like

Fair enough, will be looking forward to that.

1 Like

2 posts were merged into an existing topic: Comments on Gpuccio: Functional Information Methodology

I will start from this statement to deal with what I call the central/core of ID theory.

You must understand that when I was requested to write a summary of my methodology to measure FI in proteins, I did exactly that. I did not include a complete description of ID theory.

Of course, being a very convinced supporter of ID theory, it was very natural for me to conflate the measurement of very high values of FI with an inference to design, because that’s exactly what ID theory warrants. But now, having discussed in some detail the rationale of my measurement of FI in proteins, the focus can shift to ID theory itself. In brief, what is the connection between complex FI and design? And how does that connection apply to biological objects?

An important premise is that my personal approach to ID theory is completely empirical. It requires no special philosophy or worldview, except some good epistemology and philosophy of science. It is, I believe, completely scientific. And it has no connections with any theology. It has always been my personal choice to avoid any reference to theological arguments in all my scientific discussions about ID theory. And I will stick to that choice here too.

More in next post.


You can feel free to refine your case, as long as you are clear about any shifts you are making.

Three questions:

  1. What empirical evidence do you have that demonstrates FI increases are unique to design?

  2. How have you accounted for the similarity between proteins due to common descent?

  3. How have you accounted for differences between proteins caused by neutral drift and neutral coevolution?



Please, let me go on with some linear explanation of ID theory and my approach to it. Then I will answer your three questions.

You may have noticed that I have proposed two different questions about what ID theory is about:

  1. What is the connection between complex FI and the design inference?

  2. How does that apply to biological objects?

Now, if we want to understand each other, we have to focus first on the first question. To do that, we must for the moment forget biological objects. After all, they are the object we are discussing about: are they designed or do they arise by other mechanisms? So, we will for the moment consider the origin of biological objects undecided, and try to understand ID theory without any reference to biology.

To do that, we need an explicit definition of design and of functional information. I have offered a lnk to my two OPs about those two definitions. So, I will just remind here that:

  1. Design is any process where some conscious intelligent and purposeful agent imprints some specific configuration to a material object deriving it from subjective representations in his consciousness. The key point here is that the subjective representation must precede its output to the materila oobject.

  2. FI is the number of bits required to implement some explicitly defined function. Any function can be used. FI is always defined in relation to the defined function, whatever it is. n object exhibits the level of FI linked to the function if it can be used to implement the explicitly defined function at the explicitly defined level.

  3. In general, an explicitly defined function generates a binary partition in a well defined system and set of possible objects: those that can implement it, and those that cannot. FI, in general, is computed as -log2 of the ratio of the target space (the number of objects that can implement the function) to the search space (the number of possible objects) in the defined system.

More in next post.

Finally, a definition of “design”.

It looks like an entirely reasonable definition.

With this definition, it should be trivially obvious that biological organisms are not the result of design.



I am happy that you appreciate the definition. I love definitions.

For the conclusions, we will see…

1 Like

Speaking only for myself, I don’t think there is any reason to discuss design, or “ID theory,” in this context until the very basic questions asked by @swamidass are addressed. And I would reiterate that asking this question outside of even a basic phylogenetic analysis/approach is futile.

It is not possible to talk meaningfully about “functional information” without these basic foundational tasks being done.


I agree. Rather than a primer on ID generalities, let’s focus on what specifically you @gpuccio are doing.

For example, with your definition of design, there must be a pre-existing design. You are empirically based. So what evidence can you produce for a pre-existing specification? As @nwrickert, it seems obvious that this does not exist, at least not in an human accessible form.


To all:

So, the central core of ID theory is the following:

Leaving aside biological objects (for the moment), there is not one single example in the whole known universe where FI higher than 500 bits arises without any intervention of design.

On the contrary, FI higher than 500 bits (often much higher than that) abunds in designed objects. I mean human artifacts here.

Therefore, if we observe FI in any object (leaving aside for the moment biological objects) we can safely infer a design origin for that object.

That procedure will generate no false positives. Of course, it will generate a lot of false negatives. The threshold of 500 bits has been chosen exactly to get that type of result.

If those points are not clear, we are not really discussing iD theory, but something else.

This strong connection between high FI levels and a design origin has, of course, a rationale. But its foundation is completey empirical. We can observe that connection practically everywhere.

The rationale could be expressed as follows: there is no known necessity law that generates those levels of FI without any design intervention. Therefore, FI in non design systems can arise only by chance. But a threshold of 500 bits is so much higher than the probabilistic resources of the known universe, that we can be sure that such an event is empirically impossible. The probabilistic barriers of getting 500 bits of FI are simply too high to be overcome.

Well, that’s ID theory in a nutshell. I will come to the application to biology later. But I am confident that this simple summary will be enough for the moment to generate some answers. :slight_smile:

1 Like

You present a bare assertion as a premise of your analysis.

Why should we agree with this? It seems obviously false. What evidence do can you present to support this assumption?

There are examples of non-designed processes processes we can directly observe producing FI. We can observe high amounts of FI in cancer evolution too, which you agree is not designed. We also see high amounts of FI in viruses, which you also agree are not designed. All these, and more, are all counter examples to your assumption.

As a technical point, without clarifying precisely how FI is defined, this is not at all clearly the case.


I was answering to the remarks made here asking how I got to the design inference from the simple observation of complex FI. If you are not interested in the theory you seem to discuss so often here, please clarify that.

But that seems not to be the case. I see that, as expected, my “primer on ID generalities” has already generated some fierce response. So, I think I will go on, and answer them.

1 Like

This is what you should be working to demonstrate. You have now asserted this, so you should proceed to demonstrate the truth of this assertion.


I’m sorry, I did not mean to communicate any lack of interest in design; for me, the topic is very interesting and I was not criticizing your inclusion of the topic in general. I added “in this context” in a failed attempt to point out that without some foundational and very basic additional information, the analysis that attempts to address “functional information” is meaningless at best and misleading at worst.


I have a 2048 bit gpg key. Okay, maybe the RSA cryptosystem is inefficient, and there are less than 500 bits of FI there. But I could generate a longer key in much the same way. So the 500 bits doesn’t seem much of a limit.

The RSA key itself was mostly produced by a random number generator, with some filtering. Yes, you could say that the random number generator was designed. And you could say that the RSA cryptosystem was designed. Still, the key itself is mostly generated randomly, so does not seem designed.

What it amounts to, is that if you want to consider the role of design, then you have to push the design further back than the FI (functional information).

There are plenty of people at this site who have no problem with saying that there’s design involved in biological organisms. But they see a need to push the design further back than the organisms themselves. For example, they may see the system of evolution as being designed, but they don’t see individual organisms as designed.

As an agnostic, I cannot rule out the possibility that the system of evolution is designed. Possibly there are some atheists, maybe even Richard Dawkins, who might admit that they cannot rule out design at that level. But you cannot have a science of design for design at that level. You can maybe have a philosophy or theology, but not a science.


A few comments from the peanut gallery.

I find it helpful to have this context given because I’m not already familiar with @gpuccio’s work. If it has wandered out-of-bounds for the relevant question, that should soon become apparent.

@gpuccio: Is this Behe’s definition? Your own?? I am just curious to the source due to another recent discussion here.

Again, a source would be interesting. I won’t argue these definitions here (others have already), but I can work them into other ongoing discussions.

I think you could find naturally occurring non-biological examples too.

@colewd I suggest your question is better for a side thread. I’ll start one … no I won’t, it already exists! :slight_smile: