Gpuccio: Functional Information Methodology

gpuccio · August 27, 2019, 4:14pm

I start with you, because at least I have not to show meteorologic abilities that I do not posses! Art’s tornadoes will be more of a challenge.

I am not sure if the problem here is a big misunderstanding of what FI is. Maybe, let’s see.

According to my definition, FI can be measured for any possible function. Any observer is free to define a function as he likes, but the definition must be explicit and include a level to assess the function as present. Then, FI can be measured for the function, and objects can be categorized as expressing that function or not.

An important point is that FI can be generated in non design systems, but only at very low levels. The 500 bit threshold is indeed very high, and it is appropriate to really exclude any possible false positive in the design inference.

I think that I must also mention a couple of criteria that could be important in the following discussion. I understand that I have not clarified them before, but believe me, it’s only because the discussion has been too rushed. Those ideas are an integral aprt of all ID thinking, and you can find long discussions made by me at UD in the past trying to explain them to other interlocutors.

The first idea you should be familiar with, if you have considered Dembski’s explanatory filter. The idea is that, before making a design inference, we should always ascertain that the configurations we observe are not the simple result of known necessity laws. For the moment, I will not go deeper on this point.

The second point is about specification, not only functional specification, but any kind of specification. IOWs, any type of rule that generates a binary partition in the search space, defining the target space.

The rule is simple enough. If we are dealing with pre-specifications, everything can work. IOWs, let’s take the simple example of a deck of cards. If I declare in advance a specific sequence of them, and then I shuffle the cards and I get the sequence, something strange is happening. A design inference (some trick) is certainly allowed.

But if we are dealing with post-specifications, IOWs we give the rule after the object had come into existence and after we have observed it, then the rule must be independent from the specific configuration of bits observed in the object. Another way to say that is that I cannot use the knowledge of the individual bits observed in the object to build the rule. In that case, I am only using an already existin generic infomration to build a function.

So, going back to our deck of cards, observing a sequence that shows the cards in perfect order is always a strange result, but I cannot say: well, my function is that the cards must have the following order, and then just read the order of a sequence that has already been obtained and observed.

This seems very trivial, but I want to make it clear because a lot of people are confused about these things.

So, I can take a random sequence of 100 bits and then set it as electronic key to a safe. Of coruse, there is nothing surprising in that: the random series was a random series, maybe obtained by tossing a fair coin, and it had no special FI. But, when I set it as a key, the functional information in that sequence becomes 100 bits. Of course, it will be almost impossible to get that sequence by a new series of coin tossing.

Another way to say these things is that FI is about configurations of configurable switches, each of which can in principle exist in at least two different states, so that the specific configuration is the one that can implement a function. This concept is due to Abel.

OK, let’s go back to your examples. Let’s take the first one, the other will probably be solved automatically.

The configuration of stars in the sky.

OK, it is a complex configuration. As it is the configuration of grain of sands on a beach.

So, what is the function?

You have to define a function, and a level of it that can define it as present or absent in the object we are observing.

What is the object? The starry sky? You mean our galaxy, or at least the part we can observe from our planet?

What is the function?

You have to specify all these things.

Frankly, I cannot see any relevant FI in the configuration of stars. Maybe we can define some function for which a few bits could be computed, but no more than that.

So, as it is your example, plese clarify better.

Fior me, it is rather obvious that none of your examples shows any big value of FI for any possible function, And that includes Art’s tornado, which of course I will discuss separately with him.

Looking forward to your input about that.

gpuccio · August 27, 2019, 4:18pm

Thank you!

I will come to your tornado as soon as possible. In the meantime, the discussion with Swamidass can maybe help clarify some points. I will come back to the discussion later.

gpuccio · August 27, 2019, 4:20pm

You are anticipating too much. Have patience. I am only saying that the correct model for the RV part of the neo-darwinian model is a random walk. For the moment, I have not considered NS or other aspects.

By the way, the random walk model is also valid for neutral drift because, as sait, it is part of the RV aspect.

As said, my estimate is a good lower threshold. For design inference, that is fine.

I have discussed that. Why do you doubt that it is a reasonable approximation? It is not confused by neutral evolution, why should it? The measurement itself is based on the existence of neutral evolution. Why should that generate any confusion?

I have said that my procedure cannot evaluate functional divergence as separate from neutral divergence. Thereofre, what I get is a lower threshold. And so? What is the problem? As a lower threshold I declare it, and as a lower threshold I use it in my reasonings. Where is the problem?

Of course, as said, I am not considering NS. Yet. I will. But I have already pointed to two big OPs of mine, one for RV and one for NS. You can find a lot of material there, if you have the time.

However, I will come to that. And to the role of NS in generating FI. Just give me time.

But RV is a random system of events. It must be treated and analyzed as such.

swamidass · August 27, 2019, 4:45pm

I said the information is the positions of visible stars in the sky. The function of this information, for many thousands of years, was navigation (latitude and direction), time-telling (seasons), and storytelling (constellations). Any change that would impact navigation, time-telling, or storytelling, or create a visual difference would impact one or all these things.

There are about 9,000 visible stars in the sky (low estimate). Keeping things like visual acuity in mind (Naked eye - Wikipedia), we can compute the information. However, even if there are just two possible locations in the sky for every star (absurd) and only half the stars are important (absurd), we are still at 4,500 bits of information in the position of stars in the sky. That does not even tell us the region of sky we are looking at (determined by season and latitude), but we can neglect this for now.

gpuccio · August 27, 2019, 4:50pm

swamidass:

What is the object? The starry sky? You mean our galaxy, or at least the part we can observe from our planet?

What is the function?

I said the information is the positions of visible stars in the sky. The function of this information, for many thousands of years, was navigation (latitude and direction), time-telling (seasons), and storytelling (constellations). Any change that would impact navigation, time-telling, or storytelling, or create a visual difference would impact one or all these things.

There are about 9,000 visible stars in the sky (low estimate). Keeping things like visual acuity in mind (Naked eye - Wikipedia), we can compute the information. However, even if there are just two possible locations in the sky for every start (absurd) and only half the stars are important (absurd), we are still at 4,500 bits of information in the position of stars in the sky. That does not even tell us the region of sky we are looking at (determined by season and latitude), but we can neglect this for now.

I wll briefly answer this, and then for the moment I must go.

What makes the current configuration of the stars specific to help navigation, time telling or story telling? If the configuration were a different random configuration, wouldn’t it be equally precious for navigation, time telling and storytelling?

There is no specific functional information in the configuration we observe. Most other copnfigurations generated by cosmic events would satisfy the same functions you have defined.

swamidass · August 27, 2019, 4:52pm

Yes you did say this. We dispute this claim. Both @sfmatheson, @glipsnort, and I have all explained our objections.

This the crux (or at least one crux) of the issue. We are convinced that neutral evolution will be mistaken as FI gains. You have not put forward any negative controls to quell our objections. See what has already been said:

glipsnort:

My method underestimates FI, and this is one of the resons for that.

As I pointed out, under some circumstances your method will grossly overestimate FI as you have defined it. You would at least have an argument if you redefined FI to be the information needed to keep a particular gene (say) functional.

Even if you adopt a more restricted (and intuitive) definition of FI, however, your procedure still can’t tell you about increases in FI in a lineage. All you’re doing is looking at sequence conservation of various species with respect to human, right? You observe that conservation is almost always lower as you look across longer branches. But conservation is only a lower bound on the FI of the sequence; a changing lower bound tells you precisely nothing about changes to the quantity itself. It’s not mathematically possible to draw that kind of conclusion.

In fact, we would expect to see conservation decreasing with increasing branch length even if the function and the FI of a gene haven’t changed at all. Functional sequence can change without changing function, but it does so more slowly (sometimes much more slowly) than nonfunctional sequence changes, since the mutational avenues that preserve function are much more restricted (and may only become accessible with a different genetic background or in a different environment).

That last paragraph is key. Your estimate of FI seems to be, actually, FI + NE (neutral evolution), where NE is expected to be a very large number. So the real FI is some number much lower than what you calculated.

gpuccio · August 27, 2019, 4:55pm

I really don’t understand.

Can you please explain why neutral evolution would be part of the FI I measure? This is complete mystery to me.

Neutral evolution explains the conservation of sequences? Why? I really don’t understand.

swamidass · August 27, 2019, 4:56pm

A new configuration would not be equally precious for telling the stories we have now. We would have different constellations, and therefore different myths about these constellations. My function is to tell this specific (specified!) stories, not any old stories you might want to come up with in place of them. So no, a new configuration would break the storytelling function.

Remember also, that some configurations (e.g. a regular grid or a repeating pattern) are useless for navigation or time-telling. Very quickly, we would get over 500 bits with a careful treatment, well into the thousands if not millions of bits.

gpuccio · August 27, 2019, 5:00pm

But you are doing exactly what I cautioned about. You are defining the function as a consequence of an already observed configuration.

If the configuration were different, we woukld be telling different stories.

Are you really so confused about the meaning of FI?

The function must be defined independently You can define the function as “telling stories about the stars”. You cannot define the function as “telling storeis about the stars in this specific configuration”.

How can you not understand that this is conceptually wrong?

swamidass · August 27, 2019, 5:05pm

No, I’m just using a particular definition of function, which parallels yours in biology. If you don’t want me to use my definition, I am not sure you can use yours.

It seems you are defining function by the already observed configuration of proteins in extant biology. This does not take into account the configurations that would produce the same high level functions, but we just don’t see because it is not what happened.

If these are the rules, you are breaking them. Right?

It is subjective how we define function. I chose a definition of function that paralleled yours in biology, so I am not sure how you can object to me “breaking the rule” while breaking yourself with your own definition!

Yes, this highlights the problems with using FI as a way of determining if something is designed or not.

glipsnort · August 27, 2019, 5:30pm

When using conservation it’s possible, even likely, to underestimate the total FI present while overestimating the change in FI. When the mouse genome was sequenced, for example, one of the immediate outcomes was a lower bound on the fraction of the genome that is functional (not quite the same thing as FI, but in the same conceptual neighborhood), a bound of 6% based on the fraction of the genome that is conserved. That was a valid conclusion (with various caveats). If we were to repeat the same analysis across primates to humans, we would get a larger fraction, say 8%. That would also be a valid conclusion. What is not a valid conclusion is that the functional fraction of the genome increased by 2% in primates. Some functional sequence is likely to have changed on the branches between rodents and primates without losing function, while other functional sequence has been lost and gained in each branch.

swamidass · August 27, 2019, 5:59pm

Which is, once again, why it is important to do this analysis with a phylogeny. As @John_Harshman, an expert in this area, comments:

Note that the paper he links too does a great job at the analysis you are attempting to do. It would be valuable to look it over to determine where you disagree, agree, or could learn from it @gpuccio.

Furthermore, I reiterate my question from early on:

And:

Kirk · August 27, 2019, 8:44pm

Gentlemen, I see my name has been mentioned here. I’m a bit overwhelmed by work, so cannot participate in this discussion, but I do want to make a few clarifying points regarding my own thinking on this problem.

First, as I have stated previously, the quality of any estimation depends upon sufficient sampling, and I greatly doubt that we have sufficient data to estimate the FI required for any protein in a specific species. The same probably goes for genus, and maybe even for family and order. My interest has always been, and continues to be, the FI required for the origin of novel protein families, rather than for a protein in a single species or genus.

For this reason, I have focussed, and continue to focus on protein families that have had the benefit of sufficient sampling produced by thousands of independently evolving populations across a wide range of taxa (preferably across many phyla). In discussions with various people in the field, there seems to be two research questions to answer (which appear to have come up here, though I’ve not read anything other than gpuccio’s one reply included in the email sent to me):

How can we test for sufficient sampling, given common descent and,
How can we test for sufficient sampling, given the possibility of a global maximum fitness in sequence space, clustering our data in only a subsection of sequence space?

For the past 8 months or so I have been working on a method to provide answers to both questions. The method itself was not difficult, but the testing of that method has been time consuming. I cannot discuss anything related to this here, as I am submitting my findings to a journal for peer review and publication. I will say this, however … I wouldn’t even think about estimating the FI for a protein from the data for an individual species (e.g., human), for obvious reasons when one scans the sampling available at present. But my initial assumptions several years ago regarding sampling broadly across phyla or kingdoms is being verified to produce reasonably accurate estimates of FI required for the origin of many protein families.

I can’t say anymore until the paper passes review and is published. The input and critiques I’ve had thus far from a few non-ID scientists sceptical of ID, has been especially valuable, but I cannot widen the circle of discussion any further until after the paper is out. As my former supervisor urged me … “stop leaking your research and focus on submitting more papers for publication.”

As I indicated at the outset, I cannot participate in this discussion, although it does look to be interesting.

swamidass · August 27, 2019, 8:45pm

Thanks for chiming in @Kirk. Great to see you, even if it is for a moment.

gpuccio · August 27, 2019, 10:08pm

Again, my purpose is not to measure the full content of FI in a protein. I ageree that it is possible to underestimate the full content of FI, but we can have a good idea of the component revealed by sequence conservation from that point on. That’s what I do, that’s what I get as result, and that’s what I use for my inferences. I have never made reference to the full content of FI. My purpose is to demonstrate the presence, at a certain point in evolutionary history, of a definite quantity of FI that has a sequence configuration that will be conserved up to humans. This should be clear, by now, Either you agree that my methodology does that, or you don’t. Fell free to decide, and if you want to explain why. But there is no sense in requiring from my methodology what it has never tried to measure.

As for overestimating the change in FI, again I have never tried to estimate the absolute change in the full content of FI. That should be clear from what I have written. I quote myself:

gpuccio:

It should be clear that my methodology is not measuring the absolute FI present in a protein. It is only measuring the FI conserved up to humans, and specific to the vertebrate branch.

So, let’s say that protein A has 800 bits of human conserved sequence similarity (conserved for 400+ million years). My methodology affirms that those 800 bits are a good estimator of specific FI. But let’s say that the same protein A, in bees, has only 400 bits of sequence similarity with the human form. Does it mean that the bee protein has less FI?

Absolutely not. It probably just means that the bee protein has less vertebrate specific FI. But it can well have a lot of Hymenoptera specific FI. That can be verified by measuring the sequence similarity conserved in that branch for a few hundred million years, in that protein.

That should be clear enough, but still you insist about the danger of overestimating the change in FI, when I have never tried to do that.

Maybe you are confused by the fact that I speak of information jumps. But, you see, my term has always been “information jumps in human conserved sequence similarity”. It’s not a jump in the full content of FI, as I clearly explain in the above quote.

IOWs, when I say that CARD11 shows and information jump of 1250 bits at the transition to vertebrates, I simply mean that 1250 bits of new FI that is similar to the form observed today in humans appear at that transition. It is a jump, because new sspecific sequence information arises, that was not there before. But I have never said the the total FI was lower before. I simply don’t measure it, because my methodology cannot do that.

And this is it. You think as you like, but at least try to understand what I say and what I am doing. Or justr don’t try, if you prefer so.

swamidass · August 27, 2019, 10:22pm

@gpuccio did you misread @glipsnort?

Overestimate is the opposite of underestimate. We are saying you are wildly overestimating FI, not underestimating it.

gpuccio · August 27, 2019, 10:26pm

OK, I will try to be simple and clear.

I am here to discuss a methopdology that can, I believe, give important indications about a certain type of FI in proteins (the one that can be revealed by long sequence conservation), its appearance at certain evolutionary times, its different behaviour in different proteins. And that can give a good idea, by establishing a reliable lower threshold of new FI appearing at certain steps, of how bif the functional content of many proteins is. These data are very interesting, in my opinion, tgo sipporft a design inference in many cases. This is my purpose, and nothing else.

Now, may be that a phylogenetic analysis could do that better. Or maybe not. I don’t know, and I cannot certainly perform a phylogenetic analysis now. I am not aware of phylogenetic analyses that are centered on the concept of FI as formulated in ID, least of all on design detection. So, I have my doubts.

However, I am not here to perform a phylogenetic analysis, I am here only to explain and defend my ideas, and I try to do exactly that.

So, again, I am comvinced that my methodology is a good estimator of that part of FI which is connected to sequence conservsation, for example from cartilaginous fish to humans, and that appears in vertebrates. The same procedure can also be applied to other contexts, of course.

I have received, from you and others, a few recurring criticism that are simply not true ot not pertinent. Here are a couple of examples:

Wrong. My estimate of FI is, rather, Total FI - functional divergence (FI not conserved up to humans). Therefore, as stated also by glipsnort, my estimate is underestimating FI, not overestimating it. Moreover, NE has nothing to do with this.

Why? And what do you mean by NE? Do you mean NV and ND? Why should that “be mistaken” as FI gain? By my procedure? There is absolutely no reason for that. Neutral variation is the cause of divergence in non functional sequences. Why should it be mistaken as FI gain by a proicedure based on sequence conservation? I really don’t understand what you mean.

And so on. How can we discuss with such basic misunderstandings repeated so many times, and without any real explanation of what is meant?

gpuccio · August 27, 2019, 10:59pm

Did you misread @glipsnort?

He is saying that it is possible, even likely, to underestimate the total FI present (true) while while overestimating the change in FI.

And have you read my answer to him? I quote myself:

gpuccio:

As for overestimating the change in FI, again I have never tried to estimate the absolute change in the full content of FI. That should be clear from what I have written. I quote myself:

gpuccio:

It should be clear that my methodology is not measuring the absolute FI present in a protein. It is only measuring the FI conserved up to humans, and specific to the vertebrate branch.

So, let’s say that protein A has 800 bits of human conserved sequence similarity (conserved for 400+ million years). My methodology affirms that those 800 bits are a good estimator of specific FI. But let’s say that the same protein A, in bees, has only 400 bits of sequence similarity with the human form. Does it mean that the bee protein has less FI?

Absolutely not. It probably just means that the bee protein has less vertebrate specific FI. But it can well have a lot of Hymenoptera specific FI. That can be verified by measuring the sequence similarity conserved in that branch for a few hundred million years, in that protein.

That should be clear enough, but still you insist about the danger of overestimating the change in FI, when I have never tried to do that.

Maybe you are confused by the fact that I speak of information jumps. But, you see, my term has always been “information jumps in human conserved sequence similarity”. It’s not a jump in the full content of FI, as I clearly explain in the above quote.

IOWs, when I say that CARD11 shows and information jump of 1250 bits at the transition to vertebrates, I simply mean that 1250 bits of new FI that is similar to the form observed today in humans appear at that transition. It is a jump, because new sspecific sequence information arises, that was not there before. But I have never said the the total FI was lower before. I simply don’t measure it, because my methodology cannot do that.

My procedure cannot overestimate FI, only underestimate it. My estimate of the change is only an estimate of the change (jump) in human conserved similarity. It is not, and never has been said to be, a measure of the change in total FI.

OK, I was answering to your very disappointing post about the starry sky, but for some strange reason I have lost all that I had written. Maybe it is better. Now I am tired. Tomorrow I will see if I really want to say those things.

gpuccio · August 27, 2019, 11:10pm

This is bad. Frankly, I would probably not even answer this kind of arguments, if they did not come from you.

I don’t know if you really believe that the stars in the sky exhibit high values of FI, or if you are only provoking (without any good reason, IMO).

If you really believe that, there is proibably no purpose in continuing any discussion about FI.

If you are provoking, it’s not a good sign just the same.

However, here is a brief answer.

The simple rule I have described (and which is rather obvious in any possible serious discussion about FI) is that we cannot use the observed bits to define the function. The function must be defined undependently from the knowledge of the observed bits.

So: “a configuration of stars that favors storytelling” is valid. But probably almost all possible configurations would do that.

While “a confifuraion of stars where the first has these celestial coordinates, the second these other ones”, and so on for all 9000 visible stars, is not valid.

So, a binary number of 100 digits is a good definition. And, of course, has no relevant FI.

A binary number that is 00110100… is not a valid definition. It can be used only as a pre-specification.

This is the rule, and I have never broken it. I have never defined a protein function that says: “a protein with the following sequence: …” I have always used for proteins the function described in Uniprot for the observed protein, or something like that. IOWs, a protein which can do this and that. Never: “a protein with this sequence”.

But you say: no, I wnat the stars that must have exactly the position that we know. That is breaking the rules. You are using the bits. I have never done that.

You say:

"It seems you are defining function by the already observed configuration of proteins in extant biology. "

Not at all. That statement is unfair, wrong and confounding.

I am always defining function as what a protein can do. I am using observed configurations, in a precisely described way and accordign to well explained assumptions, only to estimate FI in proteins, not to define function. You are equivocating, and rather badly.

You raise the problem of other sequences that could implement the function. But my methodology is aimed exactly at that: having an estimate of the target space. If you do the math, you will see that the estimates of the target space in my results are very big.

Of course, there is always the problem of possible alternative solutions, similarly complex, but completely different. Those cannot easily be anticipated. They certainly exist, in some measure.

That is a completely different problem. It has nothing to do with the definition of the function, but rather with the estimate.

I have discussed that problem in detail in the past. You will find a long discussion about that in this OP and in the following thread:

Defending Intelligent Design Theory: Why Targets Are Real Targets, Probabilities Real Probabilities, And The Texas Sharp Shooter Fallacy Does Not Apply At All.

Look at the part about clocks.

swamidass · August 27, 2019, 11:20pm

Okay, can you clarify how you implemented this rule in your analysis?

Topic		Replies	Views
Durston: Functional Information Office Hours Design	63	8248	December 5, 2018
Computing the Functional Information in Cancer Conversation Design	41	5431	July 6, 2020
Information is Additive but Evolutionary Wait Time is Not Conversation Science	12	1535	September 3, 2019
Explaining the Cancer Information Calculation Conversation	85	6743	September 28, 2020
Looking for sources on the information argument Conversation Design	127	2781	September 10, 2021

Gpuccio: Functional Information Methodology

Related topics