Gpuccio: Functional Information Methodology

gpuccio · August 27, 2019, 11:25pm

In my discussion about the relationship between FI and the design inference (that has nothing to do with my methodology to estimate FI) I have given a clear example of FI in language and how to measure it. Please, refer to that. The Shakespeare sonnet. You will find many possible functional definitions for the sonnet, each of them implying different levels of FI. The bits in the sonnet (the sequence of letters) are of course never used in any definition.

In my procedure to estimate FI in proteins, the function is not defined (it is supposed to be the one described in Uniprot, if and when known). The estimate is based on conservation, which is an indicator of functional constraint, but does not tell us what the function is.

Those are two different things.

gpuccio · August 27, 2019, 11:28pm

I am going to sleep Tomorrow we can go on.

gpuccio · August 28, 2019, 8:33am

Swamidass:

OK, now after some rest, let me go back to the starry sky example. I will show how it should be treated in terms of FI.

We have a system where 9000 stars can have an independent position in the sky. Also, they can have different brighteness.

We define a function: that the stars can help orientation and navigation.

Let’s assume that the position of the stars is a random configuration. We have a system with a very big search space. A lot of possible configurations, considering both position and brightness.

The right question, from the point of vew of FI, is: how many of the possible configutations would satisfy the independently defined function? How big is the target space? Is it an infinitesimal fraction of the search space (high FI), or rather a big part of it?

The answer here, while difiicult to compute in detail, is easy enough in principle: the target space is almost as big as the search space. Therefore, FI is really trivial, almost zero.

Why?

Because, of course, almost all the possible random configurations of position and brightness can help orientation and navigation.

Not all of them, however.

Very ordered configurations, those where all the stars are more or less equally distributed in the sky, and brghtness is more or less the same for all of them, would not help orientation and navigation. We would just see a sky that is the same everywhere, and does not allow us to get information about earth rotation and our position.

But of course, those highly ordered configurations are really, really rare in the search space.

This is an interesting example, and I thank you for providing it, because it is a case where order does not satisfy the function, while randomness does. Unfortunately (for your argument), the FI linked to such a system is almost zero.

I think this is a good answer for your other examples too, but if you believe that they have different merits, please explain why.

swamidass · August 28, 2019, 1:17pm

Yes, there is distinction between macrostates and microstates of a system.

I actually agree with you in much of this analysis, but you are missing the point. I’m applying the rules that you laid out, and it leads to this problem.

You are making a parallel mistake in your analysis of protiens. I made the parallel mistake to make this point.

In an honest attempt to enable understanding, meaning no disrespect, can you see where I’m drawing the parallel? You don’t have to ultimately agree with me that the parallel is valid. It would be helpful though if you could articulate the parallels in your analysis. Perhaps if we got there, we could have a sensible conversation about the ultimate validity of this object lesson.

glipsnort · August 28, 2019, 3:54pm

Your comments here make sense to me for some definitions of FI, but I don’t see how they work in the context of your stated definition. If FI is purely a measure of the ratio of target space to search space, what does “specific sequence information” mean? Or 1250 bits of new FI? The function of the protein is unchanged, as are the target and search spaces. How do you decompose the ratio into old and new FI?

swamidass · August 28, 2019, 3:55pm

I’m pretty sure he is using an in clade control to do this. The in clade minus the out clade gives him the change.

Of course there are issues with this, but I think that is his strategy.

gpuccio · August 28, 2019, 4:00pm

Art:

Now I come to your tornado. It is an interesting example, because, even if I am no expert of meteorology, it is probably one of those cases where some form of order comes out of a system including events that can be easily explained in form of necessity laws, random events and chaotic components. I think there are many examples like that. None of them is designed, of course.

But do they exhibit high FI. The answer is no.

I will try to explain how the system should be considered in terms of FI, even if I am not a meteorologist. Of course, you are free to stick you your analysis in terms of water molecules, but I cannot agree.

Well, here the system is our planet, and its meteorologic phenomena. I think we can agree on that.

What type of system is it? IOWs how can we describe and analyze meteorologic phenomena?

I think we can agree that many of those events follow, more or less precisely, some well known laws, derived of course from the laws of physics applied to this particular system. That’s why many events can be more or less anticipated. Weather previsions are everywhere, and I would say that at present they are often rather good.

So, that part is a necessity system, more or less precise. No FI in that.

But of course, not all can be anticipated. Least of all, I think, tornadoes. Probably, necessity laws act on random components and chaotic components to generetae a tornado. I don’t know, I am not a meteorologist.

So, being not a meteorologist, I paste here some explanation taken somewhere on the web, hoping it is not so bad:

What causes tornadoes?
Tornadoes form in unusually violent thunderstorms when there is sufficient (1) instability and (2) wind shear present in the lower atmosphere.

Instability refers to unusually warm and humid conditions in the lower atmosphere, and possibly cooler than usual conditions in the upper atmosphere. Wind shear in this case refers to the wind direction changing, and the wind speed increasing, with height. An example would be a southerly wind of 15 mph at the surface, changing to a southwesterly or westerly wind of 50 mph at 5,000 feet altitude.

This kind of wind shear and instability usually exists only ahead of a cold front and low pressure system. The intense spinning of a tornado is partly the result of the updrafts and downdrafts in the thunderstorm (caused by the unstable air) interacting with the wind shear, resulting in a tilting of the wind shear to form an upright tornado vortex. Helping the process along, cyclonically flowing air around the cyclone, already slowly spinning in a counter-clockwise direction (in the Northern Hemisphere), converges inward toward the thunderstorm, causing it to spin faster. This is the same process that causes an ice skater to spin faster when she pulls her arms in toward her body.

Other processes can enhance the chances for tornado formation. For instance, dry air in the middle atmosphere can be rapidly cooled by rain in the thunderstorm, strengthening the downdrafts that asist in tornado formation. Notice that in virtually every picture you see of a tornado the tornado has formed on the boundary between dark clouds (the storm updraft region) and bright clouds (the storm downdraft region), evidence of the importance of updrafts and downdrafts to tornado formation.

Also, an isolated strong thunderstorm just ahead of a squall line that then merges with the squall line often becomes tornadic; isolated storms are more likely to form tornadoes than squall lines, since an isolated storm can form a more symmetric flow pattern around it, and the isolated storm also has less competition for the unstable air which fuels the storm than if it were part of a solid line (squall line) of storms.

Because both instability and wind shear are necessary for tornado formation, sometimes weak tornadoes can occur when the wind shear conditions are strong, but the atmosphere is not very unstable. For instance, this sometimes happens in California in the winter when a strong low pressure system comes ashore. Similarly, weak tornadoes can occur when the airmass is very unstable, but has little wind shear. For instance, Florida – which reports more tornadoes than any other state in the U.S. – has many weaker tornadoes of this variety. Of course, the most violent tornadoes occur when both strong instability and strong wind shear are present, which in the U.S. occurs in the middle part of the country during the spring, and to a lesser extent during fall.

Contrary to popular opinion, tornadoes have not increased in recent years.

OK, so I would say that there is some good understanding of the conditions that generate a tornado. In general, we can say that some specific configurations of the basic components of the weather (distribution of winds, temperatures, pressures, and so on) allow tornadoes to be generated. So, there is nothin mysterious in the process. It is well understood, even if its mathematical and empirical treatment is certainly difficult. Some configurations of weather conditions lead to tornadoes.

It’s those configurations that we must consider, not the configurations of individual molecules of water. The basic components of weather follow, more or less, precise necessity laws, and of course molecules of water follow those laws too. The random-stochastic component, the only one that generates specific configurations that can act as “configurable switches”, is caused by the complex interaction of those necessity laws.

So, the correct question in terms of FI is: how many weather configuration (target space) lead to a tornado, in the search space of all possible weather configurations?

I am certainly the last persom that can solve such a problkem quantitatively, but I believe that it can be solved in principle. There is nothing mtsterious here.

Now, tornadoes are not too common (luckily), but they are certainly not exceedingly rare. I suppose therefore that, analyzing the space of configurations above mentioned, it will not be difficult to show that the configurations that lead to a tornado are not exceedingly rare. I cannot make that kind of analysis, unfortunately. Can you?

So, I am rather confident that tornadoes, like many manifestations of necessity acting on random and chaotic components, are certainly fascinating, but can be perfectly explained in terms of the physics of these system. And the configurations that lead to them are a non trivial part of the space of configurations.

IOWs, FI is low, and there is absolutely no reason to infer design.

gpuccio · August 28, 2019, 4:04pm

It seems that I don’t understand the point. Please, don’t be too confident in my intelligence. Could ypu please explain better what you think?

sfmatheson · August 28, 2019, 4:09pm

My understanding of your writings here is that you have calculated/estimated a probability (referring to “probabilistic barriers” and “probabilistic resources of the known universe”) that a particular outcome could come about. That is the probability I was referring to. Until we know more about that outcome, most importantly whether it could have gone in other ways, the probability of that particular outcome is meaningless. This is a tired old topic in discussions of design and I know you are aware of it. But to reiterate, the information we need is:

How many potential outcomes were there? In the case of a protein, this would include all the different ways to build that protein, all the different ways to make any protein that would carry out that function, and all the different ways to achieve the goal of that function.

[Side note: you may notice that I am using design-ish language here (“to make”, “to achieve”) because I don’t think there is anything wrong or unscientific about talking design-ishly.]

By what process are we pursuing these outcomes? Any probability calculation that assumes a single-step flying-together of amino acids is, at this point, flat-out dishonest. So, while calculating probability, are we considering the fact that an evolutionary “walk” is almost as far from a random flying-together as one might get? Whenever I hear “probabilistic resources of the universe” I find those kinds of bogus “calculations.” If your metric gives us that kind of information, then it’s beyond worthless. My sense is that the metric tells us about conservation of protein structure, and nothing more than that, and that it is therefore too incomplete to do anything interesting.

Without the information above, your metric is a fancy version of the sharpshooter fallacy. More precisely, the metric is just a measure of an outcome, but it’s not a measure of how that outcome could have come about, and it is not even the beginning of a measure of how likely that was.

This has been explained repeatedly to you. The context is a negative control, which is basic scientific concept. I see that in the 2 days since you posted this, others have attempted to get you to see that you have not shown that you are not dramatically overestimating “FI”.

Here is my final response to you. What you are getting on this forum and in these threads is an extraordinary opportunity that is, for good reasons, rare. You are getting peer review of your ideas, by some of the world’s top experts, without even submitting a manuscript. Moreover, you are doing this as an anonymous commenter, while your reviewers are identified by name and affiliation, in public. To your credit, you have provided (for the most part) clear explanation of your methods and your raw data. But so far* you seem to have failed to take any of the scientific criticism to heart. You have been given a chance to have your ideas vetted, but you are not participating in the review process. To call this a missed opportunity is to seriously understate the nature of this conversation.

*I have read only some of the conversation over the last day or so. Perhaps you have responded positively to critiques and are now working on revisions. This, of course, is the essence of the peer review process.

No need to respond unless you have new analysis or substantive responses to the specific critiques that we have provided you. It should be obvious that the metric itself is not making a favorable impression on the people who would know if it were valuable.

swamidass · August 28, 2019, 4:30pm

I agree with @sfmatheson on this.

The way you are defining and measuring FI in proteins is closely analogous to my definition of the storytelling function of stars.

Your objection, which I believe is correct, is that my definition of function is too tightly linked to the microstate.

Is this a valid objection? Maybe and maybe not. It depends if it really is valid to set the microstate as a goal, and whether the details of appearance of big Dipper or the North Star (for example) is important for me. For some questions, these are certainly valid features to expect. For others, maybe not. However your point is that defining function too linked to the granular microstate (specific stories and constellations) will give a wholely different answer than a function linked to the macrostate (any stories and any constellations).

Similarly, your analysis, whether your intend it or not, is too linked to the microstate to make a case for design. We have all been explaining this to you in different ways. There is no reason we need to explain the precise (or approximate) microstate of protein sequences, because we have good reason to think there is a very very large number of potential microstate configurations that would produce that same high level functions. You cannot use the analysis done on the microstate to justify claims about FI in the macrostate.

Now, is possible that there are claims you could justify from your analysis of it were more precisely bounded. For example, most of us would agree (barring Gods Providence and complete determinism) that the specific protiens we see would be totally different if we were to “rewind the tape” and replay life’s history. Maybe vertibrates would not even evolve! If that is your claim, we already agree with you.

However you are arguing for design from this. It does not follow.

Just to list out other issues:

You still have not given valid methodology to seperate out the contribution of neutral evolution (NE) to your FI measure.
You still need to get a time demonimator, and estimate of the time during which the vertabratw transition unfolds to compute the rate (and correct for neutral evolution.
You still need to demonstrate that the vertibrates transition could not have taken place by accruing FI on a gradual process over time.
We have left hanging explication of several negative controls. For example cancer could be really helpful if we can get on the same page here. We see FI gains greater than 500 bits in cancer evolution. I’ve already linked to the analysis.
I wonder if it would help for me to write out your methodology for you to review, just to be sure I did not miss anything salient. Would that help you?

gpuccio · August 28, 2019, 4:39pm

Pelase, look also at my comment #121 to Swamidass.

The point is: one thing is the definition of functional information, another thing is my strategy (or anyone else’s) to get an indirect estimate of it.

The definition is what it is. A direct method to measure the target space (the thing we really don’t know) would be to sythesize all possible sequences in the search space, and test each one in the lab for the defined function. Of course that is not possible, and never will be.

Another semi-direct way would be to have such a good understanding of the sequence - function relationship in proteins as to be able to compute the target space. That is promising, but I believe that we are still very far away from that.

So, we haver to use indirect estimates of FI. My procedure is exactly that.

Being based on long conservation of sequence, the interesting thing is that we estimate FI without any explicit reference to the function.

IOWs, we have a protein that certainly has some function in its context. We trace the appearance of some new sequence specificity at some definite evolutionary step, and we classify that new sequence specificity as highly functionally constrained, because we observe tfhat, after its first appearance, it is conserved for 400+ million years.

So, we have an indirect estimate of the FI in that specific sequence in that specific context, even oif we have not defined the function explicitly. Of course, we can have a good idea of what the function is just by looking the protein at Uniprot. At least for many proteins.

Now, to answer your questions. Let’s say we have a protein in pre-vertebrates which has low sequence similarity with the human form in pre-vertebrates.

There are different possibilities. Fir example, in the case of CARD11, as you can see in the graph I have posted, the protein probably did not exist in pre-vertebrates. The bitscore is extremely low, probably just background noise, or just some limited domain similarity is some part of the long molecule. That is perfectly consisten with the protein being involved in the immune system, which as we know appears in jawed fishes. So, this is probably a new vertebrate protein.

As such, the explanation is rather straightforward. The protein appears in cartilaginous fishes, and right from the beginning of its existence it has already more than half of its potential FI (about 1.3 baa).

The remaining history of the protein in vertebrates, as can be seen in the graph, is not very interesting. There seems to be another minor adjustment in reptiles. Maybe, but it is difficult to be sure. The rest is compatible with passive conservation, increasing as the evolutionary distance decreases.

So, the history of this protein is clear enough: it exhibit a major engineering (be patient, let’s say it could be by design or bt neo-darwinist mechanisms, for the moment) at the beginning of its existence, and then not much happens. Except maybe at the transition to reptiles.

Now, let’s say that we have, instead, a protein that alredy exists in pre-vertebrates. Let’s call it protein A. Let’s say that it has a value of human conserved sequence similarity, in pre-vertebrates, of 0.7 baa. That is alredy something. Let’s say the protein is 500 AA long. We have already a major similarity here, and reasonably a bitscore of a few hundred bits.

Now, the same protein jumps in cartilaginous fishes to 1.5 baa, presenting a jump of 0.8 baa. So, a few hundred bits of new human conserved sequence similarity. Of new FI.

What does it mean?

We already know that we are not measuring total FI, nor total FI change. The protein in pre-vertebrates could well have higher total FI, or less, or the same. We don’t know, because my methodology cannor detect FI which is not linked to long sequence conservation. IOWs, what I have called functional divergence.

So, we just stick to the new FI that appears at the transition and is then conserved: those 0.8 baa. What is their meaning?

The reasonable answer is that the protein function undegoes a major adaptation at the transition to vertebrates. Maybe the basic function, the basic structure, remain the same. Maybe the total FI remains the same. As said, that we don’t know.

But the appearance of such a big new component of FI in vertebrates means that now the protein does the same things in a different context, or that it does some new things. That is perfectly conmpatible with what we know about the big changes that happen in new classes of organisms, especially at regulation level, and in protein networks that control transcription or other major pathways. TFs, for example, as already discussed, often retain their DBDs, but change the rest of their sequences. And acquire new functions, or differently tailored functions.

I hope this answers your questions.

swamidass · August 28, 2019, 4:42pm

It is likely most of us agree with this, though I’m not sure you’ve demonstrated this rigorously. We even linked to an article doing a similar analysis for the Cambrian explosion.

How exactly do you get to “design” from here?

The magnitude of the jump in FI is not a problem.

gpuccio · August 28, 2019, 5:12pm

OK.

I certainly am.

There are two different aspects.

The probabilistic resources of our biological world can definitely be computed, at least as a geneorus higher threshold. That what I have done in my OP about that issue:

What Are The Limits Of Random Variation? A Simple Evaluation Of The Probabilistic Resources Of Our Biological World

You will se in the first Table there that I give a very generous estimate of 140 bits as the limit (for bacteria). That means that, at a very generous most, only 2^140 different states could have been reached and tested in 5 billion years on our planet.

Then there is the problem of evaluating the target space. That is more difficult, and there are objective problems. Look also at my comment #133.

I am very confident that my procedure is very good to estimate a specific form of FI, linked to long evolutionary conservation of sequence. Of course, we must address some potential difficulties, and you list some of them: alternative solutions, and so on.

Indeed, I have discussed those things many times . Of course, I cannot address everything here an all at the same time. I have given links, but probably nobody here has the time to check them. IOWs, I am human, and you are too.

But I cannot understand why, every time there is some aspect that I have not yet discussed, you all draw final conclusions on what I think or am doing. That is wrong. I am here top answer your questions, when they are good questions.

I give you a link here of an OP where I have discussed many of the things you mention:

Defending Intelligent Design Theory: Why Targets Are Real Targets, Probabilities Real Probabilities, And The Texas Sharp Shooter Fallacy Does Not Apply At All.

This is one of the things I discuss in that OP. Possible alternative solutions. In brief, if there were many other complex alternative solutions (and probably there are), the computation of FI, at the levels we are considering, woyuld change very little. See the section about clocks in the mentioned OP.

If there were much simpler solutions, we would definitely observe those ones, and not the complex solution we observe.

That’s perfectly fine.

No, here you don’t understand my point. I have never said that there is a single step transition. Sometimes I really think many of you believe I am a complete fool. Maybe, maybe not. Not in this case.

Of couse the transition happens in many steps. THat’s why O have clearly said that the best model is a random walk from some unrelated sequence state.

And the concept of probabilistic resources has exactly that meaning: how many attempts can the system make? How many steps are allowed in the random walk.

The probability of finding an unrelated state by a random walk, let’s say buy 2^100 steps, is practically the same as the probability of finding that same target by a random search in the same number oif attempts. The two systems differ in the initial steps, of course, but with a big numer of steps there is no great difference. It is just related to the rate between target space and search space, and to the number os attempts/steps allowed to the system.

And of course, one thing should be clear. All probabilistic evaluations refer only to the RV part of the neo-darwinisn mechanism, which include Neutral drift. The NS part must be evaluated separately and differently. And I have not even begun to do that here.

Finally, my contribution here is not aimed to publish a paper. So, I do not consider it as some form of peer review.

It is, instead, aimed at intellectual confrontation about a very inmportant paradigm difference: design against neo-darwinism to explain biological functions.

In that sense, it is much more precious than a peer review to me. But for very different reasons.

swamidass · August 28, 2019, 5:22pm

@gpuccio do you see how you argument is grounded in microstates, not macrostate? Your objection to me about the start constellations applies here.

Art · August 28, 2019, 5:58pm

But what I do with water molecules is what you do with amino acids when you estimate FI in proteins, and moreover exactly what you do when you estimate the probablistic resources in the biosphere (as in here). There is no conceptual difference.

@gpuccio, you cannot have it both ways. Either you are wrong with your approaches to assessing FI in biological systems, or my argument about tornadoes is correct.

@gpuccio, it looks to me as if you are asserting that, by definition, natural events cannot generate FI. This is what we call massively circular reasoning, and it dooms all of your claims about FI in living things.

No, I have shown precisely how the configuration space for tornadoes is immensely, microscopically small. The fact that you choose to deny the utility of the very same approaches you use to think about FI in living things doesn’t change this.

@gpuccio, I am not going to belabor your evasiveness here. Hopefully, even though you cannot bring yourself to admit as much on this board (and most definitely not at UD), you should have questions in your own mind about the true utility of your approach to FI. As we explore other of your claims and assertions, we will see more inconsistencies that will help us better understand the flaws in your arguments. Hopefully, at the end of all this, you will have learned something and maybe, just maybe, you will be able to modify and improve your methodology.

Art · August 28, 2019, 6:00pm

But, @gpuccio, you have just denied that your very same approach and calculation can be used to estimate FI in other examples. Why should we accept your claims vis-a-vis the linked essay, when you arbitrarily decide when the method does and does not apply?

sfmatheson · August 28, 2019, 7:32pm

This is by far the most common error made by design apologists, and they frequently don’t see it. I don’t have time to examine all of your writings, so perhaps your approach somehow avoids this pitfall. My purpose in writing about it was not to assert that you claim a “single-step transition” but to warn you that ID people who cook up gigantic probabilities are making one of a small number of silly mistakes. One is to assume that the “target” is THE TARGET, and thus to commit the sharpshooter fallacy. And another is to ignore how the walk of evolution actually happens, with selection and constraint amid a constantly changing landscape of fitness and function. This error leads people to make laughably irrelevant mathematical claims that all amount to a probability that is conceptually identical to one-time flying-together of amino acids. If you did not make this error, then you should be able to explain how selection and constraint affect protein evolution, and you should be able to comment on how your FI ideas contribute to what we already know.

Does this mean that you take “random walk” to have no component other than randomness? I remain unconvinced that your methods have any meaning for evolution.

You are mistakenly equating peer review with publication.

swamidass · August 28, 2019, 7:34pm

Let’s be clear on this @art. I’m not sure @gpuccio is really being evasive. I don’t think he has had it laid out to him before. He has not had constructive resistance, for the most part, to his work at UD. I think he is genuinely working this out with us now, not being evasive.

This is a critical point.

Our objection is that this method will detect massive amounts of FI everywhere and has poor false-positive control if we are seeking design. We are demonstrating this with negative controls, which fail on this approach, and these negative controls are being presented both inside and outside biology.

The objections to these controls are making our point. It looks (to me at least) like the objections apply equal well to the original argument by @gpuccio. Until we have a clear, principled approach to these objections, carefully working out why they do not apply to the FI gains at the vertebrate transitions, well until then we have a failure to launch.

This is an entirely separate set of objections than those arising about the validity of the FI calculations being proposed. Both sets of objections need to be resolved.

Art · August 28, 2019, 7:40pm

Agreed. I just expect that @gpuccio will not acknowledge that my post posed some problems, and that the inconsistencies that are apparent are genuine issues. I want to state up front that I won’t be re-hashing the same points over and over.

If there are constructive comments or questions, of course this discussion can continue. Sorry for implying otherwise.

Art · August 28, 2019, 8:01pm

Pardon the self-reply, but to say things a bit more diplomatically, I am OK if, going forward, me and @gpuccio agree to disagree about my example. My point has been made.

Topic		Replies	Views
Gpuccio on Common Descent Conversation Science	1	750	August 26, 2019
Miller: Axe Decisively Confirmed? Conversation Science , Design	31	4559	February 23, 2019
Gauger and Mercer: Bifunctional Proteins and Protein Sequence Space Office Hours Design	188	7403	November 15, 2018
Mercer's Work on Protein Function and Sequence Space Office Hours Design	5	808	June 19, 2021
Simulating 500 million years of evolution with a language model Conversation Science , Artificial-Intelligence	9	177	February 2, 2025

Gpuccio: Functional Information Methodology

Related topics