Gpuccio: Functional Information Methodology

glipsnort · August 27, 2019, 1:29pm

Hey, I thought it was what I wrote before – but I already knew what I meant, of course.

glipsnort · August 27, 2019, 1:52pm

As I pointed out, under some circumstances your method will grossly overestimate FI as you have defined it. You would at least have an argument if you redefined FI to be the information needed to keep a particular gene (say) functional.

Even if you adopt a more restricted (and intuitive) definition of FI, however, your procedure still can’t tell you about increases in FI in a lineage. All you’re doing is looking at sequence conservation of various species with respect to human, right? You observe that conservation is almost always lower as you look across longer branches. But conservation is only a lower bound on the FI of the sequence; a changing lower bound tells you precisely nothing about changes to the quantity itself. It’s not mathematically possible to draw that kind of conclusion.

In fact, we would expect to see conservation decreasing with increasing branch length even if the function and the FI of a gene haven’t changed at all. Functional sequence can change without changing function, but it does so more slowly (sometimes much more slowly) than nonfunctional sequence changes, since the mutational avenues that preserve function are much more restricted (and may only become accessible with a different genetic background or in a different environment).

I’ll offer the same example I gave in the other thread: the human immune system. Your body contains DNA with more than 500 bits of FI, coding for hundreds of specific antibodies that are highly functional, each precisely tuned to a protein on a particular pathogen or pathogen strain. You were not born with DNA that had that information in it; it was generated by a process of random mutation and selection.

gpuccio · August 27, 2019, 2:40pm

Yes, you have. Definitely.

What additional information? I can get no sense from tour discourses. You are certainly in good faith, and you also admit that you are not an expert, if I understand well. Maybe that’s why your objections are not clear at all. I say that with no bad intentions, but because I really don’t understand what you mean.

So please, explain what is the missing information without which probability would be meaningless here. But please, clarify of what probability you are speaking.

The bitscore is linked to a probability in the Blast allgorithm itself. It is given as E value, and for values small enough it is practically the same thing as a p value. It expresses the expected number of similar homologies in a similar search in the database if the two sequences were unrelated. For many Blast resulta resulting in high similarity, that value is given as 0.

The probability whuch I mention in ID theory is a different thing. It is the probability linked to the FI value, and expresses the probability to find the target in a random search or walk, in one attempt.

I think these concepts are rather precise. What is the additional information you need?

That’s simply a meaningless statement. What do you mean?

The point about neutral evolution is that it happens. What do you mean?

I have not made a phylogenetic analysis, nor can I see any reason why I should do that. I have used common phylogenetic knowledge, at the best of my understanding, to make very simple assumptions. That vertebrates derive from some common chordate ancestor, that cartilaginous fishes split from bony fishes rather early in the natual history of vertebrates, that humans derive from bony fishes and not from cartilaginous fishes. Am I wrong? The times I have given in my graphs are only a gross approximation. They are not important in themselves.

You are free to think as you like. But I have not analyzed a few hand picked trajectories.

Now that I have access to my database, I can gibe you more precise data (but you could find them in my linked OPs).

For example, I have evaluated the information jump at the vertebrate transition (IOWs the human conserved similarity in cartilaginous fishes minus the human conserved similarity in pre-vertebrates) for all human proteins, and using all protein sequences of non vertebrate deuterostomes and chordates and of cartilaginous fishes in the NCBI database. Here are the results, expressed both as absolute bitscore difference, and as bits per aminoacid (baa):

Absolute difference in bitscore:

Mean = 189.58 bits
SD = 355.6 bits
Median = 99 bits

Difference in baa:

Mean = 0.28796 baa
SD = 0.3153 baa
Median = 0.264275 baa

As you can see from the values of the medians, half of human proteins have an information difference at the vertebrate transition that is lower than 99 bits and 0.26 baa.

Is that a good negative control, if compared to the 1250 bits of CARD 11? Remember, these are logarithmic values!

75th centile is 246 bits and 0.47 baa. That means that 25% of human proteins have values higher than that. And I could show you that values are very significantly higher in proteins involved in the immune system and in brain maturation.

I don’t know if that means anything for you. However, for me these are very interesting data. About FI in evolutionary history.

gpuccio · August 27, 2019, 2:43pm

Are you saying that the human immune system is a non biological object? Interesting.

You have some serious misconceptions about the immune system, however, but just now I have other things to do. If you have one single counter-example of a non biological object that exhibits more than 500 bits of FI and is not a designed human artifact, please do it. That was the request.

swamidass · August 27, 2019, 2:49pm

@gpuccio I don’t want to interrupt your flow before you fully respond, but this is an important point to clarify. There are several false equivalences in this statement.

First, there is a difference between FI estimates by your procedure and the true FI.

Second, “one attempt” and “random walk” are not the same thing and will produce very different probabilities if finding a feature with a given FI.

Third, neither “one attempt” nor “random walk” is a reasonable model for evolution, which demonstrably can find feature with higher FI than a purely random attempt or walk.

For your argument to work, as merely a starting point, you have to demonstrate the FI you compute is a reasonable approximation if the true FI, not confused by neutral evolution and correctly calling negative controls. You also have to use a better model of evolution than random trials or walks.

At the moment however your case relies on a series of false equivalences, or so it seems.

No, he is presenting another negative control from biology.

Art · August 27, 2019, 2:50pm

gpuccio:

Why not?

I have considered the probabilistic resources of our planet as a higher threshold of the number of possible states visited by a super population of bacteria inhabiting our planet for 5 billion years and reproducing at a very high rate.

This is of course an exaggeration, and a big one, but the idea is correct, I believe. The probabilistic resources of a systerm are the number of states that can be randomly reached. It is similar to the number of times that i can toss a coin. They can be expressed as bits, just taking the positive log2 of the total number of states.

So, if I have a sequence that has a FI of 500 bits, it means that there is a probability of 1:2^500 to get it in one random attempt. If my system has probabilistic resources of 120 bits (IOWs, 2^120 states can be reached), the probability of reaching the target using the whole probabilistic resources is still 1:2^380.

What’s wrong with that?

Of course, as I have said, the Blast bitscore is not the FI. But, provided that the conditions I have listed are satisfied, it is a good estimator of it. Look also at my answer to glipsnort, that I have just published.

Please, let me know what you think. Thank you.

Thanks for this explanation, @gpuccio. I still do not agree with your very cavalier approach towards informational metrics (your units do not seem to match between the different approaches, which just doesn’t sit well with me), but I can see what you are trying to do. Others have raised more serious concerns, and I am happy to let this matter slide for the time being.

swamidass · August 27, 2019, 2:54pm

One example is the configuration of stars in the sky. Far more than 500 bits.

Another example is the location of islands in the sea.

Another example is the weather patterns across the globe for the last century.

And yes, all these objects can be defined by a functional specification. This is all functional information.

Art · August 27, 2019, 2:54pm

For what its worth, here is how I visualize this contrast. The “one attempt” approach yields a value that is a sort of state function (like entropy in thermodynamics). The “random walk” approach yields values that are pathway dependent, and thus not state functions (but rather akin to work in thermodynamics). The two simply cannot be equated.

Art · August 27, 2019, 2:55pm

Tornadoes.

Michael_Callen · August 27, 2019, 3:19pm

A post was merged into an existing topic: Comments on Gpuccio: Functional Information Methodology

glipsnort · August 27, 2019, 2:58pm

Sorry, I missed that you were only talking about non-biological systems. That’s not a claim I find interesting. Should you ever wish to deal with biological systems – which are of course the systems of interest – let me know.

Color me skeptical.

Art · August 27, 2019, 3:02pm

To follow up, something from another board in a universe far far away. To let @gpuccio see the why of this reference:

Consider the tornadic thunderstorm. It consists of a number of integrated and essential components, all of which are needed to produce and maintain the tornado. The ground and upper-air windstreams (which must be oriented in precise manners), the precisely-functioning updraft, the supercell itself (which consists of more parts than I can possibly list), and the funnel cloud. By most accounts, an IC system.

Can we speak about the information content of a tornadic thunderstorm? I believe so. Recall that the informational content of genomes is usually estimated by “calculating” that fraction of all possible sequences (nominally, amino acid sequences) that can satisfy a particular specification. We can use a similar strategy to guesstimate the “information” carried by water vapor molecules in a storm. The hard part is deciding how few of all of the possible states that are available to a particular water molecule are actually “used” in a storm. Now, one can count up all possible positions in the storm, interactions with all possible partners, etc., etc., and realize that the number is probably rather small. But, for the sake of argument, let’s pick an arbitrarily large number – let’s propose that only 1 in 10^30 states of any given water molecule is excluded in a storm.

Starting there, we need only count the number of water vapor molecules in a storm and estimate the “probability” that the arrangement found in a storm would occur. If we arbitrarily think in simple terms - a storm that is 5x5x7 miles in size, a temperature of 30 degrees C, a partial pressure for water vapor of about 32 mm Hg, an overall atmospheric pressure of 1 atm - then the number becomes (roughly) 1x10^-30 raised to the number of water vapor molecules in the storm (which is about10^36). Which in turn is about 10^-10^6 (that’s 1 divided by 1 million!). Or, using the conversion @gpuccio does, about 3 million bits!

Jordan · August 27, 2019, 3:21pm

Just a reminder, we are creating a somewhat restricted space in this thread. If you are not one of the above folks, please use Comments on Gpuccio: Functional Information Methodology for your comments.

gpuccio · August 27, 2019, 3:30pm

We’ll come to that. Just now, it seems that I have to deal with stars and tornadoes. That will take some time.

We’ll see.

swamidass · August 27, 2019, 3:33pm

Before stars and tornadoes, there are other important questions I would recommend. The question of negative controls is central here.

For example I’m still not clear cancer will work as a negative control for you. If showed cancer has an increase in FI, would that demonstrate FI is not a good way to determine design? Or would you just conclude this is evidence cancer is designed? If the former, we have something to talk about. If the latter, we may have reached epistemic closure.

swamidass · August 27, 2019, 3:35pm

@gpuccio there are several dimensions to this conversation. Would you like us to start a few threads dedicated to each dimension? That might enable a more orderly conversation, so key points are not dropped. Each thread could process at its own pace too?

What do you think?

gpuccio · August 27, 2019, 3:39pm

I am not equating them. In most cases, the correct model is a random walk. However, probabilities for a random search and a random walk from some unrelated state, given a high number of attempts, are similar. Many people are more familiar with the concept of a random search. I suppose that, for frameshift mutations, it’s more a random search. However, I had no intention to equate the two things.

With new FI, the model is a random walk from an unrelated state. FI expresses well the probability of finding the target. It is true that a minimum number of steps is necessary to make the target possible, but here we are discussing billions of steps, and extremely improbable targets. The initial difference is not really relevant.

gpuccio · August 27, 2019, 3:43pm

Yes. Definitely.

We have something to talk about. Definitely.

That would be too complicated for me. Thank you for the proposal, but I think we should go on this way. If you have patience, I will try to deal with all the relevant issues.

Art · August 27, 2019, 3:47pm

Thanks again, @gpuccio. I would like to commend you for broaching this subject, as it stands in contrast to the approaches used by the ID vanguard. I have long been of the opinion that the relevant metric that ID proponents should be measuring is something akin to informational work, which may be like what you describe here. I suspect that there are serious issues with the approaches one may take to estimate this property, but the concept seems to make sense to me.

swamidass · August 27, 2019, 3:51pm

But evolution is not a random walk!! It is guided by many things, including natural selection. If you neglect selection, you are not even modeling the most fundamental basics.

There are other deviations from the random walk model too. Evolution is also not all or none, but demonstrably can accumulate FI gradually in a steady process. I could go on, but you need a better model of evolution.

Topic		Replies	Views
Gpuccio on Common Descent Conversation Science	1	751	August 26, 2019
Miller: Axe Decisively Confirmed? Conversation Science , Design	31	4568	February 23, 2019
Gauger and Mercer: Bifunctional Proteins and Protein Sequence Space Office Hours Design	188	7478	November 15, 2018
Mercer's Work on Protein Function and Sequence Space Office Hours Design	5	810	June 19, 2021
Simulating 500 million years of evolution with a language model Conversation Science , Artificial-Intelligence	9	183	February 2, 2025

Gpuccio: Functional Information Methodology

Related topics