Gpuccio: Functional Information Methodology

swamidass · August 31, 2019, 4:55pm

A few final observations. @gpuccio states that his purpose here is not scientific inquiry, but to confront neo-Darwinism.

This seems to clear up a lot. In conversation with us, he has not modified his analysis or worked to include any controls in his analysis.

He has been working to his goal of “confronting” neo-Darwinism. He has been successful, to a degree, because confrontation has nothing to do with being right or wrong. However, he failed in one major way. None of us are advocating neo-Darwinism!!! Neo-Darwinism as defined by the ID movement (and by @gpuccio) is not modern evolutionary theory. We moved on from Darwinism, meaning positive selection dominated change, in 1968. Behe was in high school, and ID had nothing to do with it.

So what we have here is a performance of confrontation, not a scientific inquiry, instead we have theatre. I am glad he is upfront about this. I respect his honesty.

It seems that there is agreement here:

glipsnort:

Ah, so you already know that what you’re modeling is not standard evolutionary biology.

To summarize the situation as I see it: there are two distinct kinds of evolution being discussed here. One, call it evol_biol, is the evolution proposed by evolutionary biologists. In evol_biol, there are many possible new functions that a species could acquire, function is often a matter of degree, and adaptation for a single trait frequently can take many routes, each through multiple beneficial mutations. In the other, call it evol_gpuccio, each species has a single possible beneficial trait, function is binary (present or not), and it is achieved or lost by a single mutation.

Sequence conservation, and the entire machinery of BLAST searches and functional information, tests the probability of evol_gpuccio occurring. The argument against evol_biol, which is the kind of evolution everyone else is interested in, boils down to, “Read Behe’s book.”

So let’s end on some common ground. @gpuccio affirms common descent, and that might be a more useful conversation to continue on in the near future:

However, at this point, I think we can let this thread end.

@gpuccio is arguing against a version of evolution that is different from contemporary understanding of biology and evolutionary science. It may well be correct that his version of evolution is impossible (and likely so!), but this has nothing to do with modern evolutionary science.

This topic will now auto-close tomorrow at 6pm. There are several interesting subtopics we can continue discussing in other threads. Let us keep these new threads more focused in scope, focused on the subtopics that arose.

@gpuccio thanks for engaging here, and I look forward to continuing the conversation!

swamidass · August 31, 2019, 5:55pm

Final thoughts @gpuccio, @sfmatheson, @art, @glipsnort?

sfmatheson · August 31, 2019, 6:10pm

My final thought: ping me when you find an ID advocate who is actually interested in design and who understands evolutionary biology, at least at the level of an undergrad.

gpuccio · August 31, 2019, 6:56pm

OK, let’s clarify this.

10 objects with 50 bits of FI each are not, in any way, 500 bits of FI.

10 objects with 50 bits of FI each are 500 bits of FI only if those 10 exact objects are needed to give some new defined function.

Let’s see the difference.

Let’s say that there is a number of possible functions in a genome that have, each of them, 50 bits of FI.

Let’s call the acquisition of the necessary information to get one of those functions “a success”.

These functions are the small safes in my example.

The probability of getting a success in one attempt is, of course, 1:2^50.

How many attempts are necessary to get at least one success? This can be computed using the binomial distribution.

The result is that with 2^49 attempts we have a more than decent probability (0.3934693) of getting at least one success.

How many attempts are necessary to have a decent probability of gettin at least 10 successes, each of them with that probability of success, each of them with 50 bits of FI?

Again, we use the binomial distribution.

The result is that with 2^53 attempts (about 16 times, 4 bits, the number of attempts used before) we get more or less the same probability: 0.2833757

That means that the probability of getting 10 successes is about 4 bits lower than the probability of getting one success. The FI of the combined events is therefore about 54 bits.

Why is that? Why do probabilities not multiply, as you expect?

It’s because the 10 events, while having 50 bits of FI each, are not generating a more complex function. They are individual successes, and there is no relationship between them.

That’s why the statement:

10 objects with 50 bits of FI each are not, in any way, 500 bits of FI.

is perfectly correct. Those ten objects have 500 bits of FI only if, together, they, and only they, can generate a new function.

In terms of the safes, solving the 100 keys to the samell functions generates 100 objects, each with 1 bit of FI. But finding those 100 objects does not generate in any way 100 bits of FI, because the 100 functional values found by the thief have no relationship at all with the 100 bit sequence that is the solution for the big safe.

I hope that is clear. We can rather easily find a number of functions with lower FI, but their FI cannot be summed, unless those functions are the only components that can generate a new function, a function that needs all of them exactly as they are.

Please, give me feedback on this point, before I start examining the example of affinity maturation in the immune system.

This is not only to Swamidass, but to all those who have commented on this point.

By the way, I was forgetting: using the binomial distribution, we can easily compute that the number of attempts needed to get at least one success when the probability of success if 1:2^500 (500 bits of FI) is 2^499, with a global probabilty of 0.3934693.

swamidass · August 31, 2019, 7:06pm

@gpuccio, we will continue this for a bit, will likely split it into a new thread.

False.

Exactly, such as the combination of their 10 independent functions. Therefore, by your own definitions, 10 objects with 50 bits are exactly 500 bits of FI (defining function as the sum total of their function).

You go on to explain something entirely unrelated.

Your math is all wrong here. In your example above, that of the 100 safes, the success probability for each trial is 50%, and if you have, say, 20 trials, you expect to get about ten successes.

This also is all wrong. You need to use a modified version of an extreme value distribution.

Because your math is all wrong. Very easy. These are basic homework problems in an intermediate probability course. In your education to become a physician, you probably never had a chance to learn this. Just turns out you are doing the math wrong.

The conceptual disconnect is so large here that I am tempted too call it confabulation.

Of course they can, in all cases, if we define the new function as the sum total of all the functions.

This will be handled, also, in a new thread.

Nope. As I just explained here: Information is Additive but Evolutionary Wait Time is Not. It appears you had some basic misunderstandings here of FI. I encourage you to catch up. This is an important area of science, and it is good to learn about.

swamidass · August 31, 2019, 7:15pm

@gpuccio, I overstated this one thing. I was expecting you were going to the 100 safe example. This one part of your explanation is correct, IF AND ONLY IF the specific 50 FI are 1) required with no substitutes allowed (which we know is not true of evolution) and 2) it is not decomposable. You still are incorrect, because you did not state this material caveats.

2^FI does not equal wait time, except in one very artificial situation.

Moreover you need to be using an extreme value distribution, not a binomial distribution. Your math is all wrong.

gpuccio · August 31, 2019, 7:21pm

Are you kidding?

I have clearly said that I was computing for a probability of 50 bits of FI. The functions are analog to the samll safes, because the bif safe is the object with 500 bits of FI.

Can you just explain that? I have made all the computations using the binomial distribution. They are correct.

If you define the function as getting 10 objects with 50 bits of FI, which is what ahd been declared, that is not a 500 bit function.

If you define the function as getting exactly the following 10 functions:

etc,

each of them with 50 bits of FI, then that reasult has 500 bits of FI.

There was nothing in the original definition that stated that 10 specific functions had to be found. The simple statement was that 10 objects of 500 bits of FI, in general, are 500 bits of FI. That is wrong.

You cannot just say that my math is wrong. Do it yoursefl, explain how you have made the computation, and let’s see.

Your trick of partially quoting my reasoning is unacceptable and unfair:

I have said:

"Let’s say that there is a number of possible functions in a genome that have, each of them, 50 bits of FI.

Let’s call the acquisition of the necessary information to get one of those functions “a success”.

These functions are the small safes in my example.

The probability of getting a success in one attempt is, of course, 1:2^50."

Have you forgotten that the reasoning started with:

“Let’s say that there is a number of possible functions in a genome that have, each of them, 50 bits of FI.”

But you equivocate beacuse I say:

“These functions are the small safes in my example.”

meaning of course that they have the role of the small safes while the 500 bit function has the role of the big safe.

And then you say that my math is wrong because I should have computed for a probability of 0.5!

My math is perfectly right.

gpuccio:

The probability of getting a success in one attempt is, of course, 1:2^50.

How many attempts are necessary to get at least one success? This can be computed using the binomial distribution.

The result is that with 2^49 attempts we have a more than decent probability (0.3934693) of getting at least one success.

How many attempts are necessary to have a decent probability of gettin at least 10 successes, each of them with that probability of success, each of them with 50 bits of FI?

Again, we use the binomial distribution.

The result is that with 2^53 attempts (about 16 times, 4 bits, the number of attempts used before) we get more or less the same probability: 0.2833757

That means that the probability of getting 10 successes is about 4 bits lower than the probability of getting one success. The FI of the combined events is therefore about 54 bits.

What is wrong here?

gpuccio · August 31, 2019, 7:28pm

Why?

I see that you have realized that the discussion about the safe you made previously is wrong.

I have also clearly stated that all these reasonings, for the moment, are only about a random system. In a random system, wait time depends on probabilities and probabilistic resources, as in my model. There is no point in continuosly stating that NS can change some of these things. I know, and I have always analyzed those aspects too. Not yet here.

The role of NS, be it negative or positive, and of possible decompositions, is all another discussion. If we do not analyze the probabilistic context, how can we model the effect of selection?

I would like to discuss the immune system now, but I don’t know if there will be the time. Let me know how to proceed.

swamidass · August 31, 2019, 7:28pm

By definition it is. If each of those 50 bit function is independent, then the function that by definition includes all these functions is exactly 500 bit.

You are right, I overstated it, and explained in the next post:

On that last bit of computation (really all of that post), you need to use an extreme value distribution, not binomial, even if we grant you the unrealistic assumptions here. Remember, probability of success DOES NOT equal 2^-FI. Wait time does not equal 2^FI.

This is a basic misunderstanding you have that may take a while to sink in. That’s okay.

swamidass · August 31, 2019, 7:30pm

I never mentioned NS. I talked about other things. The system we are discussing is a decomposable system, and it is also entirely random.

Do you know what an extreme value distribution is? If you did, it would probably make more sense. When you start discussing the probability of at least X success out of many simultaneous trials, you need an EVD. There are ways to approximate it of course.

gpuccio · August 31, 2019, 7:31pm

I think that if you explained it better, ti could be useful. Remember, I am not considering the effects of selection here. I have computed by the binomial distribution the number of attempts that is expected to give some probability of the defined result. That can be converted into waiting time knowing the mean time necessary for one attempt to take place.

What’s wrong with this?

swamidass · August 31, 2019, 7:38pm

We have worked it out in more detail here: Information is Additive but Evolutionary Wait Time is Not.

Note, the example there does not include positive selection. It is a clearly explained math problem. You just get the answer wrong. It is worth understanding what you got wrong.

It will be hard for you to see this with out being willing to see that wait time does not equal 2^FI. That is just false. The example you gave of the 100 safes is a beautiful example of why this is not the case, and it is your example. I’d focus on that new thread if I were you.

gpuccio · August 31, 2019, 7:38pm

Excuse me, I think that you should express more clearly why an Extreme Value Distribution should be used here. The binomial distribution is a discete distribution, and I believe that we have discrete values here. I am ready to consider your statement, but you need to motivate it.

swamidass · August 31, 2019, 7:39pm

We have explained here: Information is Additive but Evolutionary Wait Time is Not. A local mathematician (@nwrickert) is correctly noting your error is because you are not taking parallelism into account.

swamidass · August 31, 2019, 9:00pm

Briefly, imagine 10 identical runners on a team.

A binomial distribution (drawing an analogy) tells you how long it will take them to finish a ten leg relay race. The central limit therom helps you here, and you can get increasingly accurate estimates as the number of legs increases.

The corresponding EVD, however, would tell you how long for all ten of them to complete different races that all start at the same time. The max time to complete the race is given by the EVD, and that is what you want.

Paradoxically, in some cases, the estimates get less certain as the the number of runners increases. This is because as you increase the number of runners, outliers become more likely, and these outliers are derived from the part of the distribution we cannot usually model from data. This is the “Black swan” effect,

Regardless, evolution doesn’t have to open all 100 safes any way. It doesn’t need all the runners to get across the finish line. It just needs many runners to get accross, not even most, and we would observe FI increases.

swamidass · August 31, 2019, 9:04pm

Don’t be diasappointed @gpuccio. You confronted us, and that was your goal. You achieved that goal. Sorry none of us are neo-Darwinists. Let us know if you find one out there. I thought they were extinct, but maybe you will find a living fossil out there somewhere!

gpuccio · August 31, 2019, 10:28pm

OK, I will try to simplify this point. FI, if correctly understood and applied, is related to the wait time. More or less as 2^FI

The point is, FI is the number of bits necessary to implement one well defined function. Without those bits, the function simply does not exist. That means that the function is treated as non decomposable. Therefore, the wait time is approximately 2^FI. Therefore, FI, used correctly, expresses the probability of finding the function in a purely random system, if no necessity intervention, like NS, intervenes.

That is the purpose of FI. That is the reason it is useful.

Now, if the function can be demonstrated to be decomposable, FI must be analyzed taking into account the decomposition. Which, in a biological context, means the effects of NS.

It is not true that decomposition of a function has nothing to do with selection. In the case of the samlle safes, the wait time is very short because the simpler functions are recognized as such (the safe opens, and the thief gets the money. In a biological system, that means that the simpler function must work so that it can be recognized and in some way selected. Otherwise, those simpler functions would not change at all the probability of the final result, or the wait time. If the thief had to try all possible combinations of 0 and 1 for the 100 safes, and become aware that something has happened only when all 10 safes are open, then the problem would be exactly the same as with the big safe.

So, intermediate function is always a form of selection and as such it should be treated. So, any intermediate function that has any influence of the wait time has also the effect of lowering the FI, if correctly taken into consideration.

Moreover, a function must be a function. some definite task that we can accomplish with the object. The simple existence of 10, or 100, simpler functions is not a new function. Not from the point of view of FI as it must be correctly conceived and applied.

The correct application of FI is the computation of the bits necessary to implement a function, a function that does not exist without all those bits, and which is not the simple co-existence of simpler functions. IOWs, there must be no evidence that the function can be decomposed into simpler function.

That said, 10 objects having 50 bits of FI do not mean 500 bits of FI.

And the wait time for a complex function, if FI is correctly applied, is more or less 2^FI.

If you want to conceive and apply FI differently, and apply it to co-existing and unrelated simpler functions, or to functions that can be proved to be decomposable, you are free to do as you like. But your application of the concept, of course, will not work, and it will be impossible to use it for a design inference.

Which is probably, your purpose. But not mine.

So, if you insist that FI is everywhere in tons, in the starry sky, in the clouds, maybe even in the grains of sands of a beach, you are free to think that way. Of course, that FI is useless. But it is your FI, not mine.

And if you insist that the 100 safes and the big safe have the same FI, and that therefore FI is not a measure of the probability and of the wait time, you are free to think that way. Of course, that type of FI will be completely useless. But again, it is your FI, not mine.

I believe that FI, correctly understood and used, is a precious tool. That’s why I try to use it well.

Regarding the EVD, I am not convinced. However, if you think that such an analysis is better than the one performed by the binomial distribution, which seems to me the natural model for bynary outcomes of success and failure, why don’t you try to make some analysis of that type, and let’s see the results? I am ready to consider them.

The objection of paralelism in some measure I understand. But you must remember that I have computed the available attempts of the biological system as the total number of different genomes that can be reached in the whole life of our planet. And it is about 140 bits, after a very generous gross estimate of the higher threshold.

So, the simple fact here is: we are dealing (always for a pure random system) with at most, at the very most, 140 bits of possible attempts everywhere, in problems that have, in most cases, values of FI much higher than 500 bits for proteins for which no decomposition has ever be shown.

Why should parallelism be a problem? Considering all the possible paralllel attempts in all existing organisms of all time, we are still at about 140 bits.

OK, I am tired now. Again, excuse me, I will probably have to slow down my interventions. I will do what I can. I would like to deal, is possible, with the immune system model, because it is very interesting. Indeed, I have dedicated a whole OP to that, some time ago.

Antibody Affinity Maturation As An Engineering Process (And Other Things)

And I think that this too is pertinent:

Natural Selection Vs Artificial Selection

And, of course, tornadoes, tornadoes…

Ah, and excuse me if I have called you, and your friends, neo-darwinist. I tend to use the expression in a very large sense. I apologize if you don’t recognize yourself in those words.

From now on, at least here, I will use the clearer term: “believer in a non designed origin of all biological objects”. Which, while a little bit long, should designate more unequivocally the persons I have come here to confront myself with. Including, I suppose, you.

swamidass · August 31, 2019, 10:53pm

False. Nice try.

Exactly like your 100 safe example! But wait…

Totally unlike your 100 safe example. It is very convenient this time for you to produce your own counter example. Thank you.

You have personalize this. This is YOUR purpose in FI. However FI doesn’t work for this purpose. You wonderful counterexample aptly demonstrates this!

I could go on, but it should be clear were your argument stands.

swamidass · August 31, 2019, 10:54pm

I am not a neo Darwinist in any sense. None of us are. No reason to apologize. Just tell us when you find one. I thought they were extinct!

glipsnort · September 1, 2019, 12:46am

You’ve already defined FI: -log_2(target space / search space). Your definition is simple and easy to apply. Under it, 10 independent objects with 50 bits each have a total FI of 500 bits. You can see it from your own safe analogy: what are the search and target spaces for the 100 and 1 safe cases?

Topic		Replies	Views
Gpuccio on Common Descent Conversation Science	1	750	August 26, 2019
Miller: Axe Decisively Confirmed? Conversation Science , Design	31	4565	February 23, 2019
Gauger and Mercer: Bifunctional Proteins and Protein Sequence Space Office Hours Design	188	7472	November 15, 2018
Mercer's Work on Protein Function and Sequence Space Office Hours Design	5	809	June 19, 2021
Simulating 500 million years of evolution with a language model Conversation Science , Artificial-Intelligence	9	183	February 2, 2025

Gpuccio: Functional Information Methodology

Related topics