Intelligence and Halting Oracles

@EricMH while you are thinking about it, may I ask for a clarification? Please forgive my clumsy notation below. My background in statistical theory lets me follow the IT literature well enough, but I’m not used to expressing myself in these terms.

For the law of Information non-growth in the setting of statistical theory we have
I(X; Y ) ≥ I(X; T(Y ))
which is familiar to me as an application of Jensen’s Inequality. In the Algorithmic Information setting we have the same, and this shows that the average message length is minimised when coded for the true probability distribution rather than any function T() of that probability distribution. I agree with you this far, but here begins my question:

If I(X; Y ) represents information for some biological function coded by the true probability distribution p, then are you asserting that I(X; Y ) already represents the maximum state of greater biological fitness can exist?

In this scenario any T() other than identity will indeed have lesser biological function, because no other state is possible. But this seems to be intuitively backwards for what is supposed of biological evolution: For some probability distribution q not equal to the true distribution p there exists some function U() such that
I(X; Y,q ) \leq I(X; U(Y ),p)
thus allowing an increase in fitness. That is, U() reduces the Kullback-Leibler distance between q and p. I would weaken this statement to U() might exist, because it is not clear we could find such a function universally.

1 Like

He couldn’t even produce a single working simulation to validate his empericism claims. So we didn’t get here. Give it time and we would have started demonstrating different U functions that broke his implementation. But alas you can’t break something already broken.

I’m saying the same thing as you (Swamidass) did regarding different types of mutual information, I think. Increase in MI depends on where you start and where you go.

I have some thoughts about MI and replication, I want to think it thru a bit more first.

1 Like

The environment is highly informative. Typo, right?

3 Likes

Eric, It’s not clear what I asked is directly answerable. I won’t throw any stones if you can’t answer formally. :slight_smile:

For water molecules the answer would seem to be entropy. The chemical energy released when two hydrogen and oxygen atoms combine to form a water molecule dissipates to the environment, raising total entropy and increasing the information relative to those atoms being a water molecule. THAT was dammed hand-wavy, but it is the essence of the answer, I think.

I’m going to give the more general case of replication another rethink. I will attempt to be more formal about it too.

1 Like

What actually happened I produced a couple, and you rejected them in favor of your own implementation, which I subsequently showed to be flawed.

No, you cannot algorithmically reverse engineer the elegant program for the environment, unless the environment is purely random.

I wasn’t trying to produce a working example because I believe it is impossible to do so. This is particularly true if you are using empirically incompressible strings, as you do here. If we restrict the domain, it is possible to produce a working example. That was never on the docket though.

No, if X is the ideal distribution of fit genomes and Y is some existing genome, then as long as I(X;Y) < H(X), then Y can still achieve greater biological fitness and learn more about X.

Yes, certainly a U exists, at least in some abstract sense, since it is the missing information that gets I(X;Y) up to H(X).

I think the missing piece here is that since U exists, and U applied to Y increases I(X;Y), then why do I say information growth impossible, since U made it grow?

The crucial point is how do we get U. We need some other sort of function that gives us U if we don’t have it already. And the problem becomes that that function needs the same information content as U to give us U. We can never get information from nothing.

Another trip up point is that it seems we could accidentally generate U. If U is a bitstring, then there is always a chance of generating U by flipping a coin enough times.

This objection is correct, but what we are looking for is some function that can generate U more reliably than by flipping a coin. Additionally, we want to recognize and hold onto U if we do generate it by flipping a coin or some other randomized process.
In both cases, we face the same kind of information problem, in that to do either thing we end up needing the information that is in U to reliably produce and/or hold onto U.

This was the insight of Levin’s proof, that if you take the expectation over all possible random functions that could perform U, you never end up with a net information gain.

Incidentally, this turns out to be the negative Kullback-Liebler distance. Since, per Jensen’s inequality (as you mention), KLD is always non negative, then -KLD is always non positive.

So, while there is always a probability of generating U by chance, we can say with certainty the expected information gain by applying randomly generated functions (-KLD) is always non positive. And this is the probabilistic side of the law of information non growth.

1 Like

What is on the docket? Your argument is unclear. Just because algorithmic mutual information is not calculable does not mean it has no empirical validity.

The last comment I posted in the wrap up thread I reference Li and Vitanyi’s use of compressible approximation to algorithmic conditional information to perform empirical clustering to great effect. And algorithmic conditional information implies algorithmic mutual information.

@EricMH we’ve had a large range of discussions. I’ve appreciated this. I’m not sure what you are hoping for now. I thought that last thread was you “wrap up”. What do you think needs to be discussed now?

I believe the resolution to this conundrum is pretty straightforward.

A is the thing to be copied, and there is a function F that produces B, the copy. F needs the information in A to create B, so there is no information creation going on:

I(F:A) >= I(B:A).

My understanding is I went through this whole empirical experiment exercise in order for you to present your disproof of ID’s empirical validity. I’ve done the former and now I await the latter. On the other hand, if you don’t have such a disproof, then say so.

OK, we are on the same page at least. I feared you might have it the other way around! :slight_smile:

In statistics we know U exists from the Rao-Blackwell-Kolmogorov theorem. …

And now I find I need some time to follow through the theory on the Algorithmic Information side. I will come back to this.

I think we are mostly in agreement, except that the expected information gain is not the goal. The distribution of information change just needs to capture some part of the fitness gradient. I’ll try to make that a more formal statement when I get back to this.

AND thank you for following up, I do appreciate it.

Edit: I had a thought about H(X) being a set of points of roughly equivalence function, rather than a single maximal point. I think that might imply a non-convex loss function, and Jensen’s inequality may not apply. This would also change the parameters of the current discussion, so I’m not sure it’s fair to introduce this now, but I thought it was worth noting.

2 Likes

@Timothy_Horton Be polite. There is some deep math here, and I won’t just wave it away without understanding it.

If you want to insult people, go back to Facebook.

5 Likes

He did say he would follow up on my questions, and I’m glad that he has.

1 Like

When someone claims to have provided empirical evidence for the Intelligent Design of biological life, an event which if true would be one of the most profound scientific discoveries of all time, it seems only prudent to be skeptical. I’m merely pointing out asserting “no one has DISPROVED my claims” isn’t scientific evidence for ID.

2 posts were merged into an existing topic: Explaining the Cancer Information Calculation

Tim, there is a real discussion happening here, and I would like to see it through. If you can’t contribute in a constructive way, then please do not interrupt.

2 Likes

In this setting will might be able to determine when I(X; U(Y),p) will and will not converge to H(X). That depends on a numbers of unstated assumptions, including a fitness gradient, which is beyond the scope of this discussion. We can find coin-flip scenarios where I(X; U(Y),p) will generate MI relative to fitness (convergence).

That doesn’t mean you are wrong about increasing the expectation over all possible functions, but it does leave some wiggle-room for randomness to increase information within the accepted definition of Information Non-Growth, for some subset of functions.

@dga471 My original question has to do with the definition of MI. “MI is not a thing” but a measure of commonality between two things, A and B, and here replication creates MI by definition.

Eric is saying that duplication of A to create a new object B creates no new information, which is also correct, but in a different meaning, as you noted.

I need to think about this, and backtrack to what I wrote about gaining information from the environment, then sit down and try to turn this into a clearly stated argument.

4 Likes