The TTP as defined by Eric seems to be a more narrow, specific claim: namely that there are processes which can never be performed by a Turing machine. Processes, not results. So it seems that Eric needs to show that humans can create procedures which are inherently impossible for a TM to execute.
(Still, this is a bit confusing to me because I thought that almost by definition, processes (programs?) can always be executed by a Turing machine. Am I mistaken?)
I imagine this, if successful, would also be some sort of proof against the possibility of artificial general intelligence (AGI) using Turing machines. So perhaps looking into the arguments for or against the possibility of AGI would be one way to judge how likely TTP is true or not.
So what happens if we unplug the computer? Isn’t that a halting state??
There seem to be several unstated assumptions at work here.
First, MI is not a thing that exists in a physical way, it is a measurement between two systems (say A and *B). If A and *B are identical or nearly identical then they will share a high degree of MI. We can easily imagine a physical process that produces identical copies of the same system; for instance, all water molecules share a relatively high degree of MI. By this argument all water molecules (and all matter!) is designed.
It is more interesting to consider some system A than is capable of making copies of itself, creating a child system B that can also make copies of itself. Here the MI should contain the information relative to the_function_ of reproduction. Again, we can name physical processes that reproduce themselves, such as fire. Fire possesses some of the qualities of life; it metabolizes fuel from the environment, and can spawn other fires nearby via sparks. On state of “on fire” is very similar to any other state of fire, so they should have MI between them. Is fire designed? I think not.
So far we have at least two examples of MI that do not require intelligence, unless we want to declare everything to be designed, which is not useful in a scientific sense.
We can continue this exercise by adding functions in addition to replication. For instance, can the process capture energy from the environment? If we count hot embers as stored energy, then fire again appears to be designed. Capturing energy is also implied in the function of reproduction.
But what about capturing information from the environment? Now we have something different. Fire clearly doesn’t have any “memory” about what it burned before. Fire never gets any better or worse at burning, it just continues to burn until it exhausts itself. I claim that capturing information from the environment is a key function for sometime to be “life”. Capturing information separate life from other chemical processes that we would not claim to be alive. There could be additional functions needed depending on how we define life, but reproduction and capturing information are essential.
[edit:: clarifying: Where and when halting occurs is another hidden assumption, or at least undefined]
Considered as an algorithm, a system A that reproduces itself is a forking process. The forks produce child systems B and C (with slight variation) which continue. If C is better at reproducing itself than B we will get more copies of C in the environment than B. If B and the children of B are effectively unable to reproduce, then the B process halts. The remaining process C has gained*** information relative to successful reproduction (fitness) from the environment.
*** clarification: C has not lost information relative to fitness, and may have gained relative to A.
Once such a system as A and C exist in the environment there is no barrier to adding additional information other than the environment itself. @EricMH may still have a case for the initial MI, but the idea needs further refinement to avoid the “everything is designed” implication.
He is talking about a halting Oracle, not a halting state.
However, the real issue is that no logical process can solve the halting problem, not no physical process. Moreover, human intelligence is a wonderful counter example. What ever ID means by design, apparentkt humans can do it, even though we are demonstrably not halting oracles.
Agreed. I went through this exercise to illustrate that Information Non-Growth doesn’t have any useful meaning here without careful definition of the conditions, and a halting oracle doesn’t seem to be necessary at all.
(S. Joshua Swamidass)
split this topic
This is a good, substantial comment on the topic of info theory, so I wanted to make a quick response before I depart. I’ve been meaning to for awhile, but kept forgetting, my apologies.
Your main argument is that duplicating A creates mutual information between A1 and A2. This is an interesting point, and I’ll get back to you with a response. On the face of it, it does seem to contradict the law of information non-growth, so there is possibly some error here. But, I cannot handwave things away, so I’ll come up with a formal resolution to this conundrum.
Regarding your other point about organisms learning the environmental information, I also suspect that learning about the information in environment E is limited by the information in A, since the environment itself is pretty uninformative, K(K(E)|E) = K(E). And at any rate, there is still a mass quantity of environmental information to explain, which results in the information regress I pointed out to you before, same as Dembski’s proof regarding the vertical no free lunch theorem. But, this point is good and deserves more than just a handwave, so I’ll work on a formal resolution here too.
@EricMH while you are thinking about it, may I ask for a clarification? Please forgive my clumsy notation below. My background in statistical theory lets me follow the IT literature well enough, but I’m not used to expressing myself in these terms.
For the law of Information non-growth in the setting of statistical theory we have
I(X; Y ) ≥ I(X; T(Y ))
which is familiar to me as an application of Jensen’s Inequality. In the Algorithmic Information setting we have the same, and this shows that the average message length is minimised when coded for the true probability distribution rather than any function T() of that probability distribution. I agree with you this far, but here begins my question:
If I(X; Y ) represents information for some biological function coded by the true probability distribution p, then are you asserting that I(X; Y ) already represents the maximum state of greater biological fitness can exist?
In this scenario any T() other than identity will indeed have lesser biological function, because no other state is possible. But this seems to be intuitively backwards for what is supposed of biological evolution: For some probability distribution qnot equal to the true distributionp there exists some function U() such that
I(X; Y,q ) \leq I(X; U(Y ),p)
thus allowing an increase in fitness. That is, U() reduces the Kullback-Leibler distance between q and p. I would weaken this statement to U() might exist, because it is not clear we could find such a function universally.
He couldn’t even produce a single working simulation to validate his empericism claims. So we didn’t get here. Give it time and we would have started demonstrating different U functions that broke his implementation. But alas you can’t break something already broken.
Eric, It’s not clear what I asked is directly answerable. I won’t throw any stones if you can’t answer formally.
For water molecules the answer would seem to be entropy. The chemical energy released when two hydrogen and oxygen atoms combine to form a water molecule dissipates to the environment, raising total entropy and increasing the information relative to those atoms being a water molecule. THAT was dammed hand-wavy, but it is the essence of the answer, I think.
I’m going to give the more general case of replication another rethink. I will attempt to be more formal about it too.
I wasn’t trying to produce a working example because I believe it is impossible to do so. This is particularly true if you are using empirically incompressible strings, as you do here. If we restrict the domain, it is possible to produce a working example. That was never on the docket though.
No, if X is the ideal distribution of fit genomes and Y is some existing genome, then as long as I(X;Y) < H(X), then Y can still achieve greater biological fitness and learn more about X.
Yes, certainly a U exists, at least in some abstract sense, since it is the missing information that gets I(X;Y) up to H(X).
I think the missing piece here is that since U exists, and U applied to Y increases I(X;Y), then why do I say information growth impossible, since U made it grow?
The crucial point is how do we get U. We need some other sort of function that gives us U if we don’t have it already. And the problem becomes that that function needs the same information content as U to give us U. We can never get information from nothing.
Another trip up point is that it seems we could accidentally generate U. If U is a bitstring, then there is always a chance of generating U by flipping a coin enough times.
This objection is correct, but what we are looking for is some function that can generate U more reliably than by flipping a coin. Additionally, we want to recognize and hold onto U if we do generate it by flipping a coin or some other randomized process.
In both cases, we face the same kind of information problem, in that to do either thing we end up needing the information that is in U to reliably produce and/or hold onto U.
This was the insight of Levin’s proof, that if you take the expectation over all possible random functions that could perform U, you never end up with a net information gain.
Incidentally, this turns out to be the negative Kullback-Liebler distance. Since, per Jensen’s inequality (as you mention), KLD is always non negative, then -KLD is always non positive.
So, while there is always a probability of generating U by chance, we can say with certainty the expected information gain by applying randomly generated functions (-KLD) is always non positive. And this is the probabilistic side of the law of information non growth.
What is on the docket? Your argument is unclear. Just because algorithmic mutual information is not calculable does not mean it has no empirical validity.
The last comment I posted in the wrap up thread I reference Li and Vitanyi’s use of compressible approximation to algorithmic conditional information to perform empirical clustering to great effect. And algorithmic conditional information implies algorithmic mutual information.
@EricMH we’ve had a large range of discussions. I’ve appreciated this. I’m not sure what you are hoping for now. I thought that last thread was you “wrap up”. What do you think needs to be discussed now?
My understanding is I went through this whole empirical experiment exercise in order for you to present your disproof of ID’s empirical validity. I’ve done the former and now I await the latter. On the other hand, if you don’t have such a disproof, then say so.