Eric Holloway: Wrap up Experiment on Mutual Information

You haven’t. That to me is time wasted on learning a hard subject like information theory and not doing anything useful with it. Drop the ID bull**** and use your PhD in information theory for something that improves people’s lives like better communication systems, better information systems.

Please demonstrate this, then. I have a decent amount of effort to demonstrate my claims. You claim to have very impressive credentials in the field. As such, you should be able to easily demolish my claims. I would be very grateful if you did so.

You have not shown anything of value in information theory here in the past several weeks. All smoke and mirrors.
Nothing of value, nothing warranting discussion, and certainly nothing worth publication in IEEE Transactions on Information Theory.

https://www.itsoc.org/

@EricMH,

The only way this line of argument could succeed is if you think miraculous activity really can be detected.

So how do you use human code to prove miraculous code?

A Bug Detected and Corrected (Thanks!)

You are right! I had a bug in the code. Thank you for picking that out. Just fixed it. Runs with the same result as before, all around. For the GZ compression (mine):

Prediction I_GZ(Y:Y)>0 (CONTROL PASSED)
Experiment 5: iid Y
Average success: 1.0

For the LZ compression (yours):

Prediction: I(X:Y) >= I_LZ(X:Y)
Experiment 3: X = Y = 1s
Average success: 0.0

So thanks for catching the error. The point still stands. Your compression algorithm fails. Mine does not. Why didn’t you fix the bug and find out my claim was still correct?

A Misguided Rebuttal from Eric

Here is one error.

To be clear, it must show that MI(Y|Y) approximately equals I(Y). Let us say, just to be precise, it has to be within 5% of the correct value Regardless, I increased the length to 10000, and got this result:

Prediction I_LZ(Y:Y)>0
Experiment 5: iid Y
Average success: 0.0

Increasing the string size does not fix this problem. Your instinct is false. Why didn’t you do the simulaiton yourself instead of putting out an obviously false claim like this? Why would you make these claims without testing them? This is really bad form.

Another Misguided Rebuttal

This is an example of a valid thing to do in an experiment like this. Random strings are not usually compressible, however by treating them consistently (rather than just passing through their length as you did), we can still arrive at a valid MI estimate. That is the key point here, a valid estimate of MI.

We can use a better implementation if you like. I just convert the string of ‘01’ to a string of bytes. Same results. Why don’t you try testing your claims before throwing them out their like this? Your intuitions are just not valid in this case.

However, it is not a valid implementation if MI is usually returning zero on MI(Y:Y) (as in your modified LZ compression). That is just not consistent. It is not mathematically valid. It turns out that a compression algorithm like that I wrote is valid, even though it does not always reduce the size of the bit string.

The Main Point

The main point is that you can’t even establish empirically any of the claims you’ve established theoretically about mutual information growth. This is just starting with the examples you’ve given. Once you get over this hump (if you could), I’d still be able to offer new types of X, Y, and E to break your claim. This is just trivially easy because the claims you are making are not empirically valid.

The claims are theoretically valid in a narrow context that does not apply to the real world. The mistake you keep on hitting up against is thinking that the theory will cary over to practice. It will not. Because compression is uncomputable, we are guaranteed that empirically these are not valid claims.

The Other Main Point

@Patrick has a point.

The Code

I just edited the original file. It took me just about 5 minutes to test your claims, and implement a better compression. Why not try testing your claims before making them next time? You might learn a lot. https://repl.it/@swamidass/Holloway-Info-Nongrowth

Well, as we discovered during the thread Intelligence and Halting Oracles, your claims ultimately do depend on an affirmation that intelligence is a halting oracle, and that humans possess this intelligence, so humans are halting oracles.

And Josh has argued that halting oracles are a logical impossibility. In any case, it seems far from obvious that humans are halting oracles, and there’s a longstanding debate in the field of AI about the computational nature and power of the human mind. Even you seem to admit that there is no clear proof or disproof of the TTP.

So at least there’s one gray area there. You really can’t go on and pretend that this is all resolved by arguing over a straightforward mathematical proof.

2 Likes

Just for the record, I myself didn’t argue this. That is what the field of Information Theory teaches. It is the foundational finding that every computer scientist should learn at the undergraduate level. This is from Turing himself, the father of the field.

Which is to concede that humans are not halting oracles. Therefore humans are not able to produce mutual information in the unobservable theoretical sense in which you refer to it. There is no exception in the information theory proofs that an incomplete halting oracle can produce mutual information.

Once again, we are not allowed private definitions of halting oracle and other precise technical terms, when appealing to proofs made in a field that use a cannonical definition. That is not “kosher.”

3 Likes

LZ doesn’t do so well on non ergodic sequences. On the series of 1s in my 3rd experiment it works.

MI(Y:Y) = K(Y) + K(Y) - K(Y,Y)

Since I_LZ(Y:Y) is going to use upper bounds on these Ks, there is no guarantee it’ll return a value greater than zero, especially if the requirements for good LZ performance are violated as in the case of random bitstrings. Not sure why you say it is not mathematically valid.

I did. My experiments worked. You merely substituted your own compression algorithm, which should not be returning MI(X:Y) > 0 on randomly generated X and Y. So, it looks like your algorithm is incorrect. I’ll get back to you on this next weekend.

This is very interesting. Can you prove/demonstrate there is no way to have any empirical validity? As I stated in another thread, this does not follow from uncomputability. You need to fill in the gap in #2. This is the key claim you make and it’d be great if you can demonstrate it somehow. I will continue thinking of how to do so, too, because I’d also like to disprove (my version of) ID.

  1. Compression is uncomputable
  2. ?
  3. Information non growth is not a valid empirical claim

I’ll return to this thread next weekend.

Can you cite where Turing proved halting oracles cannot exist? As far as I know, he only proved Turing machines cannot be halting oracles. I’d be extremely grateful and most likely reject ID if you can prove halting oracles are impossible.

Yes there is. If we have a halting oracle O we can calculate the Kolmogorov complexity for any bitstring, and that allows us to easily violate the law of information non-growth. A limited halting oracle also has this ability.

I’ll return to this thread next weekend.

Here it is:

This is precisely defined as a Turing machine. Thus, your proof shows that a Turing machine cannot be a halting oracle. So, if you are using this proof to show halting oracles do not exist, then you are begging the question with the assumption that everything is a Turing machine, the exact premise I do not accept.

As it is, whether the mind is a halting oracle is an open (but small) area of research called hypercomputation, being investigated by people like Selmer Bringsjord and Jack Copeland.

Looking further into your code over the last week, the problem is your compression function decompresses bitstrings, and then you factor this into the mutual information. The reason for the decompression is possibly file information overhead, and also that incompressible bitstrings must be distinguished from compressible bitstrings to allow for correct regeneration.

What this results in is that a random bitstring has mutual information with every other bitstring. Additionally, you could have bitstrings that have algorithmic mutual information but will register as having zero mutual information due to this extra value, since they individually LZ compress enough to eliminate the overhead, but together do not LZ compress and still have overhead.

However, according to information theory, one defining characteristic of a random bitstring is that it is independent of any other bitstring, so has zero mutual information with every other bitstring. So, the fact that a random bitstring has mutual information with itself is an artifact of this overhead. In standard Kolmogorov complexity theory, this overhead is ignored as a constant term, so that is why I subtract it from my functions.

Consequently, I am able to guarantee at least one true positive with my algorithm:

With high probability, my MI function will not measure positive on two independently randomly generated bitstrings.

iid X, Y I(X:Y) = 0 => I_LZ(X:Y) = 0

So, this is one empirical guarantee preserving an algorithmic mutual information identity, contra your claim that algorithmic mutual information had no empirical validity.

@EricMH, this conversation is nearly over. At the moment, we are waiting for this to conclude:

You make claims like this all the time that end up being false. There is just no point engaging this. Demonstrate claims like this. I’ve demonstrated you wrong several times now. It is your turn.

One again an empirical claim with no empirical validation. You have yet to produce any code that demonstrates your claims. That is my take away from this exchange, and it seems that you would agree.

First, a preliminary note, you are the one claiming that you can prove ID has no empirical validity. So, this whole exercise of having me come up with algorithms, and then trying to point out flaws in them does nothing to support your claim. Whether I can or cannot come up with the right algorithm has nothing to do with whether you can prove ID is not empirically tractable.

This is such a crazy debate tactic, but I play along with it because I am interested in whether I can do it. However, it is frustrating when you do nothing to make the positive case that you can prove ID is not empirically tractable. You’ve promised this ever since I first came on the forum, but have continually avoided making any positive contribution of your own.

You have not. This last round you substituted your own compression algorithm and claimed victory.

You repeatedly claimed to be able to disprove the empirical validity of ID, and each time your proofs were mathematically flawed. You do not acknowledge this, and instead continue making the same unproven claims.

Once I get to a more reliable internet connection I’ll give empirical demonstration of the decompression problem your approach has.

This is one of the experiments I ran…

My first experiment suffered from the decompression problem that your compression algorithm has. I fixed it, and my experiments pass.

The sentence you quote says exactly the opposite…

It seems we are back to our starting point.

It is my observation that you make empirical claims and cannot produce simulations that satisfies these claims. From the beginning I’ve warned about what you dismiss as Pesky Implementation Details in Information Theory. There is a wide gap between theory and practice. Just because you’ve proven something theoretically does not mean it is true empirically.

If you are done trying to convince me, perhaps our time has come to a close. At least for now.

Ok, so you admit you cannot prove ID is empirically intractable. Fine, why have I wasted my time here?

As I see it,

  1. You are a leading light in ID Information Theory, trained by the best of the movement.

  2. You made several theoretical claims that you believed transferred over to practical measurement.

  3. Two times now, you produced simulations meant to demonstrate a claim.

  4. In both cases, your simulations did not demonstrate your claims true. Perhaps they were uninterpretable, but they certainly did not demonstrate your claims true.

  5. Several other claims have been made that are all demonstrably false. In fairness, I will list one below, but there are many more like it.

  6. There is no reason any one should trust the claims we have not tested, because not one of the claims you have tested has been demonstrated true.

As an example of a false claim:

There are so many claims like this through out your work. It takes too much effort for me to demonstrate them all false, but in this case I did.

This gets to the heart and soul of information theory, as I explained to you long ago:

I have demonstrated that your theoretical constructs have not even one time transferred over to an empirical simulation. Maybe someone else can succeed where you failed, and I’d welcome this. For example, maybe someone can solve this Open Challenge to ID Advocates.

At this time I am closing the thread. @EricMH can request to put a final post up when he is ready.

Thanks for giving me the closing comment again. This is to your credit.


As an opening note, while this discussion has focussed a whole lot on my own capabilities (and lack thereof) of translating algorithmic information theory into empirically testable settings, it should be said that much more qualified people than myself have done just this. Li and Vitanyi have written an entire book on practical applications of Algorithmic Information Theory, and famously derived a compression based clustering metric from algorithmic information theory.

Now for my responses.

We call each other out on three items, which I’ll address in turn.

1. MI_LZ(Y:Y) > 0 on long enough sequences

My algorithm will give MI(Y;Y) > 0 on long enough sequences, but not on every sequence due to LZ taking awhile to get warmed up.

swamidass:

I increased the length to 10000, and got this result:

Prediction I_LZ(Y:Y)>0
Experiment 5: iid Y
Average success: 0.0

Increasing the string size does not fix this problem. Your instinct is false. Why didn’t you do the simulaiton yourself instead of putting out an obviously false claim like this? Why would you make these claims without testing them? This is really bad form.
[/quote]

LZ only makes asymptotically optimal guarantees for ergodic sequences. For highly random bitstrings, this means it takes a very long time for LZ to compress the bitstring.

So, looks like the sequences must become rrrreeeeeaaaaaalllllyyyyy long for MI_LZ(Y:Y) > 0 if iid Y. I have a loglog plot of LZ(Y+Y)/len(Y+Y) for iid Y, and it looks like it won’t cross the X axis until Y’s length reaches 10^{12}.

2. Demonstration of tractability

You have pointed out flaws in the experiments, but I have also demonstrated successes. For instance, the experiment that started off this thread can guarantee with high probability that if X and Y are iid, then MI_LZ(X:Y) = 0, which matches the algorithmic mutual information property in this regard. I’ve repeated the code in a new repl.it below, and included the control showing that also MI_LZ(X:Y) > 0 for some X and some Y, so I didn’t just cheat by setting MI_LZ(X:Y) = 0 for everything. It is admittedly a very modest empirical guarantee, but it is a guarantee nonetheless.

You have not yet refuted this particular experiment. What you have done is proposed your own compression metric, which I’ll address next.

3. Issues with the GZip mutual information

You are not a fan of my LZ implementation, so you use the standard LZ algorithm in GZip. This is fine, but you make the same sort of mistake I did with my first experiment, where I failed to account for decompression.

In the following repl.it, I copied your GZip version, and have two experiments. The first shows that GZip consistently decompresses random bitstrings. The problem is that since random bitstrings are decompressed, then they artificially register mutual information where there is none.

The second experiment shows the consequence of the decompression. I generate a random X from a Bernoulli 0.5 distribution. I show that it registers mutual information with any Y drawn from Bernoulli distributions taking probabilities 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0. Information theory says a Bernoulli(0.5) X should have 0 mutual information with any other bitstring (barring a few highly improbable exceptions), so GZips decompression issue causes the metric to violate this information theory property.

That being said, this is not necessarily a ding against the GZip version. It can either be easily corrected by ensuring it doesn’t return decompressed values, as I did, or this issue can be acknowledged as a tradeoff for some other useful property the GZip version preserves.

The overall takeaway is that @swamidass is correct in that not all properties of algorithmic mutual information can be preserved when moving to the calculable, empirical realm.

On the other hand, these experiments seem to show we can pick and choose properties, and guarantee empirical preservation of at least some of them.

In particular, preserving the property that MI is not positive when there is no algorithmic mutual information could allow us to detect when there is algorithmic mutual information, even if we cannot measure the precise amount.

And I think that’s all I’ve got. Take care everyone! I seem to have gotten something out of my interactions here with you all, since I just stayed up until 2am when I should be studying for a major exam instead :stuck_out_tongue:

1 Like

Thanks for your final comment. This has been in my inbox for 9 days. I am sorry for the delay in putting it public. This was just an administrative error on my part.

2 Likes