Eric Holloway: Wrap up Experiment on Mutual Information

Looks like the mutual information discussion is winding down. I’d mentioned I cleaned up the experiment from the first time I talked with @swamidass. The main fix is I made is the calculated mutual information is always non-negative by making the compression algorithm not decompress bitstrings, except adding a single bit to flag that the bitstring is incompressible. I also subtracted an extra bit in the mutual information calculation to account for the possibility of two incompressible bitstrings. These fixes resulted in the first two experiments achieving a 100% success rate.

I created a second set of experiments to see whether calculated mutual information underestimates algorithmic mutual information. In the first case I use two bitstrings consisting only of 1s, and results in a 0% success rate. In the second case, I use randomly generated bitstrings, and get a 100% success rate.

The experiments can be seen here:

What conclusions can we draw from this? First, I submit the premise that compressible bitstrings are themselves instances of algorithmic mutual information. With that premise, the main conclusion is that in the absence of algorithmic mutual information, i.e. all random bitstrings, calculated mutual information behaves the same, i.e. measures zero. In other words, you cannot get positive calculated mutual information from zero algorithmic mutual information.

Since algorithmic mutual information cannot be produced by randomness + determinism, and calculated mutual information requires algorithmic mutual information, then the existence of calculated mutual information implies one of two possibilities:

  1. algorithmic mutual information exists for no reason
  2. something other than randomness + determinism exists

2 is a preferable conclusion to 1 if there is a viable alternative to randomness + determinism. A halting oracle is an alternative, and it seems to be viable. I do not know of any proof that a halting oracle is a logical impossibility.

As such, these experiments seem to best support conclusion 2, and the best candidate that I know of for 2 is a halting oracle.

A counter argument someone might give is these experiments are illusions since they use pseudorandom number generators instead of true randomness. To which I’d respond: the experiments would not turn out differently if we had used a quantum random bit generator instead. However, we cannot know if anything is truly random in the quantum sense. But, we can know that algorithmic mutual information exists if computable mutual information exists. So, we can know true positives, but not true negatives, when it comes to algorithmic mutual information.

So, that’s about all I’ve got. I’ll check back during the weekend and address any questions people have.


@EricMH thanks for giving this a shot. It is a valiant effort. Unfortunately, you made an error in your MI calculation. Turns out this positive result is a just an artifact of this error. I don’t have time to explain fully right this moment, but I will soon. I want to commend you in actually attempting to test your ideas with a simulation.

1 Like

A clarification on my appeal to a halting oracle.

An objection I’ve seen a number of times is that at least human intelligence is not a halting oracle because it is easy to pose halting problems that humans cannot solve.

But, this does not prove that a human is not a complete halting oracle. There is a space between what a Turing machine can decide regarding the halting problem and what a complete halting oracle can decide. Just because a human is not a complete halting oracle does not mean humans are not limited halting oracles within this space between Turing machines and complete halting oracles.


This sort of response is exactly the reason I was fed up with your forum in the first place.

I’ll be back. Don’t worry. I just need some time…

Alright @EricMH,

The problem is that your MI function always returns zero, because you have not used a good compression function. This leads to contradiction.

Experiment #3

Even your own experiment #3 fails, but I’m certain you are miscalculating I(X:Y) in this. On your code, this is what I get:

Prediction: I(X:Y) >= I_LZ(X:Y)
Experiment 3: X = Y = 1s
Average success: 0.0

Where you are using this.

I(X:Y) = log(new_str_len,2)*2-log(new_str_len,2)-1

However, that is not the right formula. I’d implement a fix, but in this specific case (given the inputs), we know that I(X:Y) should approximately equal zero. So certainly it is TRUE that I(X:Y) >= I_LZ(X:Y).

Experiment #5

I added a control for you. THe mutual information of an IID bit-string with itself should be the length of the string, much greater than zero. Here is the results.

Prediction I_LZ(Y:Y)>0
Experiment 5: iid Y
Average success: 0.0

It fails? Why? Because I_LZ always returns zero everytime. It cannot measure MI. It is not a valid measure of MI.

What is the Problem?

The compression algorithm used always returns the same value every time. In the case of these experiments, with a length of 1000, it always returns a compression size of 1001, which then makes the MI always equal zero, no matter what the input. This means it is not actually a compression algorithm. This is a consequence of the hack you added in here:

Essentially, every single bit string in the whole experiment is incompressible, the compression algorithm always just adds a bit, inflating every example. The situation is worse in how it computes MI. The algorithm cannot recognize that identical objects have MI = I. It is an invalid compression function over this data for this reason.

What is the Fix?

I substituted GZ compression for your “modified-LZ” compression, and reran the experiments. We can now pass the basic test. The algorithm can actually return MI > 0 when given identical objects.

Prediction I_GZ(Y:Y)>0 (CONTROL PASSED)
Experiment 5: iid Y
Average success: 1.0

However, now all your claims fail, because they were never empirically valid in the first place.

Prediction: I_LZ(X:Y) >= I_LZ(E(X):Y)
Average success: 0.6

Prediction: I_LZ(X:Y) >= I_GZ(E(X):Y)
Experiment 2: iid E, X = Y = 1s (FAIL, NOT VALID EMPERICAL CLAIM)
Average success: 0.0

Prediction: I(X:Y) >= I_GZ(X:Y)
Experiment 3: X = Y = 1s (FAIL, NOT VALID EMPERICAL CLAIM)
Average success: 0.0

Prediction MI(X:Y)==0 (FAIL, NOT VALID CLAIM)
Experiment 4: iid X, Y
Average success: 0.0

You can see my code here: Holloway: Info Nongrowth - Replit


With some time, I’ll circle back and visually show where Eric is going wrong.

Computer coding is definitely out of my field of knowledge, so I had some simple background questions.

If I replaced every string of 3 or more identical bases with a number and the base would this be a crude compression algorithm? For example:



Also, you mentioned a compression algorithm in another thread where you only list the differences between your sequence and a reference sequence. This sounds a lot like Mpeg video compression, at least according to my limited understanding. The information for each frame in an Mpeg compressed file contains the changes from the last frame. If there is no information for a pixel then the pixel is carried over from the previous frame. Was Mpeg video compression an inspiration for the DNA compression algorithm you proposed?


Yup, that is LZ compression. The problem is that data looks more like this:


So, the compression is always the same as the input string, which is to say there is no compression. There are several ways to fix this. The best way is to get a better compression algorithm, which is exactly what GZ is. It uses a way of detecting repeats (the BW transform), so:


Can be (essentially) compressed to:


Another one of the mistakes that @EricMH makes is computing the mutual information as:

I_LZ(X) + I_LZ(Y) - I_LZ(X+Y)

This is just an error with LZ compression because it will always be close to zero, except in some boundary cases. LZ can’t pick up on long range repetitions, so it will just fail to compute MI. A revised version that might work in some limited cases is:

I_LZ(X) - I_LZ(X ^ Y)

Here the “^” is the XOR on the bit strings, returning a bit string that is 1 every position the two strings are different. This function would actually be able to compute the mutual information, assuming the two are perfectly aligned. So, for example,

if X and Y are equal to:TAGGCTTAGGCAA
Then X^Y, would be a very easy to compress string.


This is good, making it so that MI(X,Y) is equal to I(X) and I(Y). It is very easy to break this however.

and Y is a rotation: ATAGGCTTAGGCA
then X^Y is the much more difficult to compress:


This means the MI will be computed as I(X) + I(Y) when it should actually be very close to just I(X). To avoid this error, the right way to cover a large range (though not all) of cases is to use GZ compression. The GZ compression detects offsets and a whole range of other changes.

[NOTE: The (DNA xor DNA) to BINARY is not valid. It is just an illustration. A valid approach would convert the ATCG to numbers and use (a-b) mod 4 as the XOR between two DNA strings A and B.]



You are getting free consulting from a computational science professor… and you are upset?


Well, he was just expressing the “fed-upness” of exasperated impatience, after all. Not the same thing as being upset.
Josh’s usually generous and patient responses can be exasperatingly slow, at times. We’ve all felt that on occasion. But, I am truly grateful for the consistent approach he brings to bear in the process. Even when I disagree with him! : )


It does appear ungrateful of me. However, my time here does not feel like time well spent, to me. The reason I am engaging on this forum is @swamidass insists ID theory is fundamentally flawed. This is a very interesting claim to me because I have spent 1/3 of my life researching the issue in my free time and during a PhD degree. After multiple weeks of exchange, it is still not clear what the fundamental flaw is. I’ve shown in each case his reasoning is invalid, at least to my own satisfaction, if not to anyone else’s. I would like ID to be shown false, if it can, but I don’t want to accept a bad argument or proof.


At its core, the complaint against ID is not about the existence of God.

Its about Epistemology.
And sometimes epistemology walks into metaphysics. And metaphysics is (sometimes) whatever you say it is.

Saying all this… what is the “really real” issue?:

Can science detect God’s existence or God’s operation?

Since science would have to have evidence IN FAVOR of science having such powers… you are at the distinct disadvantage of not having such evidence.

So… how do you expect to prevail on what could be a metaphysical contention?

This is incorrect. It does (almost) always return zero if the strings are (pseudo) randomly generated, but that is not a flaw. However, if the strings are highly compressible, then the MI value is positive, such as in my 3rd experiment.

I am not a fan of your implementation because you use ASCII encoding for “0” and “1”, which will artificially add compressibility due to all the extra zeros in the ASCII codes, and thus does not faithfully evaluate the experiments dealing with random bitstrings.

Also, your control experiment is in error since it is meant to measure the MI(Y;Y), but instead measures MI(X;Y). My algorithm will give MI(Y;Y) > 0 on long enough sequences, but not on every sequence due to LZ taking awhile to get warmed up.

ID can’t be shown false because is not science. ID is creationism (religion) disguised as science. ID can’t be falsified as neither can religion.

I’m not dealing with metaphysics. It’s all well established mathematical theory in the fields of information theory and algorithmic information theory. None of my arguments deal with the ID theory of Dembski & co. My only claim in that regard is that their work is homogenous with well known theorems in information theory, but I’m not arguing that here, just claiming it.

Ignore the term ID for now. I just want my specific claims to be shown false.

Your mathematical claims are irrelevant as you failed to link them, in a meaningful way, to biological systems.
And you have not added much to Information Theory as most of the Information Theory math was well established before you started looking at it.

1 Like

Where have I claimed to add to information theory? I am arguing based on the implications of well established information theory theorems.

My core claim is that:
A) algorithmic mutual information cannot be generated by randomness + determinism (proven)
B) this maps in a measurable way to the empirical world

@swamidass claims B is false and can be proven or at least demonstrated to a very persuasive degree. I have not yet seen this.

At any rate, I have spent multiple weeks on this and not gotten anywhere significant in regard to B, though @swamidass has convinced me of the need to look at more empirical experiments to demonstrate my argument better to others. Furthermore, the lack of contrary evidence for B has deepened my conviction that B is correct or at the very least worth looking into further.

You haven’t. That to me is time wasted on learning a hard subject like information theory and not doing anything useful with it. Drop the ID bull**** and use your PhD in information theory for something that improves people’s lives like better communication systems, better information systems.