# The EricMH Information Argument and Simulation

@patrick, can we deal this this later? I’m putting out a fire, have a stack of work, and want to conclude thsi thread with @EricMH for good.

If you want to help here, help me figure out if he really has a PhD or not. He presented himself as a PhD who has done work on information theory and evolution published in his dissertation. I cannot find a single paper in Google Scholar under his name. He will not say if he is using a pseudonym or not. Figure that out for me, and maybe even @gbrooks9 can help.

I’m just gonna post the final part of this soon.

I tried checking him out yesterday. Not much out there on him. Certainly nothing crossing into evolutionary biology in any way.

1 Like

So, variation, selection and descent with modification in biological systems is different, information wise, depending on whether humans are or aren’t in the process? If I gave you two organisms could you tell which one had traits or functions that were the product of natural evolution vs. one that had been through multiple rounds of human-managed selection?

1 Like

Doing a quick Google search, Eric is not lying. He got his PhD from Baylor in 2017 under Robert Marks (who unsurprisingly is one of the leaders of ID and information theory). His dissertation is titled “What cannot create information, what can, and why it matters”. Unfortunately it is not available online, it seems.

2 Likes

Just as a warning everyone. I’m taking a moment to write a final post here. At that time, I will leave it open for a short time for @EricMH to respond. No one else should. I want to give him a chance to explain.

Great. Thank you for finding this @dga471. Thank you.

@EricMH, why didn’t you just provide this? I was honestly questioning your credentials. I had no idea you had worked with Marks. This makes much more sense. I’m sorry about the confusion there. I was just confused by the links you offered, and my inability to find any references to you. Apparently I need to learn to use Google right.

I’m very sorry for that mix up. Next time when asked, just make the connections. Why so evasive? You were at the heart of the ID information effort, and might even be a friend of @Winston_Ewert, who trained in the same place.

Thank you @dga471 for finding that out.

1 Like

Okay, I believe you. I was honestly confused, so once again, I’m sorry about that detour. You really do have a legitimate PhD, studying under one of the leading lights in the ID movement. What you are putting forward here deserves serious attention. I’m glad we have clarified this.

## The EricMH Information Argument

I pressed you to define your terms and produce a simulation because this statement is not remotely specified enough to understand and analyze.

Nonetheless, how this is formulated, your entire argument depends on this equation: I(X:Y) \geq I(E(X):Y). If this is true, we still have work to do to see if it is relevant to evolution or not. However, if it is false, we are done. There is no further reason to engage in this argument. It has been falsified. Whether or not its true depends on exaclty what you mean by these poorly defined terms.

Perhaps you were abbreviating a well considered argument, and would stand and deliver. Or perhaps not. I cannot tell from that paragraph. I interpolated some of your meaning, guessing what you meant, and antipicpated…

The reason I needed a simulation is to just get a basic idea of what you were even talking about. It is not clear, from how you write, what it is you mean. That is why I asked you a battery of questions.

## The Simulation

You did not really answer my questions. What you did was far more surprising. Your simulation actually falsified your own equation. This at least makes clear what you were thinking. It also shows your equation is wrong in that you have not articulated or understood its limits. If you had, you would never have written the simulation this way.

So took your code and modified it to run 1000 times, and report how often the main experiment, and the control did or did not violate your equation. This is a test of (1) the equation itself and (2) your understanding of the equation. https://repl.it/repls/DarkorangeSomberCopycat

Here are the results from one run:

main experiment equation holds 512 out of 1000 trials
control experiment equation holds 700 out of 1000 trials


First off, you have the main experiment, which is supposed follow your equation. In this, you pick an X and Y, using a particular E. Then you compute I(E(X):Y). You test to see if this is greater or less than I(X:Y). Here is the thing, ~50% of the time it is greater, and ~50% of the time it is less than. That is exactly what we expect for your choice of E, X, and Y. That is the definition of falsifying your primary equation.

What ever you think it means, you are wrong. What is remarkable, however, is that instead of collecting this data and understanding it, you write it off as “stochasticity.” I’m not going to do the work for you, but you’ll find that the deviations from equal are of equal magnitude in each direction. The key issue is that you do not know when this equation applies and when it does not.

I can tell you exactly what you did wrong here. In fact, looking at your code, I knew exactly what would happen before I ran the experiment. I know what I am doing here. That is all beside the point though. Your own simulation demonstrates there is an error in your understanding. That is all we need to know for this conversation.

## The Control Experiment

Second, you have a positive control, where you set Y = X. At no point do you describe what you expect, so we have to infer. This is a degenerate case, and in this case the two sides of the question should be equal.

However, what we find is that about 70% of the time your equation is violated. Why?

The reason why is because you are only computing an approximation of the mutual information that has a tiny positive bias (in this case). It turns out that it is impossible to exactly compute mutual information content. Once again this is exactly what i would have expected in this scenario.

## Why Uncomputability Matters

Remember our exchange on this?

Now you are seeing why it matters. If you can’t keep this straight you will end up totally misunderstanding what your proofs mean. This one of the most important results in information theory, and must be thoroughly grasped to apply it correctly. This is why your control failed worse than your main experiment.

## Other Controls?

This is just one control experiment. Where are the rest? Typically you want to have a large battery of controls to test every edge case, and ensure you can precisely demarcate the domain of applicability of a proof like this. Any surprise indicates lack of understanding. Every surprise sends us back to the drawing board to understand what we missed from first principles. Mathematical proofs in information theory can be very difficult to get right.

They are very subtle ways they can go wrong. The same is true in population genetics, and in many areas of computational biology. That is why simulation is such a fundamental part of the field. We do not trust our intuitions. We do not trust our proofs. We verify them. We understand them. We attempt to falsify them as many ways as possible, before we advance them with any confidence.

## What all This Means

It means you are back to the drawing board. I was not bluffing when I said your argument was not clearly specified. I couldn’t tell what you were saying. Based on some definitions you might be right on some claim or another. On others, not.

I’m concerned, however, that even on having the simulation in front of you that you did not even realize you had made an error.

At this point, it seems we are done @EricMH. You have a lot of work to do if you here. Next time, please take my questions seriously. Make sure you’ve done your home work too. You are obviously a good computer programmer. Test your claims before you make them publicly. This this post really important read (including the linked philosophy entry): The Role of Simulation in Science. Always verify your proofs with simulation, so you can carefully understand what you are really learning.

Good luck, and send my kind regards to Marks and @Winston_Ewert.

2 Likes

Dr Swamidass - That was quite impressive!

2 Likes

3 posts were split to a new topic: Intuition in Biology and Physics

5 posts were split to a new topic: An Information Test For Artificial vs. Natural Selection?

The above statement is incorrect, it is 30% of the time the equation is violated in the control, which follows my prediction that it’d be right more often than wrong.

I must confess that I am still not entirely clear on what your objection is, so below is my best effort to understand.

Is your objection that since we have to approximate algorithmic information, there will be deviations from the theorem when we use calculable approximations? In other words, with my notation, you are saying that at least some of the time I_{LZ}(X:Y) < I_{LZ}(E(X):Y) when I_{LZ}(E:Y) = 0. This objection makes sense to me. You may well be right, and it is something that I would have to analyze further to see what exactly the limits of calculable approximations are.

By itself, the above does not address my argument. So, you also are importing another premise: that it is computable information metrics that are relevant for the whole debate, not the algorithmic information I’ve been using. This may also be true, and something I will have to think upon further.

The whole time I thought you were debating the veracity of the information non growth theorem, which made no sense to me as it is mathematically proven. Now, it seems you do not actually disagree with that point. I guess we really were talking past each other, as one of the commentators pointed out.

At any rate, this is my takeaway from the discussion:

1. Evolution cannot increase algorithmic mutual information. To be clear, this is my argument, and I cannot see how the experimental detour applies. As far as I can tell, no one disagrees with this point.
2. Evolution might increase approximated algorithmic mutual information (AAMI).
3. AAMI might be the only metric that matters, since it is calculable.

I appreciate your many responses, and will keep points #2 and #3 in consideration.

1 Like

A tractable experiment is to see whether human guidance in genetic programming is identifiable.

Yes, I should have provided you references instead of being evasive. However, in my mind, everything I say becomes colored by the association with ID. I’d prefer my arguments to stand or fall on their own.

2 Likes

I see. Well I understand that. Thanks for not holding it against me. I was not trying to dismiss you for being ID. It just seemed like you had misrepresented yourself. You did not, but that is what it seemed like. Thanks for taking it in stride.

Rather than go back to check the code, I’ll just grant that to you. It is really beside the point. That is only relevant to the control experiment you wrote. Besides, 70% accuracy is horrible for this domain.

The prediction is not that it would be “right more often than wrong.” You did not make a prediction. I applied your formula, which would say that the formula would be right 100% of the time. I asked you for a prediction but you gave none. It should be 100% for your reasoning to be valid. This demonstrates you missed something critically important.

The bigger problem is that your main simulation was right only 50% of the time. That means that 50% of the time evolution will improve mutation information (as you have defined it). Those are very good odds. It means your application of this equation does not take into account key caveats in the original proof, or the original proof is wrong. I’m not going to untangle which one it is right here. If the original proof was valid, than you’ve missed key assumptions. What ever the case, you argument ends not being valid.

To make your case, you have many more experiments to demonstrate every single step in your argument is correct. There are several controls that have to be run. There are several validations. You are no where near making the case at this point.

If that was a correct proof, you are certainly misapplying it. I’m not importing another premise. I’m just applying the basic findings of information theory. You cannot treat theoretical compressibility as if it is emperical compressibility. These things do not behave same way.

You know this, because you are not getting the results you should. That means you did not know what that formula really means. That is your real problem.

I was not debating you. I was trying to make sense of what you were saying. You were not being clear. I gave you a list of things I needed in the simulation. You did not provided it. I gave you a list of question, you did not answer them all. I was not messing with you. The problem was that you were not being clear. I was just trying to make sense of what you were saying.

Depending on what you mean, it is possible I agree. Once again, this is not clear However, in the sense it is true, it is irrelevant to evolution. There is no way to map this proof to evolution because algorithmic mutual information is impossible to measure, and is not related to function.

Another thing that it appears you miss is that “intelligence” is not exempt from the rules of information theory. It is not a magical exemption. There is nothing in information theory that says intelligence allows to transcend the limits. That ends up being a key control you have to put into your experiments. That, also, is where it will be easiest to invalidate most the arguments you put forward after the next step.

That is easy to answer. It is demonstrable that evolution inevitably increases AAMI to arbitrarily high levels. That, also, is the only metric that matters in this context because it is calculable.

Thanks. I appreciate this exchange. @EricMH, now that I know who you are, you can try and engage again in the future. Peace.

2 Likes

My claim is intelligence is a halting oracle. Since a halting oracle can calculate the Kolmogorov complexity of any bitstring, then the algorithmic information limits do not apply. For example, with a halting oracle we can have a program p: “print the lexicographically first bitstring that has Kolmogorov complexity N” where N > \ell(p). Thus, the halting oracle violates information information conservation.

Consequently, with a halting oracle in the loop, the process can generate greater algorithmic mutual information than was in the initial state. Conversely, if a process creates more algorithmic mutual information than it initially had, then this means a halting oracle is in the loop.

How this translates into something we can measure is left to be worked out. But, this is a clear alternative to evolution that can explain the origin of algorithmic mutual information, while evolution cannot do so. And, it is a straightforward way to make sense of the whole ID argument with well established information theory concepts.

This is why I still do not understand why people like yourself find ID arguments to be incoherent or controversial, and ultimately why I didn’t find our discussion especially enlightening. It ultimately seemed to come down to disagreement over semantics and technicalities. I have yet, in the past decade and a half, to see a substantial counter to ID theory. To me, ID seems to be a paradigm shift, and the status quo does not like the boat being rocked.

Well this is the world I live in as a computational biologist solving pragmatic problems. I can’t 'leave" that till later. That is the crux of the issue.

It is precisely the issue above.

Once again, there is a gap between theory and practical measurement. What you have not done yet is to show that Komolgorov complexity is even important for evolution. You’ve also missed that random noise is incompressible. I could go on, but there are several places where the “mapping” between information theory and practice are just not correct.

If you like, I can show you more examples some day from Marks and Dembski. I think they mean well, and probably most readers struggle to keep up with them. There is real errors in their work. I’ve shown some of their work cold, to PhD students of mine, and they can quickly hone in on the mistakes. Usually there is precise failure cases that are straightforward to construct (control experiments). It is for this reason that I am not convinced by the arguments. No one in computational biology is engaging their work because it just not useful for understanding biological data.

Perhaps get outside the ID bubble.

Encourage Marks or Dembski engage with me if you like. I’m happy to deal with them directly on their papers. I have no animosity towards them.

1 Like

I’ve yet to see any substantial counter from you. I make a clear theoretical argument. You dismiss it because of practical implementation issues. That is a non sequitur.

It appears my argument is invalid in this forum because I do not have a research job where I can devote loads of time and students to fulfilling someone’s needs to empirically validate math. The little amount of time I can devote to research between working a full time non academic job and family is better spent thinking through the theory, and only reducing to practice where it actually is relevant, so I probably will not be back.

1 Like

That’s fine.

Stick around and talk about other things then. That might be more fun. Do you have an answer for A Science Fiction Riddle?