Eric Holloway: Algorithmic Specified Complexity

swamidass · September 1, 2018, 8:38pm

The Point: ASC Not Practically Useful

There is no ambiguity in my mind here, though you seem to be getting my point. The claims of Marks is entirely about what can be computed from data. He makes the claim that ASC is good way of empirically measuring CSI. That is the whole purpose of the study. In this context we do not know the true P, but can only guess it. If the result is dependent on choice of P, then his entire claim is false. Essentially, as you put it:

Which is to say that ASC is not useful in a practical setting, contra his claims. That is precise challenge I am making. If ASC is not practically useful, we still have no way of measuring CSI (as Marks admits in the paper). This leaves CSI as very poorly defined concept without any way of engaging real data till his problem is solved. I know there are other attempts to solve this problem, but they all have analogous defeaters.

Valid Choice of P

Your criteria for valid estimated P is interesting.

We can prove criterial is only satisfied if and only if \forall X, P_{ext}(X) = P_{true}(X). Remember, that P is normalized, so that the sum across all X equals 1. So too much density for one outcome, has to be compensated for with too little density at some other outcome, which would violate your inequality. Therefore, by your criteria, the estimated P is only valid if and only if it exactly equals the true P. I certainly agree.

The criteria you’ve put forward makes my point. The inequality ASC < CSI only applies if you know exactly what the true P is, but we do not know the true P for any biological sequence. ASC is, therefore, practically useless. Remember, the only valid P is the true P, so we have no way of constructing a valid ASC.

That was not my claim at first. Though it appears to be true. If finding a valid estimated P is equivalent to finding the true P. That is impossible. Finding the true P is equivalent to finding the ideal compression, which we know is uncomputable. There is a simple algorithm for transforming the true P into the ideal compression. So if we can determine the true P, then compressibility is computable, However, we already know that leads to contradiction: compressibility is not computable. Ergo, determining the true P of, for example, DNA is provably impossible.

What is at Stake

It seems we are tracking very closely with the logic of the paper. You’d have to show where you deviated from the logic of the paper. Or perhaps, where they went wrong. This is, after all, your idea that they have developed.

That does disprove the claim that ASC is a valid way of measuring CSI. This the whole point of the Marks paper on ASC, that he has found a practical way of measuring ASC, and therefore a practical way of detecting intelligence. If this turns out to be false, the entire point of the paper is overturned.

The Interesting Claim

That was not my earlier claim.

Rather, my earlier claim is that I can construct a sensible P which will almost always yield an ASC of zero (if not always). I can both empirically demonstrate this and prove it too. If that is true, one of two things is also true:

Either ASC is not useful in a practical setting for measuring CSI, because it more determined by our choice of P than any signal in the data.
Or CSI is precisely zero for all objects encodable as strings (i.e. everything), and we therefore find no evidence for intelligent design using CSI.

We will go further, but will be interesting to see which path you will take on this. Either ASC is useless practically for measuring CSI, leaving CSI as a signature of design but unmeasurable; or ASC is correct and useful practically, but it demonstrates that ASC does not empirically find CSI in anything. I’m not honestly sure which one is a better conclusion for ID. Both are going to be difficult to work through.

swamidass · September 1, 2018, 9:30pm

There are additional proofs I can make.

If we know the true P, we can use this to construct an ideal compression algorithm. I can produce the construction algorithm to transform any P into a compression algorithm.
We know that determining the ideal compression algorithm for arbitrary data is impossible (compressibility is uncomputable), therefore it is impossible to know the true P of arbitrary data. Of note, is impossible to know the true P of biological sequences.
The ASC requires a P and a compression algorithm to implement. If we choose the true P and the ideal compression algorithm constructed from the true P, then ASC measured on all sequences would be exactly zero in all cases. So the true ASC appears to be zero for all sequences.

I can make these proofs in a fairly straight forward way. They are not difficult proofs (in fact #2 is trivially true if #1 is true). Moreover, they are consistent with the proof that ASC less than or equal to CSI, because zero would be less than or equal to CSI (if it is a real entity).

Note, moreover, that IF we use the true P and the ideal compression algorithm to implement ASC, it seems I can prove that ASC = CSI. So this might demonstrate that CSI is merely an artifact of though, and measured as zero in all objects, even if the empirical/observed ASC is greater than zero. All the CSI being measured by observe ASC is just an artifact of using the wrong compression algorithm and the wrong P. I think this maybe difficult to establish because CSI is such a poorly defined concept. I suspect the most likely response would be to abandon ASC entirely so as to protect CSI.

This seems to be complementary proof to the point that ASC is not practically useful. This point, it seems, demonstrates that the idealized ASC of any sequence is zero, and therefore not a meaningful quantity. If the proof that ASC corresponds with CSI is valid, it might even demonstrate that CSI is always zero.

swamidass · September 2, 2018, 3:11am

The Puzzle Rules

If you can’t solve this puzzle with an objective metric, it means you can’t objectively determine the CSI of objects. I do not think this is a solvable puzzle, but because I generated the sequences I know the right answer. For each of the following sequences, I know the correct CSI. I can give a few hints, and a promise:

Each sequence has a different CSI, either 5, 10, 100, 150 or 200 bits of CSI to be precise.
Each sequence was constructed with about five lines of python code.
I’m saving the code used to generate this, and the answer key, and can produce it when you are done trying.

Remember, it is not enough to just permute these 5 numbers and try all combinations. You have to produce an implementation of ASC that produces results (1) consistent with these numbers and (2) able to identify which ones have high CSI. Of course, you could submit a function that always emits zero. It would be less than CSI in all cases, but it would be unable to determine which sequences were high or low.

If you cannot solve this puzzle (and I think it is impossible to solve), then it seems we have demonstrated that ASC cannot practically identify high CSI.

The Sequences

98d50ad729eb9af79bedc1eb64d555083513057ddfa6e86892fc6360d843e168

ce98c8f3aaddcac18d724c93ccde9b5af87f1967fe4481700d0832c5b003f4a2

451834bb93b6de3e8f87255f14789b728b544304027343000e57eabba0573318

1319730bc751757d7ae1b1f21f81978b9a2ee6d96685f03afc93d257c7d21d66

19385699c280f99b6825aa165c38191b9d7b6f150cf504f25b9934e96b90261d

EricMH · September 2, 2018, 5:15am

If we are just looking at a subset of the possible X, then the estimated P does not need to equal the true P to be valid. In any practical scenario we are looking at a subset of physical phenomena, anyways.

swamidass:

The more interesting claim you make is that it is provably impossible to craft a usable valid(Pest.)valid(P_{est.})

There are additional proofs I can make.

If we know the true P, we can use this to construct an ideal compression algorithm. I can produce the construction algorithm to transform any P into a compression algorithm.

We know that determining the ideal compression algorithm for arbitrary data is impossible (compressibility is uncomputable), therefore it is impossible to know the true P of arbitrary data. Of note, is impossible to know the true P of biological sequences.

The ASC requires a P and a compression algorithm to implement. If we choose the true P and the ideal compression algorithm constructed from the true P, then ASC measured on all sequences would be exactly zero in all cases. So the true ASC appears to be zero for all sequences.

I can make these proofs in a fairly straight forward way. They are not difficult proofs (in fact #2 is trivially true if #1 is true). Moreover, they are consistent with the proof that ASC less than or equal to CSI, because zero would be less than or equal to CSI (if it is a real entity).

Note, moreover, that IF we use the true P and the ideal compression algorithm to implement ASC, it seems I can prove that ASC = CSI. So this might demonstrate that CSI is merely an artifact of though, and measured as zero in all objects, even if the empirical/observed ASC is greater than zero. All the CSI being measured by observe ASC is just an artifact of using the wrong compression algorithm and the wrong P. I think this maybe difficult to establish because CSI is such a poorly defined concept. I suspect the most likely response would be to abandon ASC entirely so as to protect CSI.

This seems to be complementary proof to the point that ASC is not practically useful. This point, it seems, demonstrates that the idealized ASC of any sequence is zero, and therefore not a meaningful quantity. If the proof that ASC corresponds with CSI is valid, it might even demonstrate that CSI is always zero.

This is great. Thanks for listing your proofs.

#1 is correct.

#2 While it is true compressibility is uncomputable, it does not follow that we cannot find a decent lossy compression for some arbitrary data. If we are only looking at a subset, then we can reproduce the signal in the subset, which is equivalent to an overestimate of the probability.

For example, if we are just looking at DNA, we don’t care about the specific atomic composition of the DNA strand, we just care about the very high level base pair coding. From this coding, we could theoretically create a new DNA strand with this coding, but the new DNA strand would not be atomically identical to the original, so there is a tremendous amount of information loss. But, we still think characterizing the DNA strand by the base pair coding is very useful, so that is the signal.

#3 Since P is an abstraction, it does not need to be a particularly good compression. So, it does not follow that ASC is always zero, if we are just comparing the P compression to the ideal compression of the abstraction.

I generally agree with #4 that ASC is a more concrete form of CSI, but still CSI is a more general definition that is mathematically valid. For example, CSI could potentially be greater than ASC if the semiotic agents are halting oracles.

Regarding your puzzle, I’ll go ahead and concede defeat.

This does not follow:

With that, I don’t see there is much more to add. I appreciate you sharing why you do not think ASC is useful. Unfortunately, I do not find the reasons very convincing. I still hold out hope for a refutation of ASC/CSI. ID has consumed over a decade of my life, and it would be great to finally disprove it and move onto something else.

Patrick · September 2, 2018, 7:42am

Why do you hold out hope for ASC? ASC hasn’t shown to be a useful mathematical formulation that accurately describes anything in nature. Also ASC doesn’t seems to add anything to Information Theory either. And certainly not related to evolutionary science. Further, how does ASC have anything to do with ID?

EricMH · September 2, 2018, 11:55am

Those are good questions. Yes, ASC is technically a form of mutual information, so in a sense is not innovative. For that matter I’ve not discovered anything in ID that is innovative. But, that is also the whole point. The information theory aspect of ID is actually not controversial, and as far as I can tell makes no claim that hasn’t been made and proven elsewhere by more eminently qualified people. So, ID theory is settled mathematically. I really do not understand @swamidass contention with ID theory. The only reasonable point I get from our exchanges is there could be implementation issues in measuring mutual information, but so far I’ve not seen any insurmountable problems.

Furthermore, if @swamidass is correct, then ID is the least of his concerns. The whole field he is an expert in, information theory, is predicated on the notion that mathematical concepts like entropy and mutual information are not completely useless in practical settings. For example, the channel capacity is a form of mutual information, and Shannon proved we can create codes that get arbitrarily close to the channel capacity with arbitrarily small error. Our entire modern communication infrastructure is possible because of his insights regarding mutual information. But, if the practicality of mutual information is indeed the issue, then @swamidass needs to aim a lot higher than people like myself and the rest of the ID proponents.

As far as ID, ASC and CSI are concerned, the whole point is that mutual information cannot be generated by natural processes, for a suitably qualified definition of “natural process.” So, if mutual information exists in the genome, or any other aspect of physical reality, including the “natural processes” themselves, then it is put there by something that itself cannot be characterized as a natural process. That seems pretty significant, at least to me. Furthermore, practicality aside, it is also obvious mutual information exists. Any instance of order, such as this post on the message board, is an instance of mutual information.

So, in a nutshell, information theory tells us that because I wrote this post, we can know God exists and humans have immortal, immaterial souls.

swamidass · September 3, 2018, 2:52pm

A post was merged into an existing topic: Side Comments on Algorithmic Specified Complexity

swamidass · September 3, 2018, 3:04pm

It depends what you mean. Information theory is not controversial on its own, and ID has added nothing to it. I agree with you on this.

However, the way ID has applied information theory appears to be in obvious error to those of us using information to do useful things in biology. It seems that the abstract theory is understood, but knowledge how to correctly make use of it in biology is absent. Reviewing the publications of ID information theory leaders, it does not appear any of them has applied information theory to solve a real problem in biology.

Other than arguing for ID, it does not seem there is any practical experience in ID in using information theory to solve biological problems. This might explain why every scientist in mathematical biology that I know of thinks ID information theory is misguided. @EricMH, have you, Marks or anyone else in ID published any information theory grounded papers that solve real problems in biology? For comparison, almost every paper on my CV engages information theory as an engine for discovery. So I know this is certainly possible.

Of course. That is why it has been so important in biology.

The problem is that ID appears to misunderstand how information theory can be used in biology, when do not have full knowledge of the generating process, and the processes is extremely complex.

In a nutshell, that is the crux of the problem. This, to me, is equivalent to claiming that because 1+1=3, we can know God exists and that we have souls. I believe the latter, but dispute the former.

I’ll respond with a final post in a day or so. which should summarize where we stand and bring conclusion to this. @EricMH. Thank you for participating in this exchange. It has been enlightening.

EricMH · September 6, 2018, 2:37pm

No, I do not believe so, and I agree this is a major problem. One could say it is because of persecution of pro-ID researchers in academia, which is certainly part of the issue. But, I think there has been too much focus on polemics and not enough focus on the science part.

My interest is the latter, I truly think ID says something important and unique, and which translates to useful science and technology. So I disagree with you here:

The issue is you seem to repeatedly equate “lack of application” = “theory in error”. I used to think this was true, but now that I realize all the ID theory is well grounded, I think the answer is more complicated.

Part of it is that ID theory has actually already been applied to a great degree, but not recognized as such, and independently from the ID movement. So, the real problem is that ID is too correct, such that it is already applied, and the practically implications are already well understood and considered to be obvious and uninteresting.
The other part is ID can provide new insights, but those are harder to come by, and due to a combination of 1 and a commitment to materialism, progress in these new areas comes much more slowly. However, this is the most interesting area to me, and why Jonathan Bartlett and I published “Naturalism and Its Alternatives in Scientific Methodologies” and why I participate in Dr. Marks’ Mind Matters.

A final point that drives my interest is the ethical implications of ID and the investigation of intelligence. If we reduce to matter, as computer scientists assume, then this is an ethical solvent, humans have no value, and that’s why the scientific revolution has gone hand in hand with horrible atrocities. But, it is not merely a soft hearted matter of trying to save people. The psychopathic ubermensch would consider such motivations as irrelevant.

There is a pragmatic point, which is if humans are reducible to matter, then they can be replaced by machines, and the ubermensch will continue to have all their needs met. So, the ubermensch can wipe out humanity without a problem. However, if ID is correct, then humans can do something machines cannot: create information. In which case, regardless of the ubermensch nihilism and atheism, they’ll need to do their best to preserve humanity because the machines are insufficient for their desires.

So, ID is a trans-ethical argument, where it doesn’t even matter if the polemics convince someone or not, it is like the law of gravity which even a complete nihilist needs to observe if he wants to fly a plane or cross a ravine.

Which results in the somewhat ironic position that societies that reject ID are less fit than those that accept ID, and will become naturally deselected.

But isn’t it clear from what I said how the conclusion follows from the premises? It’s a pretty simple argument:

We can mathematically prove mutual information cannot be generated by determinism + randomness.
All natural processes consist of determinism + randomness.
Therefore, natural processes cannot produce mutual information.
Mutual information exists.
Therefore, something beyond determinism + randomness must exist.

It is a short step from 5 to get to the immaterial soul and the existence of God.

swamidass · September 6, 2018, 4:56pm

2 posts were merged into an existing topic: Side Comments on Algorithmic Specified Complexity

swamidass · September 6, 2018, 9:07pm

First off, @EricMH, thank you for participating in this exchange.

It was illuminating. I’m going to trying and articulate some of our main points of common ground here, and summarize where this stands. I’ll then close the thread. If you would to make any final comments, I will reopen for you. This has been a productive exchange though, and I look forward to the next one.

@EricMH this is an important point of agreement between us. It is refreshing to see this honestly, which I’ve also observed from @Winston_Ewert and @pnelson. Thank you.

I think you are misunderstanding this. I am not criticizing it because it is not useful. I’m finding their to be theoretical errors, that can best explain as a result proofs being offered by people with little practical experience in this area. The real problem is the theoretical errors, which I can demonstrate wrong with simulation more effectively than arguing with symbols. Perhaps I am wrong on this, but the issue is not lack of application, but errors in the theory that I can demonstrate with simulation.

As one example, it appears Marks misunderstood that ASC must always be less than CSI. This turns out to be true if and only if we have a valid P function. As you say here, and then I summarize:

I should emphasize that this standard is not achievable in practice. Not a single ID computation of information in biology uses a valid P by this standard. Remember:

We do not know all natural causes, and we expect to find new causes.
Even for those we do know, we do not know how they interact to produce a P.
For those we do know, most need more information than is within a DNA sequence to accurately compute P.

Notice that all these factors collaborate to increase ASC:

An unaccounted natural cause that orders the data will make P invalid.
Inability to accurately model how different causes interact will make P invalid.
Ignoring critical information for computing P will make P invalid too.

For those three reasons, we can be certain that observed ASC may be wildly higher than CSI. Essentially our ignorance or inability to implement P is the most likely reason for high ASC, not CSI. None of this appears to dampen Marks enthusiasm that ASC is an empirical way of detecting CSI.

I disagree with the entirely. No one has a problem with information theory. The problem is how you apply it. There are major errors in how you are applying the theory to practice.

That might be true, but there is not usually a positive case being made. The ASC argument reduces to: “if we detect a pattern in data that we cannot adequately model, then assume it is intelligence.” That is a strange argument. As I said too, I can demonstrate we can construct a valid ASC that is always zero.

I’m glad to see some positive attempts being made recently by @Winston_Ewert. That is good news. We will treat him fairly. Understanding what is going on in the unexplained part of the data is difficult. It does take time, and it sounds like ID is just now starting to think about things this way. It is too bad it has been undervalued for so long. I find it puzzling also that none of the big names in the movement are engaged in this. It seems that this is not their priority. Why?

Nothing in evolution implies this. I just don’t understand what you are getting at here. Same goes for following paragraphs. This is not sensible. Nothing in science demonstrates that humans are reducible to matter. What exactly are you arguing against?

EricMH:

So, in a nutshell, information theory tells us that because I wrote this post, we can know God exists and humans have immortal, immaterial souls.

In a nutshell, that is the crux of the problem. This, to me, is equivalent to claiming that because 1+1=3, we can know God exists and that we have souls. I believe the latter, but dispute the former.

But isn’t it clear from what I said how the conclusion follows from the premises? It’s a pretty simple argument:

We can mathematically prove mutual information cannot be generated by determinism + randomness.

All natural processes consist of determinism + randomness.

Therefore, natural processes cannot produce mutual information.

Mutual information exists.

Therefore, something beyond determinism + randomness must exist.

It is a short step from 5 to get to the immaterial soul and the existence of God.

Recall, we demonstrated that #1 was false in our last exchange, for exactly the same reason. There is gap between theory and practice.

What I want to do next is show some examples of evolutionary algorithms that can design things without intelligence. They will not smuggle a target sequence into the simulation. They will not cheat. The will however show that a mindless process can produce mutual information. This does not violate the abstract level theory that determinism + randomness cannot increase mutation information. Rather it shows how this abstract proof based on unknowable quantities has not practical importance in the conversation about biology.

That conversation, however, is for another day. I appreciate a great deal your willingness to participate in this conversation. I’m looking forward to the next one.

swamidass · September 8, 2018, 4:39am

@EricMH, go ahead and put your final comment.

EricMH · September 8, 2018, 3:45pm

First, I echo your appreciation. I have found the discussion to be helpful for my own thinking on the matter, and it has been constructive. I also apologize for any strong wording on my part, or unfairness in portraying your argument. If I have been unfair, point it out and I will correct.

Unfortunately, I find your closing arguments to not be so great. On the other hand, letting me post the closing comment is very fair minded of you, and this increases my level of trust, which is great.

First, I previously proved that we do not need an exact P to avoid over estimating ASC,

and then you seem to completely ignore this point:

swamidass:

ASC (i.e. OASC) is guaranteed to be less than CSI, provided the implementation uses the correct P. If the wrong P is used, then ASC might be higher than CSI.

I should emphasize that this standard is not achievable in practice. Not a single ID computation of information in biology uses a valid P by this standard. Remember:

We do not know all natural causes, and we expect to find new causes.

Even for those we do know, we do not know how they interact to produce a P.

For those we do know, most need more information than is within a DNA sequence to accurately compute P.

Notice that all these factors collaborate to increase ASC:

An unaccounted natural cause that orders the data will make P invalid.

Inability to accurately model how different causes interact will make P invalid.

Ignoring critical information for computing P will make P invalid too.

For those three reasons, we can be certain that observed ASC may be wildly higher than CSI. Essentially our ignorance or inability to implement P is the most likely reason for high ASC, not CSI. None of this appears to dampen Marks enthusiasm that ASC is an empirical way of detecting CSI.

The way I propose, of only considering a subset, is also not the only way. As an addition, after just a little thought, there are at least two other criteria I know of that can avoid overestimation:

If we sample according to any P_{est.} , then the expected value of the self information (first) term is \sum_{x\in X} P_{est.}(x)\log_2 \frac{P_{est.}(x)}{P_{true}(x)}, which is the Kullback-Liebler distance, and is always non-negative, mathematically proving we are always expected to underestimate P_{true} with any choice of P_{est.}.
If overestimation is unavoidable, we can correct with an extra term subtracting the possible overestimation.

And even if the probability somehow remains that we overestimate, unless the probability will most likely be greater than 50% and the amount of overestimation is very large, then a very large ASC value is still going to give us true positives.

So, the bottom line is your primary proof, that we must exactly know P_{true} in order to avoid overestimating ASC, does not work. There are at least three reasons it does not work as you initially state, so we need further development of that proof for it to strongly refute the practicality of ASC.

Second, you claim there is a fundamental theoretical issue with ASC,

but then cannot provide a proof in “symbols” and claim that you can only demonstrate it empirically. If there is a theoretical problem with ASC, you need to give some kind of concrete idea what this problem is, even if you cannot symbolically prove it. Hand waving and reference to non-existent simulation is not a counter argument.

Third, you claim the mutual information issue was settled in our last exchange,

which I most certainly do not agree with. Perhaps you misunderstood my apology in that thread as a concession. I should have been clearer. Your argument above is the same kind of fallacy you make regarding ASC that somehow the mathematics is disproven by empirical experiment. I have no idea why you say this sort of thing. It is a category error to claim a mathematical proof can be disproven by empirical experiment. One could, perhaps, disprove a mathematical conjecture with empirical evidence, but the core property of ASC and of mutual information non growth is proven, not a conjecture.

I stand by my entire argument in the mutual information exchange. I’ll only grant the experiment I performed can be specified to greater mathematical rigor, but that is not a crucial point that somehow disproves my whole thesis, so it is disingenuous to act as if it were.

Finally, it isn’t “evolution” that reduces humans to matter,

it is materialism that does so, which claims that everything in existence operates according to the laws of physics. I agree with you that nothing in science demonstrates that humans are reducible to the laws of physics, but it is also true that materialism is the status quo of the scientific establishment, and the status quo of many leaders in our day and age who make life and death decisions. However, it is a highly unfounded philosophical position with no scientific evidence, as you rightly point out, and ID is a great scientific counter to that position.

Topic		Replies	Views
Open Challenge to ID Advocates Conversation Design	12	1612	June 21, 2020
Dembski: Building a Better Definition of Intelligent Design Conversation Design , Atheism , Philosophy , Theology , History	217	3397	September 13, 2024
Great Example of the Appropriate Use of Algorithmic Mutual Information in Biology Conversation Science	34	1222	February 3, 2019
COVID-19 genome and design detection Conversation Design	33	3304	June 9, 2020
What A Darwinian Algorithm Designs Conversation Science , Design	25	1857	April 27, 2021