Eric Holloway: Algorithmic Specified Complexity

What you have here would be known as observed ASC, since conditional Kolmogorov complexity cannot be calculated. But, it seems a valid implementation of observed ASC, so I agree to it.

Quick background on ASC for those new to the subject:

Algorithmic Specified Information (ASC) is defined as ASC(X, P, C), where X is the event being analyzed, P is the probability distribution that we think generated X, and C is a context for describing X that is independent from P. It is calculated as:

ASC(X, P, C) = -\log_2(P(X)) - K(X|C)

where P(X) is the probability of P generating X, and K(X|C) is the conditional Kolmogorov complexity based on a prefix Turing machine, so K(X|C) obeys the Kraft inequality:

\sum_X 2^{K(X|C)} \leq 1.

Because K(X|C) follows the Kraft inequality, the probability of generating B bits of ASC from P is less than 2^{-B}.

Kolmogorov complexity is uncomputable, so in an empirical setting we have to estimate an upper bound using compression. We denote the compression of X given C as L(X|C). When we use a compression algorithm to estimate K(X|C), the resulting metric is known as Observed ASC (OASC):

OASC(X, P, C) = -\log_2 (P(X)) - L(X|C).

And since L(X|C) \geq K(X|C), and thus also obeys the Kraft inequality, then the probability of generating B bits of OASC from P is also less than 2^{-B}.

1 Like

An explanation of @swamidass implementation of OASC(X, P, C):

X is a passage of text written in English of length \ell(X).

P is derived from the frequency of the characters in the text. If ‘A’ occurs twice in a text 10 characters long, then its frequency F_C(\text{'A'}) = 2 and probability P_C(\text{'A'})=\frac{1}{5}, where P_C is the probability of that character.

C is the alphabet used to compose the text, and we’ll denote the entropy per letter in the text as:

H_C(X) = \sum_{c \in C} P_C(c)\log_2(P_C(c)).

The first term -\log_2(P(X)) in OASC is calculated by:

-\log_2(P(X)) = \ell(X) H_C(X) = \ell(X) \sum_{c \in C} P_C(c)\log_2(P_C(c)) = \sum_{c\in C} F_C(c)\log_2(P_C(c)).

The second term L(X|C) is computed by compressing X and C, and then subtracting L(X)-L(C), which seems to be based on the assumption that L(X|C) = L(X,C)-L(C) = L(X)-L(C).

The assumption that L(X,C)-L(C) = L(X)-L(C) may not be quite correct, since the naive interpretation of L(X,C) is that X and C are concatenated, so the compression L(X,C) > L(X), and thus L(X,C)-L(C) > L(X)-L(C). Consequently, L(X)-L(C) could potentially underestimate K(X|C), leading to readings higher than ASC(X, P, C), but probably not by much, so I don’t think it matters. But, if we wanted to be completely rigorous, the algorithm should compute L(X,C)-L(C), since perhaps one could use a really large alphabet and a text with few repetitions to make the probability of B bits of OASC greater than 2^{-B}.

There also may be ways to play with how the compression algorithm works so L(X|C) = L(X,C)-L(C) is not quite correct, since it assumes L(X,C) = L(X) + L(C), and it certainly seems possible that L(X,C) < L(X) + L(C). Normally, C would be derived independently of whatever X we are looking at, instead of derived from X. However, in this case C is extracted straight from X. A more neutral option is to discard L(C) altogether, and just use L(X), since L(X) \geq K(X) \geq K(X|C) (unless L happens to include X as a special case).

One final thing to keep in mind, if we use this algorithm to look at a new X’, we have to keep constant the P derived from the original X. So, -\log_2(P(X')) = \ell(X')H_C(X).

But minor caveats aside, per this algorithm’s implementation of OASC:

OASC(X, P, C) = \sum_{c\in C} F_C(c)\log_2(P_C(c)) - L(X) + L(C).

Thus, @swamidass algorithm nearly fits the definition of OASC I gave above.

1 Like

@EricMH, the paper by marks does not include any reference to “observed ASC” versus “ASC”. Can you elaborate the different you are seeing? Where does Marks make that difference. As I understand it, the claim is that ASC is greater than or equal to the function I implemented. With a better compression algorithm, the ASC estimate might increase.

Perhaps I should clarify that my ASC function implementation is a bound on the true value. It is a lower limit on the true ASC, which will be greater than or equal to the number computed here. Is that the distinction you are getting at here?

Would you like to change the algorithm so it exactly fits the definition?

1 Like

Yes, keeping in mind my caveats, it is a lower bound on ASC.

OASC is defined in @Winston_Ewert’s IEEE paper “Algorithmic Specified Complexity in the Game of Life”.

If we want to be sure beyond a shadow of a doubt your OASC algorithm is technically correct, then for P_X calculated from an initial text X, we calculate the OASC for any new X’ as such:

OASC(X', P_X, C) = \ell(X')H_C(X) - L(X').

1 Like

We will move forward with my implementation, but would you please make an implementation that is beyond a shadow of a doubt correct? You can fork my implementation and edit it as required.

I want to now go over the key claims about ASC as laid out by Marks. Tell me if you think I missed something important.

Marks is correct to point out that intuitive concepts of information do not map to information theory:

While humans operate using an intuitive concept of information, attempts to develop a theory of information have thus far fallen short of the intuitive concept.

This is the problem that he is trying to solve here, by providing a quantitate way of measuring the “information” content that is indicative of design. Marks points out the problem with Dembski’s Complex Specified Information (CSI) criteria for identifying design, there is no objective way to measure CSI. This is the problem that ASC seeks to solve…

One approach to information is specified complexity as expressed by Dembski (Dembski, 1998). Dembski’s concern is that of detecting design, the separation of that which can be explained by chance or necessity from that which is the product of intelligence. In order to infer design, an object must be both complex and specified.

The concept of specification has been criticized for being imprecisely defined and unquantifiable. … The goal of this paper is to present and defend a simple measure of specification that clearly alleviates these concerns. Towards this end, the authors propose to use conditional Kolmogorov complexity to quantify the degree of specification in an object. Conditional Kolmogorov complexity can then be combined with complexity as a measurement of specified complexity. This approach to measuring specified complexity is called algorithmic specified complexity.

So the function I implemented is claimed to be a reliable way of detecting design. A process that produces high ASC should, therefore, be only possible if intelligence is being injected into the process for Marks to be correct.

Do you agree @EricMH?


Yes, it is a variant of mutual information. If P(X) = 2^{-K(X)}, and we use the maximally compressed form of C (denoted by C^*), then ASC(X,P,C) = K(X) - K(X|C^*) = I(X:C), where I(X:C) is algorithmic mutual information. Per our last discussion, algorithmic mutual information cannot be generated by random and computable processes any better than by flipping a coin. Hence my claim that naturalism cannot create information that fits our intuitive notion of information.

If you properly formulate P based on all possible chance and necessity causes, then yes, X with high ASC would give high probability something other than chance and necessity created X.

In my implementation, is P properly formulated?

Not if you are trying to account for all possible chance and necessity causes. In your case, high ASC just means the text was not generated by sampling characters according to frequency distribution. This leaves open many other probabilistic and algorithmic means of generating text, such as with Markov chains.

1 Like

So this gets a key claim that Marks makes, and I want to know if you agree with him:

It is not possible to calculate the Kolmogorov complexity of an object. However, it is possible to upper-bound the Kolmogorov complexity and thus lower-bound the algorithmic specified complexity. This means that something can be determined to be at least this specified, although the possibility that it is even more specified cannot be ruled out.

According to Marks, the consequences of using the wrong P is that ASC will be artificially low, and never artificially high. My ASC implementation is supposed to be a hard lower bound on CSI. Do you agree? If not, where did Marks go wrong?

After answering this question, I need you to choose either:

  1. If ASC (and my implementation) can never overestimate CSI, then we will keep my implementation and move to the next part.

  2. If you think we must carefully chose our implementation of ASC (the P) or we will overestimate CSI, that leaves you with two tasks: (1) explain were Marks when wrong and (2) give us the correct implementation by which we can compute ASC on biological sequences (DNA and/or protein sequences).

It seems a key point Marks is making is that even if we have the wrong compression algorithm, we are guaranteed to be under-estimating CSI with the ASC measure. I think you agree, as this appears to be your idea. If not, that would be really interesting. Curious which choice you will take…

I’m not seeing where you are reading this. In the passage you quote he is discussing the specification term K(X|C), not the complexity term -\log_2 P(X).

Yes, this is correct. If C is kept constant, then L(X|C) is always going to over estimate the conditional Kolmogorov complexity if it follows the Kraft inequality, and thus underestimate the ASC.

1 Like

Great…you chose option one, right?

This means that the implementation I’ve given here for ASC is a lower bound on CSI. So tell me if I have the logical syllogism correct then. I’m using ASC in reference to OASC (my implementation), to be consistent with my code and Mark’s paper.

  1. Postulate: nothing else but intelligent design can produce objects with high CSI.
  2. Ergo, high amounts of CSI are a unique signature of intelligence,
  3. For a biological sequence X, you have demonstrated ASC(X) \leq CSI(X).
  4. Some biological sequences in nature have high ASC; ergo, they have high CSI.
  5. Ergo, these sequences were produced by an intelligent designer.

Is this the argument that Marks, Dembski and you are all making? How would you refine or tighten this syllogism? Or is it correct as written?

1 Like

This is an important point I suppose. I mean to say that ASC will always be a lower bound of CSI, and it seems you agree. It will never be more than the true CSI. This is what you have proven about my implementation of ASC.

It is a false dilemma since it is premised on the false claim that “the wrong choice of P can only underestimate ASC.” I’m not sure why you think the wrong P can only underestimate ASC, as that is not stated in the passage you quote.

It is easy to see your claim is incorrect. For example, I could pick a P that gives P(X)=0 for all events that occur, and P(X) > 0 for events that never occur. In which case, all events that occur will have ASC=\infty.

The assumption that “wrong choice of P can only underestimate ASC” also does not apply to your implementation of ASC. I can easily construct an example where letter frequency gives the wrong probability for a sequence, e.g. “ABABABABABABABABABAB” will have -\log_2 P(X)=20 in your scheme by sampling uniformly from \{\text{"A"},\text{"B"}\}. On the other hand, the sequence could have been generated by the program print “AB” * randint(10,137), which gives -\log_2P(X)=-\log_2 \frac{1}{128} = 7. In which case your choice of P highly over estimates the ASC.

1 Like

So this is interesting. Just to get some definitions straight here, we need to stick to the nomenclature definitions in the Marks paper.

I presented a choice between two options:

  1. ASC (or OASC) can is guaranteed to be less than CSI.
  2. ASC implemented with the wrote P can be greater than CSI.

You respond that:

This means, it appears that:

  1. I presented a valid implementation of ASC (which is later called OASC).
  2. Marks argues that the implementation is guaranteed to produce a number greater than CSI, and that this is an objective metric.
  3. Now you are saying that if I did not correctly choose P, ASC could be higher than CSI.

At face value, #2 and #3 are in contradiction. How can both be true? If a poor choice of P would increase the ASC above CSI, then how do we know if we have the right P or not? It seems that, instead, the claim is of you and Marks is:

ASC (i.e. OASC) is guaranteed to be less than CSI, provided the implementation uses the correct P. If the wrong P is used, then ASC might be higher than CSI.

Do you agree?

If You Agree…

If you do agree, for ASC to be confident lower-bound bound on CSI there must be an unambiguous way of determining the correct P. There must be an ambiguous way of determine P, or there is always the possibility that a better choice of P will reduce ASC, demonstrating that ASC was greater than CSI.

This becomes particularly challenging in cases where we do not know how a sequences was generated. So, then, this brings us to a fundamental and central set of question:

  1. What is the correct P for biological sequences such that we can be certain ASC < CSI?
  2. If you cannot produce an guaranteed to be correct implementation of P for biological sequences, how do you know that ASC < CSI?
  3. If you can produce produce an implementation of P, how do you know if it is correct?
  4. If you can’t produce an implementation or P or even determine if a given P is correct, how is ASC (OASC in all these references) guaranteed to be less than CSI?

If you have a general method of determining what the correct P is, I’d like to put that to the test. I can produce a a few sets of sequences, and you determine which set has the highest CSI. If you have an objective strategy, without knowing how I generate these sequences (till the end), you should be able to determine which one has the highest CSI. I do not think this is possible. In fact, I think it can prove its impossible, but we can see if you can prove me wrong.

If you can’t determine which sequence has the highest CSI, that means Marks is incorrect, and ASC is not an objective way of measuring/bounding CSI.

If You Disagree…

How are you certain that ASC will always be less than CSI? When looking at biological sequences, It is always possible we just have the wrong P, and the true CSI is much lower.

Because of this unresolvable question, it seems then that Marks is wrong: ASC is not an objective way of measuring/bounding CSI.

I want to signal also that I agree with this too:

Your example, appears to demonstrates this point:

The choice of P is critically important for computing ASC if we are to interpret it as a lower-bound of CSI. Your language here is confusing though. You mean the ASC computed using this choice of P will be much higher than the ASC using the correct P.


Finally, I on some careful though, I determined a general way of implementing P that will produce an ASC (on average) that is equal to or less than zero, no matter what the sequence. In fact, it might always produce an ASC of zero on CSI.

If I can implement such a P, and demonstrate that it produces an ASC of zero on arbitrary sequences (including any you want to present), does this demonstrate that there is no CSI in biological sequences?

1 Like

There is a bit of ambiguity in this discussion. We need to distinguish between the fact of the matter and our understanding of the matter when we talk about P.

There is always a real P_{true} that accurately estimates the probability that chance and necessity produced X. This P will always result in an ASC that has the property \Pr\{ASC(X, P_{true}, C) \geq b\} \leq 2^{-b}.

Then there is the P_{est.} we are talking about, which if we get it wrong could overestimate ASC. The key is we need to pick a P_{est.} such that it never under estimates the probability for all X. Mathematically the criterion for a valid P_{est.} is:

valid(P_{est.}) := \forall X, P_{est.}(X) \geq P_{true}(X)

As long as this criterion is met, then OASC is guaranteed to be a lower bound. An example of a universal P_{est.}
is P_{est.}(X) = 2^{-K(X)}, based on the assumption that all physical processes can be completely characterized by some combination of Turing computation and bits generated by a Bernoulli(\frac{1}{2}) distribution.

So, in regard to your challenge, it is certainly interesting and I would like to participate, but my failure will not indicate that Marks et. al. are wrong about ASC. The key property of ASC is mathematically proven, and so empirical counter evidence can only mean we have erred in our construction of OASC, possibly in our selection of P_{est.}.

The more interesting claim you make is that it is provably impossible to craft a usable valid(P_{est.}). If true, this will not disprove ASC, but it will show it is not useful in a practical setting. I would like to see your proof.

1 Like

The Point: ASC Not Practically Useful

There is no ambiguity in my mind here, though you seem to be getting my point. The claims of Marks is entirely about what can be computed from data. He makes the claim that ASC is good way of empirically measuring CSI. That is the whole purpose of the study. In this context we do not know the true P, but can only guess it. If the result is dependent on choice of P, then his entire claim is false. Essentially, as you put it:

Which is to say that ASC is not useful in a practical setting, contra his claims. That is precise challenge I am making. If ASC is not practically useful, we still have no way of measuring CSI (as Marks admits in the paper). This leaves CSI as very poorly defined concept without any way of engaging real data till his problem is solved. I know there are other attempts to solve this problem, but they all have analogous defeaters.

Valid Choice of P

Your criteria for valid estimated P is interesting.

We can prove criterial is only satisfied if and only if \forall X, P_{ext}(X) = P_{true}(X). Remember, that P is normalized, so that the sum across all X equals 1. So too much density for one outcome, has to be compensated for with too little density at some other outcome, which would violate your inequality. Therefore, by your criteria, the estimated P is only valid if and only if it exactly equals the true P. I certainly agree.

The criteria you’ve put forward makes my point. The inequality ASC < CSI only applies if you know exactly what the true P is, but we do not know the true P for any biological sequence. ASC is, therefore, practically useless. Remember, the only valid P is the true P, so we have no way of constructing a valid ASC.

That was not my claim at first. Though it appears to be true. If finding a valid estimated P is equivalent to finding the true P. That is impossible. Finding the true P is equivalent to finding the ideal compression, which we know is uncomputable. There is a simple algorithm for transforming the true P into the ideal compression. So if we can determine the true P, then compressibility is computable, However, we already know that leads to contradiction: compressibility is not computable. Ergo, determining the true P of, for example, DNA is provably impossible.

What is at Stake

It seems we are tracking very closely with the logic of the paper. You’d have to show where you deviated from the logic of the paper. Or perhaps, where they went wrong. This is, after all, your idea that they have developed.

That does disprove the claim that ASC is a valid way of measuring CSI. This the whole point of the Marks paper on ASC, that he has found a practical way of measuring ASC, and therefore a practical way of detecting intelligence. If this turns out to be false, the entire point of the paper is overturned.

The Interesting Claim

That was not my earlier claim.

Rather, my earlier claim is that I can construct a sensible P which will almost always yield an ASC of zero (if not always). I can both empirically demonstrate this and prove it too. If that is true, one of two things is also true:

  1. Either ASC is not useful in a practical setting for measuring CSI, because it more determined by our choice of P than any signal in the data.

  2. Or CSI is precisely zero for all objects encodable as strings (i.e. everything), and we therefore find no evidence for intelligent design using CSI.

We will go further, but will be interesting to see which path you will take on this. Either ASC is useless practically for measuring CSI, leaving CSI as a signature of design but unmeasurable; or ASC is correct and useful practically, but it demonstrates that ASC does not empirically find CSI in anything. I’m not honestly sure which one is a better conclusion for ID. Both are going to be difficult to work through.


There are additional proofs I can make.

  1. If we know the true P, we can use this to construct an ideal compression algorithm. I can produce the construction algorithm to transform any P into a compression algorithm.

  2. We know that determining the ideal compression algorithm for arbitrary data is impossible (compressibility is uncomputable), therefore it is impossible to know the true P of arbitrary data. Of note, is impossible to know the true P of biological sequences.

  3. The ASC requires a P and a compression algorithm to implement. If we choose the true P and the ideal compression algorithm constructed from the true P, then ASC measured on all sequences would be exactly zero in all cases. So the true ASC appears to be zero for all sequences.

I can make these proofs in a fairly straight forward way. They are not difficult proofs (in fact #2 is trivially true if #1 is true). Moreover, they are consistent with the proof that ASC less than or equal to CSI, because zero would be less than or equal to CSI (if it is a real entity).

  1. Note, moreover, that IF we use the true P and the ideal compression algorithm to implement ASC, it seems I can prove that ASC = CSI. So this might demonstrate that CSI is merely an artifact of though, and measured as zero in all objects, even if the empirical/observed ASC is greater than zero. All the CSI being measured by observe ASC is just an artifact of using the wrong compression algorithm and the wrong P. I think this maybe difficult to establish because CSI is such a poorly defined concept. I suspect the most likely response would be to abandon ASC entirely so as to protect CSI.

This seems to be complementary proof to the point that ASC is not practically useful. This point, it seems, demonstrates that the idealized ASC of any sequence is zero, and therefore not a meaningful quantity. If the proof that ASC corresponds with CSI is valid, it might even demonstrate that CSI is always zero.


The Puzzle Rules

If you can’t solve this puzzle with an objective metric, it means you can’t objectively determine the CSI of objects. I do not think this is a solvable puzzle, but because I generated the sequences I know the right answer. For each of the following sequences, I know the correct CSI. I can give a few hints, and a promise:

  1. Each sequence has a different CSI, either 5, 10, 100, 150 or 200 bits of CSI to be precise.

  2. Each sequence was constructed with about five lines of python code.

  3. I’m saving the code used to generate this, and the answer key, and can produce it when you are done trying.

Remember, it is not enough to just permute these 5 numbers and try all combinations. You have to produce an implementation of ASC that produces results (1) consistent with these numbers and (2) able to identify which ones have high CSI. Of course, you could submit a function that always emits zero. It would be less than CSI in all cases, but it would be unable to determine which sequences were high or low.

If you cannot solve this puzzle (and I think it is impossible to solve), then it seems we have demonstrated that ASC cannot practically identify high CSI.

The Sequences