Eric Holloway: Algorithmic Specified Complexity

@EricMH is a recent PhD graduate from Richard Mark’s group, who is one of the leading lights of the Intelligent Design movement. @EricMH agreed to have a deeper conversation with me on one of his mentor’s key contributions, Algorithmic Specified Complexity (ASC). This is an information theory based metric proposed by Marks ( http://robertmarks.org/REPRINTS/2014_AlgorithmicSpecifiedComplexity.pdf ).

I coded this metric up in a few lines of python. You can look at the code, and even run it yourself here: https://repl.it/@swamidass/AlgorithmicSpecifiedComplexity. The code (minus the imports):

def IID_info(seq):
  counts = [len(list(group)) for token, group in itertools.groupby(sorted(seq))]
  total = float(sum(counts))
  info = [-math.log(c / total, 2) * c for c in counts ]
  return sum(info)

def GZ_size(seq):
  S = StringIO.StringIO()
  F = gzip.GzipFile(mode='wb', fileobj=S)
  F.write(seq); F.close(); S.flush()
  return len(S.getvalue()) * 8

def tokens(seq, shuffle=False):
  L = list(s for s,g in itertools.groupby(sorted(seq)))
  if shuffle: random.shuffle(L)
  return "".join(L)

def GZ_info(seq):
  toks = tokens(seq)
  return GZ_size(seq) - GZ_size(toks)

def ASC(seq, comp_info = GZ_info):
  return max(0, IID_info(seq) - comp_info(seq))

AlgorithmicSpecifiedComplexity = ASC

This code, for example, computes the ASC content of a test block of text at 817 bits.

@EricMH, do you agree this is a correct implementation? If not, please fix it. Once we have agreement here, I’ll move on to the next piece of the story.

4 Likes

3 posts were split to a new topic: Side Comments on Algorithmic Specified Complexity

The final paragraphs of the paper are helpful for understanding their main point. They argue that ASC can objectively identify “specified information” in sequences.

Incalculability

It is not possible to calculate the Kolmogorov complexity of an object. However, it is possible to upper-bound the Kolmogorov complexity and thus lower-bound the algorithmic specified complexity. This means that something can be determined to be at least this specified, although the possibility that it is even more specified cannot be ruled out. Therefore, even though detecting a specification cannot be achieved mechanically, it can be objectively identified when found.

Acknowledgements

The approach of using compressibility as a measurement of specification was suggested to the authors by Eric Holloway. The authors have attempted to extend the approach to apply to many more types of specifications. The authors are grateful for his initial suggestion and answer to our initial objections to the idea

Notice that @EricMH there as a contributor.

1 Like

What you have here would be known as observed ASC, since conditional Kolmogorov complexity cannot be calculated. But, it seems a valid implementation of observed ASC, so I agree to it.

Quick background on ASC for those new to the subject:

Algorithmic Specified Information (ASC) is defined as ASC(X, P, C), where X is the event being analyzed, P is the probability distribution that we think generated X, and C is a context for describing X that is independent from P. It is calculated as:

ASC(X, P, C) = -\log_2(P(X)) - K(X|C)

where P(X) is the probability of P generating X, and K(X|C) is the conditional Kolmogorov complexity based on a prefix Turing machine, so K(X|C) obeys the Kraft inequality:

\sum_X 2^{K(X|C)} \leq 1.

Because K(X|C) follows the Kraft inequality, the probability of generating B bits of ASC from P is less than 2^{-B}.

Kolmogorov complexity is uncomputable, so in an empirical setting we have to estimate an upper bound using compression. We denote the compression of X given C as L(X|C). When we use a compression algorithm to estimate K(X|C), the resulting metric is known as Observed ASC (OASC):

OASC(X, P, C) = -\log_2 (P(X)) - L(X|C).

And since L(X|C) \geq K(X|C), and thus also obeys the Kraft inequality, then the probability of generating B bits of OASC from P is also less than 2^{-B}.

1 Like

An explanation of @swamidass implementation of OASC(X, P, C):

X is a passage of text written in English of length \ell(X).

P is derived from the frequency of the characters in the text. If ‘A’ occurs twice in a text 10 characters long, then its frequency F_C(\text{'A'}) = 2 and probability P_C(\text{'A'})=\frac{1}{5}, where P_C is the probability of that character.

C is the alphabet used to compose the text, and we’ll denote the entropy per letter in the text as:

H_C(X) = \sum_{c \in C} P_C(c)\log_2(P_C(c)).

The first term -\log_2(P(X)) in OASC is calculated by:

-\log_2(P(X)) = \ell(X) H_C(X) = \ell(X) \sum_{c \in C} P_C(c)\log_2(P_C(c)) = \sum_{c\in C} F_C(c)\log_2(P_C(c)).

The second term L(X|C) is computed by compressing X and C, and then subtracting L(X)-L(C), which seems to be based on the assumption that L(X|C) = L(X,C)-L(C) = L(X)-L(C).

The assumption that L(X,C)-L(C) = L(X)-L(C) may not be quite correct, since the naive interpretation of L(X,C) is that X and C are concatenated, so the compression L(X,C) > L(X), and thus L(X,C)-L(C) > L(X)-L(C). Consequently, L(X)-L(C) could potentially underestimate K(X|C), leading to readings higher than ASC(X, P, C), but probably not by much, so I don’t think it matters. But, if we wanted to be completely rigorous, the algorithm should compute L(X,C)-L(C), since perhaps one could use a really large alphabet and a text with few repetitions to make the probability of B bits of OASC greater than 2^{-B}.

There also may be ways to play with how the compression algorithm works so L(X|C) = L(X,C)-L(C) is not quite correct, since it assumes L(X,C) = L(X) + L(C), and it certainly seems possible that L(X,C) < L(X) + L(C). Normally, C would be derived independently of whatever X we are looking at, instead of derived from X. However, in this case C is extracted straight from X. A more neutral option is to discard L(C) altogether, and just use L(X), since L(X) \geq K(X) \geq K(X|C) (unless L happens to include X as a special case).

One final thing to keep in mind, if we use this algorithm to look at a new X’, we have to keep constant the P derived from the original X. So, -\log_2(P(X')) = \ell(X')H_C(X).

But minor caveats aside, per this algorithm’s implementation of OASC:

OASC(X, P, C) = \sum_{c\in C} F_C(c)\log_2(P_C(c)) - L(X) + L(C).

Thus, @swamidass algorithm nearly fits the definition of OASC I gave above.

1 Like

@EricMH, the paper by marks does not include any reference to “observed ASC” versus “ASC”. Can you elaborate the different you are seeing? Where does Marks make that difference. As I understand it, the claim is that ASC is greater than or equal to the function I implemented. With a better compression algorithm, the ASC estimate might increase.

Perhaps I should clarify that my ASC function implementation is a bound on the true value. It is a lower limit on the true ASC, which will be greater than or equal to the number computed here. Is that the distinction you are getting at here?

Would you like to change the algorithm so it exactly fits the definition?

1 Like

Yes, keeping in mind my caveats, it is a lower bound on ASC.

OASC is defined in @Winston_Ewert’s IEEE paper “Algorithmic Specified Complexity in the Game of Life”.

If we want to be sure beyond a shadow of a doubt your OASC algorithm is technically correct, then for P_X calculated from an initial text X, we calculate the OASC for any new X’ as such:

OASC(X', P_X, C) = \ell(X')H_C(X) - L(X').

1 Like

We will move forward with my implementation, but would you please make an implementation that is beyond a shadow of a doubt correct? You can fork my implementation and edit it as required.


I want to now go over the key claims about ASC as laid out by Marks. Tell me if you think I missed something important.

Marks is correct to point out that intuitive concepts of information do not map to information theory:

While humans operate using an intuitive concept of information, attempts to develop a theory of information have thus far fallen short of the intuitive concept.

This is the problem that he is trying to solve here, by providing a quantitate way of measuring the “information” content that is indicative of design. Marks points out the problem with Dembski’s Complex Specified Information (CSI) criteria for identifying design, there is no objective way to measure CSI. This is the problem that ASC seeks to solve…

One approach to information is specified complexity as expressed by Dembski (Dembski, 1998). Dembski’s concern is that of detecting design, the separation of that which can be explained by chance or necessity from that which is the product of intelligence. In order to infer design, an object must be both complex and specified.

The concept of specification has been criticized for being imprecisely defined and unquantifiable. … The goal of this paper is to present and defend a simple measure of specification that clearly alleviates these concerns. Towards this end, the authors propose to use conditional Kolmogorov complexity to quantify the degree of specification in an object. Conditional Kolmogorov complexity can then be combined with complexity as a measurement of specified complexity. This approach to measuring specified complexity is called algorithmic specified complexity.

So the function I implemented is claimed to be a reliable way of detecting design. A process that produces high ASC should, therefore, be only possible if intelligence is being injected into the process for Marks to be correct.

Do you agree @EricMH?

2 Likes

Yes, it is a variant of mutual information. If P(X) = 2^{-K(X)}, and we use the maximally compressed form of C (denoted by C^*), then ASC(X,P,C) = K(X) - K(X|C^*) = I(X:C), where I(X:C) is algorithmic mutual information. Per our last discussion, algorithmic mutual information cannot be generated by random and computable processes any better than by flipping a coin. Hence my claim that naturalism cannot create information that fits our intuitive notion of information.

If you properly formulate P based on all possible chance and necessity causes, then yes, X with high ASC would give high probability something other than chance and necessity created X.

In my implementation, is P properly formulated?

Not if you are trying to account for all possible chance and necessity causes. In your case, high ASC just means the text was not generated by sampling characters according to frequency distribution. This leaves open many other probabilistic and algorithmic means of generating text, such as with Markov chains.

1 Like

So this gets a key claim that Marks makes, and I want to know if you agree with him:

It is not possible to calculate the Kolmogorov complexity of an object. However, it is possible to upper-bound the Kolmogorov complexity and thus lower-bound the algorithmic specified complexity. This means that something can be determined to be at least this specified, although the possibility that it is even more specified cannot be ruled out.

According to Marks, the consequences of using the wrong P is that ASC will be artificially low, and never artificially high. My ASC implementation is supposed to be a hard lower bound on CSI. Do you agree? If not, where did Marks go wrong?

After answering this question, I need you to choose either:

  1. If ASC (and my implementation) can never overestimate CSI, then we will keep my implementation and move to the next part.

  2. If you think we must carefully chose our implementation of ASC (the P) or we will overestimate CSI, that leaves you with two tasks: (1) explain were Marks when wrong and (2) give us the correct implementation by which we can compute ASC on biological sequences (DNA and/or protein sequences).

It seems a key point Marks is making is that even if we have the wrong compression algorithm, we are guaranteed to be under-estimating CSI with the ASC measure. I think you agree, as this appears to be your idea. If not, that would be really interesting. Curious which choice you will take…

I’m not seeing where you are reading this. In the passage you quote he is discussing the specification term K(X|C), not the complexity term -\log_2 P(X).

Yes, this is correct. If C is kept constant, then L(X|C) is always going to over estimate the conditional Kolmogorov complexity if it follows the Kraft inequality, and thus underestimate the ASC.

1 Like

Great…you chose option one, right?

This means that the implementation I’ve given here for ASC is a lower bound on CSI. So tell me if I have the logical syllogism correct then. I’m using ASC in reference to OASC (my implementation), to be consistent with my code and Mark’s paper.

  1. Postulate: nothing else but intelligent design can produce objects with high CSI.
  2. Ergo, high amounts of CSI are a unique signature of intelligence,
  3. For a biological sequence X, you have demonstrated ASC(X) \leq CSI(X).
  4. Some biological sequences in nature have high ASC; ergo, they have high CSI.
  5. Ergo, these sequences were produced by an intelligent designer.

Is this the argument that Marks, Dembski and you are all making? How would you refine or tighten this syllogism? Or is it correct as written?

1 Like

This is an important point I suppose. I mean to say that ASC will always be a lower bound of CSI, and it seems you agree. It will never be more than the true CSI. This is what you have proven about my implementation of ASC.

It is a false dilemma since it is premised on the false claim that “the wrong choice of P can only underestimate ASC.” I’m not sure why you think the wrong P can only underestimate ASC, as that is not stated in the passage you quote.

It is easy to see your claim is incorrect. For example, I could pick a P that gives P(X)=0 for all events that occur, and P(X) > 0 for events that never occur. In which case, all events that occur will have ASC=\infty.

The assumption that “wrong choice of P can only underestimate ASC” also does not apply to your implementation of ASC. I can easily construct an example where letter frequency gives the wrong probability for a sequence, e.g. “ABABABABABABABABABAB” will have -\log_2 P(X)=20 in your scheme by sampling uniformly from \{\text{"A"},\text{"B"}\}. On the other hand, the sequence could have been generated by the program print “AB” * randint(10,137), which gives -\log_2P(X)=-\log_2 \frac{1}{128} = 7. In which case your choice of P highly over estimates the ASC.

1 Like

So this is interesting. Just to get some definitions straight here, we need to stick to the nomenclature definitions in the Marks paper.

I presented a choice between two options:

  1. ASC (or OASC) can is guaranteed to be less than CSI.
  2. ASC implemented with the wrote P can be greater than CSI.

You respond that:

This means, it appears that:

  1. I presented a valid implementation of ASC (which is later called OASC).
  2. Marks argues that the implementation is guaranteed to produce a number greater than CSI, and that this is an objective metric.
  3. Now you are saying that if I did not correctly choose P, ASC could be higher than CSI.

At face value, #2 and #3 are in contradiction. How can both be true? If a poor choice of P would increase the ASC above CSI, then how do we know if we have the right P or not? It seems that, instead, the claim is of you and Marks is:

ASC (i.e. OASC) is guaranteed to be less than CSI, provided the implementation uses the correct P. If the wrong P is used, then ASC might be higher than CSI.

Do you agree?

If You Agree…

If you do agree, for ASC to be confident lower-bound bound on CSI there must be an unambiguous way of determining the correct P. There must be an ambiguous way of determine P, or there is always the possibility that a better choice of P will reduce ASC, demonstrating that ASC was greater than CSI.

This becomes particularly challenging in cases where we do not know how a sequences was generated. So, then, this brings us to a fundamental and central set of question:

  1. What is the correct P for biological sequences such that we can be certain ASC < CSI?
  2. If you cannot produce an guaranteed to be correct implementation of P for biological sequences, how do you know that ASC < CSI?
  3. If you can produce produce an implementation of P, how do you know if it is correct?
  4. If you can’t produce an implementation or P or even determine if a given P is correct, how is ASC (OASC in all these references) guaranteed to be less than CSI?

If you have a general method of determining what the correct P is, I’d like to put that to the test. I can produce a a few sets of sequences, and you determine which set has the highest CSI. If you have an objective strategy, without knowing how I generate these sequences (till the end), you should be able to determine which one has the highest CSI. I do not think this is possible. In fact, I think it can prove its impossible, but we can see if you can prove me wrong.

If you can’t determine which sequence has the highest CSI, that means Marks is incorrect, and ASC is not an objective way of measuring/bounding CSI.

If You Disagree…

How are you certain that ASC will always be less than CSI? When looking at biological sequences, It is always possible we just have the wrong P, and the true CSI is much lower.

Because of this unresolvable question, it seems then that Marks is wrong: ASC is not an objective way of measuring/bounding CSI.

I want to signal also that I agree with this too:

Your example, appears to demonstrates this point:

The choice of P is critically important for computing ASC if we are to interpret it as a lower-bound of CSI. Your language here is confusing though. You mean the ASC computed using this choice of P will be much higher than the ASC using the correct P.

3 Likes

Finally, I on some careful though, I determined a general way of implementing P that will produce an ASC (on average) that is equal to or less than zero, no matter what the sequence. In fact, it might always produce an ASC of zero on CSI.

If I can implement such a P, and demonstrate that it produces an ASC of zero on arbitrary sequences (including any you want to present), does this demonstrate that there is no CSI in biological sequences?

1 Like

There is a bit of ambiguity in this discussion. We need to distinguish between the fact of the matter and our understanding of the matter when we talk about P.

There is always a real P_{true} that accurately estimates the probability that chance and necessity produced X. This P will always result in an ASC that has the property \Pr\{ASC(X, P_{true}, C) \geq b\} \leq 2^{-b}.

Then there is the P_{est.} we are talking about, which if we get it wrong could overestimate ASC. The key is we need to pick a P_{est.} such that it never under estimates the probability for all X. Mathematically the criterion for a valid P_{est.} is:

valid(P_{est.}) := \forall X, P_{est.}(X) \geq P_{true}(X)

As long as this criterion is met, then OASC is guaranteed to be a lower bound. An example of a universal P_{est.}
is P_{est.}(X) = 2^{-K(X)}, based on the assumption that all physical processes can be completely characterized by some combination of Turing computation and bits generated by a Bernoulli(\frac{1}{2}) distribution.

So, in regard to your challenge, it is certainly interesting and I would like to participate, but my failure will not indicate that Marks et. al. are wrong about ASC. The key property of ASC is mathematically proven, and so empirical counter evidence can only mean we have erred in our construction of OASC, possibly in our selection of P_{est.}.

The more interesting claim you make is that it is provably impossible to craft a usable valid(P_{est.}). If true, this will not disprove ASC, but it will show it is not useful in a practical setting. I would like to see your proof.

1 Like