Dembski: Building a Better Definition of Intelligent Design

I think constructive questions make for a good exception. :slight_smile:

An event is this usage is a single potential observation from a discrete probability distribution - a Multinomial distribution in statistical terms and as I described at #95. Dembski sometimes uses an event to represent entire evolutionary histories. I’ve never seen him carefully define what this means, he just tosses out vague statements like “the probability this protein evolved” as if it a number that can be calculated.

can it be (the existence of) a physical object?
The requirement is that you can define a probability distribution for all the events in the set. For " the existence of objects", I don’t even know what that means.

SI is defined for any probability distribution including continuous distributions (like the Normal), AND (I can’t say this enough) it only applies to the probability distribution, not to any single observation from that distribution.

KI can only be applied to discrete probability distributions, AND it can also be applied to a single observed sequence.

I think we should discard the “coin flips” example from above, because sequences are not coin flips. We could define a distribution of sequencesS such that

S = “HTHTHTHTHHTHTHTHTH” with p=0.5, compress to “HT 10 times” (9 characters not counting spaces)
S = “HHHHHHHHHHHHHHHHHHHH” with p=0.25, compress to “H 20 times” (8 characters)
S = “TTTTTTTTTTTTTTTTTTTT” with p=0.25, compress to “T 20 times” (8 characters)

which has an average compressed length of
KI(S) = 0.59 + 0.58 + 0.5*8 = 8.5 characters.

Note: it’s difficult to have a workable example of data compression with short sequences, and I have fudged this example a bit. For any practical purposes S is not compressible to less than 10 characters.

This gets into the meaning of messages. The sender and receiver have to previously agree on how signals will be interpreted. This might be a presumed agreement like “messages with be sent in English”, or the “value of pi is 3.1415926535897932384626 …”.

In Gil’s example the decimal expansion of pi should be used for both E and D, or the abbreviation “pi” should be previously agreed meaning for both.

I mean to fix Gil’s example. Maybe tomorrow …

1 Like

Dembski, however, is not writing for you or for Wolpert. He is writing for the likes of @Giltil. And, as we can see, not without success.

5 Likes

Very good! I didn’t see that.

True and true. I’ve been ignoring that for simplicity, but that should be included in the definition of the probability distribution.

The sequence isn’t necessarily a decimal number. …

Are you trying to make my brain explode! :wink:

I think @Giltil made a good effort in his example, but this is just a hard subject. I’m working on a better example.

This definition would appear to exclude most real-world “design detection” uses of SC, as they tend to both (i) be observations that have already occurred, and (ii) not have a discernible probability distribution attached to them.

Yes, this sort of potential disconnect was exactly why I asked my questions.

Then it would seem reasonable to state that, unless you can demonstrate that a (purported) event follows a specific probability distribution, its Shannon Information value is not well-defined, and so, under Dembski’s (in)equality that @Giltil quoted above, nor is its Specified Complexity.

This would appear to debunk all claims about SC as they apply to evolution.

Yes, but in this context there is no “sender and receiver”, just somebody performing a calculation. Would it be unreasonable to assert that D, and thus |D|, are not well-defined in this context?

Yes, but then “the likes of @Giltil” goes and parrots Dembski’s claims to the likes of us, and wonders why we laugh in his face when he absurdly claims “the guy really masters the concepts”.

1 Like

Yes, there are a whole heap of (potentially dodgy) assumptions that need to be made when applying this framework beyond the most simplistic examples, to the real world, e.g.:

  1. Do we have a complete set of potential outcomes? (Example from coin-toss, is there a non-zero probability that the coin will land on its edge?)

  2. Do we have accurate probabilities for these outcomes? (Do we know if it is a fair coin being tossed?)

  3. Are the outcomes independent? (Do we know that, if the person tossing a coin gets three tails in a row, they don’t toss a two-headed coin for the next toss?)

Correct, unless someone can work out the probability distribution for evolution. Seems like a safe bet.

No sender or receiver, but at a minimum the same rules should be applied to both.

There is a deeper problem here, because D and E are not necessarily defined on the same distribution. More on that later.

Dembski completely ignores the “large sample” properties of statistical inference, which might help with some of these problems. His education in statistical theory might be weak in this respect, though I would expect he could pick up on his own. It is notable that Dembski never considers sample sizes large than N=1, ignoring large sample (asymptotic) distributions entirely.

For your point 3, independence isn’t a problem for the reason you describe because each event is a complete sequences, not a sample of randomly generated coin flips. I think Dembski uses coin flip sequences as an example, but that would only apply to that example.

A bit of groundwork to set out.

First, if anyone is wondering, Kolmogorov Information (KI) and Algorithmic Information (AI) are the same thing. Sometimes you see “Solomonov Information” too, as he published a few months earlier (IIRC). “Chaitin Information” is never used. The recent trend is to use AI and avoid any controversy over who should get the credit.

Next is something I was hoping to avoid - how information is actually coded in practical use. There is more than one way to do it, and I’m going to give a very general example which won’t be optimal. For more details see Shannon-Fano Coding.

If you want to compress data, and specifically sequences of (English) text, there are two big things to consider:

  1. the frequency of words (I’m using a list from Kaggle.com)
  2. the length of the sequence to be coded.

I choose the standard English alphabet for this example. We could further translate into Hexadecimal or bits, but that just make the example more difficult. So here is a message:
“the quick brown fox jumped over the lazy dog”

“THE” is the most frequently appearing word in my list, so it’s a candidate word that we might assign a code like this:

#0” → “THE”

The second most frequent word is “OVER”, and I code it as “#2”.

Here “#” is a special character indicating that the next two characters “0” are encoded information. I can code up to 10 words this way, and any other text in our message will not be coded. Here’s a table for my coding scheme:

WORD FREQUENCY RANK (English) CODE
the 292910 1 #1
over 214653 110 #2
quick 238985 934 #3
dog 79609 1047 #4
brown 38407 1268 #5
fox 106413 3222 #6
lazy 164960 9125 #7
jumped 151954 12017 #8

(Now you can copy/paste from Excel to Discourse? Cool!)

Now I can encode my message as:
#1 #3 #5 #6 #8 #2 #1 #7 #4
which is 26 characters (counting spaces), down from 44 characters. The compressed message is 60% the length of the original.

Note these is a lot of "overhead here, with 9 “#” characters. Also, if my message had any words with fewer than 3 characters, then those words could not be compressed.

Not quite done - to calculate the AI I also need to include the algorithm to decode my message, which means the first and last columns in my table and instruction for how to apply it. AI can be broken down into two parts; the coded data AND the decoding instructions.

Counting the decoding instructions, my coded message will be longer than the original, but this won’t be the case in general for communicating longer messages. I think for the purpose of this discussion what really matter is that compression is possible.

This should make it easier to rework Gil’s example, but it is no help for Dembski. Some of you will probably figure it out before I can get to it. :slight_smile:

1 Like

Another detail.

In general, for a coding scheme of a random distribution of sequences of length N, out of 2^N possible sequences:

2^(N-1) are non compressible (50% of sequences cannot be coded any shorter than the original sequence itself),

2^(N-2) are compressible by 1 bit (25% of all),

2^(N-3) are compressible by 2 bits (12.5%),

… on down to …

1 sequence is compressible is by N-1 bits (to 1 bit).

This doesn’t count the “overhead” for the code indicator.

I avoided this in my example above where all codes require 2 character, but in a more complicated example I could have shorter codes for more frequent words, gaining efficiency. In practice “THE” might require fewer bits than “JUMPED”.

There is a strong limit on compressing sequences, but you can gain efficiency using algorithms to compress many sequences.

2 Likes

Here is @Giltil’s example (@93) reworked. I’m going to put this up without further comment until people can look at it help me fix any mistakes. :slight_smile:

According to Dembski:
SC (E ) = I (E) – K (E ) ≥ I (E) – |D |
With
E, the event in question
SC: specified complexity
I: Shannon information
K: Kolmogorov Information
D: description of E
|D|: the number of bits making up D

Here the event (E) in question is event a random sequence generated as a random number with 150 digits of precision from a discrete uniform probability distribution. This sequence of numbers below, represented as a decimal S:

S=0.3702698541 3702698541 3702698541 3702698541 3702698541 0.3702698541 3702698541 3702698541 3702698541 3702698541 0.3702698541 3702698541 3702698541 3702698541 3702698541 (spaces for readability)

and the Event E and probability is defined as:
P[S=0.3702698541 … ] = 10^-150
and 10^-150 = 2^-498.2892142, or I(E) = 499 bits (rounded up) in Dembski’s notation.

Note: For any discrete uniform distribution I(E) simplifies to 1/p, where p is the probability of any given event. For any other distribution I would need to use the formula to calculate the average -log(p_i) for all events.

The number S comes from repeating a randomly generated sequence of digits (“3702698541”) 15 times. It is not irrational or easily described as a fraction, but there is an simple way to compress it. This sequence can be described with the following pseudocode algorithm D such that:

K = 0
FOR i = 0 TO 140 BY 10
  K = K + 0.3702698541 * 10^-i
NEXT i
(See comment @115 for a detailed step-thru)

Accordingly, |D| is 44 characters. Assuming ASCii code characters require 8 bits each, and 44 \times 8 = 352 \ bits (ignoring spaces).

So SC(E) ≥ 499 - 352 = 147 bits

And Dembski’s conclusion would be …

This is a high level of SC. Therefore, chance can confidently be eliminated to explain E.

(Anyone see bugs here?)

4 Likes

But the “agreement” and the “rules” would depend on the extent of common knowledge shared between the sender and receiver (e.g. knowledge of the existence of pi or, lacking that, of basic geometry), and so would be dependent on the specific sender and receiver. Lacking a specific sender and receiver, there is no specifiable basis for agreement, so it would seem that the amount of knowledge that the description can assume is undefined.

1 Like

The obvious issue is that SC(E) is defined by an inequality and therefore cannot have a value other than “True” or “False”.

Aside from that the inequality is written in a rather odd and apparently obfuscatory way and I(E) can be cancelled, removing it from both sides.

If I am correct in my rewriting it comes out as:

Assuming that an algorithm to produce E is consider an adequate description (and why not?) |D | can never be greater than K (E ) (it is not stated that |D | must be the minimum length but it would be absurd if arbitrarily long descriptions were allowed).

SC (E ) would in that case mean that there is no description of E shorter than the minimum length algorithm to produce E.

I note that this is seems to be at odds with Eric Holloway’s assertion that shorter length means more ASC - perhaps the inequality is inverted (although it is also at odds with Holloway’s assertion that ASC is a quantity and Holloway’s idea of a description makes no sense to me at all)

I don’t understand your pseudocode here. Maybe you can elaborate ?

As for me, I would describe event E as follow:
D: Repeat X 15 times, with X= 3702698541

Now, not sure to know how to compute |D|.

You can encode it as #1#3#5#6#8#2#1#7#4” with the rule that there will always be a space between two #ed words.

I don’t follow, shouldn’t this be
|D | K (E ) = SC* (E ) ?

You point out two things I need to elaborate. The first is how to calculate the SI, which is trivial for the discrete uniform distribution, but I should show the step. I should be more explicit in how I calculate KI too.

Presumably D must be a minimum length description …

Under Kolmogorov Information Theory, we can never know if we truly have the minimum length description, because we cannot rule out the existence of some other more efficient coding scheme (skipping the bit about Universal Turing Machines). This is the reason for the inequality. The best we can do is the shortest algorithm we can find, knowing there could still be a shorter one.

Sure. I’ll trace through few steps.
First statement
K = 0
then we enter the FOR loop
i=0
K = K + 0.3702698541 * 10^-0 = K + 0.3702698541 * 1
result: K = 0.3702698541
Next step
i=10
K = K + 0.3702698541 * 10^-10 = K + 0.00000000003702698541
result: K = 0.37026985413702698541
Next step
i=20
K = K + 0.3702698541 * 10^-20 = K + 0.000000000000000000003702698541
result: K = 0.370269854137026985413702698541
i=30

and we keep going until we get to 150 digits.

I would describe event E as follow:
D: Repeat X 15 times, with X= 3702698541

Which is a few characters shorter than mine, shaving off ~24 bits! :slight_smile:
We could argue about who has the better compression scheme, but that isn’t necessary. My goal is to illustrate how it is done. We have a short example where terms are well enough defined so that compression is possible, which is what I need for discussion.

Don’t worry, no one knows how to calculate KI. The best we can do is an approximate upper bound. See my response to Paul just above.

An example of approximating KI is the size of a self-extracting ZIP file, and compare this to the original file size. (It has to be a self-extracting ZIP file, because this contains the coded data AND the algorithm to recreate the original.)

If the original was (for instance) 2 MB and the seZIP is 1 MB, then we would estimate
KI(original) 1 MB

You might note that for small files the (size of the) self-extracting file can be larger than the original. This is because of the “overhead” information needed to encode/decode the data. In larger files the compression algorithm can take advantages of more repeated sequences, and we get significant compression.

Another good example are PNG format images (or any “lossless” image compression), compare this to the size of the same image in bitmap format (no compression).
This doesn’t work for JPG image format (“lossy” compression) because it loses information, simplifying the image to gain compression.

You are correct, and this goes very deep. Dembski seems to assume there is meaningful connection between the event and how we (humans) choose to describe that event. I think in his mind “it just is”, and he never defines WHY or HOW of the background knowledge to make it meaningful.

I might take a stab at this after the next bit, but I can’t promise it will make any more sense.

No. I believe what Gil meant was:

SC (E ) = I (E) – K (E )

&

SC (E ) I (E) – |D |

not

SC (E ) = [ I (E) – K (E ) I (E) – |D | ]

2 Likes

This is correct. Sorry if I’m slow! :sweat_smile:

Ah, that explains it. Although the original CSI was binary.

But that still requires explanation. How is I(E) calculated ? It can’t simply assume a uniform random distribution without justification. Why does the difference justify a design inference?

And the description must still completely describe the “event” - so if you are describing a flagellum as an “outboard motor” (sic) you need to include everything that could possibly match that description at least as well as the flagellum you’re talking about.

In the past Dembski (2005) has equivocated, calling it a binary quality and a quantity in the same paper. The current definition seems to be a quantity.

Often it is discrete uniform, other times he might state some other distribution, depending on the conclusion he wishes to reach. Sometimes he only states a probability.

This remains a problem, and we have yet to hear reports of an Evinrude being spotted inside a cell.