Well, I didn’t. But a more relevant point: How would one use the ID concept of specified complexity, or any other ID concept, to demonstrate that just from the number itself and with no other background info? Perhaps @Giltil could demonstrate.
Yet, the relation seems quite clear.
But it doesn’t explain why this relationship exists - nor the assumptions that underlie the claim.
Depending on the distribution and the content, the longer message might have a lower information content even by Shannon’s measure.
This is true for Kolmogorov complexity, but not for Shannon complexity.
Indeed, but you don’t address the point that I was explicitly talking about algorithmic complexity - and that algorithmic complexity is closer to the ordinary idea of complexity than Shannon information is.
Also I will point out again that you have to make assumptions to get the conclusion you want. If heads and tails are not equiprobable, for instance, the Shannon information of the two sequences will differ.
He [Shannon] converted probabilities to bits, establishing a relationship between probability and information content. For example, an event with probability 1 in 2^100 corresponds to 100 bits of information.
Please provide a citation to Shannon’s work for this claim, as it is not clear that Perplexity AI didn’t simply hallucinate it.
How would one use the ID concept of specified complexity, or any other ID concept, to demonstrate that just from the number itself and with no other background info?
To demonstrate what precisely?
Indeed, but you don’t address the point that I was explicitly talking about algorithmic complexity - and that algorithmic complexity is closer to the ordinary idea of complexity than Shannon information is.
Whether or not AC is closer to the ordinary idea of complexity than SI is irrelevant to this discussion. The point is that under Shannon framework, the two notions of complexity and improbability are tight together.
If heads and tails are not equiprobable, for instance, the Shannon information of the two sequences will differ.
Sure, but that doesn’t change the fact that under the Shannon framework, the more complex sequence will be the less probable one.
Whether or not AC is closer to the ordinary idea of complexity than SI is irrelevant to this discussion. The point is that under Shannon framework, the two notions of complexity and improbability are tight together.
Since the notion of complexity is being discussed it certainly is relevant to this discussion. And since Dembski seems to be moving towards algorithmic complexity with ASC - and since CSI hasn’t exactly worked out as an argument against evolution, largely because it’s impractical to calculate the relevant probabilities - I’m not sure that relationship - if it’s not something the “AI” wrongly put together - is really of any great relevance.
I’d be far more interested in an explanation of how a short English description can possibly indicate high ASC. It shouldn’t indicate high algorithmic complexity.
Sure, but that doesn’t change the fact that under the Shannon framework, the more complex sequence will be the less probable one.
This is actually not a fact but merely a claim backed only by a quote from an unreliable AI model, not by a citation to anything that Shannon himself wrote.
“The AI said it, I believe it, that settles it” is a particularly ludicrous and fallacious Argument from Authority. I’m sure that, if I phrased my question just right, I could get an AI to agree that God doesn’t exist – would you take the AI’s word for that Gilbert?
To demonstrate what precisely?
That it is not just a random string of numbers, but contains specified information.
He could have call it « low probability specified information » instead of « complex specified information » for it is clear that for him, the two expressions are synonymous. Don’t forget that the subtitle of the two editions of « The design Inference » is « eliminating chance through small probabilities »
“One of the saddest lessons of history is this: If we’ve been bamboozled long enough, we tend to reject any evidence of the bamboozle. We’re no longer interested in finding out the truth. The bamboozle has captured us. It’s simply too painful to acknowledge, even to ourselves, that we’ve been taken. Once you give a charlatan power over you, you almost never get it back.”
― Carl Sagan, The Demon-Haunted World: Science as a Candle in the Dark
Ruminating on my skim of Shannon’s 1948 paper, it leads me to the following conclusions.
-
You can take a message, and a sh…ed-load of assumptions that will depend on the specific characteristics of that message, and calculate a probability for it.
-
As different messages, even of the same bit-length, will have different characteristics, therefore different assumptions, and different probabilities, you cannot reverse the process and assign a bit-length to a probability.
-
A probability of 1 in 2bit-length would appear to equate to the specific case where a message is simply static/white-noise.
It is therefore clear that attributing any claim of ‘low-probability therefore complexity’, let alone ‘low-probability therefore design’ to Shannon is just a steaming pile of … ‘humbug’.
That it is not just a random string of numbers, but contains specified information.
According to Dembski:
SC (E ) = I (E) – K (E ) ≥ I (E) – |D |
With
E, the event in question
SC: specified complexity
I: Shannon information
K: Kolmogorov Information
D: description of E
|D|: the number of bits making up D
Here the event in question is the sequence of numbers below:
14159265358979323846264338327950288419716939937510582
This sequence can be described as follow:
D: « First 53 numbers decimal expansion pi ». Accordingly, |D|=147,64 bits.
As for I(E), it is equal to 176 bits
So SC(E) ≥ 176 - 147,6 = 28,4 bits
This is a high level of SC. Therefore, chance can confidently be eliminated to explain E.
According to Dembski:
SC (E ) = I (E) – K (E ) ≥ I (E) – |D |With
E, the event in question
SC: specified complexity
I: Shannon information
K: Kolmogorov Information
D: description of E
|D|: the number of bits making up D
Really?
Isn’t that just SC (E ) = |D | ≥ K (E ) ?
And doesn’t that lead to the question of how to compute the Shannon information? Presumably D must be a minimum length description - and must be taken as a complete description of the event for the purposes of calculating both the Shannon and Kolmogorov complexity.
Also I note that you don’t even bother to calculate the Kolmogorov information for your example or how you calculate either the Shannon information or the length of the description in bits. I can’t have any confidence in that.
Gil is referring to specified complexity and Tim is talking about complexity. Tim cited a definition of complexity which maybe different than specified complexity. This discussion has the chance to bear some fruit if definitions can be mutually understood by all.
A reasonable goal, and you give an example that makes a point I need to be sure everyone understands.
Bill’s scenario:
The probability of the result of 500 coin tosses is 1. [I think Bill means “Sum of probabilities”] The probability of a predicted pattern like heads and tails with never [be] 2 heads or tails ie HTHTHTHTHT…500 times is orders of magnitude less than one. If we see this pattern we can eliminate a fair coin or flipping a fair coin as the cause.
Considered as Bernoulli trials, the probability of a sequence “HTHTHTHTHT…” is the same as 500 heads or 500 tails. To get different probabilities we need to redefine this as “a binary sequence of length 500”, which is a different distribution. Let’s define the variable S and the distribution SEQ:
S ~~ SEQ(N,P*)
{read “~~” to say “distributed as”}
is the probability distribution of N characters (H or T, or 1 and 0 in a binary representation), and P* is a vector of 2^N probabilities for each of the 2^N possible sequences. The sum of all P* is 1.0, of course.
If the probabilities in P* are uniform (all equal to 2^-N) then we have the situation where the probabilities are equal under Bill’s scenario.
A sequence of Bernoulli trials is unlikely to generate this alternating sequence. More formally we have rejected the null hypothesis that the HTHTHT sequence was generated using Bernoulli trials. We only have a single observation of S to estimate P[HTHTHT sequence], but (under Bill’s scenario), so we can’t say much about the distribution of S or the probabilities P* without a lot more data - more sequences of length 500.
OK, definition in place, I can make a few statements:
-
Dembski often swaps the Bernoulli trial and “SEQ” distributions to suit the conclusions he wishes to reach. Specifically, he poses the observed results of evolution as a Bernoulli trials, and we know that isn’t right.
-
Bill’s scenario could still be committing the Texas Sharpshooter fallacy. The are MANY patterns of S that would lead us to reject this null.
-
The selection of N=500 is arbitrary. If we choose N=2, then the possible sequences are: HH, HT, TH TT, and all have probability 0.25. IF we define a probability less than 0.1 are “unlikely” or “complex”, then none of these are complex. IF you choose N=4, then all sequences have probability 0.0625, and all are complex. GIven any arbitrary definity of “complex” as a small probability, it is always possible to choose N large enough to reach the conclusion that an observed S is complex.
I wrote about most of this 9 years ago.
More replies later - work to do!
He could have call it « low probability specified information » instead of « complex specified information » for it is clear that for him, the two expressions are synonymous.
If should be clear to ALL OF US the two expressions are synonymous.
I’d be far more interested in an explanation of how a short English description can possibly indicate high ASC. It shouldn’t indicate high algorithmic complexity.
An very good point. We are discussing the probability/complexity (problexity?) of the description, not the complexity of the thing being described.
This is actually not a fact but merely a claim backed only by a quote from an unreliable AI model, not by a citation to anything that Shannon himself wrote.
This LLM thing is a Red Herring, so don’t get too bogged down with it. If Gil is comflating the general situation where communications might be “complex” with the complexity of a given message. Not the same thing!
I’d be far more interested in an explanation of how a short English description can possibly indicate high ASC. It shouldn’t indicate high algorithmic complexity.
IIRC, ASC uses “lack of randomness”, but I don’t have time to look it up just now. Really tho, ASC has no biological application, so it’s not worth the effort.
It is therefore clear that attributing any claim of ‘low-probability therefore complexity’, let alone ‘low-probability therefore design’ to Shannon is just a steaming pile of … ‘humbug’.
Some troubles in points 2 and 3 dealing with how messages are actually encoded, but a correct conclusion.
According to Dembski:
SC (E ) = I (E) – K (E ) ≥ I (E) – |D |With
E, the event in question
SC: specified complexity
I: Shannon information
K: Kolmogorov Information
D: description of E
|D|: the number of bits making up DHere the event in question is the sequence of numbers below:
14159265358979323846264338327950288419716939937510582This sequence can be described as follow:
D: « First 53 numbers decimal expansion pi ». Accordingly, |D|=147,64 bits.As for I(E), it is equal to 176 bits
So SC(E) ≥ 176 - 147,6 = 28,4 bits
This is a high level of SC. Therefore, chance can confidently be eliminated to explain E.
@Giltil There is so much here that is a best ambiguous, I hesitate to begin to criticize. I invite to to revise, but some of your errors appear to be uncorrectable.
D: « First 53 numbers decimal expansion pi ». Accordingly, |D|=147,64 bits.
You left out some non-obvious steps. How do you arrive at this? Also a possible typo since below you give |D|=147,6
As for I(E), it is equal to 176 bits
Why does it have a probability of 2^-176? What is the distribution of E?
A fundamental error here: SI cannot be measured for any given message, only for the entire distribution of possible massages. Again, what distribution are you using?
NOTE: A probability distribution can have fractional Shannon Information expressed in bits. In practice use have to round up because you can’t have fractional wires.
BUT WAIT - THERE"S MORE
D: « First 53 numbers decimal expansion pi ». Accordingly, |D|=147,64 bits.
Your description contains “pi”, and it is also necessary to define “pi”, making your descriptions circular. In practice
“First 53 numbers decimal expansion pi”
will be longer than the algorithm to generate pi because it adds “First 53 numbers”.
Which, ignoring your other mistakes, gives you SC = 0.
(If the compress version of the description is longer than the original, then you skip compression and use the length of the original.)
This sequence can be described as follow:
D: « First 53 numbers decimal expansion pi ». Accordingly, |D|=147,64 bits.
Show your work.
One possible calculation is that each character in the description is at least 6 bits, as there are 26 uc letters, 26 lc letters, 10 digits, space and EOL available, for 64 =2^6 options for each character.
That would mean |D| is 37×6=222 bits.
As for I(E), it is equal to 176 bits
So SC(E) ≥ 176 - 147,6 = 28,4 bits
Or, using the other value of |D|, SC(E) has a lower bound of only -26 bits, which completely scuppers your conclusion.
Added: Merging uc and lc letters still gives more than 32 characters available, for more than 5 bits per character and a lower bound of -9 bits.
In hopes that we might avoid perpetuating errors, I suggest we take a pause and let people catch up on recent comments before continuing.
At a minimum, Gil deserves some time to consider all of this before replying, so let’s not pile on.
As, due to timezone differences, I haven’t had an opportunity to respond to @Giltil’s “Accord to Dembski” post, I’m going to claim an exemption from Dan’s “pause”, at least to the extent of asking some, very basic, questions about his (in)equality.
-
What is an “event”? And specifically what limitations, if any, apply to what can constitute an “event”? Can an “event” be something beyond a sequence of characters? For example, can it be (the existence of) a physical object?
-
Are Shannon Information and Kolmogorov Information well-defined for any “event” beyond a sequence of characters? Further, are they calculable for anything beyond that limit?
-
What level prior knowledge can be assumed in the formulation of the “description of E”? A question has already been raised as to whether knowledge of the irrational number pi can be assumed. What about knowledge of geometry, and the existence of circles having diameters and circumferences (and thus the constant ratio between the two)?
Lacking clarity on such points, I have no option except to agree with Wolpert’s opinion of Dembski’s (later) work:
There simply is not enough that is firm in his text, not sufficient precision of formulation, to allow one to declare unambiguously ‘right’ or ‘wrong’ when reading through the argument. All one can do is squint, furrow one’s brows, and then shrug.
As for I(E), it is equal to 176 bits
Why does it have a probability of 2^-176? What is the distribution of E?
If the distribution of E is 53-digit numbers, then there are 9*10^52 possibilities[1], which is ~2^176, so a 2^-176 probability of getting this result if picking one at random.
But of course there’s no guarantee that the sequence has to be that length. Longer ones may be possible, in which case the probability may be much lower.
There’s no guarantee that each digit is equally likely - there are a lot more 9s than 0s - which would also affect the probability.
The sequence isn’t necessarily a decimal number. It could be a base-12 number that doesn’t contains either of the ‘digits’ A or B. Or a hexadecimal value that doesn’t contain A, B, C, D, E or F. It could be a series of measurements of something, perhaps with limitations or lower probabilities on the differences between successive values (there are no changes from 0 to 9 or vice verse, and only one change from 1 to 9). It could be derived from a string of letters (“opizecehigic…”), or be pairs or triplets of digits, or not represent numerical values at all.
But, since it is the start of the decimal expansion of pi, the distribution of E is likely to be pi to x digits, where to give a reasonable length[2] x is between e.g. 20 and 80. So the Shannon information isn’t based on the actual digits, which are fixed, but only on the length chosen.
So really, I(E) is 6 bits, |D| is at least 6 bits, SC(E) is close to zero and not only can chance not be eliminated to explain E, chance is by far the most likely explanation.