Not sure to understand your development here. But in my example, I was assuming that the coin was fair. And in that case, the two outcomes, the one displaying only heads and the one apparently random, have the same probability of occurrence, hence the same Shannon information, haven’t they?
WD describes the simpler description, the one lacking randomness as containing more Kolmogorov information, not more Shannon information. And in that sense, he is correct, isn’t ?
Edit:
WD describes the simpler description, the one lacking randomness as containing less Kolmogorov information, not more Shannon information. And in that sense, he is correct, isn’t ?
Microscopes existed during the age of sail. The bacterial flagellum was discovered in 1676, more than a century before the first working steamship was built. The ASC of a bacterial flagellum does not apparently include anything other than a simple description.
What was the ASC of a bacterial flagellum that could not be seen during the age of sail?
It’s the decimal expansion of pi without the “3.”
I think most people would be able to figure that out.
My mistake. Let me correct:
WD describes the simpler description, the one lacking randomness as containing less Kolmogorov information, not more Shannon information.
No. Neither has Shannon Information; SI is a property of the population, not the sample.
EDIT: Given your assumption of a single fair coin they have equal probability, but that’s not SI. If there are two coins, then we can estimate the SI for each.
Clarification: Heads and Tails are the observed signal/data. The coin in the the random “population” generating the signal
Again no, but your confusion is understandable, especially if you have been reading Dembski. He tends to prevaricate.
Correct for AI/KI. “Simpler description” does not apply to SI.
Looking at Dembski’s ENV article …
In this usage “measure” is a mathematical term. In Dembski (2005) he did not have an “information measure”, even though he called it that. The reason is that he did not have a probability distribution to define the measure. My understanding is this error carries over to his other publications up to about 2005 and the ASC paper with Marks and Ewert.
Dembski is conflating Information with Surprisal. If the probability of event E is small, then the magnitude of p*log(p) [the Surprisal] is large. SI is the negative of the sum of surprisal over the entire probability distribution. No single event can be used to estimate SI.
Dembski declares that smaller probability events are more complex, but there is no justification. There is no reason why rare events cannot be simple. I think we might name some simple events that are complex too, but it seems off topic.
From the next paragraph, last sentence:
Dembski likes to measure probability in bits. That’s OK, but probability is not Shannon Information. For this example ANY sequence of ten coin flips is 10 bits of SI. Shannon’s application was communications, and SI can be thought of as a measure of bandwidth needed to convey signals from sender to receiver. In this context more information means a greater variety of signals that may be sent. All 10-bit signals are equally “complex”.
Note that Dembski never defines his population. Is it one coin flipped ten times, ten coins flipped one time each, or the sum of ten Bernoulli trials (a Binomial)? I’ll cut him a little slack here for space limitations, but he has been know to define his probability distribution differently depending on the outcome he wishes to show. IIRC details can be found in Elsberry and Shallit (2011).
Dembski goes on to write about AI/KI, and is basically correct here. It remains that he has often confused algorithmic randomness with lack-of-randomness in previous writing.
ETA: Thanks tp @sfmatheson for corrections and the link!
A further thought about “complexity”. Dembski writes that low probability events are more complex. Perhaps what he intends is the “meaning” of the message conveyed is more complex?
In communications the meaning of messages is determined by prior agreement of the sender and receiver. They might agree to communicate in English, and both understand English, which is previously shared information needed to decode the communication. When it comes to actually coding words in a message, it makes sense to code commonly used words in a short and simple code, and rarely used words in a more complex code (ie: “The” and “Antidisestablishmentarianism”). This is the only sense I can see where low probability signals should be more complex.
But this gains us nothing. Neither AI or SI deal with the meaning of messages, and Dembski (rightly) doesn’t discuss the meaning of messages either. He implies that low probability events are somehow special, but there is nothing particularly special about a probability being small.
Yes, for Dembski, complexity refers to probabilistic complexity, or small probability.
No, the notion of a message has no bearing with the notion of complexity. Let’s take the example of coin toss again.
If a coin is tossed 100 times, the outcome will be a complex, hence improbable event of 1 in 2^100.
If it is tossed only 2 times, the outcome will be a simple, hence probable event of 1 in 4.
As you can see, the notion of a message is absent here.
No, Dembsky doesn’t implies that low probability events are somehow special, not at all. For Dembski, to be special, an event requires both small probability AND specification. If a low probability event is unspecified, its low probability is irrelevant for eliminating chance.
Gilbert Thill, you clearly have no understanding of the meaning of the word “complex”:
complexadj Consisting of or comprehending various parts united or connected together; formed by combination of different elements; composite, compound. Said of things, ideas, etc. (Opposed to simple, both here and in sense 2.) [OED]
A probability is a real number between zero and one (inclusive) – it is a scalar, and so simple not complex.
No it won’t be. It will be a simple series of heads and tails.
This appears to contradict your statements above which conflate low probabilities with complexity, regardless of the presence or absence of a “specification”. (And beyond that, I highly doubt that Dembski has ever given a rigorous definition of “specification”.)
Yes, we know, I was reaching for other possible meanings, looking for something that might make more sense. But Dembski doesn’t call it “Low-probability Specified Information”, he calls it “Complex”. Why?? (Rhetorical)
Something that is generally overlooked, is that Dembski doesn’t define a probability distribution except in very simple examples. In any other case the distribution is either undefined or assumed to be Uniform, even if Uniform is clearly not a reasonable assumption. This allows him to claim he is discussing probability, when it’s really just arithmetic. It also leads him into mistakes, like using a “probability” greater than 1.0.
Gil is referring to specified complexity and Tim is talking about complexity. Tim cited a definition of complexity which maybe different than specified complexity. This discussion has the chance to bear some fruit if definitions can be mutually understood by all.
The probability of the result of 500 coin tosses is 1. The probability of a predicted pattern like heads and tails with never 2 heads or tails ie HTHTHTHTHT…500 times is orders of magnitude less than one. If we see this pattern we can eliminate a fair coin or flipping a fair coin as the cause.
I know you understand all this my hope is to get some productive common ground to this discussion.
I’ve asked the following question to Perplexity AI: Has Shannon used the concept of complexity in his theory of information? Below is the answer I got:
Shannon did not explicitly use the concept of complexity in his original theory of information. However, there is a deep connection between probability and complexity that is evident in Shannon’s information theory. Shannon’s theory primarily focused on the transmission and communication of information, using concepts like entropy and channel capacity. He converted probabilities to bits, establishing a relationship between probability and information content. For example, an event with probability 1 in 2^100 corresponds to 100 bits of information. The complexity aspect in Shannon’s work is related to the length of bit sequences being transmitted. Longer sequences have greater information content and complexity, which impacts their transmission across noisy channels. It’s important to note that the concept of complexity in information theory was more fully developed later by Andrei Kolmogorov. Kolmogorov complexity, also called algorithmic information theory, measures the complexity of strings rather than distributions, but shares many concepts with Shannon’s entropy. While Shannon laid the groundwork, the explicit incorporation of complexity into information theory came after his seminal work, with later developments building upon and extending his ideas.
Bottom line: Dembski is correct when he connects complexity with probability.
Is he? Aside from the fact that LLMs are not entirely reliable the Perplexity “AI” doesn’t even clarify how complexity relates to length. Which is not strictly correct, either. If the message was in English, encoded asASCII text then it would not be the case that all bit strings of the same length are possible messages,
Further, it is quite clear from your own examples that algorithmic complexity is not closely related to length. A bit string with a simple pattern is less complex than a random bit string of the same length.
It seems to me that algorithmic complexity is certainly closer to ordinary ideas of complexity than message probability - or even length. Should an extremely simple message be considered “complex” just because it has a low probability? Or just because it is long? Surely not in the ordinary sense. I wouldn’t call the regular signals of a pulsar complex, no matter how long the pulsar contained to signal.
But (i) Dembski isn’t here and (ii) Gilbert is parroting his claims. Therefore I see no reason not to address the glaring flaws to Gilbert.
And so you go off the rails right off the bat – a LLM will not give a rigorous explanation.
This claim would appear to be highly problematical.
It would only appear to only establish an analogy “between probability and information content”, not a “relationship”.
The analogy immediately breaks down when you move beyond the simple introductory-textbook example of a known discrete distribution with two equiprobable outcomes – i.e. it breaks down for the entirety of real world outcomes.
A blatant example would be that, for the normal distribution, the probability of being within one standard deviation of the mean is 0.6827. How many bits of information is this?
The disanalogies between probabilities and information bits are:
Bits are discrete, probabilities can take any value between zero and one.
Alternate values of bits have equal weight – real-world probabilities need to be estimated – which generally involves making assumptions.
Then, after conflating probability with information, the LLM goes on to conflate Shannon Information with Kolmogorov Information – which are, AFAIK, two incompatible formulations.
This muddled explanation would seem to indicate that, far from being “correct”, Dembski is “not even wrong”.
I do not however find this the least bit surprising – as Dembski’s entire career would appear to indicate that he is far more successful in creating drama than in creating anything rigorous, let alone useful to the wider mathematical/information theory community.
Addendum: I would note that Gilbert’s post, to which I originally replied, made no mention of “specified complexity”, listed no specification for his “a coin is tossed 100 times, the outcome will be a complex” example and only mentioned “specification” at all in his final paragraph. It is therefore not reasonable to assume that he was “referring” to “specified complexity” rather than complexity simpliciter. Further, to the extent that “specified complexity” is equated to a probability it is a misnomer – more muddle from the muddle-merchants at the DI.
Further addendum: I did a quick skim of Shannon’s famous 1948 paper – and while he deals extensively with (very complex) probabilities, he does not seem to simplistically equate ‘bits of information’ with log2 of probability. I would suggest that the LLM is either mistaking what others have (erroneously) said about Shannon with what Shannon actually wrote, or is outright ‘hallucinating’.
The complexity aspect in Shannon’s work is related to the length of bit sequences being transmitted. Longer sequences have greater information content and complexity
This is true for Kolmogorov complexity, but not for Shannon complexity.
Let’s take the example of the two coin tosses again.
Here is the first
HHHHHHHHHHHHHHHHHHHH
Here is the second
HTTTHHTHTTTTHHTHHTTH
Because they both have the same length, they have the same complexity in the sense of Shannon. However, because the pattern of the first is simple, it has less Kolmogorov complexity than the second.
He could have call it « low probability specified information » instead of « complex specified information » for it is clear that for him, the two expressions are synonymous. Don’t forget that the subtitle of the two editions of « The design Inference » is « eliminating chance through small probabilities »