Examples of Shannon information "codes"?

Rumraket · January 3, 2019, 4:03am

I don’t know what the definition of a shannon information code is, but have you ever considered that Fraunhofer lines are effectively like barcodes in light? These lines can be used to infer the elemental composition of shining objects from across the width of the observable universe. I’ve been told by physicists that the existence of Fraunhofer lines in the electromagnetic spectrum can be predicted by quantum mechanics, and their locations in the spectrum calculated from first principles.

This would then seem to be a sort of “code” that originates the the properties of atoms, or perhaps more correctly in the quantum mechanical laws that describe them.

BruceS · January 3, 2019, 10:27am

Shannon information is just a property of random variables: you need a sample space (often a set of strings) and a probability distribution. Modeling something with a causal relation does not need probabilities so I think the example of inferring composition from Fraunhofer lines is not on its own a candidate for Shannon info analysis.

However, we can often introduce probabilities in situations when there is random noise between two variables which we take to have an underlying causal relation. For example, in neuroscience, we can see how effective neural spike trains model external stimuli by using the mutual (Shannon) information of their two probability distributions as estimated by repeated sampling from response to same stimulus. This approach separates correlation of response with stimulus (correlation in the MI sense) from the inherent noise in stimulus and response taken separately.

BruceS · January 3, 2019, 10:46am

The OP uses the term “Shannon information codes”. There is a specific technical meaning for the term “Shannon code”. It is the encoding of the input messages which produces maximum possible compression after taking into account the probability of each message.

Morse code is an attempt at that, but it only looks at messages of length one (ie English letters). Huffman coding is the modern formalization of that idea of optimal single symbol-based compression.

Fully optimal codes require one to look at the entire message. In practice, this is done by looking at probabilities of pairs of letters, then triplets, then whole words, etc. Shannon considers such Markov modeling in his paper.

Modern compression algorithms like LZW build tables of the most frequent subsequences in a given input string and code them as shorter sequences; this effectively gives the most frequent substrings the shortest encoding, which is the basic idea in Shannon coding. LZ type coding is one of the compression techniques used in Zip compression.

(As an aside, as I read the posts by the OP, they still assume the everyday usage of information as referring to human meaning and incremental knowledge. The OP does not address Shannon’s technical usage of ‘information’, which is a property of the probability distribution of random variables. Specifically, Shannon’s technical definitions relate to -log2 p(i) and the expected value of this quantity, where p(i) is the probability of message i. This assumes countably many messages so that a discrete distribution suffices.)

BruceS · January 3, 2019, 2:02pm

Just for fun, on DNA and Shannon/Huffman codes. (You can find lots of articles with similar themes)

ncbi.nlm.nih.gov

Improving the efficiency of the genetic code by varying the codon length--the perfect genetic code.

AJ Doig, Journal of theoretical biology, Oct 1997 07

The function of DNA is to specify protein sequences. The four-base "alphabet" used in nucleic acids is translated to the 20 base alphabet of proteins (plus a stop signal) via the genetic code. The code is neither overlapping nor punctuated, but has mRNA sequences read in successive triplet codons until reaching a stop codon. The true genetic code uses three bases for every amino acid. The efficiency of the genetic code can be significantly increased if the requirement for a fixed codon length is dropped so that the more common amino acids have shorter codon lengths and rare amino acids have longer codon lengths. More efficient codes can be derived using the Shannon-Fano and Huffman coding algorithms. The compression achieved using a Huffman code cannot be improved upon. I have used these algorithms to derive efficient codes for representing protein sequences using both two and four bases. The length of DNA required to specify the complete set of protein sequences could be significantly shorter if transcription used a variable codon length. The restriction to a fixed codon length of three bases means that it takes 42% more DNA than the minimum necessary, and the genetic code is 70% efficient. One can think of many reasons why this maximally efficient code has not evolved: there is very little redundancy so almost any mutation causes an amino acid change. Many mutations will be potentially lethal frame-shift mutations, if the mutation leads to a change in codon length. It would be more difficult for the machinery of transcription to cope with a variable codon length. Nevertheless, in the strict and narrow sense of coding for protein sequences using the minimum length of DNA possible, the Huffman code derived here is perfect.

Topic		Replies	Views
Shannon information and COVID-19 Conversation Science , Article	93	1896	October 7, 2022
How Is DNA Not Like A Code? Conversation	71	1335	October 18, 2021
Perspectives on Discussion of Science and Religion Conversation Philosophy	62	402	February 8, 2025
Is there really information being conveyed within a cell? Conversation	665	4551	May 17, 2025
Durston: Functional Information Office Hours Design	63	8184	December 5, 2018

Examples of Shannon information "codes"?

Related topics