The OP uses the term “Shannon information codes”. There is a specific technical meaning for the term “Shannon code”. It is the encoding of the input messages which produces maximum possible compression after taking into account the probability of each message.

Morse code is an attempt at that, but it only looks at messages of length one (ie English letters). Huffman coding is the modern formalization of that idea of optimal single symbol-based compression.

Fully optimal codes require one to look at the entire message. In practice, this is done by looking at probabilities of pairs of letters, then triplets, then whole words, etc. Shannon considers such Markov modeling in his paper.

Modern compression algorithms like LZW build tables of the most frequent subsequences in a given input string and code them as shorter sequences; this effectively gives the most frequent substrings the shortest encoding, which is the basic idea in Shannon coding. LZ type coding is one of the compression techniques used in Zip compression.

(As an aside, as I read the posts by the OP, they still assume the everyday usage of information as referring to human meaning and incremental knowledge. The OP does not address Shannon’s technical usage of ‘information’, which is a property of the probability distribution of random variables. Specifically, Shannon’s technical definitions relate to -log2 p(i) and the expected value of this quantity, where p(i) is the probability of message i. This assumes countably many messages so that a discrete distribution suffices.)