Just to be clear here, is there a direct equating of information with entropy in information theory and/or computer science? Shannon’s entropy is rightly called entropy, because it is just the same as Boltzmann’s, but I would hesitate to say either is the same as or even monotonously increasing with “information content” (how ever exactly one would quantify that). Here is why:
Entropy, in both formulations, is a measure of a macrostate’s phase space volume. In statistical physics, we could think of the macrostate as a tuple of state variables, like temperature and pressure distribution over the volume of a thermodynamic system. There are many (many) ways in which all of that system’s particles can be arranged in terms of their actual positions and momenta (microstate) to arrive at the same values for these macroscopic variables. All of these microstates “look the same” after a very conservative amount of coarse graining. Even if we insist that there is a particle at every position-momentum locus that we have captured in a snapshot, because the particles are interchangeable, there is still N! permutations to get exactly that state, and particle numbers N are on the order of 10²⁰ to 10³⁰ in realistic lab settings. Still, some macrostates correspond to more microstates than others. Given random fluctuations, it is then vastly more likely that a system should progress into a macrostate that occupies more of the phase space, than into one that occupies less. This, in the understanding of statistical physics, is the second law of thermodynamics, and at no point in this explanation was there any need to introduce “information”, nor would it have helped if I did.
In computer science, if we are going to define an amount of information by the length of a string of bits, we have done nothing to begin talking about entropy. The number of bits is the number of particles, and if we impose that it cannot even change, of course the amount of information won’t either. The microstate is what ever specific values all the bits have, and if we decrease our resolution of this picture far enough to where we can no longer tell which of two neighboring bits is red and which is blue when looking at a purple locus, now we are beginning to get towards talking about statistical mechanics again. If in every time step a bit is randomly chosen from an even distribution for the flipping, the flucutations of colour between locations will even out in the long run, and the shade of the system will be a mix of red and blue corresponding to the initial ratio of red vs blue bits. There are many more ways of producing noise that looks like that under a coarse enough graining, than there are ways of producing noise that has a lot of highly contrasting regions. This number-of-ways to get the macroscopic outcome, that is entropy. Again, there was no need to even mention information.
But not only is equating entropy with information unhelpful, it can, under quite reasonable interpretations, also end up misleading:
Consider the blurring process of bits described above. Sure, the information kept within the exact microstate is constant, but if we are talking about 10²⁰ bits, it becomes impractical to store them. We could instead store a string of region lengths, where a region is a contiguous string of bits that have the same value. In that sense, as entropy increases, the amount of those regions rises. It becomes more expensive to keep the information about the microstate, than had the regions been long as they were before the diffusion. Of course, we are still storing the entire microstate, just in a different format, which eventually becomes as impractical as storing all the bits in the beginning. So how about we do the coarse graining we normally would in physics, and sample large chunks of our bit string and store some statistical information about that chunk. A hash sum, of sorts. Well, if our bit strings are very long, then in equilibrium the hash of one chunk will have the same value as the hash of another. Their distinctiveness only exists far from equilibrium, and it decreases as we approach equilibrium. Knowing the equilibrium state we know almost nothing about the initial state anymore. Entropy has increased between initial and steady state, but in the process the state has become unspecific, and information about its past is altogether erased. And it’s not that our storage scheme has expended information now hidden away in the system. Recovering the correct initial state from an equilibrated one is outright impossible – the Gauss kernel smooths out that information until there is none left in the long term limit. That information is genuinely erased from the system, and any storage scheme that would retain it would have had to keep catching it as it was “evaporating” out of the actual string of bits.
In my opinion both understandings of information are reasonable and with no obvious favourite over the respective competitor. Where the former is essentially identical with entropy, the latter is essentially its inverse. This, I find, makes the introduction of the term without explicit definition muddying, ontop of unnecessary. I understand that the topic of this thread is about information flow, but I find it nevertheless hasty to just pull in entropy as a stand-in, when entropy does have a consistent and unambiguous intuition-independent definition and interpretation accross different fields. If the same can be said of information, I’ll be glad to hear it. As far as I know, there is a lot more debate and ambiguity over that term than there ever was about entropy.