@kirk is it possible that you are mistaking this:
KL(p || q) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)}
for this?
\Delta H =\sum_{x \in X} -q(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)}
Notice the switch from p to q in the first term. KL looks a lot like a delta H, but it most definitively is not. The first summation in KL is not H ( p) or H ( q), because it includes both q and p. The second summation, however, is H(p).
Just to catch everyone up, if q is MaxEnt (the base state that @kirk is using), then in this case KL = delta H. If q is not MaxEnt, this is not the case. If q is MaxEnt, then for all p:
H(q) =\sum_{x \in X} -q(x) \log {q(x)} = \sum_{x \in X} -p(x) \log {q(x)} = \log {N}
Here, N is the number of possible states. In this case it doesn’t matter that p is not q. Usually, however, it matters a great deal. If q is NOT MaxEnt, this not longer is true.