Thanks for the concern. I’m not confusing the two. I’m just asking you to be consistent.
This is not my approach. This is your approach. I’m just asking for us to use FSC to compute the FI of cancer using the same approach you published on pfam. You seem unwilling to do this, but that is all I’m asking for at this time.
This is not true. It is entertaining that you’d think so. The calculation would show something quite different. It seems like you don’t know how KL would be computed. Very interesting.
Regardless, your inferences are incorrect because your guess at what KL of FI is wrong. The rest of your logic is predicated in that erroneous calculation.
You can expect this to change going forward.
It appears I’m the first computational biologist with training in information theory to have looked over your math. It is clearly in error, but in a way that would be hard for most biologists to recognize. With this thread, and the thread on Cancer information (Computing the Functional Information in Cancer), and observers like @evograd, @rjdownard, @Art, and @mercer, the methodological errors will be more widely known.
The difference is that I am a computational biologist that has been applying information theory to biology for decades now. Very few people have the ability to competently review your work. I am one of them.
I’m sorry if I’m misrepresenting you. In your paper, FSC is computed using a maxent ground state. I think we agree that this is what you did. Right?
I’m saying we should see how that approach computes the information of cancer. I think it is in error on cancer, and it is also in error on protein families. You can’t use it for protein families if it can’t pass the basic control of cancer genomes. For you compute the correct FI of a protein family, among other things, you have to use the equivalent of “normal” which i the “ancestral” state in protein families. You didn’t do that for pfam, so the calculation is similarly invalid for pfam.
This is an helpful image. Your current approach starts from that ground state. However, it is not how evolution says things arose. So it is starting from a a strawman of a ground state. You have to start from the ancestral state, not a maxent state.
Nope. I am sorry, but that is not the right calculation. Applying the formula from your paper, the FSC paper, produces about 6 billion bits of FSC. Would you like step through the calculation? That should be fun.
Also “cancer” is not “non-functional.” As @Mercer has already noted. It is a precisely defined functional state that requires precise control of gene expression and new protein functions.
After letting go of the idiosyncratic use of the term “Shannon uncertainty” (which I do not agree with), I do agree that delta H can be negative. However, it does not mean what you think it does. Also, it is not clear that cancer has reduced H. You have to demonstrate this, and I am not sure it is true.
Except you haven’t even produced a single correct calculation. Clearly I am talking about information in a different way than you imagine I am.
The information to erase a hard drive is much FI at all, perhaps as low as one bit. It also increases the function of the hard drive by increasing it’s capacity (I.e.) its function. Stepping through this example might be instructive. It seems you didn’t realize that the FI using KL would be trivially low in this case. Also, if we want to use FSC or Delta H, we don’t know if it goes up, down, or stays the same. There is not enough details to tell. For KL, however we can know quite easily.
Likewise, the information of cancer using a base state of maxent is not 0; it is, rather, about 6 billion bits of FSC. That also is an example worth stepping through.
That is turns out to be false. KL divergence is the amount of information required to change between states, delta H is not. To make a simple analogy. Let us say I have two documents, that say totally different things, and accomplish different functions, D1 and D2. Let us say that the complexity is the same, so H(D1) = H(D2). So the delta H is zero. Now let’s say I have D1, but want to change into D2. How much information do I need to do this?
Turns out that it is definitively not 0, even though delta H = 0. Instead, it will take KL(D2 || D1) bits of information to create D2 starting from D1. What you need is KL. This is a foundational concept in information theory. There really is no way around this.
It seems that what is going on here is that you haven’t yet appreciate the error you made in thinking KL = delta H. Yes, that is true if and only if the base state is uniformly random. However, as soon as you deviate form this, it is no longer true that FI = -log (W), and the formula you are using are no longer correct. You just realized this recently, earlier in this thread.
Of course, you have to argue that detla H is correct for there to be any coherence to your argument, but it is not. The example I just gave you is an example of that.
At this point, you have incorrectly inferred how to compute KL on cancer and in the erased hard drive. Based on that incorrect inference, you made incorrect computations on the FSC of cancer. That seems to be where we should start, taking a large number of cancer genomes, and computing their FSC by your published method.
It will produce a FSC computation of 6 billion bits. From their we can discuss the next steps forward. What do you think?