Jonathan Bartlett: Measuring Active Information in Biological Systems

A post was merged into an existing topic: Comments on Jonathan Bartlett’s Office Hours

You haven’t actually said specifically what the problem is, except when misstating what I’m doing. And, Matheson, who seemed to understand what I was doing, wound up agreeing that the procedure did what it said it should do. So, you shouldn’t use “us” as if this represented all present. It just isn’t the case.

That’s literally true of everything, including throwing a baseball to a target. Nobody would say that the baseball player threw randomly with respect to the target.

Except that people do, and they mean something by it. If you don’t, great! It actually seems you are agreeing with me, but disagreeing about other what other people mean. Let’s say that no one ever talked about if mutations are random with respect to function. That doesn’t mean one can’t recognize a bias towards function, and recognize that this indicates that there must be a reason for a significant regularized bias towards function.

But, in truth, I bet that you couldn’t get Larry Moran to say that mutations are biased in a way that leads towards function. It sounds to me like he means every word he says about it, and many others, too. When he and I discussed it, the only thing he would admit to is bias, but specifically not bias towards function.

More so now than before. Every argument against my thesis seems to think that I’m presuming that mutations should be randomly distributed, or that biologists are saying that mutations should be randomly distributed. That is 100% not the case, but the fact that the common link between the people who disagree with me is that they misunderstand my arguments seems to indicate that there is nothing wrong with my thesis, though perhaps it does say that I am poor at communicating it.

Perhaps for a different way of communicating it, let me walk through it in a backwards fashion, and see if there is more clarity.

Let’s presume, counter-factually, that mutations did have a uniform random distribution. In such a case, would it be correct to say that they are “random” with respect to fitness? I think so. In fact, in such a case, it would be correct to say that they are random with respect to just about anything, fitness included. If this wouldn’t count as “random with respect to fitness”, I don’t know what would.

Therefore, knowing what the fitness effects of “random with respect to fitness” looks like. We can then measure whether or not particular biases are biased towards function, away from function, or biased in a way that does not deviate from randomness towards function. These are all things which are performable, measurable actions that have definite meanings based on the terms being used. We can, even, measure whether or not the specific biases that actually occur in organisms are equivalent to being random towards fitness or whether they are biased one way or the other.

This line of reasoning only relies on two questions - (1) if mutations were randomly distributed, would the be random with respect to fitness? I believe the answer would be yes. (2) Can we measure the fitness of actual mutations? I believe the answer here would be yes. Since we can answer (1) and (2), we can also compare the measurements. If (2) has greater fitness than (1), then in what sense is “random with respect to fitness” true?

2 is greater than 1. I’ve already agreed that “random with respect to fitness” is a horrible way to explain this and it is not true in a mathematically precise way. It is true that this is an incorrect claim, but for a different reason than you’ve laid out.

If your point is “mutations are not random with respect to fitness,” well that is as flawed as saying “mutations are random with respect to fitness.” They are random (not totally predictable) in both cases.

Actually, mathematical modelers would model this as a random variable, which is not independent of the target. It would be “random event even though it is dependent on the target.” The whole “random with respect to” is not a sensible way of describing this.

If your point is that “mutations are not independent of fitness,” yes that is correct. I think that is what you mean. Once again, we know this from other observations than the argument you’ve made here.

1 Like

Oh dear, that’s potentially misleading though I don’t think you intended it that way. What I wrote is:

I then identified the best-case scenario for what that would mean. I do NOT think that your math is likely to generate insights, because “some expectation” refers to what is rather clearly a strawman.

This is what your paper reports: Some pretty basic math that generates an untested metric that is claimed to represent a difference between a real-world dataset and a benchmark. We don’t know the utility of the metric, nor do we know whether its apparent quantitative nature is related to anything in the real world. What we do know is that the benchmark is a strawman and that the data fed into the process render the process unacceptably prone to GIGO. Given my assessment above and summarized here, I think it’s potentially confusing to readers to repeatedly cite me as having somehow affirmed that the paper “does what it says it will do.”

Well, I’m a humanist, so I think you are worth a lot more than the time of day. I’ve never met you so I have no basis for liking you or not. As for whether you care about my opinion of the paper, that seems inconsistent with your whole post here. Don’t you think?

6 Likes

Yes, after re-reading I see I did have a misunderstanding, but it really doesn’t change that much. First though, a few comments that I offer as a reviewer to help you improve the method, but do not change the overall discussion:

  1. The binomial distribution is OK as you are using it here, but Poisson and negative-binomial models could also be used, I think.
  2. Equations 23 is a probability, and equation 25 is a conditional probability. Given that you have parallel variables for everything else, why not make #23 a conditional probability too?
  3. Equations 21 and 22 are rates of mutant organisms in the population, and not per-organism mutation rates. This will lead to underestimating the numbers of mutations in the population. What you want is simply M_s * G. This is fixable, but will change your derivation of equation #36. ETA: I might have this backwards, and you are estimating based on assumed M_s?
  4. Equation #34 need not be zero. There could be some mysterious external force inserting non-random mutations. :wink:

OK, but this still has the form of a likelihood ratio test comparing two mutation rates.

Now the troublesome part - it is not stated if the function should be specified before the test or after observing the data/function. The latter suffers from the sharpshooter fallacy, and seems entirely dependent on post-hoc choice of function. The GENERAL METHOD section is written in a way that could be performed as a planned experiment, IF some function can be decided on ahead of time. I should leave that question to others, but as @sfmatheson has noted, AI doesn’t seem to test anything interesting.

If I choose to measure AI as “mutations which preserve the current function (drift)”, then I could have positive information for identical function. In the post-hoc sense, and for some mutation to new function, I could define AI to be positive or negative at my whim (maybe I don’t want E.Coli that eats citrate?).

You have pieces of a useful method here. I know this because I recognize the bits of statistical theory you have rediscovered. You have a good start towards design of experiments too, and that is the key to answering useful questions in a testable way. You also seem to understand the need to separate multiple sources of variation as in “internal and external” sources.

I don’t want to be overly critical - you really have come a long way - but there are still a few pieces missing. The connection to what you want to test is not complete yet, or at least you aren’t expressing it in a way that I can understand.

3 Likes

Dan - thanks for the comments. I will think more about how these things can be expressed and more directly connected. You are probably right that what seems clear in my mind is not necessarily fully communicated (and therefore not fully defended). Just to be sure I’ve answered your points:

#2 - I believe you are correct. I’ll look at it more to be sure, and if so I’ll see if I can get a correction issued.
#3 - It does means what you suggest. Perhaps my terminology was muddy. The O_ variables refer to the rates of mutant organisms. By “per organism” I meant “The rate of incidence per organism”.
#4 - cute :slight_smile:

As to the positive/negative issue, I am going based on biological measures of fitness, not just “outcomes I want to see”. So, it is not “can you do X”, it is more of “can you survive”. Survival/fitness is a biological function that is not chosen by the experimenter. The experimenter chooses the survival task, but only measures the organism’s success.

SFMatheson - my reasons for posting here are several. First of all, Josh and I have some history together, and I wanted to be sure to share important parts of what I was doing here as well as other places (I also shared at UD and TSZ as well). Second, if there were significant problems, I certainly want to know what they are. Asking seems like a good enough way to find out. I don’t need anyone to like me better, or think my paper is significant, or interesting, or anything like that. I need to know if there is something significant that I’ve missed. What I have learned is that there haven’t been disagreements with the basic logic of the paper that I have found convincing. There has been some disagreement with my treatment of SMH, and I acknowledge that this is certainly something I should dig into more recent studies of. However, it doesn’t impact my main thesis (i.e., it is a question of whether or not I got the inputs to the process right, not whether or not the process itself works).

“I then identified the best-case scenario for what that would mean”

And what is that best-case scenario?

And what did I say I was trying to do?

I don’t really see a difference between what you said and what I said, except to maybe add “at best” to the beginning of it. Nonetheless, if you are retracting that, no worries, I’ll accept that.

Josh -

That’s good, but you should know that many people (including many, many biologists) do believe that this description represents a precise way to describe mutations.

The point is that my method provides a quantifiable way to describe this, and to test this in specific cases. Certainly, there are other ways to know this qualitatively, when there are very strong examples. The nice thing about the quantitative approach is that we can tell, in places where we are not familiar with the mechanism or where the effect isn’t as pronounced, that there is something worth investigating, as I mentioned with the example from Hofwegen et al.

So, I think I will conclude my participation in this thread, as I don’t think there is any remaining progress to be made. I believe that looking at truly random mutations provides a valid expected value to compare against, and others disagree. I think finding the point of departure of the different viewpoints is sometimes the most useful result of interactions such as this.

Thanks all for your engagement!

2 Likes

Immunologists who study VDJ recombination, such as Dr David Schatz, at Yale, are interested in finding biological explanations for why RAG recombinases target particular sites during SHM. Here is one paper describing such biological explanations for apparent non-random mutation.

Biology is complex. We could imagine various biologic explanations. Perhaps how different sites of chromatin are spatially related to one another in the nucleus. A good friend of mine tested that hypothesis, and found it to be incorrect:

However, there could be other biological explanations, such as more open areas of chromatin, suggested in this paper where they observed mutation of newly integrated DNA

or location of Immunoglobulin enhancer elements:

or location of “recombination signal sequences”

In addition, it is important to note that B cells undergoing SHM are under high selective pressure in the germinal center reactions, forcing survival of cells in which productive mutations have occurred that enhance antigen binding affinity. So only the surviving B cells are quantitated, and those are the B cells that had productive, as opposed to detrimental mutations.

6 Likes