Jonathan Bartlett: Measuring Active Information in Biological Systems

Irrespective of what you are comfortable calling it, that is precisely what I was aiming for! Success!

1 Like

The uniform distribution is sometimes justifiable as a maximum entropy distribution. In the absence of better information about the actual distribution, this gives the largest possible variance for an estimate. Given that we have information that not all mutations are equally likely, we can and should replace the uniform assumption.

I have some further thoughts, echoing my comments at TSZ, but it’s late. Briefly, …

I think a multinominal distribution is needed in place of the binomial. This allows for non-uniform probabilities of mutation, and with simplifying assumptions we get back to the binomial.

Even starting from a uniform distribution of mutations, observed data will be conditional on selection, and this will give the appearance of active information.

Starting from a population of clones, the population will become increasingly diverse with each succeeding generation. Expected Information alone is not adequate to describe changing populations; Active Information will have a distribution, requiring expectation and variance. I think it is reasonable to assume asymptotic normality for this discussion.

The preceding comment doesn’t apply if AI is not meant to be applied over multiple generations.

6 Likes

Nothing - the phrasing of the question just sounded like I should be able to simply know the result a priori, so I wanted to emphasize that this is an empirical test.

My citations of the literature of SMH supports that claim. I would be happy replacing “only” with “almost exclusively”, with the caveat that it makes the math harder without any significant changes to the results (so, needlessly harder). Papavasiliou and Schatz says, “As a result, the frequency of mutation is highest in the upstream portion of the V exon (encoding complementarity determining region 1) and declines thereafter, with mutations rarely if ever found in the downstream constant region exons”. The part that needs mutability is the V region, where the mutations are happening. It would generally be damaging to the effect if the mutations were happening in the constant regions (they control signaling). So the mutations occur where needed, and not in other places. This is controlled by a promoter, so that moving the promoter affects where the mutations occur (in other words, there is specific information in the gene about where to mutate).

The interesting thing about many of the cancers and other issues with antibody production, is that most of them occur because of the degredation of existing information. That is, many of them occur because previously correct base pairs have mutated to become RSS recognition sites. I have not read your paper on lymphomas, but I would not be surprised if it were due to either a mutation that decreased the ability of the mutation system to find the targets of mutation, or a mutation creating improper targets elsewhere. Both would point to the idea that it is pre-existing information that led to the beneficial results in the first place, and the degredation of that information is what causes problems.

Pseudogenes: Are They “Junk” or Functional DNA? says (emphasis mine):

Then what would you be measuring? My method measures expected vs. actual. Yours is simply measuring actual vs. actual, which will always be zero. My goal is to create a test for the specific claim of whether or not a mutation is random with respect to function. We can get a value for “random with respect to function” by actually sampling them and finding out what that is! Then we can see if biological mutations are in fact random with respect to function or not. You are trying to pre-eliminate the very question being asked.

There is error in everything, but if the direction of selection when growing your culture is not influenced by the selection you are testing against, I don’t see how it would be problematic. Additionally, as mentioned at TSZ, you could actually use a replica plating method to check for pre-existing mutations that occurred when doing pre-experimental replication.

It is indeed closest to the model when applied to a single generation, and that is the assumption of the experimental method. The “relative” one was applied over multiple generations, simply because that was the data available. However, also in the relative case, the I_Omega was also subject to selection as well.

So, look, I don’t want to get off track re your paper, which I think I’ve already explained is not sufficiently useful or novel for a lot of deep discussion. But your description of SHM is not accurate. Specifically, in my opinion you are presenting a vision of SHM as a highly targeted mutation system (that’s true) that only fails when it is broken. That last part is really not accurate.

A better description would take into account that the system regularly hits other targets, which then have to be repaired. (See Nature paper below.) That’s under normal conditions. The extent of non-Ig hits was documented a couple of years ago (see J Exp Med paper below). The emerging but (IMO) clear picture is that SHM is a dangerous tool, employed with a strong bias that we can call “aim” or “targeting” but also with a significant risk. The tool hits other targets by virtue of its characteristics, and not because of any “degradation” of any other information. In other words, the system causes mutations. If you prefer teleological verbiage, you’d have to say that the system was designed to cause mutations, and the design is either imperfect or is calculated to balance risk and reward. What I don’t think you can say, credibly, is that it only causes cancer when there’s “degradation of information.”

That statement is either too vague to mean anything, or it’s outright wrong. I would encourage you to avoid that kind of vague assertion in a conversation like this one, which involves scientific ideas and well-informed scientists. It’s just a friendly suggestion.

https://www.nature.com/articles/nature06547

4 Likes

I entirely agree. The evolution of cancer itself requires mutations to generate functional information, quite a bit of it: Computing the Functional Information in Cancer

4 Likes

I can’t understand how anyone could describe the most fearsome oncogene we know, V12Ras, as resulting from a “loss of information.” Or for that matter, how it works to describe gains of function (horribly common in cancer) as “loss of information.” The claims are not credible.

2 Likes

More to the point though @sfmatheson,

The focus of this thread should be @johnnyb’s paper. We can start a new thread to discuss his other ideas, and it would serve us best if he refrained (in this thread) from arguing for ideas not in his paper.

3 Likes

That’s very interesting. Thanks. I do hope that you aren’t trying to use that one gene in one organism as evidence that all pseudogenes are functional, though.

4 Likes

Thank you for the response, and I want to give the paper another read before I say much more. But first, we have a miscommunication to clear up.

You have essentially set up a likelihood ratio test of heterogeneity versus the null of uniform probability of mutations. The null expectation simply cannot occur in nature, and no one is going to be surprised when you discover that mutations are not uniform (we already know that). I’m suggesting using a null that more accurately reflects what is already known. There are a multitude of alternate hypotheses for how the “AI” can differ from this more realistic null.

ETA: The statistical approach would be to test against a null of the expectation given function (current function). I’ll have to think about how that could be defined - random drift perhaps?.

It’s still unclear what you mean by “random with respect to function”. What function is that? I’ll re-read and follow-up.

But you want to test if mutations are random. We already know selection is not random. Selection is going bea non-random bias in any data you can get, or it may confound AI measurement entirely. I agree the single generation measurement will be the cleanest in this respect.

2 Likes

6 posts were split to a new topic: Comments on Bartlett

I would suggest reading the current empirical literature on the subject.

Your citations of the literature are far too old and only consider the products of SMH after selection.

They wrote that 18 or 19 years ago, man. They weren’t looking at the actual mutations before selection. They couldn’t. Now we can, and you are ignoring those data.

That’s why the ages, in addition to the tiny numbers, of the papers you cite give you away.

That’s simply not true.

The mutations occur in selective places, including hundreds of genes where they are not needed. We know that from sequencing before selection, which wasn’t done in 2002. People have since microdissected B-cells from germinal centers, before selection, and done single-cell genome sequencing.

Have you forgotten that you asked for quantitative papers and that I supplied one, free full text available, from 2018?

It appears that you either didn’t read it or didn’t understand it.

Let’s look at Figure 1, panel A:

Just to be as generous and basic as possible, I’ll point out that the numbers and bars around the circular graph represent chromosomes and the red bars represent mutations.

So, Jonathan, what is the relative frequency of Ig variable region mutations relative to other sites in the genome before selection, according to the data graphed above?

I should add that this was predicted given the characteristics of mutations in B-cell lymphomas, but we only knew that because lymphomas are also the products of positive selection. Therefore, the interpretation that followed your grudging admission that lymphomas also occur was not merely sneaking in teleology, but completely wrong.

You simply can’t look at sequences after selection and credibly claim that you are looking at sequences before selection.

4 Likes

Your primary example misrepresents the products of mutation plus selection as the products of mutation alone.

1 Like

We can do that? :astonished:

@johnnyb It seems that it IS possible to observe data that is not conditional on selection. I will need to revise my comments on this. @Mercer Thanks for the info!

3 Likes

Yes. One sections a mouse spleen or human lymph node. The germinal centers are discernible. Then with a dissecting microscope (I suspect that young grad students can do it with the naked eye), one samples cells from the centers.

This paper has some figures that may explain it better:

Before and after microdissection:
https://www.nature.com/articles/2402073/figures/2

This slide show is more computer science-centric and may ring more bells for you and @johnnyb:

All the links above are ONLY for illustration of the technique used in this paper linked below, which has the data:

As shown in the cartoon below, @johnnyb is citing papers from long ago, when we could only look at cell populations (shown in pink on the right) after selection. Today, people can follow and have followed individual clones in germinal centers–in vivo.
https://www.semanticscholar.org/paper/Dynamics-of-B-cells-in-germinal-centres-Silva-Klein/dd9bd7d5159f3009b7e0f3cf10ad8786e499b960/figure/2

Another version is here, which has the cells we could see in 2002 properly outside the follicle at the top, and multiple rounds of selection that Jonathan is missing:
https://onlinelibrary.wiley.com/action/downloadFigures?id=imr12396-fig-0003&doi=10.1111%2Fimr.12396

4 Likes

Is your point that most mutations fall outside the intended target?

A post was merged into an existing topic: Side comments on Bartlett: Measuring Active Information

@Mercer -

Your point is well-taken. It is true that I haven’t been heavy in the research of this lately. To give a bit of history, this paper that I’m presenting is actually itself almost ten years old. I wrote the paper shortly after doing a poster presentation on the topic. The road to publication has been long. When I first wrote it, I tried to publish it at an ordinary journal, and had mixed reviews (one very positive, one very negative). I wasn’t very familiar with the publication process at the time, so I wasn’t really sure where to go or what to do or who to send it to. Anyway, I first sent it to BIO-Complexity in 2012, where it was rejected because at the time they were only doing experimental papers. A few years later, they had opened it up to more types of papers, and I submitted it again, and it was rejected because I had some mathematics errors. A few years later, I went back through and did a detailed cleanup of the math and got a mathematician friend, Asatur Khurshudyan (who had previously coauthored a mathematics paper with me on changes to the second derivative) to help me out (he’s mentioned in the acknowledgements). I submitted this, and the mathematician had some initial pushback, but finally got it through.

All that to say, there are indeed older references, but that is due to the publication history. Would it be great if I had the time to keep up with everything? Sure, but I also have a day job, a teaching job, and two secondary writing jobs, so it is true that I don’t always have time to keep up with everything. However, a cursory review of the paper you cited seemed to indicate that it was a AID site prediction study, not an experimental study (actually it looked like a combined study, but it was hard to tell from a brief review how much was predicted vs. demonstrated). The problem with prediction techniques is that it assumes that the cell doesn’t have compensation machinery as well (in fact, it also assumes that SMH is the only valid usage of AID). But nonetheless, I will certainly grant that you have a broader knowledge of the SMH literature than I do.

Anyway, if you are correct, then that would not argue against my main idea (that you can measure active information and this is a good way to measure it), but only the application (that I have correctly applied the idea that I stated). If there winds up being less active information in SMH than I think there is because I measured it too casually, so be it. It’s not the main idea, just a way to understand how simplifications of the concept can work.

Another thing to notice, though, is that “selection” is being used in an equivocal way here. If the organism targets a cell for destruction because it doesn’t meet the standards, that is not “selection” in the Darwinian sense, but more similar to targeted mutation. If the cell simply falls apart because it is broken, then that is indeed selection in the Darwinian sense. But, if the organism is terminating the cell because it detects that the cell is operating outside its boundaries, that’s not Darwinian selection. Is it active information? Probably not, because the organism did indeed “try” at that point. Anyway, it is an interesting discussion, and I’m happy to say that you know more about the details than I do.

Again, my goal, as stated in the paper and in this thread, is to supply a mechanism for testing the question of whether or not organisms contain information about likely targets of evolution. I am not invested in any particular outcome for any particular process, especially as the present paper is concerned. I’m only concerned with whether the testing mechanism, if wielded by the appropriate person, would perform as indicated. Matheson, despite thinking the paper isn’t worthwhile, did in fact think that the testing mechanism would in fact perform as indicated. I don’t really care if Matheson likes me or the paper, or thinks that me or the paper is worth the time of day. I am interested in the fact that he thinks that the paper does what it says it will do.

I will say, however, that, in my (admittedly limited) reading, I have several times found that lack of targeting has occurred due to identifiable mutations which make off-target sites more likely to occur. This is more well-documented with the RAG enzymes during V(D)J recombination, but I have found papers on it. I imagine we will find more, but, as Matheson points out, my suppositions aren’t science.

@Dan_Eastwood -

I think you are still thinking of the paper in the same way that Swamidass started out thinking. You say,

This is a complete misunderstanding of what I am doing. I am not testing whether or not mutations follow a uniform probability. I am testing whether or not mutations, as they occur in nature, are more successful or less successful (or even) than they would be if they did follow a uniform probability. This is the important point. Let’s say there is a strong bias of mutations. But, let’s say that this bias generally doesn’t favor the organism’s results. This would be negative active information, because it would be pointing away from success. The reason why a uniform distribution of mutations is the right thing to test against is because we have no reason for thinking that mutations should or should not be biased towards function. Using the amount of function that a uniform distribution gives tells us where the “zero point” is - the expected value of an arbitrary mutation. Mutational biases may be more helpful or more harmful than arbitrary mutations. That’s what active information seeks to find out.

When someone says that “mutation is random with respect to function”, they are saying (or at least most people are understanding) that the outcome of the mutation is no more or less functional than any other arbitrary transformation on the genome. Therefore, this measures this question and assigns a value to it. Note that the measurement is useful (as mentioned previously) even if this was not the original question intended by the statement “mutation is random with respect to function”. But, in addition to its uses in detecting possible, previously-unknown mutational mechanisms, it also serves the function of forcing people to be more quantitative about statements like this :slight_smile:

Anyway, this has been fun, and I appreciate everyone’s thoughtful contributions. Unless anyone has specific questions for me, I’m happy to leave y’all with the last word.

1 Like

To be clear, I have not stopped thinking this is what you are doing.

Yes, that is what I thought from the beginning. Our point is that this doesn’t make much sense.

I’ve already granted you that mutations are not independent of function. They are skewed to functional mutations that are beneficial in important ways. That is true, but we don’t know that from your work and analysis.

They are still random, in that we cannot fully predict the mutations we will see. That’s why the whole “random with respect to” is a very poor way to put this.

Thanks for joining the conversation. I do have some specific questions.

Do you understand why the focused issues I just raised here are a deal breaker for us? Do you still think your argument is valid?

2 Likes