Bartlett: Measuring Active Information in Biological Systems

John_Harshman · April 9, 2020, 1:32pm

Not quite well defined. Is that the probability of success in some measured period? How is “success” defined?

And neither of which appears to have any sort of clear meaning or ability to be applied operationally. What are the two search algorithms here? How does one measure their probabilities of success?

Roy · April 9, 2020, 2:07pm

Here’s another false claim in the paper:

If cells only contain information that assist their evolution in specific ways, active information measurements will be able to determine which ways the cell contains information for.

You can estimate “active information” by running a search algorithm many times and seeing how often it succeeds. But knowing how much better a search algorithm is than random chance isn’t enough to tell you what the search algorithm involves, since many different search algorithms may offer the same improvement.

Using Bartlett’s example of 100 numbered cards, exactly one of which has “12” on it, finding the “12” by picking 7 random cards has a probability of 0.07. If the cards are known to be sorted into ascending order, a seven step binary split search is guaranteed to find the “12”. Thus the “active information” is log(1/0.07) or ~3.8365.

But if the cards are known to be arranged so that the first 7 cards have even numbers and the next 93 have odd ones, picking the first 7 cards is also guaranteed to find the “12”. This gives this algorithm the exact same amount of “active information”.

So if the arrangement of cards is known to both be sorted in ascending order and (or or) have only the first seven cards having even numbers, “active information measurements” cannot tell you which of the two algorithms is being used. Likewise, “active information measurements” on cells that “assist their evolution in specific ways” can’t tell you “which ways the cell contains information for”, since many different “ways” may contain the same amount of “active information”. Measuring the “active information” in a search algorithm to be 3.8365 bits tells you absolutely nothing about what that algorithm is doing. Likewise, measuring the “active information” in a cell to be 3.8365 bits tells you absolutely nothing about what that cell is doing. In fact it’s worse for cells, because you don’t even know which card you’re looking for.

Roy · April 9, 2020, 2:31pm

“Active information is measured by comparing the probability of success of a single query of a given search to the probability of success of a single query of a pure random search…As an example, let’s say that a random search of my card deck yielded success with an average success of 1\100 probability, and a particular search algorithm yielded success with 1\20 probability.”

There’s no explicit definition of success, only an implicit one of finding what he search algorithm is looking for.

How the [redacted] am I supposed to know?

Rumraket · April 9, 2020, 3:03pm

Yes and in that situation you could simply define success as a beneficial mutation occurring in some specific locus, for example. In that case the probability of success of the “search” is then the probability of a beneficial mutation occurring in that locus. Then comes the problem of determining what should count as a “purely random search” for beneficial mutations. Bartlett has posted about this over on theskepticalzone, where I asked him to clarify that for the situation I imagined above:

@Rumracket –

“It’s not exactly clear what you mean by a “pure random search”(IΩ) in the context of biology”

Remember, the “pure random search” is in the context of mathematics, not biology. The goal is not to say what biology will do, but to say what the expected value should be.

“If there is a biochemical bias in the process of substitution (which of course we know there is), is that then going to count as part of the “search under analysis”(IS) , or is that the “pure random search”(IΩ)?”

That’s definitely I_S.

“Do I have it correct that in this admittedly contrived example, the absence of a bias to the process of mutation is what would constitute the “pure random search”, aka (IΩ)?”

Yes, at least that’s the theoretical goal. It may only be approximatable though. However, I agree with Axe’s “coverage principle” – that an haphazardly chosen selection will generally have the same basic probabilistic features as a “pure random search”. Several fields (including machine learning) rely on this fact.

However, just to throw in a little complexity, as is mentioned in the section on “relative AI”, you can actually calculate AI relative to whatever background you like. My goal was to measure it in comparison to a pure random search. But, in theory, there is no reason why you couldn’t measure AI in relation to SNP biases. I’m not sure what use it would give you, unless you were specifically searching for non-SNP-oriented mechanisms or something. My goal was to be as agnostic as I can be about the nature of the information, and simply measure whether the cell is geared towards generating beneficial mutations or not. So, because of that, SNP biases are part of I_S.

T_aquaticus · April 9, 2020, 3:13pm

That’s been my conclusion as well. I read a paper several years ago where there was an increase in mutations within actively transcribed genes:

What I got from that paper is that single stranded DNA is more prone to mutation than double stranded DNA. Related to the topic at hand, if a possible beneficial mutation is within a gene that is not being actively transcribed then this mechanism will lower the chances of this mutation occurring. Referring back to Lenski’s experiment, the citrase gene was not being expressed and would have had a lower mutation rate compared to actively transcribed genes.

There is also a chance of biased mutations resulting in biased conclusions. Since more mutations are occurring active genes it is possible for someone to conclude that most beneficial mutations must therefore reside in active genes. It would be a lot of work, but it could be possible to create a library of genes/DNA under the control of a permissive promoter and then screen against a given environmental challenge.

John_Harshman · April 9, 2020, 3:21pm

What, in biology, is a “single query”? What, for that matter, is “a purely random search”?

Rhetorical questions.

sfmatheson · April 9, 2020, 3:28pm

Then there’s the whole fascinating and important topic of mutation showers and (related) kataegis.

Mercer · April 9, 2020, 3:47pm

And if you do that, Bartlett and Eddie will just move the goalposts to something deliberately vague, like a “new body plan.”

Roy · April 9, 2020, 3:55pm

I’m glad I don’t have to answer, them then, because I haven’t got a clue what answer to give.

Roy · April 9, 2020, 3:56pm

Off to TSZ I hop…

Dan_Eastwood · April 9, 2020, 5:24pm

I disagree - it’s muddled even before it gets to biology. See what I wrote at comment #56 (or close to that).

It depends on how long the random search runs. It a deck of 100 cards sampled with replacement 100 times looking for a single target card, the probability is about 60%. The probability approaches ~63% (1-1/e~=0.63) as the number of cards and trials increase together.

Given a well behaved convex function as a search space a deterministic search like Newton-Raphson will always converge. Genetic search will converge too, but will take longer.
I think I could work out probabilities for a “Random” fitness landscape of the sort ID proponents assume, and would expect GA to do as well as any search in that setting (it would find relatively high fitness, but never the overall maximum). But that setting is biologically irrelevant, so it doesn’t seem worth the effort.

Joe_Felsenstein · April 9, 2020, 5:56pm

I think this might have been misspelled, or is maybe a word in Old English. Kats don’t live to a great aeg. We had a kat that reached 22, which is kind of aeould.

sfmatheson · April 9, 2020, 6:02pm

If you had given your kat an aegis they’d have been shielded from death. Ah hindsight.

Dan_Eastwood · April 10, 2020, 5:09pm

Looks like there is not much interest, even at UD.

Tom_English · April 12, 2020, 5:37am

I appreciate your thinking of me (this is my first comment at PS), but I’m not up for deconstructing Bartlett’s article. I’ll comment that referring to sampling as search is just a trick for rendering “obvious” the false notion that the process involves information.

You have often heard of biased sampling, and have never heard of informed sampling. There’s a good reason for that: a sampling process is, in and of itself, absolutely uninformed of the properties of as-yet unsampled elements of the sample space. I’ve posted more at The Skeptical Zone:

Tom_English · April 12, 2020, 5:40am

Oops. You might be interested in the rigorous treatment that I just linked to. But here is the comment at TSZ:

http://theskepticalzone.com/wp/measuring-active-information-in-biological-organisms/comment-page-1/#comment-273459

Dan_Eastwood · April 13, 2020, 3:31am

@Tom Thanks for dropping in. I’ll read your links and follow up at TSZ

swamidass · April 13, 2020, 7:56pm

@Tom_English, that’s a helpful comment, reproduced here:

Yes, I see the Texas Sharpshooter Fallacy in Bartlett’s analysis (left column of page 10 — why no section numbers?) of a particular mutation that occurred in the Long-Term Evolution Experiment.

Consider replacing an n-base gene g with (1) a sequence of n bases drawn uniformly at random or (2) a sequence of n bases obtained by randomly shuffling the exons of gene g. The latter is much more likely to improve fitness than the former. But it doesn’t follow that a process sampling a space of DNA sequences by shuffling the exons of an existing gene is itself informed of the fitnesses associated with novel DNA sequences.

As I have said many times — I’m not interested in going through all of it yet again — sampling processes are biased, not informed. When an event is more likely to occur in one sampling process than in another, that merely indicates a difference in bias of the processes.

The game that ID proponents are playing, when they refer to sampling as search , is to make “obvious” the false notion that the outcome of the process depends on information intrinsic to the process.

johnnyb : As the paper mentions, it does not rely on evolution ontologically being a search to be valid. Only that there is a “problem” (i.e., selection pressure), a “goal” (relieving of selection pressure), and a multitude of potential paths.

A sampling process is never , in and of itself, a search. You are categorically wrong in suggesting otherwise. A sampling process has no information whatsoever of properties of as-yet unsampled elements of the sample space. That is the fundamental reason that there is “no free lunch” for someone who selects a sampling algorithm for use in a search for a solution to a problem. The algorithm-selecting agent possibly exploits information, but in no case is the sampling process described by selected algorithm informed. I have addressed this matter with great rigor in “Sampling Bias Is Not Information.”

Consider spinning a U.S. penny on its edge, say, on a kitchen countertop. Tails is about 4 times as likely to come up as heads. I can use prior information of the bias in favor of tails to improve the expected value of a bet on the outcome. But it would be ridiculous to say that the process of spinning a penny itself has information relative to the nominally unbiased process of flipping a penny.

Topic		Replies	Views
Jonathan Bartlett: Measuring Active Information in Biological Systems Office Hours Science , Design	46	3188	April 26, 2020
What A Darwinian Algorithm Designs Conversation Science , Design	25	1802	April 27, 2021
Hunter: Finally, the Details of How Proteins Evolve Conversation Science	171	5423	March 7, 2019
Durston: Functional Information Office Hours Design	63	8227	December 5, 2018
Explaining the Cancer Information Calculation Conversation	85	6713	September 28, 2020

Bartlett: Measuring Active Information in Biological Systems

Related topics