Thanks for chiming in @Kirk. Great to see you, even if it is for a moment.
Again, my purpose is not to measure the full content of FI in a protein. I ageree that it is possible to underestimate the full content of FI, but we can have a good idea of the component revealed by sequence conservation from that point on. That’s what I do, that’s what I get as result, and that’s what I use for my inferences. I have never made reference to the full content of FI. My purpose is to demonstrate the presence, at a certain point in evolutionary history, of a definite quantity of FI that has a sequence configuration that will be conserved up to humans. This should be clear, by now, Either you agree that my methodology does that, or you don’t. Fell free to decide, and if you want to explain why. But there is no sense in requiring from my methodology what it has never tried to measure.
As for overestimating the change in FI, again I have never tried to estimate the absolute change in the full content of FI. That should be clear from what I have written. I quote myself:
That should be clear enough, but still you insist about the danger of overestimating the change in FI, when I have never tried to do that.
Maybe you are confused by the fact that I speak of information jumps. But, you see, my term has always been “information jumps in human conserved sequence similarity”. It’s not a jump in the full content of FI, as I clearly explain in the above quote.
IOWs, when I say that CARD11 shows and information jump of 1250 bits at the transition to vertebrates, I simply mean that 1250 bits of new FI that is similar to the form observed today in humans appear at that transition. It is a jump, because new sspecific sequence information arises, that was not there before. But I have never said the the total FI was lower before. I simply don’t measure it, because my methodology cannot do that.
And this is it. You think as you like, but at least try to understand what I say and what I am doing. Or justr don’t try, if you prefer so.
Overestimate is the opposite of underestimate. We are saying you are wildly overestimating FI, not underestimating it.
OK, I will try to be simple and clear.
I am here to discuss a methopdology that can, I believe, give important indications about a certain type of FI in proteins (the one that can be revealed by long sequence conservation), its appearance at certain evolutionary times, its different behaviour in different proteins. And that can give a good idea, by establishing a reliable lower threshold of new FI appearing at certain steps, of how bif the functional content of many proteins is. These data are very interesting, in my opinion, tgo sipporft a design inference in many cases. This is my purpose, and nothing else.
Now, may be that a phylogenetic analysis could do that better. Or maybe not. I don’t know, and I cannot certainly perform a phylogenetic analysis now. I am not aware of phylogenetic analyses that are centered on the concept of FI as formulated in ID, least of all on design detection. So, I have my doubts.
However, I am not here to perform a phylogenetic analysis, I am here only to explain and defend my ideas, and I try to do exactly that.
So, again, I am comvinced that my methodology is a good estimator of that part of FI which is connected to sequence conservsation, for example from cartilaginous fish to humans, and that appears in vertebrates. The same procedure can also be applied to other contexts, of course.
I have received, from you and others, a few recurring criticism that are simply not true ot not pertinent. Here are a couple of examples:
Wrong. My estimate of FI is, rather, Total FI - functional divergence (FI not conserved up to humans). Therefore, as stated also by glipsnort, my estimate is underestimating FI, not overestimating it. Moreover, NE has nothing to do with this.
Why? And what do you mean by NE? Do you mean NV and ND? Why should that “be mistaken” as FI gain? By my procedure? There is absolutely no reason for that. Neutral variation is the cause of divergence in non functional sequences. Why should it be mistaken as FI gain by a proicedure based on sequence conservation? I really don’t understand what you mean.
And so on. How can we discuss with such basic misunderstandings repeated so many times, and without any real explanation of what is meant?
Did you misread @glipsnort?
He is saying that it is possible, even likely, to underestimate the total FI present (true) while while overestimating the change in FI.
And have you read my answer to him? I quote myself:
My procedure cannot overestimate FI, only underestimate it. My estimate of the change is only an estimate of the change (jump) in human conserved similarity. It is not, and never has been said to be, a measure of the change in total FI.
OK, I was answering to your very disappointing post about the starry sky, but for some strange reason I have lost all that I had written. Maybe it is better. Now I am tired. Tomorrow I will see if I really want to say those things.
This is bad. Frankly, I would probably not even answer this kind of arguments, if they did not come from you.
I don’t know if you really believe that the stars in the sky exhibit high values of FI, or if you are only provoking (without any good reason, IMO).
If you really believe that, there is proibably no purpose in continuing any discussion about FI.
If you are provoking, it’s not a good sign just the same.
However, here is a brief answer.
The simple rule I have described (and which is rather obvious in any possible serious discussion about FI) is that we cannot use the observed bits to define the function. The function must be defined undependently from the knowledge of the observed bits.
So: “a configuration of stars that favors storytelling” is valid. But probably almost all possible configurations would do that.
While “a confifuraion of stars where the first has these celestial coordinates, the second these other ones”, and so on for all 9000 visible stars, is not valid.
So, a binary number of 100 digits is a good definition. And, of course, has no relevant FI.
A binary number that is 00110100… is not a valid definition. It can be used only as a pre-specification.
This is the rule, and I have never broken it. I have never defined a protein function that says: “a protein with the following sequence: …” I have always used for proteins the function described in Uniprot for the observed protein, or something like that. IOWs, a protein which can do this and that. Never: “a protein with this sequence”.
But you say: no, I wnat the stars that must have exactly the position that we know. That is breaking the rules. You are using the bits. I have never done that.
"It seems you are defining function by the already observed configuration of proteins in extant biology. "
Not at all. That statement is unfair, wrong and confounding.
I am always defining function as what a protein can do. I am using observed configurations, in a precisely described way and accordign to well explained assumptions, only to estimate FI in proteins, not to define function. You are equivocating, and rather badly.
You raise the problem of other sequences that could implement the function. But my methodology is aimed exactly at that: having an estimate of the target space. If you do the math, you will see that the estimates of the target space in my results are very big.
Of course, there is always the problem of possible alternative solutions, similarly complex, but completely different. Those cannot easily be anticipated. They certainly exist, in some measure.
That is a completely different problem. It has nothing to do with the definition of the function, but rather with the estimate.
I have discussed that problem in detail in the past. You will find a long discussion about that in this OP and in the following thread:
Look at the part about clocks.
Okay, can you clarify how you implemented this rule in your analysis?
In my discussion about the relationship between FI and the design inference (that has nothing to do with my methodology to estimate FI) I have given a clear example of FI in language and how to measure it. Please, refer to that. The Shakespeare sonnet. You will find many possible functional definitions for the sonnet, each of them implying different levels of FI. The bits in the sonnet (the sequence of letters) are of course never used in any definition.
In my procedure to estimate FI in proteins, the function is not defined (it is supposed to be the one described in Uniprot, if and when known). The estimate is based on conservation, which is an indicator of functional constraint, but does not tell us what the function is.
Those are two different things.
I am going to sleep Tomorrow we can go on.
OK, now after some rest, let me go back to the starry sky example. I will show how it should be treated in terms of FI.
We have a system where 9000 stars can have an independent position in the sky. Also, they can have different brighteness.
We define a function: that the stars can help orientation and navigation.
Let’s assume that the position of the stars is a random configuration. We have a system with a very big search space. A lot of possible configurations, considering both position and brightness.
The right question, from the point of vew of FI, is: how many of the possible configutations would satisfy the independently defined function? How big is the target space? Is it an infinitesimal fraction of the search space (high FI), or rather a big part of it?
The answer here, while difiicult to compute in detail, is easy enough in principle: the target space is almost as big as the search space. Therefore, FI is really trivial, almost zero.
Because, of course, almost all the possible random configurations of position and brightness can help orientation and navigation.
Not all of them, however.
Very ordered configurations, those where all the stars are more or less equally distributed in the sky, and brghtness is more or less the same for all of them, would not help orientation and navigation. We would just see a sky that is the same everywhere, and does not allow us to get information about earth rotation and our position.
But of course, those highly ordered configurations are really, really rare in the search space.
This is an interesting example, and I thank you for providing it, because it is a case where order does not satisfy the function, while randomness does. Unfortunately (for your argument), the FI linked to such a system is almost zero.
I think this is a good answer for your other examples too, but if you believe that they have different merits, please explain why.
Yes, there is distinction between macrostates and microstates of a system.
I actually agree with you in much of this analysis, but you are missing the point. I’m applying the rules that you laid out, and it leads to this problem.
You are making a parallel mistake in your analysis of protiens. I made the parallel mistake to make this point.
In an honest attempt to enable understanding, meaning no disrespect, can you see where I’m drawing the parallel? You don’t have to ultimately agree with me that the parallel is valid. It would be helpful though if you could articulate the parallels in your analysis. Perhaps if we got there, we could have a sensible conversation about the ultimate validity of this object lesson.
Your comments here make sense to me for some definitions of FI, but I don’t see how they work in the context of your stated definition. If FI is purely a measure of the ratio of target space to search space, what does “specific sequence information” mean? Or 1250 bits of new FI? The function of the protein is unchanged, as are the target and search spaces. How do you decompose the ratio into old and new FI?
I’m pretty sure he is using an in clade control to do this. The in clade minus the out clade gives him the change.
Of course there are issues with this, but I think that is his strategy.
Now I come to your tornado. It is an interesting example, because, even if I am no expert of meteorology, it is probably one of those cases where some form of order comes out of a system including events that can be easily explained in form of necessity laws, random events and chaotic components. I think there are many examples like that. None of them is designed, of course.
But do they exhibit high FI. The answer is no.
I will try to explain how the system should be considered in terms of FI, even if I am not a meteorologist. Of course, you are free to stick you your analysis in terms of water molecules, but I cannot agree.
Well, here the system is our planet, and its meteorologic phenomena. I think we can agree on that.
What type of system is it? IOWs how can we describe and analyze meteorologic phenomena?
I think we can agree that many of those events follow, more or less precisely, some well known laws, derived of course from the laws of physics applied to this particular system. That’s why many events can be more or less anticipated. Weather previsions are everywhere, and I would say that at present they are often rather good.
So, that part is a necessity system, more or less precise. No FI in that.
But of course, not all can be anticipated. Least of all, I think, tornadoes. Probably, necessity laws act on random components and chaotic components to generetae a tornado. I don’t know, I am not a meteorologist.
So, being not a meteorologist, I paste here some explanation taken somewhere on the web, hoping it is not so bad:
What causes tornadoes?
Tornadoes form in unusually violent thunderstorms when there is sufficient (1) instability and (2) wind shear present in the lower atmosphere.
Instability refers to unusually warm and humid conditions in the lower atmosphere, and possibly cooler than usual conditions in the upper atmosphere. Wind shear in this case refers to the wind direction changing, and the wind speed increasing, with height. An example would be a southerly wind of 15 mph at the surface, changing to a southwesterly or westerly wind of 50 mph at 5,000 feet altitude.
This kind of wind shear and instability usually exists only ahead of a cold front and low pressure system. The intense spinning of a tornado is partly the result of the updrafts and downdrafts in the thunderstorm (caused by the unstable air) interacting with the wind shear, resulting in a tilting of the wind shear to form an upright tornado vortex. Helping the process along, cyclonically flowing air around the cyclone, already slowly spinning in a counter-clockwise direction (in the Northern Hemisphere), converges inward toward the thunderstorm, causing it to spin faster. This is the same process that causes an ice skater to spin faster when she pulls her arms in toward her body.
Other processes can enhance the chances for tornado formation. For instance, dry air in the middle atmosphere can be rapidly cooled by rain in the thunderstorm, strengthening the downdrafts that asist in tornado formation. Notice that in virtually every picture you see of a tornado the tornado has formed on the boundary between dark clouds (the storm updraft region) and bright clouds (the storm downdraft region), evidence of the importance of updrafts and downdrafts to tornado formation.
Also, an isolated strong thunderstorm just ahead of a squall line that then merges with the squall line often becomes tornadic; isolated storms are more likely to form tornadoes than squall lines, since an isolated storm can form a more symmetric flow pattern around it, and the isolated storm also has less competition for the unstable air which fuels the storm than if it were part of a solid line (squall line) of storms.
Because both instability and wind shear are necessary for tornado formation, sometimes weak tornadoes can occur when the wind shear conditions are strong, but the atmosphere is not very unstable. For instance, this sometimes happens in California in the winter when a strong low pressure system comes ashore. Similarly, weak tornadoes can occur when the airmass is very unstable, but has little wind shear. For instance, Florida – which reports more tornadoes than any other state in the U.S. – has many weaker tornadoes of this variety. Of course, the most violent tornadoes occur when both strong instability and strong wind shear are present, which in the U.S. occurs in the middle part of the country during the spring, and to a lesser extent during fall.
Contrary to popular opinion, tornadoes have not increased in recent years.
OK, so I would say that there is some good understanding of the conditions that generate a tornado. In general, we can say that some specific configurations of the basic components of the weather (distribution of winds, temperatures, pressures, and so on) allow tornadoes to be generated. So, there is nothin mysterious in the process. It is well understood, even if its mathematical and empirical treatment is certainly difficult. Some configurations of weather conditions lead to tornadoes.
It’s those configurations that we must consider, not the configurations of individual molecules of water. The basic components of weather follow, more or less, precise necessity laws, and of course molecules of water follow those laws too. The random-stochastic component, the only one that generates specific configurations that can act as “configurable switches”, is caused by the complex interaction of those necessity laws.
So, the correct question in terms of FI is: how many weather configuration (target space) lead to a tornado, in the search space of all possible weather configurations?
I am certainly the last persom that can solve such a problkem quantitatively, but I believe that it can be solved in principle. There is nothing mtsterious here.
Now, tornadoes are not too common (luckily), but they are certainly not exceedingly rare. I suppose therefore that, analyzing the space of configurations above mentioned, it will not be difficult to show that the configurations that lead to a tornado are not exceedingly rare. I cannot make that kind of analysis, unfortunately. Can you?
So, I am rather confident that tornadoes, like many manifestations of necessity acting on random and chaotic components, are certainly fascinating, but can be perfectly explained in terms of the physics of these system. And the configurations that lead to them are a non trivial part of the space of configurations.
IOWs, FI is low, and there is absolutely no reason to infer design.
It seems that I don’t understand the point. Please, don’t be too confident in my intelligence. Could ypu please explain better what you think?
My understanding of your writings here is that you have calculated/estimated a probability (referring to “probabilistic barriers” and “probabilistic resources of the known universe”) that a particular outcome could come about. That is the probability I was referring to. Until we know more about that outcome, most importantly whether it could have gone in other ways, the probability of that particular outcome is meaningless. This is a tired old topic in discussions of design and I know you are aware of it. But to reiterate, the information we need is:
- How many potential outcomes were there? In the case of a protein, this would include all the different ways to build that protein, all the different ways to make any protein that would carry out that function, and all the different ways to achieve the goal of that function.
[Side note: you may notice that I am using design-ish language here (“to make”, “to achieve”) because I don’t think there is anything wrong or unscientific about talking design-ishly.]
- By what process are we pursuing these outcomes? Any probability calculation that assumes a single-step flying-together of amino acids is, at this point, flat-out dishonest. So, while calculating probability, are we considering the fact that an evolutionary “walk” is almost as far from a random flying-together as one might get? Whenever I hear “probabilistic resources of the universe” I find those kinds of bogus “calculations.” If your metric gives us that kind of information, then it’s beyond worthless. My sense is that the metric tells us about conservation of protein structure, and nothing more than that, and that it is therefore too incomplete to do anything interesting.
Without the information above, your metric is a fancy version of the sharpshooter fallacy. More precisely, the metric is just a measure of an outcome, but it’s not a measure of how that outcome could have come about, and it is not even the beginning of a measure of how likely that was.
This has been explained repeatedly to you. The context is a negative control, which is basic scientific concept. I see that in the 2 days since you posted this, others have attempted to get you to see that you have not shown that you are not dramatically overestimating “FI”.
Here is my final response to you. What you are getting on this forum and in these threads is an extraordinary opportunity that is, for good reasons, rare. You are getting peer review of your ideas, by some of the world’s top experts, without even submitting a manuscript. Moreover, you are doing this as an anonymous commenter, while your reviewers are identified by name and affiliation, in public. To your credit, you have provided (for the most part) clear explanation of your methods and your raw data. But so far* you seem to have failed to take any of the scientific criticism to heart. You have been given a chance to have your ideas vetted, but you are not participating in the review process. To call this a missed opportunity is to seriously understate the nature of this conversation.
*I have read only some of the conversation over the last day or so. Perhaps you have responded positively to critiques and are now working on revisions. This, of course, is the essence of the peer review process.
No need to respond unless you have new analysis or substantive responses to the specific critiques that we have provided you. It should be obvious that the metric itself is not making a favorable impression on the people who would know if it were valuable.
I agree with @sfmatheson on this.
The way you are defining and measuring FI in proteins is closely analogous to my definition of the storytelling function of stars.
Your objection, which I believe is correct, is that my definition of function is too tightly linked to the microstate.
Is this a valid objection? Maybe and maybe not. It depends if it really is valid to set the microstate as a goal, and whether the details of appearance of big Dipper or the North Star (for example) is important for me. For some questions, these are certainly valid features to expect. For others, maybe not. However your point is that defining function too linked to the granular microstate (specific stories and constellations) will give a wholely different answer than a function linked to the macrostate (any stories and any constellations).
Similarly, your analysis, whether your intend it or not, is too linked to the microstate to make a case for design. We have all been explaining this to you in different ways. There is no reason we need to explain the precise (or approximate) microstate of protein sequences, because we have good reason to think there is a very very large number of potential microstate configurations that would produce that same high level functions. You cannot use the analysis done on the microstate to justify claims about FI in the macrostate.
Now, is possible that there are claims you could justify from your analysis of it were more precisely bounded. For example, most of us would agree (barring Gods Providence and complete determinism) that the specific protiens we see would be totally different if we were to “rewind the tape” and replay life’s history. Maybe vertibrates would not even evolve! If that is your claim, we already agree with you.
However you are arguing for design from this. It does not follow.
Just to list out other issues:
You still have not given valid methodology to seperate out the contribution of neutral evolution (NE) to your FI measure.
You still need to get a time demonimator, and estimate of the time during which the vertabratw transition unfolds to compute the rate (and correct for neutral evolution.
You still need to demonstrate that the vertibrates transition could not have taken place by accruing FI on a gradual process over time.
We have left hanging explication of several negative controls. For example cancer could be really helpful if we can get on the same page here. We see FI gains greater than 500 bits in cancer evolution. I’ve already linked to the analysis.
I wonder if it would help for me to write out your methodology for you to review, just to be sure I did not miss anything salient. Would that help you?
Pelase, look also at my comment #121 to Swamidass.
The point is: one thing is the definition of functional information, another thing is my strategy (or anyone else’s) to get an indirect estimate of it.
The definition is what it is. A direct method to measure the target space (the thing we really don’t know) would be to sythesize all possible sequences in the search space, and test each one in the lab for the defined function. Of course that is not possible, and never will be.
Another semi-direct way would be to have such a good understanding of the sequence - function relationship in proteins as to be able to compute the target space. That is promising, but I believe that we are still very far away from that.
So, we haver to use indirect estimates of FI. My procedure is exactly that.
Being based on long conservation of sequence, the interesting thing is that we estimate FI without any explicit reference to the function.
IOWs, we have a protein that certainly has some function in its context. We trace the appearance of some new sequence specificity at some definite evolutionary step, and we classify that new sequence specificity as highly functionally constrained, because we observe tfhat, after its first appearance, it is conserved for 400+ million years.
So, we have an indirect estimate of the FI in that specific sequence in that specific context, even oif we have not defined the function explicitly. Of course, we can have a good idea of what the function is just by looking the protein at Uniprot. At least for many proteins.
Now, to answer your questions. Let’s say we have a protein in pre-vertebrates which has low sequence similarity with the human form in pre-vertebrates.
There are different possibilities. Fir example, in the case of CARD11, as you can see in the graph I have posted, the protein probably did not exist in pre-vertebrates. The bitscore is extremely low, probably just background noise, or just some limited domain similarity is some part of the long molecule. That is perfectly consisten with the protein being involved in the immune system, which as we know appears in jawed fishes. So, this is probably a new vertebrate protein.
As such, the explanation is rather straightforward. The protein appears in cartilaginous fishes, and right from the beginning of its existence it has already more than half of its potential FI (about 1.3 baa).
The remaining history of the protein in vertebrates, as can be seen in the graph, is not very interesting. There seems to be another minor adjustment in reptiles. Maybe, but it is difficult to be sure. The rest is compatible with passive conservation, increasing as the evolutionary distance decreases.
So, the history of this protein is clear enough: it exhibit a major engineering (be patient, let’s say it could be by design or bt neo-darwinist mechanisms, for the moment) at the beginning of its existence, and then not much happens. Except maybe at the transition to reptiles.
Now, let’s say that we have, instead, a protein that alredy exists in pre-vertebrates. Let’s call it protein A. Let’s say that it has a value of human conserved sequence similarity, in pre-vertebrates, of 0.7 baa. That is alredy something. Let’s say the protein is 500 AA long. We have already a major similarity here, and reasonably a bitscore of a few hundred bits.
Now, the same protein jumps in cartilaginous fishes to 1.5 baa, presenting a jump of 0.8 baa. So, a few hundred bits of new human conserved sequence similarity. Of new FI.
What does it mean?
We already know that we are not measuring total FI, nor total FI change. The protein in pre-vertebrates could well have higher total FI, or less, or the same. We don’t know, because my methodology cannor detect FI which is not linked to long sequence conservation. IOWs, what I have called functional divergence.
So, we just stick to the new FI that appears at the transition and is then conserved: those 0.8 baa. What is their meaning?
The reasonable answer is that the protein function undegoes a major adaptation at the transition to vertebrates. Maybe the basic function, the basic structure, remain the same. Maybe the total FI remains the same. As said, that we don’t know.
But the appearance of such a big new component of FI in vertebrates means that now the protein does the same things in a different context, or that it does some new things. That is perfectly conmpatible with what we know about the big changes that happen in new classes of organisms, especially at regulation level, and in protein networks that control transcription or other major pathways. TFs, for example, as already discussed, often retain their DBDs, but change the rest of their sequences. And acquire new functions, or differently tailored functions.
I hope this answers your questions.
It is likely most of us agree with this, though I’m not sure you’ve demonstrated this rigorously. We even linked to an article doing a similar analysis for the Cambrian explosion.
How exactly do you get to “design” from here?
The magnitude of the jump in FI is not a problem.
I certainly am.
There are two different aspects.
The probabilistic resources of our biological world can definitely be computed, at least as a geneorus higher threshold. That what I have done in my OP about that issue:
You will se in the first Table there that I give a very generous estimate of 140 bits as the limit (for bacteria). That means that, at a very generous most, only 2^140 different states could have been reached and tested in 5 billion years on our planet.
Then there is the problem of evaluating the target space. That is more difficult, and there are objective problems. Look also at my comment #133.
I am very confident that my procedure is very good to estimate a specific form of FI, linked to long evolutionary conservation of sequence. Of course, we must address some potential difficulties, and you list some of them: alternative solutions, and so on.
Indeed, I have discussed those things many times . Of course, I cannot address everything here an all at the same time. I have given links, but probably nobody here has the time to check them. IOWs, I am human, and you are too.
But I cannot understand why, every time there is some aspect that I have not yet discussed, you all draw final conclusions on what I think or am doing. That is wrong. I am here top answer your questions, when they are good questions.
I give you a link here of an OP where I have discussed many of the things you mention:
This is one of the things I discuss in that OP. Possible alternative solutions. In brief, if there were many other complex alternative solutions (and probably there are), the computation of FI, at the levels we are considering, woyuld change very little. See the section about clocks in the mentioned OP.
If there were much simpler solutions, we would definitely observe those ones, and not the complex solution we observe.
That’s perfectly fine.
No, here you don’t understand my point. I have never said that there is a single step transition. Sometimes I really think many of you believe I am a complete fool. Maybe, maybe not. Not in this case.
Of couse the transition happens in many steps. THat’s why O have clearly said that the best model is a random walk from some unrelated sequence state.
And the concept of probabilistic resources has exactly that meaning: how many attempts can the system make? How many steps are allowed in the random walk.
The probability of finding an unrelated state by a random walk, let’s say buy 2^100 steps, is practically the same as the probability of finding that same target by a random search in the same number oif attempts. The two systems differ in the initial steps, of course, but with a big numer of steps there is no great difference. It is just related to the rate between target space and search space, and to the number os attempts/steps allowed to the system.
And of course, one thing should be clear. All probabilistic evaluations refer only to the RV part of the neo-darwinisn mechanism, which include Neutral drift. The NS part must be evaluated separately and differently. And I have not even begun to do that here.
Finally, my contribution here is not aimed to publish a paper. So, I do not consider it as some form of peer review.
It is, instead, aimed at intellectual confrontation about a very inmportant paradigm difference: design against neo-darwinism to explain biological functions.
In that sense, it is much more precious than a peer review to me. But for very different reasons.