I am trusting my own mathematical background also.

I am not talking about FSC I am talking about FI. If all sequences do the job there is 0 FI by definition.

I am trusting my own mathematical background also.

I am not talking about FSC I am talking about FI. If all sequences do the job there is 0 FI by definition.

You are shifting the goal posts.

We are talking precisely about Durston’s method.

You maybe right.

So the question at hand is does FSC=FI. Do you think they are different? The math looks the same to me.

The math looks very different to me.

FI = -log2 [N/W]

Math for FSC:

*H* ( *X* f( *t* )) = -∑ *P* ( *X* f( *t* )) log *P* ( *X* f( *t* )) (1)

1 Like

Does Xf=N/W ?

No. Durston defines Xf here:

For FI = -log2 [N/W]

N is the number of protein sequences that will produce a specific function and W is the number of possible protein sequences under consideration. The number of known sequences that produce a function is obviously not the total number of possible protein sequences that will produce a specific function. The two are not equivalent.

1 Like

The entire set of aligned sequences that satisfies that function, therefore, constitutes the outcomes of

Xf.

So It looks like Xf=N

Do you agree?

That is false. Xf can not be all possible sequences that produce a specific function. Xf is only a subset of all protein sequences that produce a specific function.

2 Likes

@colewd. This equality is a conjecture by Durston that he never tests nor can he prove. The left part of the equation is meant to refer to FI, but the right side is FSC. He is conjecturing here that FSC = FI, but he does not demonstrate this true.

In fact we find the opposite, that FSC does not measure FI, as I explained above. Just stating a conjecture in mathematical form does not prove it true. It is still a conjecture till it is validated. Durston never even tries to validate it or to prove it. One my graduate students did and showed it is false. He is making an error very similar to @EricMH.

1 Like

What do you think N is?

All of the protein sequences that will produce a specific function, both known and unknown.

2 Likes

Is that how he defined it? Do you think this might be an estimate of N?

The entire set of aligned sequences that satisfies that function, therefore, constitutes the outcomes of Xf. Here

Durston defined Xf by the number of known protein sequences found in existing species. Durston did not do any experiments to determine what the overall sequence space was for a specific function. Therefore, his definition necessarily misses all of the protein sequences with that function which are not found in existing species.

1 Like

This is from his paper.

I disagree. I think this was part of his estimation of AA substitutability was all about.

He does not have the entire set of sequences that have that function. That is our point. He only has the entire set of ** known** sequences

He did not do any experiments. I know because I’ve asked him directly about this.

1 Like

Experiments of the Axe flavor I agree.

He is making an estimate of substitutability of AA based on sequence data. He is trying to estimate the total number of functional sequences which is the point I am trying to make with T. We will never be able to search all possible sequences as I know you realize.

Even Axe’s experiment falls well short. He barely modified an existing protein. For a full test you would need to look at a random library of proteins that have a minimum of 400 amino acids each.

1 Like

He did not even do simulation experiments that were valid.

I did. My graduate student did. We falsified his conjecture.

1 Like