Design and Nested Hierarchies

Fixed it for you Bill. :slightly_smiling_face:

This is what we are down to. I see it is a important claim for science to acknowledge. I see physics may hit this same wall soon as they are now modeling atoms with computational capability. The empirical evidence eventually leading to mind as a mechanistic explanation for the universe is fascinating.

It is really no more vague than claiming matter curves space time.

Oh dear. Bill Cole says scientists have been doing science wrong for 200 years.

Whatever will science do now? :wink:

1 Like

Data can’t mutate. DNA does.

However, yes, there is functional constraint, but there is no correlation between that and functional information. I’m glad we seem to agree that @gpuccio’s assumption is objectively false.

There is correlation by definition. If less than 100% of the possible sequences are functioning in their specific function than there is functional information. This is what functional constraint is telling us.

All this being said I think your suggestion for more refined correlation work is a good one.

Hi Bill,

So what you have described and what everyone else, including me, has described are completely different things. Rather than get upset, I’d like to provide a detailed enough explanation that you will be able to understand why they are completely different.

Before I spend time on it, though, I just want to make sure you feel like you’re in the right mindset. Sometimes when I get into a “robust” discussion, I get stubborn and stop listening to what others are saying. I’m not asserting that you’re in that mindset right now. If you feel like you need 24 hours to cool down, just say the word and I will wait.

Thanks,
Chris

1 Like

I am interested in your ideas here. I just cannot guarantee I will agree but good discussion enhances understanding and so far our brief exchange has been useful to me.

1 Like

No, there is no correlation between functional constraint and functional information. Muscle proteins illustrate that beautifully. Again, I note your total lack of interest in the relevant evidence.

There is no correlation in the amount. @gpuccio’s assumption is simply objectively, wrong.

My suggestion is not for “more refined correlation work.” My point is that the assumption is simply false, therefore the entire house of cards collapses.

Usually, careful people try to test their assumptions before claiming to have made some great discovery. This assumption is trivially easy to test. Why do neither of you have any interest in testing it?

2 Likes

I have seen your assertions and you have provided no support that I can see. I think you are dead wrong here as if there was no correlation the mutation rate of proteins would be equal to neutral mutations and it is not.

I have made this point and you have not addressed it. If you just keep repeating assertions where are you going with this?

Your accusation that I have provided no support is patently false.

As I mentioned multiple times, the support is in the sequence conservation vs. functional information of muscle proteins. There is no correlation.

You’re afraid to look, I think.

Not at all. What is your specific claim?

That there is no correlation between sequence conservation and functional information for muscle proteins (and other proteins), for the umpteenth time.

You’re clearly afraid to look. You showed that you could do an alignment for one of them (actin). What is preventing you from doing the others?

What in the data is showing no correlation?

Of course.

Indeed. Let’s sit down for a cup or glass of your favorite beverage and chat.

Ewert treats sequence data as a methodology for identifying genes. Once the identification is completed, he discards the sequence data and never considers it again. Instead, his whole gene component model predicts, as an output, the distribution of whole genes across the biosphere.

Over the past 50 years or so, biologists have taken a different approach when studying species relationships. They observe that among populations with known relationships (e.g., cousins) there is a probability distribution function that describes sequence divergence and convergence. The early studies focused on amino acid sequences, and of course nucleotide sequences have come to the fore in the past 3 decades.

The key point is this: since relationships (e.g., child vs cousin vs. 5th cousin) can predict sequence divergence and similarity patterns, it is possible to use Bayesian logic in the reverse direction to infer relationships from sequence divergence and similarity patterns.

The methodology is this, as far as I can tell:

  • Computationally generate many thousands/millions of hypothetical histories.
  • Compute the likelihood of the observed sequence divergence/convergence pattern for each hypothesis.
  • Select the most likely hypothesis.
  • Verify the significance of the most likely hypothesis by statistical methods against the null hypothesis (i.e., no relationship history).

There’s a lot more to this, as the problem is computationally NP-hard for large data sets. In particular, heuristic functions and search algorithms are used to find the region(s) where the most likely hypotheses probably lie, and then focus on that/those region(s).

This is a Bayesian approach, as we have reversed the direction of inference. We take the known causality (biological relationship ==>> sequence data) and infer the inverse (sequence data ==>> biological relationship)

As a side comment, one of the chief reasons Ewert’s approach has not yet succeeded is that it requires comprehensive and complete gene libraries across very large numbers of diverse species. The comprehensive and complete gene libraries do not yet exist because they are very expensive to build, both in terms of lab work and in terms of computation.(*)

Biologists - please correct any material omissions or errors in my post!

Hope this is helpful,
Chris

(*) I am also inclined to make a conjecture that I think might be helpful. Because it’s a conjecture, I am writing it in a footnote so as not to detract from my main logic.

The causal relationship between biological relationship and distribution patterns of whole genes has not been studied as extensively as sequence data distribution patterns, as far as I know. This would be due to the absence of critical data (mentioned above) as well as the time needed to gather the data from observation–i.e., it generally takes a long time for a new gene to appear or disappear from a population!

In the absence of good data about this causal relationship from biological relationship to whole gene distribution patterns, it is difficult if not impossible to infer relationships in the opposite direction (distribution of whole genes ==>> biological relationship) using Bayesian logic. And if we can’t do that, the model is not currently useful in the field of biology.

2 Likes

Thanks for your thoughts here. I mostly agree with everything you posted. I want to ponder this for a while.

1 Like

I completely agree here. A conversation over at TSZ helped me see this. I played a little with homolog database and it was very awkward. You needed very specific commands to perform your search and then the results were iffy. I wrote Winston and he acknowledged this. He agrees the sequences need to be scrubbed.

More to follow :slight_smile:

The sequence conservation and functional information, of course.

Looks fine but for one thing. I wouldn’t call it a Bayesian approach, as it doesn’t involve Bayes’ Theorem. Just P(data | tree) when we want P(tree | data). But we have no good way to get from one to the other, just a tacit agreement to use one as substitute for the other. Not that I’m complaining.

Now it seems to me that Ewert’s approach doesn’t succeed because it fits the diagram to the data. Every data pattern gets a node, so any data will be a perfect fit. There’s absolutely no way to reject that network.

1 Like

I am pretty much getting the opposite result. Should we compare methods :-). Mammal actin looks like almost1500 bits of functional information.

You think that actin and tropomyosin have more functional information than myosin and the troponin?