Sal's Flower?

That’s a silly test. If you remove the labels — just another word for nouns — you can’t talk about anything.

I have no idea what you’re trying to say there, so I can’t answer.

Since you have no idea what sets of genes are possible and which sets could or could not create a functional animal, this objection is worthless. The point, which you ignore, is that there is no reason to suspect a pattern that would fit the particular tree we propose on the basis of other data.

What, then, is the test?

Note that those are different data from Sal’s flower. I will confess that I do not understand Ewart’s Figure 8. What, on that graph, is a module? Why are there multiple modules for genes that, for example, are found only in the chicken?

I don’t think you mean a Bayesian model; you mean a Bayesian criterion of fit. Yes, I have a problem. I don’t understand how he can test for overparameterization by first eliminating all the parameters. More importantly, none of this has anything to do with the subject here, which is Sal’s flower.

2 Likes

That is a sufficient but not necessary condition for not talking about anything. Some employ nouns here and still manage to not talk about anything, often at great length.

9 Likes

You’re not looking at the data. Annotations are not data.

Winston’s graph is not based on data. It is based on incomplete annotations. Why not look at some of the actual data for yourself, independent of the annotations?

6 Likes

Remove the specific label not the labels.

Sal’s flower is a Venn diagram of gene annotations and not a tree. Does the tree add additional value to the analysis? Does a dependency graph add additional value?

From the paper:

Life resembles a nested hierarchy because a nested hierarchi- cal structure is similar enough to a dependency graph structure to approximate it.

We know if certain genes are missing we cannot build a functioning animal. We know if certain genes are missing chickens cannot perform short flight. We know if certain genes are missing fish cannot live under water.

From the paper.

Pfam [51] primarily classifies domains, portions of genes, rather than the entire gene. Genes are classi- fied into architectures, which correspond to a particular combination of domains on a gene. Thus genes are con- sidered to be in the same family if they have the same combination of domains.

Page 7 of the paper describes the classification methods of the other databases.

Here are the predictions of the paper.
To summarize, we have the following predictions:

  • Biological data should fit the dependency graph better than a tree.
  • Data produced by a process dominated by common descent or branching should fit a tree better than a dependency graph.
  • Inferred graphs for biological data should contain many more non-taxonomic modules with many more genes than dependency graphs inferred from such data known to have been produced by common descent.
  • Software should fit a dependency graph better than a tree, but a tree better than a null model.

The paper attempt to test these predictions.

Hi Bill,

You keep quoting the Ewert paper as if it had demonstrated something. It does have the value of posing an interesting and theoretically testable hypothesis.

What it does not provide is any hint of support for that hypothesis; it is fatally flawed due to the problem of missing and incorrect gene annotations.

Secondly, Ewert concedes at the outset that, even if his analysis were 100% correct, the dependency graph approach is completely incapable of accounting for sequential genomic data – which is the vast majority of data that biologists use to infer ancestry.

So even if his hypothesis were to gain support, it would not be able to address the vast array of data from which biologists infer common ancestry. The dependency graph approach would have some value, insofar as it might point to some as yet poorly understood or underappreciated mechanisms of speciation and genetic flow. But it would not undermine the vast evidentiary support for the theory of evolution, as Ewert himself concedes.

You have been told these things many times, yet you completely ignore them as you repeatedly cite Ewert. Have you considered how your credibility suffers when you write in this fashion?

I will give credit to Ewert: when he realized that his approach offered zero evidence in support of an alternative to evolution, he stopped arguing and went back to the lab to do more work. He’s not here on the forum advocating for something that he knows has zero evidential support.

Ewert has my respect for his understanding of science and for his integrity. Maybe you should consider imitating him, he’s a good role model.

Best,
Chris

9 Likes

One-liner word salad just doesn’t cut it here. Please do better. The “label” in this case accurately describes the phenomenon I’ve attached it to. The data really do form a nested hierarchy, and that’s a term I’ve explained to you manly times already.

Yes to the first, no to the second. It looks as if Ewart’s dependency graph is not based on the same data as the flower. A dependency graph drawn using the data on the flower would explain very little additional data, only the tiny proportion of genes that don’t fit the tree. And I use “explain” very loosely, as the tree has a real hypothesis of relationships behind it, while the dependency graph is just an ad hoc grouping designed specifically to fit the data. There’s no way to equate the explanatory power of the two.

We don’t, in fact. At least not to the extent that would test a dependency graph. You have no idea what genes are necessary and what genes are not.

That doesn’t help. All it tells you is how a protein family is defined, not how modules are defined. Modules are groups of families, grouped exactly by their distribution among taxa. There should be one and only one module for each such group.

By definition. How he accounts for extra parameters seems flawed to me.

I’m also suspicious of his evolutionary simulation. What parameters did he put in? Are those credible values?

You will note that Sal’s flower falsifies that prediction. And of course the same prediction will hold true if gene losses (or, for Ewart, gene family losses) are frequent. All depends on the loss rate parameter.

Not relevant to biology, though.

1 Like

I have also suggested that this is a label. What exactly does this mean in English if you remove “nested hierarchy” from the description and describe in detail the pattern you are observing. We all agree the pattern approximates a tree but this does not tell us the cause of the pattern.

This is true as Winston is using multiple data sets. “Tiny proportion” needs to be quantified. Gene gain is very difficult to explain. The “tiny amount” is significant if the event in question happens say less the 1 in 1 million reproductions as the chance of fixation becomes very small.

Of course we have an idea of some of the genes that are necessary. Additional genes we can discover through experiments.

Can you be more specific?

Not sure.

How so? How many “non taxonomic modules” are you counting in Sal’s flower?

Current think in biology, I agree.

But you cannot tell which genes are missing from mere annotations.

1 Like

As I’ve said, it means that almost all the gene distributions can be explained by a single change on the standard tree. Non-hierarchical data would show much less structure, and many more genes would require two or more changes on the tree.

Yes it does. It tells us that the pattern is caused by changes taking place on a branching tree of descent, unless you can come up with a better physical explanation for that tree.

You can see the numbers right there in the figure. What other quantification do you need?

More word salad. Gene gain is explained in the extensive literature on the subject, much of which has been cited to you here. If a gene is favored by selection, the chance of fixation becomes much greater. And one in one million is actually quite a high mutation rate.

So you agree that not all of them are necessary.

As far as I can tell, he just accounts for them by presenting a model that disposes of them. What do you think of his test?

Exactly. I don’t think he even says.

Not the point. The point is that those “modules” only account for a tiny proportion of the data, and thus make very small improvements in fit for a large expense in extra parameters. As you recall, his prediction was that if the pattern were due to a tree, the taxonomic modules would be much larger than the non-taxonomic modules. That prediction is borne out by the flower data.

Real think in biology, you should mean. Fantasy think is another matter.

3 Likes

The better explanation is 4 different starting points leading to zebra fish, chickens, mice and humans.

I think thousands of genes is a large portion. So I wonder if we are talking about the same thing.

Are you claiming that all the identified new genes are beneficial enough to have forced rapid fixation?

Sure.

I think the test is trivial. I think your idea of understanding function is much more valuable.

I think you missed his point. If we look at Sal’s flower of species of “known common descent” we would see the genes following the branching pattern and you would not have the gene gain you are seeing with the 4 separate animals.

It would be. Are thousands of genes missing, or did you just make that up?

1 Like

That’s not the better explanation as it doesn’t account for the pattern of the data. Why is there a tree? That’s one starting point.

Obviously not. The thousands of genes all support the tree. In Ewart’s terminology, those are taxonomic modules, though of course his method doesn’t allow for loss. But the point here is that the losses fit the tree. The non-taxonomic modules each have less than a hundred genes, not thousands. Again, the data fit the tree if they can be accounted for by a single gain or loss on the tree. The genes that don’t fit the tree — requiring two changes — are very few.

No. But it seems likely for most of them. After all, you’re the one claiming that they’re essential.

The genes do follow the branching pattern. I showed that to you. Don’t you remember? It’s even at the top of this page. [Well, near: #14] Now perhaps you don’t understand what it means to follow a branching pattern; I’ve tried to help you, but perhaps it’s beyond your abilities.

3 Likes

When you translated the Venn diagram to tree patterns you made 8 trees marked up with red and green markers explained with gene gain and loss. The trees I am proposing have only 4 single vertical branches and do not require the gene gain/loss explanation.

If all the information you had was the Venn diagram on what basis would you draw the tree starting with zebrafish?

So many palms, and not enough faces. Sad.

1 Like

Then you’re only looking at that part of the data that would fit any tree at all or in fact four separate trees. Genes that are present in all species don’t show hierarchicel structure and don’t support any tree. Even the dependency diagram for that data would just be a single “module”.

Using just plain parsimony we would only be able to draw an unrooted tree that would show human and mouse meeting at one node, chicken and zebrafish meeting at another, and a central branch connecting the two nodes. But if we use Dollo parsimony, under which a gene can arise only once but be deleted several times, that roots the tree on the zebrafish. But we don’t have to root the tree to see how many changes are needed, so your question turns out to be pointless.

3 Likes

If that was literally the information we had we probably couldn’t draw the tree as shown with the zebrafish as the outgroup (HMCZ), as there are several equally maximally parsimonious rooted tree topologies, such as ZCHM and HMZC. As far as I can see though, the HMCZ tree is not less parsimonious than any other tree, so think of it as a joint best topology. Other topologies, such as MCZH are less parsimonious, and could be tentatively ruled out.

However, we obviously don’t just have the Venn diagram, so why wouldn’t we put in context of all the other data we have?

It does require a massive gene gain explanation though, at the root of each one of those 4 vertical branches. Even going with this, the best case scenario (no secondary gene loss, just differential gene creation at the root of each of these branches), it still requires the independent gain of 67,739 genes in the 4 species, compared with just 27,828 gene gain/loss events in the HMCZ tree. This should be obvious from the fact that all 4 species share 10,660 genes, which are gained once (in the common ancestor) in the common ancestry model, and gained 4 times independently in the separate ancestry model.

6 Likes

The model I am talking about starts with the existence of the 4 gene sets. Their pre existence becomes the working hypothesis. In the same way the pre existence of the atoms and molecules that make them up is the current working hypothesis.

Exit the business of having to explain gene gain until a viable mechanism that generates these functional gene patterns is discovered.

@Winston_Ewert model as immature as it is pushes the discovery of how genes work together to produce biological function.

Genes exist is not a hypothesis, that’s an assumption. The hypothesis is supposed to be that a certain pattern in the sharing of genes is due to the organisms depending on these genes, and certain genes on each other.

That’s a physical necessity, not a hypothesis. Genes are, by definition, made of atoms and molecules.

If all else fails just make something up eh Bill? So please show what discovery you’ve made.

2 Likes

Not with Dollo parsimony.

But that explains nothing at all. Whatever the data, you just say it was created that way. Pointless, and worse, it doesn’t explain the pattern in the data.

You are still unable to see the difference between the pattern and the process that causes the elements of the pattern, even after it’s been pointed out to you hundreds of times.

2 Likes

Are you sure? I used Dollo parsimony to conclude that they were equally parsimonious. Can you show how HMCZ is more parsimonious than, say, ZCHM?

1 Like