Introducing Babacar

Yes, you are wrong. There are multiple ways of calibrating mutation rate, one of which is to observe the number of mutations between parent and child, and this can be done with humans. It matches pretty well with calibrations based on fossils. However, it’s also clear that there is no constant “molecular clock” over all species; the rate of evolution can vary quite a bit over evolutionary time. Phylogenetic analyses do not generally make use of this clock assumption.

1 Like

Yes, only with common descent would you a priori expect all life to share common fundamental biochemistry, such as the polymer of inheritance. Though it could in principle be the case that if life originates though some specific process, there might be a kind of chemical determinism in the process that makes it so life always ends up using DNA of the same type. While that is conceivable, there is currently no good evidence that this is the case.

You wouldn’t automatically expect this same thing on design, since a designer could in principle have many good reasons for creating different types of organisms with fundamentally different types of biochemistry. For example you could design a system where certain viruses had a type of polymer that made it impossible for them to integrate it into a particular type of host, if the host didn’t use the same kind of DNA or RNA. You could do this to deliberately prevent cross-species virus evolution, for example.

Just to be clear, you should not confuse the phylogenetic tree of life, with the fundamental biochemistry of life. While they are in some ways related (the concept of different species having the same genetic code(not to be confused with the same genome sequences) relies on them also sharing the fundamental biopolymers, DNA, RNA, and amino acids) They are two independent lines of evidence.

Yes it’s not proof. It’s evidence, as in it is a kind of thing we would expect given common descent, but not necessarily expect without common descent. It’s possible it could be due to something other than common descent, but it would a priori be less likely or more unexpected on another hypothesis than on common descent.

Yes, but those would not be objective nesting hierarchies, they would be imposed or subjective hierarchies. That’s a very crucial distinction.

To pick an example, the folder structure on my hard-drive is a nested hierarchy. There are folders within folders within folders, and each folder either contains files, and/or other folders. But this is an imposed nested hierarchy. The creators of the folder system in windows made arbitrary decisions to put particular files and folders into particular other files and folders. But these could in principle be placed anywhere. There is no logical necessity that the folder system32 has to sit inside the folder called windows, this is an arbitrary choice from the coders perspective. And the folder could be named anything, again the software could easily be made to work with differently named folders, and the files placed in innumerable other locations than they are.

This is not the case with an objective nested hierarchy, here the overall hierarchy can be used to predict where a novel item should be placed in the hierarchy given it’s attributes. If you give me a large phylogenetic tree of mammals, and then you give me a novel mammal I haven’t before encountered, I can calculate where in the hierarchy this novel mammal should go, given it’s objective attributes (genetic or morphological, or both).

But in the case of a computer’s hard-drive folder structure, I can’t do any sort of calculation that would show where a particular file belongs merely from it’s attributes (how big the file is, what it’s extension is, what it is named, some portion of it’s contents), as it could in principle be placed anywhere.

But the point is your hierarchy would be imposed, created according to whatever subjective criteria you decide counts.

You’re going to have to look this up yourself, I don’t know any articles off the top of my head that specifically address this question.

Well the linked article shows that, and the point is they match with an incredibly high degree of significance. Generally speaking, phylogenetic trees derived from different gene-sequences do that.

It doesn’t have anything to do with “statistics assuming common descent”, but with actual statistics as it is used everywhere for anything and everything else. The article does a good job of explaining why you can have a significant match without having an exact match, but I’ll give it my own shot here.

I like to explain it using this analogy: Imagine you have a large number of ultra-sensitive thermometers that can measure the temperature in a room to an accuracy of six significant figures. As in it measures not just 21 degrees Celsius, but 21.0031 degrees Celsius. Now suppose you have thirty different thermometers of that type, and you put them all in the same room on a line, and you measure literally thirty different values between 21.0031 and 21.0078 degrees Celsius.

So now you have this problem that no two thermometers match exactly, they all disagree on the exact temperature on the room. But even so, the fact is they’re all extremely close to each other, which in a sense is to be expected since we can easily imagine that the temperature in the room isn’t actually so exactly uniform in all positions. So we can still say that the thermometers match each other to an extremely high degree, and for that reason we can be highly confident that the true temperature in the room lies somewhere in the measured range. It is highly likely that the true temperature in the room is somewhere in the range between 21.0031 and 21.0078 degrees Celcius.

In the field called statistics, there are principles that describe how significant some particular matchs is. Unsurprisingly called https://en.wikipedia.org/wiki/Statistical_significance.

This same principle applies with phylogenetic trees derived from different sets of data. Each phylogenetic tree thus is a kind of “measurement” of the relationships between the species(it gives us some information about what the true relationship is like, like how the thermometer also gives us some information about what the true temperature is like), given the “thermometers” used (which could be particular genes, or morphology). Even if we don’t get completely exact matches every time (though often times we do get that), the fact that we get highly similar trees even if not always exact matches, is still a very significant result. It is the result we expect given common descent, and we do not have any other good reasons for expecting the phylogenetic trees from different and independent data sets to give so similar(and often time exactly matching) results.

3 Likes

" is not the best source for scientific information?"

Perhaps not, that is why I was asking. In the author’s defense I just plucked this quote out of his book as an example for Swamidass. I figure the book talks about mutations as well.

I’m a lay person and I really don’t know what I’m talking about, I tend to use wikipedia whenever I want to know or understand something. I can only understand so much. For this reason, I would need some grace as language goes, the specific jargin for things eludes me. I tend to understand more than I can communicate but even then I’m not educated. Also, I should have used a direct quote rather than a paraphrase, but, for the author’s sake, perhaps I can find one online. The quote reflects my understanding of what the author said perhaps more than what the author meant.

“any mutation can happen, and any series of mutations can happen.”

Yes, that makes sense to me,

“There are in principle a number of alleles equal to all possible sequences…”

Yes, I understood that from the word salad.

By “different ways of making an amino acid” I was trying to refer to “the sequence” or peptide you refered to as varying. By comparing the sequence of amino acids for cytochrome c you can map distance between organisms, and perhaps a chronology? (A part of the amino acid sequence in Cytochrome-C protein from 6 different species is given below. Human - DVEKGKKIFIM Silkworm moth - NAENGKKIFVQ Chicken - DIEKGKKIFVQ Rhesus monkey - DVEKGKKIFIM Bullfrog - DVEKGKKIFVQ Tuna - DVAKGKKTFVQ Rank the organ | Study.com This gives an alternate way to confirm evolutionary theory. For example wiki said: " Cytochrome c has a primary structure consisting of a chain of about 100 amino acids. Many higher-order organisms possess a chain of 104 amino acids.[9] The sequences of cytochrome c in humans is identical to that of chimpanzees (our closest relatives), but differs more from that of horses.[10]"

{Although the consensus I’ve read is that chimps and humans have the same (“identical”) sequence I ran into two places that indicated differences between the chimp and human sequences when I tried to look at the sequence. (MATERIALS Amino Acid Sequences By Species Unknown ... | Chegg.com)

and

"Homo sapiens1 and Pan troglodytes have 99/105 amino acids similar, meaning that there are 6 amino acids that are different.
Homo sapiens1  Pan troglodytes
P  A (proline in Homo Sapiens, alanine in Pan Troglodytes)
A  V (alanine to valine)
I  T (isoleucine to threonine)
Y  H (tyrosine to histidine)
G  E (glycine to glutamic acid/glutamate)
Y  F (tyrosine to phenylalanine) "

(http://www.coexploration.org/C-DEBI/17_1/Metabolism_Bioinformatics_Physiology_answers.pdf)

Why do these not show chimp and human sequences are identical? Are they just erroneous? I’ll just take your word for it.}

“that’s true only if there’s a uniform rate of evolution across the tree of life, which is not the case.”

So for instance rats’ mutations happen faster and dogs’ much slower such that there is no uniform rate across the evolutionary tree? Would the smaller diferences between species not still indicate closer relation?

“most methods of phylogenetic analysis do not assume this uniform rate.,”

Please explain

How do you make a phylogenetic tree? How do you calculate rates for extinct species? (Molecular Clocks ( Read ) | Biology | CK-12 Foundation)

1 Like

I don’t know. I just looked at the GenBank protein database, and it shows human, chimp, gorilla, and orangutan sequences as identical.

Smaller differences indicate closer relation, all things being equal. But it’s not difficult to find exceptions in cases of differing rates. The sizes of differences will often correctly diagnose relationships, but will mislead you frequently enough that evolutionary biologists don’t use them.

There are many different algorithms, but they all more or less calculate some measure of fit of data to trees, picking the tree with the best fit. The simplest to explain is perhaps parsimony: choose the tree that requires the fewest evolutionary changes.

This simple explanation does not appear to be hopeless. If you want something more rigorous, I recommend the book Tree Thinking, which you might be able to find at a library.

You can’t calculate rates for extinct species unless you manage to get sequences for them, which can be done for a few not-too-old fossils.

1 Like

I tried an alignment on UniProt, and it shows two species of gibbon (the only ones I saw) as also identical to human sequences. Unless your source can show a source for its sequences, I don’t know what else to do.

Oh, and I’d like to point out that you plucking the quote does nothing to make the author look better. If two random quotes are that bad, it’s probably a reasonable sample of the book.

At the risk of beating the point into the ground, I think I’ve found the problem with the data set. If I Blast the chimp sequence in UniProt, I get not cytochrome c but “cytochrome c domain-containing protein”, and I have been unable to find what they mean by that.

Thoughts: it must be a duplicate, possibly a tissue-specific variant, or perhaps a pseudogene, though it lacks a stop codon or frameshift.

2 Likes

This was very helpful, thank you.

"You wouldn’t automatically expect this same thing on design,"

I see your point, yet on the other hand, individuals tend to have a particular “handwriting”. For someone who believes in design, this would come just as automatically. In any case what one automatically understands may be irrelevant. I find real things complex and hard to understand, I don’t know why I should expect anything different in “design”.

“you could design a system where certain viruses had a type of polymer that made it impossible for them to integrate it into a particular type of host, if the host didn’t use the same kind of DNA or RNA. You could do this to deliberately prevent cross-species virus evolution, for example.”

So if man was impervious to viruses that would be an evidence to you that creation was designed?

I’ve heard it said: “Eternity is the keyhole to the whole story” So then again, You might find a purpose for such a virus. Can you not think of a few naturalistic purposes of viruses? As to making a virus proof world and whether God should have done this or that, one would require some understanding of what He wants. On a basic level, one might just simply assume this is what He wants, and then proceed to ask why. Precisely the kind of question philosophers and religions talk about. Perhaps suffering has its purpose and limits in the natural. There are tons of literature regarding this.

"It doesn’t have anything to do with “statistics assuming common descent”

Yes, I stand corrected. Statistics as you refer to them don’t assume common descent, your right.
Hmm, I think that I got mixed up. I’ll try to retrace my train of thought… so I understand the cladograms on which the statistics are based do assume common descent. This sounds to me like the trees are dictated by “what researchers decide.” (Ie: Plesiomorphies, synapomorphies and also in regards to Homoplasies) example:

“Plesiomorphies and synapomorphies
Researchers must decide which character states are “ancestral” (plesiomorphies) and which are derived (synapomorphies), because only synapomorphic character states provide evidence of grouping.[10] This determination is usually done by comparison to the character states of one or more outgroups. States shared between the outgroup and some members of the in-group are symplesiomorphies; states that are present only in a subset of the in-group are synapomorphies. Note that character states unique to a single terminal (autapomorphies) do not provide evidence of grouping. The choice of an outgroup is a crucial step in cladistic analysis because different outgroups can produce trees with profoundly different topologies.” - wiki on cladograms

So since you can cater your input to get a matching tree, any significant statistical incongruence is an indicator you need to adjust your inputs…if your adjusting inputs then maybe the tree doesn’t evidence anything except the repercussions of your own thought process. Therefore, statistics show what you want them to. That’s the train of thought in my head.-
https://www.statisticsdonewrong.com/p-value.html#if-at-first-you-don-t-succeed-try-try-again

“the fact that we get highly similar trees even if not always exact matches, is still a very significant result.”

Unless the similarity is dictated by the selected criteria used in the algorithm or there is some correlation between the trees. I bounced this off a friend of mine and he described it this way:

“When you have multiple possible ways you could generate a tree, it’s unavoidable that some of the trees will look more like each other than others. If you then allow yourself to pick the ways that end up with trees that are the most similar, it totally reduces the significance of their similarities, and if you don’t adjust for that (using something like Holm–Bonferroni method - Wikipedia) then claims of statistical significance are invalid.: The more ways of deriving a tree that are considered, the more strongly you must adjust how tight the correlation has to be before declaring statistical significance. (So for example, if various scientists have ruled-in or ruled-out certain characteristics as input to their cladogram algorithm or considered characteristics in different combinations, or tried the inputs with different cladogram algorithms or different hyper-parameters when tuning the algorithms, you have to count up all of those different combinations that were considered (even the unpublished ones) in the list of hypotheses… and then use that to adjust your significance levels using the holm-bonferroni method or something like it),but even before that, you have to make sure that the significance levels of all the various hypotheses levels are correct in isolation… which they’re not if you’re comparing say a DNA-based cladogram with a cytochrome-c-based cladogram and you haven’t adjusted for the fact that the cytochrome-c information is almost entirely accounted for based on information in the DNA.” See also: https://ajph.aphapublications.org/doi/10.2105/AJPH.2018.304337 - here’s a rough overview / https://www.statisticsdonewrong.com/

I don’t have a problem with organizing information into trees per say. It is useful, but I don’t feel this is what you would call an evidence for common descent.

Imagine just for a moment that common descent is indeed false. My understanding is that an objective application of the cladogram algorithms would still organize species or DNA information or whatever into the same trees as they do now. It would still provide a means of visualizing similarities. How does grouping similar things together on a tree provide evidence of common descent? You can group similar things together on a tree, whether they share ancestral relationships or not. You could even enter fake animals into the evolutionary matrix all day and it would still look statistically similar.

Please look at this example:
(https://www.google.com/amp/s/www.inverse.com/article/18434-pokemon-go-pocket-monster-evolution-genetics-research/amp)

“Yes, but those would not be objective nesting hierarchies, they would be imposed or subjective hierarchies. That’s a very crucial distinction.”

Hmm, as it relates to the computer filing system example, "Objective " hierarchies could be made for files in the computer as well. Date, size ,type, in use, not in use…in ram, in cache, in storage, on usb, on net… Are those not objective?
This still doesn’t seem significant to me. Case and point , the pokemon tree:

“Well, we didn’t want to do it by morphology — the way they look — there’d be too many biases in that and it’s not like it would be particularly useful because they’re all designed to look very differently. So we wanted to do something resembling molecular biology, but they don’t have DNA obviously, but something similar, something that they all have that would be slightly different. I figured there are the preprogrammed attack sets and the types and the body types could work. Anything that could be categorized. Something that has been given an explicit category and isn’t subjective. So not like how many legs they have but can they do this attack or not.” - https://www.google.com/amp/s/www.inverse.com/article/18434-pokemon-go-pocket-monster-evolution-genetics-research/amp

The "objectiveness "of the “preprogrammed attack sets” (to the pokemon morphology) is not an indicator that the pokemon tree represents a physical/historical reality, only that you think your thought process is logical internally. This is useful but not what I would consider evidence for pokemon evolution.

Also the significance of the “objectivity” idea falls apart in my mind. This is because the “objective” information is selected before hand! Information could potentially be cherry picked whether it be on purpose or not. As exemplified above, the preceding morphological tree might have some affect on the criteria chosen to produce the phylogenetic tree. This is to be avoided. Choosing criteria to produce a projection most like the hypothosis is a natural thing to do. This might be evidenced by regular revisions to trees in order to get a uniform tree. Looks to me like cladistics is easily manipulable. You may come up with whatever proof you want for your theories by just being selective in how you pick and weigh your observations.

Well, at least that’s not a creationist site, I think. But no, that isn’t how it works. It’s not necessary to decide synapomorphies and sympesiomorphies in advance, because the tree resulting from your analysis will tell you that. Nor do I know what that means by choice of outgroup producing “profoundly different topologies”. You could “cater your input to get a matching tree”, but it would be considered bad form and a good reason to reject a submission. I don’t think your friend is well-versed in phylogenetics.

Your understanding is wrong. There is no reason for congruence among trees from different data, and no reason for statistical testing of data to yield a significant result, other than phylogeny. It’s true that random data will give you a tree given most algorithms, but the tree won’t be statistically supported. That’s the crucial point: it’s not the tree, per se, that’s evidence for common descent. It’s the structure in the data that causes one tree to be a much better explanation fo the data than other trees. No, fake data would not look statistically similar.

You could, but that isn’t how an honest scientist would work. Your imputation that scientists are cooking their data is a serious accusation that should not be made casually, certainly not without actually looking at the data. I for one find it insulting. And it also implies a universal conspiracy among those working in phylogenetics to suppress the truth. We’re not only faking our results, we’re uniformly lying about faking them. Do you really want to say that?

2 Likes

"Your imputation that scientists are cooking their data is a serious accusation that should not be made casually, certainly not without actually looking at the data. I for one find it insulting.

I’m sorry for insulting you. I wasn’t considering how my questions might affect you all. I realize you built your life around these things. Also, the entire scientific community stands by the phylogenetic tree, there must be something more to it. If anybody should just take your word for it, it’s me. I really do just want to understand.

And it also implies a universal conspiracy among those working in phylogenetics to suppress the truth.

I guess it could, but I wasn’t accusing or inferring ill intent, at least not on purpose. Conspiracy would be quite a jump from what I wrote. The point is, I don’t understand (currently) why the phylogenetic tree is a proof for evolution. I joined the forum because I was excited that rather than just thinking about things on my own, I could just ask people who know what they are talking about. There is no reason to take my musings seriously. I really don’t know where to get science from other than scientists. I don’t intend this except for my own (and also my wife’s) curiosity.

.We’re not only faking our results, we’re uniformly lying about faking them. Do you really want to say that?"

I don’t think that, so I didn’t say that. I didn’t come here to make war. I’m just looking for dialogue.

Well, at least that’s not a creationist site, I think. But no, that isn’t how it works.

You can edit wikipedia I think. The link you sent for Khan Academy said the same thing. I just posted the wiki one.

“When we are building phylogenetic trees, traits that arise during the evolution of a group and differ from the traits of the ancestor of the group are called derived traits. In our example, a fuzzy tail, big ears, and whiskers are derived traits, while a skinny tail, small ears, and lack of whiskers are ancestral traits. An important point is that a derived trait may appear through either loss or gain of a feature. For instance, if there were another change on the E lineage that resulted in loss of a tail, taillessness would be considered a derived trait.
Derived traits shared among the species or other groups in a dataset are key to helping us build trees. As shown above, shared derived traits tend to form nested patterns that provide information about when branching events occurred in the evolution of the species.
When we are building a phylogenetic tree from a dataset, our goal is to use shared derived traits in present-day species to infer the branching pattern of their evolutionary history. The trick, however, is that we can’t watch our species of interest evolving and see when new traits arose in each lineage.
Instead, we have to work backwards. That is, we have to look at our species of interest – such as A, B, C, D, and E – and figure out which traits are ancestral and which are derived. Then, we can use the shared derived traits to organize the species into nested groups like the ones shown above. A tree made in this way is a hypothesis about the evolutionary history of the species – typically, one with the simplest possible branching pattern that can explain their traits.
Example: Building a phylogenetic tree
If we were biologists building a phylogenetic tree as part of our research, we would have to pick which set of organisms to arrange into a tree. We’d also have to choose which characteristics of those organisms to base our tree on (out of their many different physical, behavioral, and biochemical features).”

at least that’s not a creationist

I did quote a creationist source for Swamidass, but I don’t use any creationist sources typically and never in exclusion of secular ones. My posting it is an example of this.

It’s not necessary to decide synapomorphies and sympesiomorphies in advance, because the tree resulting from your analysis will tell you that.

It was my impression that these were necessary to make the tree (that shows ancestry). Please explain. What is “your analysis” in reference to?

Nor do I know what that means by choice of outgroup producing “profoundly different topologies”. You could “cater your input to get a matching tree”, but it would be considered bad form and a good reason to reject a submission. I don’t think your friend is well-versed in phylogenetics.

I think “*You could “cater your input to get a matching tree”, but it would be considered bad form and a good reason to reject a submission.*” is what my friend was saying. So how do you know when/if your tree was unintentionally skewed?

Your understanding is wrong.

And/Or incomplete.

So looking around I found this video: https://youtu.be/09eD4A_HxVQ In this example comparing genetic distancing is unbiased, and the “tree” depicts similarity. I would have visualized clusters but this video also makes good sense. You could potentially do this with anything, for example a bunch of junk on an old hard drive. How does one get from a depiction of similarities to parentage? What am I missing?

(Also, what part of the DNA is picked to compare? Or do they use the entire strand of DNA to compare relationship or random selections from it, or both? I wonder how the results play out… Do you ever get genetic distances that look erroneous like bacterium being more closely related to horses than yeast? What do you do if you do?)

There is no reason for congruence among trees from different data,

*and no reason for statistical testing of data to yield a significant result, other than phylogeny. *
It’s true that random data will give you a tree given most algorithms, but the tree won’t be statistically supported.

Please explain.

That’s the crucial point: it’s not the tree, per se, that’s evidence for common descent. It’s the structure in the data that causes one tree to be a much better explanation for the data than other trees.

This sounds to me like maybe your referring to parsimony. So by “better” maybe you mean “simplest” is that right?

No, fake data would not look statistically similar.

Hmm… fake data is a bit unclear. I mean fake creatures like in the link I sent for the pokemon tree. However, I think your right and I am wrong. The fake creatures would effect the tree.

I get the impression you consider this an attack. I sincerely didn’t mean it as such. I’m sorry. I’m also not out to fight evolution. I want understand it on a more than superficial level. Thank you for the time you’ve put into this.

2 Likes

No, it doesn’t. In the wiki, ancestral vs. derived is an a priori decision. In the Khan academy quote, it’s a conclusion based on the tree. Big difference. This is particularly obvious when using DNA sequences to build trees. You just put the whole sequence into the data set, without regard for whether it’s an A, C, G, or T that’s derived or ancestral at any spot.

That’s a complicated question that can’t be answered briefly. But other people will go over your analysis looking for problems.

I generally resist looking at videos. Yes, you can make up distances for anything, and you can get trees from anything. But if there’s no actual descent behind the data, no one tree will be better (by various statistical measures) than others. If the data are themselves not structured hierarchically, there are ways to tell beyond just dumping them into a tree-building algorithm. Note, by the way, that the common ways to do the analysis do not rely on genetic distances or simple similarity but on comparing whole sequences, site by site.

Added later: OK, I looked. The video explains using UPGMA, a method no systematist has used for at least 30 years, and it was considered a bad idea even then, as it assumes an absolutely constant rate of evolution over the tree. I assume the video used it because it’s a comparatively simple introduction to clustering algorithms. Real phylogenetic analyses do not use clustering algorithms but tree searches based on optimality criteria. I could explain the difference if you like.

Many different parts, depending. Again, not easy to answer briefly. You would have to look at the individual publications.

I think I’ve mentioned this before, but simple distances are not used in real phylogenetic analyses. That would assume an absolutely constant rate of evolution across all life.

I would have to go into major complexities for a good explanation. But consider a very simple and informal test: try a number of different genes independently. If they all give you the same tree, could you conclude anything from that? It’s congruence of signal from many different sources that gives us confidence in a tree. This can be done both within and between data sets.

I’m referring to any criterion used to choose trees, of which parsimony is one. The criterion is a number that summarizes the fit of data to tree. By “better” I mean that one tree fits the data to a statistically greater degree than others.

It would help if we were discussing a real scientific paper with a real phylogenetic analysis. I could modestly suggest one of mine if you like. Or maybe a primate paper would be more to your taste. It turns out that primates are easy.

Incidentally, you may not have intended to imply the various insulting claims in your posts, but they were really there. Reread what you wrote and perhaps you will find them. I will accept that they were unintended, but you really should exercise more care in what you say

1 Like

@babs it seems that one conceptual gap may be regarding distances-similarities and trees.

You can create a distance based tree out of anything, however phylogenetics trees are not distance based trees, but based on nested clases

It might help to consider a tree and ask a few questions about it. Can you imagine a distribution of mutations that fits the tree? What about one that does not fit the tree?

That second question is really important. Because mutations can be inconsistent with the tree, that shows you why it is so surprising that they are consistent. In fact it is pretty easy to create sequences that do not create a quality tree. We’ve don’t this before on the forum, and maybe we will do it again.

I will point out that some phylogenetic trees are distance-based. Neighbor-joining and least squares trees do use distances as inputs. But, crucially, they do not assume that most similar equals most closely related, i.e. they don’t assume a constant rate of evolution throughout the tree. Neighbor joining turns out to be a quick approximation of the least squares tree.

2 Likes

True enough, but also a bit vague. In this particular case though we are dealing with a question of what you would expect, given some particular hypothetical explanation X. And in the case of common descent, you would expect independently derived phylogenetic trees to show a high level of similarity. But you wouldn’t expect that on design. That is not to say a design explanation can’t be complicated or hard to understand, you just don’t have any good reason to expect this particular pattern in the data.

You can of course try to come up with some complicated story for why a putative designer would want to do it that way, but it’d be completely ad-hoc of course. And there’s no conceivable pattern of data you could not explain away with a sufficiently complicated and inscrutable number of motivations and capacities on behalf of the designer.

Why does this rock formation contain these particular bands of iron oxides? Well a designer could in principle have placed them there given some particular set of motivations and capabilities. There’s no end to this. Why do I obtain this particular pattern of temperature measurements? Well someone could have shone a strong heat-lamp on the thermometers from these different distances at these different intervals. Etc. etc.

Not necessarily, but to begin with that would be evidence against common descent if different species were based on fundamentally different and incompatible types of biochemistry. Whether that would be evidence for design is a separate question from whether it is evidence against common descent.

I was making the point about design simply to highlight the fact that you don’t automatically expect identical fundamental biochemistry in the same way that you do with common descent.

Of course, but if one makes such an assumption that everything we see is what the designer wanted, one has made it impossible for one self to discover that one might be wrong about that. That is putting the conclusion before the evidence is even analyzed. Suppose hypothetically one was wrong in making that assumption, then one would waste an eternity trying to undertand the purpose for which something was designed that in the end actually wasn’t.
At some point I think we should just let go of assumptions in that way, and just try to let the evidence speak for itself. To do that we need hypotheses that make genuine predictions that we can compare to observations. A hypothesis needs to predict data, as in say something should be a particular way if the hypothesis is true. Common descent does that for consilience of independent phylogenies.

The algorithms used to derive the phylogenetic trees from the data do assume the data is all part of a tree, and will try to derive the best tree given certain philosophical assumptions(such as, in the case of molecular data “fewest number of mutations needed to explain the differences between polymer-sequences”, understandably named the maximum parsimony algorithm).
But it’s important to understand, they are not assuming a particular tree, the algorithms are not forcing the data to produce trees that conform to each other. The mere fact that a phylogenetic algorithm will put the data on a tree does not explain why different trees derived from different data sets nevertheless agree(the algorithm doesn’t know that some other tree was generated from some other piece of data, so it can’t try to make them similar). That result can only obtain if there really is similar tree-like structure in the data.

As John Harshman explained, there are statistical tests that can be done on the data that shows whether tree-like structure actually exists in the data set(as in, is there even any tree-like structure in the data, and if so how much?).

But the stronger test for common descent is really whether independent data sets yield significantly similar trees. That is to say, for example different gene-sequences that do not constrain each other through some sort of function or physical interaction, to produce similar gene trees. Exactly because no such constraint is operating to “force” the different gene sequences to yield similar gene-trees when subjected to a phylogenetic algorithm(there is no reason they would), is it such a remarkable result when they nevertheless do.
That is a result that cries out for an explanation, and it is the resulted expected from the gene-sequences being generated from common ancestors through a branching genealogical process. Hence common descent is the best known explanation for this result.

But you can’t, really, unless you’re literally doing fraud and making up data. As in you sequence the genes of some organism, and then you very dishonestly go and edit the gene-sequence from what you really measured it to be.

That would certainly be a problem if scientists were doing this, but there’s just no good evidence of them doing this with real biological data. It would have to be some sort of vast international conspiracy where hundreds of thousands of biochemists, biologists, doctors, geneticists, statisticians and what have you all over the world are conspiring to make up gene-sequences and put them into public data bases to be used in genetic analyses. Needless to say that is just not credible.

Sure, but it isn’t. I think you should try to read up on how phylogenetic algorithms actually work, and possibly even try to derive some phylogenetic trees for yourself. If you can stand the thick Danish accent I once took a course where I watched this video where the maximum parsimony algorithm is explained: https://www.youtube.com/watch?v=gXb_WuLCD8g

Sure, but how similar, and how different could they be? Think back to the thermometers example. With 6 significant figures the thermometers could in principle disagree in almost 10 million different ways, as in there could in principle be a ten million degree difference in measurements, so the fact that they are within a 0.004 degrees Celcius of each other is remarkable. And all the more remarkable the more measurements you take independently of each other.

For large phylogenetic trees the number of ways they could disagree become truly astronomical. It’s not just the fact that some are inevitably more similar to each other than others, it’s that they are all technically within an extremely narrow range of “values”. How similar they are. It is the degree of similarity that is remarkable over so vast data sets, and that this is a consistent and reproducible result across many many different genetic loci(and morphological/other physiological data set), and that it is corroborated by many different ways to do phylogenetic analysis on genetic and physiological data.

The idea that this is all due to some sort accidental input bias, or worse deliberate fraud, is simply outside of rational discourse.

Sure, but you can’t just merely handwave in the direction of the mere possibility that researchers COULD be doing something systematically wrong, either indirectly or deliberately. You have to provide evidence that they are doing this(after all, otherwise you can just excuse away the findings of any scientific field by the product of researcher bias or fraud). And if you have this evidence, you should immediately write the scientific journals in question to show that researchers are engaging in some sort of fraud.

Then I’m sorry to have to inform you but your feelings are powerfully mistaken.

As explained already, it is not merely the fact that you can put things on branches and say “see, here’s a tree”, it’s that there are both tests for the degree to which this data really does contain tree-like structure, and that different independent data sets yields highly similar trees without being (accidentally or deliberately) systematically forced to do so.

And if you have any actual evidence of scientists doing this, just making up fake data and putting it in their phylogenetic algorithms, you should report them to the institutions where they work, and the scientific journals where they publish.

I’m looking at it and I see no evidence at all that there is significant tree-like structure in that data(the authors don’t mention having done any sort of such tests on the data set, merely just derived a tree), nor that they have tried to derive trees from different types of data and compared their degree of match.

No. You’re going to have to decide what is the criterion to use for files that go in the root folder, and then decide what folders it should contain in turn, and how many, and what criteria to use for what files you put in each of those. That would make it a subjective nested hierarchy.

Yeah, that indicates there would not be any significant match between a tree derived from morphological attributes of Pokemon characters, and a tree derived from their “attack capabilities”. In a way these authors appear to be guilty of doing exactly what you suggest would poison the significance of such a result, by deliberately including and excluding characters they suspect would affect the results.

I don’t think it’s evidence for Pokemon evolution either. I suspect the authors also don’t.

Yes, exactly. Do you have any evidence biologists are doing this with real genetic and morphological data to force different trees to agree?

I think you’ve done a fine job coming up with all sorts of excuses that you can imagine for why consilience of independent phylogenies could fail to be real evidence for common descent, now you just need to show scientists really are guilty of committing these kinds of deliberate or accidental fraud.

Do you have any real evidence to base such a belief on, or is the mere theoretical conceivability of this enough for you to believe it? After all, all scientists could in principle commit fraud about basically anything, why believe anything anyone says? Are we now really forced to go and sequence the genes of different organisms ourselves?

1 Like

Horse and yeast are equally related to bacteria
BABS, I suggest you watch the videos by Aron Ra (Aaron Nelson) on YouTube concerning phylogeny. He’s VERY good once you get past his hairy appearance. Here is the link to the first one https://www.youtube.com/watch?v=AXQP_R-yiuw

If you watch all or most of them and think about the content I think you’ll see the absurdity of the creationist position

1 Like

In regard to phylogenetic trees, once again I found your posts very helpful. I think things are clicking, and I’m learning little by little. I’m definantly suffering from gaps in information. You all are right. I should look at an actual example of phylogenetic tree and ask questions. Simple would be good. Also, I think my confusion has to do with the process of making phylogenetic trees too. The discussion here is really helping me to understand it better. I see several places where I’ve misunderstood. I’ve taken some time in between work and while doing menial jobs to listen to some videos and have my smart phone read to me while I’m working and so I think I’ve filled in some gaps in my understanding. I wrote out some notes on what I understand with along with some questions. Here is the progression of my understanding and some misconceptions and questions for you to critic, answer, clarify, etc.

From sequences to a tree

1 Choosing markers

Let’s see, first you choose what kind of data you’ll be comparing, be it DNA or protein sequence, global or local. This is carefully done (when comparing against DNA or protein sequences) to avoiding causal relations and to select relevant sequence with information… (https://cals.arizona.edu/mycoherb/arnoldlab/teaching/advmycol/baldauf.pdf ) I think choosing a sequence has many utilities- first of which may be that not all the code is demonstrating a mutation (has the same code), and also that perhaps the code has nothing in common. So picking code is important.

From what I understand you must slice the chosen sequence out of a bit of code just right. Meaning it must contain information, where an evolutionary process is at work, (where the sequence has mutated.) If you were to slice the information in the middle, the info wouldn’t be there or would be in part. Some paper I read suggested using the whole DNA, and I think this may be why. John, I think, said the same thing. (It seems to me that when using the NJ method a splicing error will be easy enough to spot because it might look like a variant trees, or misinformation. This problem might happen in alignment as well.)

Misplicing may mess up the whole process. If what a splice contains is a limited code like in dna (made of only a few characters as opposed to one with more characters) the chances of spurious correlations in a misplice are higher. This paper above stated it this way:

“The basic premise of a multiple sequence alignment is that, for each column in the alignment, every residue from every sequence is homologous; that is, has evolved from the same position in a common ancestral sequence without insertion or deletion. When this premise is met, a multiple sequence alignment can hold a wealth of information about protein structure and function, mode of evolution and, of course, phylogeny. However, a molecular phylogeny is only as good as the alignment it’s based on. At best, misaligned sequence has no useful phylogenetic information; at worst, it might have convincing misinformation.”

Do you have to splice it yourself?

How do you actually choose the sequences and why? (If what I said above is wrong)

If you use entire DNA/protein sequences, must you split them up?

How do you check if you chose (marked) the sequences correctly?

2 Align the sequences

For me I think this step is particularly confusing. You can’t just simply compare the sequences (I assume because of insertions). So you have to match them up. A wrong alignment could effect the tree. The alignment with the least gaps is considered best (provided it is not a fringe result from the mean).

Is there ever a reason to second guess an alignment?

What is a guide tree?

How do you align the sequences correctly? (Avoid compact alignments and such.)

How do you know the sequences are aligned correctly?

3 “Choose a model”

I think this means calculate which model of evolution fits your data best….I don’t understand this at all. Is this in reference to the sequence alignment or the tree building or to the algorithm used to make the tree? I just saw it here: Branching Out - 5 Steps to Creating a Phylogenetic Tree - Bitesize Bio

If this is a step in building a tree I don’t get it. I think I saw some tools for this in MEGA.

4 Tree building

Then, you enter sequences into a program to compare them.

You use the fitting algorithm of your choice, Neighbor Joining, UPGMA…ML…. My general understanding is that these plot the distance between each individual sequence, and then between the clusters as well. This makes a tree(trees). (I think these algorithms are getting many possible trees and narrow them down to one. I think this is the case for MP and ML.) In any case, you can use:

UPGMA- which assumes a molecular clock- this produces a rooted tree. Its probably not useful in every circumstance because it assumes a constant rate of mutation. It’s an old method.

NJ- which doesn’t assume anything that I know of, just measures distances. (Did I get that right?) It produces only 1 particular tree (or at least narrows things down to one tree. Is this an advantage or disadvantage?) This method I would tend to trust more. I read it’s better for more distant relations.

I would guess these two methods have more similar results.

MP- Calculates trees with the least mutations to account for the data. (An Occam’s razor approach I think.) It’s character based, but I don’t know what that means. (Maybe it means comparing individual character positions in the sequences to make trees to compare as opposed to NJ’s calculating the number of differences in sequences.) This is for things that are closely related.

ML- Also character based (whatever that means) This one was harder for me to keep in my head. Basically I think it maps all the different possible trees, then takes what are the most likely ones based on…a certain model of evolution (maybe based off of step 3?). I think this one assumes something. In any case Most Likely seems useful for any set of sequences.

If you use NJ, for example, you get an objective tree. You can deduct what is an ancestral trait and a derived trait by the nodes on it without referring to a molecular clock or morphology. (This I had not understood).

The NJ tree can not really be wrong, can it? (By this I mean it won’t display incorrect information as far as similarities go unless you entered it wrong.)

Also, at this stage you can compare trees from different algorithms and\or run the data through other algorithms to see what trees they plot. So when you run a set of sequences through these algorithms you get different outputs.

What do you do with them? Consolidate them?

Are you looking for a common topology?

I tried to use the MEGA program to do a few trees for fun just to get the idea. (Seeing how the program is setup was helpful, but I of course don’t understand it well and I can’t sequence. All I could do was look at examples on it.) It was helpful to get an idea how it works.

What I understand at this stage of making a tree is that you have one, but it’s not finished (another point of confusion) and that they only show similarities. Also, I think the algorithms used to produce the tree thus far are narrowing down a whole bunch of trees that could have worked somehow or another for the data set but are not ideal(?). So what’s next, and what do you do with trees made from the different algorithms?

Do you compare the trees produced by these methods and take an average?

Where in this process would you “root the tree” and what assumptions, if any, does that come with?

  1. Checking your results internally: Bootsraping/Jack knifing

This means you check your tree produced by the algorithm (NJ) by randomly selecting parts of the code and seeing what trees they make in order to see how robust the tree that your testing is. Jack knifing I don’t understand.

  1. Interpreting/Resolving the tree/Explaining the tree

I’m curious as to what this entails. I understand at least that you try to determine if the tree represents accurately the data inputed. Also, this is where you compare the trees made from all the methods used for congruency. In the case that things are incongruent you try to figure out why.

Do you tweak things to get them congruent afterwards?

Do you ever edit the trees? On what basis?

  1. Making a Presentable Tree (aesthetics)

A good sumation in my mind would be that this method produces a working model on which to base an undertanding for the theory of the tree of life. This method can be used to get a model tree to prune as more accurate/new informarion dictates. In other words, you can draw in the branches of the tree in many ways and this method produces the most likely way the tree branches can be drawn. Is this a fair undertanding?

How well do I understand thus far? If I get an understanding generaly of how this works maybe I’ll understand something from a simple example.

This site helped: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1365-313X.2005.02611.x

Here are some questions I have…

At what point in this process do trees from different sets of sequences look alike?

I wonder how often do the algorithms give trees different tree topologies?

If a distance algorithm is calculating relation (or distance) I’m not sure if the algoritm is getting multiple ways things could be similar, and narrowing the output down to the simplest model or is just getting one model. Also for MP methods, they present the simplest model, but could there not have been a more complicated tree that is the real tree? Or another tree that, in one aspect, is more correct? (The reason I ask is that I think one lecturer mentioned there was another program called BEAST that allows you to examine a wider range of the tree outputs and input mutation rates.)

Thank you for your time. I’m really learning a lot.

"it seems that one conceptual gap may be regarding distances-similarities and trees."

Yes, I think so.

It might help to consider a tree and ask a few questions about it. Can you imagine a distribution of mutations that fits the tree? What about one that does not fit the tree?

So it seems easier to imagine ways it would not work as opposed to ways it would. Ie. There are more ways to fail then to succeed.

(This is redundant), so when you compare different pairs of sequences you get identical trees? Can you give me some examples to look at?

If the data are themselves not structured hierarchically, there are ways to tell beyond just dumping them into a tree-building algorithm.

How do you do that?

try a number of different genes independently

I ran a sequence of the primates with an example file in MEGA, but I don’t have another sequence to check against it. Also, I tried to run 4 genes in another file (fasta file) that came with MEGA, but I got different trees for the different algorithims. What might I have done wrong?

If they all give you the same tree, could you conclude anything from that?

I would guess we probably have the right tree. Yes, that makes good sense.

What do you mean by: This can be done both within and between data sets.?

Just to be sure, you mean from different sites in the DNA, and/or between different protein sequences/DNA, right?

It would help if we were discussing a real scientific paper with a real phylogenetic analysis. I could modestly suggest one of mine if you like. Or maybe a primate paper would be more to your taste. It turns out that primates are easy.

That would be great! Simple is better.

I’ve listened to quite a few of Aaron Ra’s videos now. He is hairy. I’ve heard a lot of this before from EONS, but it’s good to hear it again.

There are a number of tests. Bootstrapping is perhaps the simplest: high bootstrap percentages show that the data agree among themselves.

First, you used MEGA, which isn’t really a phylogenetic analysis program. It’s an alignment program with a few simple phylogeny algorithm in rudimentary form. I wouldn’t accept those results for anything more complicated than neighbor-joining, and I wouldn’t believe the trees as adequate reflections of signal in the data. Can’t say much more without knowing just what you did with what data.

Yes. The question is whether different genes agree on the tree, and also whether different sites within genes agree too. More generally, whether different bits of data of any sort agree. The technical term is “congruence”.

If you want a real primate phylogeny paper, here is the first one that I found in a google search. Check it out and see what questions you have.

Here also is one of mine, not primates but birds, even more fun. I think it’s probably more readable, though I could be wrong.

1 Like