"I'm treating the mutation rate as a substitution rate" - Dr. Nathaniel Jeanson


Hi Ron
Why would loss of purifying selection be unique to humans? Mice have much higher reproductive rates and are older species. How do you correlate unique smells to rodent reproductive rates?

It looks like different starting points for the sequences should be considered

A lack of selective pressure in what animal(s)?.

The one in which those olfactory receptor genes decayed into pseudogenes. The animals on the Homo lineage. You.

My thoughts exactly. I suspect he doesn’t understand what he wrote.

@Rumraket I read this but I’m not sure what I should be taking away from it. What do you see or what would you point out in terms of strong evidence for nested hierarchy? A Molecular Phylogeny of Living Primates

Congruence of this study with previous studies. Congruence among separate partitions of the data within the study. High bootstrap values.


What @John_Harshman said. But I figure it actually takes a bit of explaining. The bootstrap is basically a test of tree consistency.

You have to understand something about how the phylogeny is made in the first place, and then how it’s tested for consistency. As it says in the paper it’s inferred on the basis of 54 nuclear genes from each of those 191 species. It’s important to note that those 54 nuclear genes come from different chromosomes, have wildly different functions, some come from coding regions and some from non-coding regions. Introns, exons, genes that regulate other genes and control development, genes that function as enzymes, genes that take part in DNA replication, etc. Some from X and some from Y chromosomes. So whatever functional constraint you might imagine operates on some gene, it’s rather difficult to see how the same constraint can be operating on another, particularly in a sense where that constraint should force trees inferred from each of those 54 loci to give similar gene trees. That just doesn’t make sense.

Anyway, what that means is that in every species they collect the sequences for those 54 genes and put them literally end-to-end in one long DNA sequence(~35000 DNA basepairs for each species), and then they construct one giant alignment from those 191 x 35000bp sequences. This alignment is then used to infer a tree with a phylogenetic algorithm.

Here’s where the bootstrap comes in.

The bootstrap basically constructs a new alignment from the original alignment, by randomly pulling columns from the original one and building a new one. Then a new tree is inferred from this new alignment derived from random sampling of the original alignment. Then the trees are compared and it is noted how many nodes are different between the trees. This is done one hundred times. That is, 100 times a new alignment is made of the same size as the original, by taking random pieces of the original alignment and inferring a new tree from this random sample of the data. Since the original data is randomly sampled, you will not get the same alignment every time, and so in some alignments you get more data from some genes and less from others. And since it’s random and there’s so much data, it’s very unlikely you get the same random samples. So each tree should have different biases.

There’s this nice table in Baum & Smith’s Tree Thinking that shows what a resampled bootstrap data set is. Notice how for example column 35 from the original alignment was sampled 4 times for the bootstrap data, which means whatever tree is inferred from that new bootstrap alignment will be more strongly biased by the data from column 35 than the tree from the original alignment:

In the primate phylogeny paper the number of columns is of course then ~35000 and there’s 191 rows.

100 times such a new resampled alignment is created, a new tree inferred, and the new trees are compared.

It is then noted for each node in the tree how many times out of those 100 that particular node is recovered. So the bootstrap value for each node is then literally how often that node is found by randomly sampling the original data set and making a new tree from that sample. If the data from those 54 genes has a lot of inconsistency, it is likely it should show by low bootstrap values for lots of nodes. I think a rule of thumb is that bootstrap values above 90 are considered good, which means that node is recovered in 90 out of 100 times.

The vast majority of nodes in the tree from the molecular phylogeny of primates has bootstrap values above 90.

Again, it’s important to understand just how different the trees could be. For 191 species there are 6.2137x10407 possible rooted trees. Trees can disagree in an incomprehensible number of ways. Finding almost identical ones over and over again basically no matter what genetic locus it is inferred from is an incredible degree of consistency. What explanation is there for that, other than common descent? That is that they have been sharing the same genealogical history, and are therefore constrained by the same speciation events, having been part of the genomes of members of the same species and evolved for similar amounts of time? Common functional constraint, or common design as creationists often invoke, simply does not explain that. What IS this supposed functional constraint operating on such wildly different DNA sequences that forces them to give similar trees?

So that’s it. Did that make sense?

Edit: Btw the number of bootstraps replicates doesn’t have to be 100, it can be many more. But it’s extremely computationally demanding creating phylogenetic trees from so much data so afaik it’s kept to something like 1000 replicates or below and the bootstrap value for a node can be something like 852, or 978 out of 1000, or whatever.


Well, I hope @thoughtful reads that. I would add a few slight complications that might cause confusion:

  1. As the amount of data increases, the bootstrap becomes increasingly vulnerable to the tiniest biases in the data, such that for genome-level data it’s nearly useless and will pretty much always give you 100% bootstrap for every single node. This, on the other hand, is a mid-range data set by current standards, so anything with >90% support should be good.

  2. Since incomplete lineage sorting is a thing, some partitions of the data will in some cases support different resolutions of a node or two, mostly when an intervening branch is very short. This will, for example, be true of the relationships among chimp, human, and gorilla. Some data partitions will support each of the three possible resolutions. But chimp + human will have many more partitions supporting it than will the other two. Not sure if there are other good examples on that tree.


Constraint is the most important concept here. Evolution is constrained in ways that design and special creation are not.


If common descent is inferred (due to methodological naturalism) then yes I agree with you. If separate starting points is considered then there is a lot more testing to do to reach a conclusion.


If you assume something silly, reality causes problems. You’re so close to getting it.


Bill that response doesn’t make sense.

Nothing about methodological naturalism is at work here. These protein coding genes are broken. It is entirely hypothetically possible that some of them have stretches where the DNA has retained or evolved other functions, but they clearly don’t and couldn’t possibly function as GPCRs and thus contribute to the sense of smell by binding odorants found in the environment.

Olfactory receptor genes code for proteins. You know, the typical G-protein coupled receptor (GPCR). Them being olfactory receptors is how they function in the sense of smell, they are receptors for molecules found in the environment, and how a smell is detected is by that molecule in the environment sticking to the receptor and triggering some other intracellular response.

Since many more of the OR genes in humans are broken in ways that really do break protein-coding genes(premature stop codons, missing or partial exons, frameshift mutations, missing promoters), they clearly no longer function in the sense of smell. The molecules that used to bind them no longer bind them, either because the protein is broken and/or not even expressed. So they don’t work. Whatever possible function those pseudogenes might hypothetically now have instead it isn’t as a receptor for an odor molecule.

And we all survive just fine without being able to detect these smells, or detect them as well as our primate cousins and more distantly related species. And we really do have a much worse sense of smell than our primate cousins and even more distant relatives. This isn’t something that can be reasonably disputed.

I can only speak for myself here but I don’t find my food, or willing mates, or recognize friends and family members by sniffing them out. Call me weird.


Hi Rum
If the standard in science was that there is a mixture of common descent and multiple separate origin events how would you determine that these seqments were truly broken genes and not simply expressed regulatory elements?

I can safely speak for all of the scientists here in stating that we do not infer common descent because of methodological naturalism. We do so because it is consistent with the extant data, and more importantly, has made hundreds of thousands of correct predictions of data.

What are you proposing and/or doing to test, then?


You are driving down the highway when you encounter vehicular heaps of twisted metal, shards of glass, parts and tires strewn in the ditch. Is your first impulse to say, “check out this work of abstract art?”

We take them to be broken because we know what they are supposed to look like when they are functioning. We might even be able to identify specific pseudogenes which are still functioning in other lineages (unless actually being able to smell is the aberration).

1 Like

Those two categories are not mutually exclusive. A pseudogene is a broken protein-coding gene. That does not actually preclude some part of it from having retained or subsequent to it’s breaking acquired some other function.
Common vs separate ancestry simply does not enter the picture in considering whether it is, in fact, a gene that used-to-be-but-no-longer-functions-as-a-protein-coding-gene. I don’t have to assume common ancestry(or naturalism, whether methodological or metaphysical) in order to see that a premature stop codon, missing exon, or frameshift mutation is very likely to render the resulting protein sequence unable to perform it’s function as an olfactory receptor.

And nothing about this inference requires that I assume that some piece of this pseudogene can’t serve as a binding spot for some regulatory protein. But I think you have to also understand that binding spots for regulatory proteins (vast majority fall in in the area of 5-30 basepairs) are very small compared to the average length of an entire protein coding gene including both exons and introns(which is in the multiples of tens of thousands of basepairs).

So if you think the entire locus of a pseudogene of an OR—which if it is even half-way intact in length is likely to be at least ten thousand basepairs long—should be considered functional and therefore not a pseudogene because you’ve identified a 30 binding spot for a regulatory protein somewhere in this stretch of DNA, then you’re being silly.


Hi Rum
Why can we not consider the possibility that it is a designed regulatory element and what we are observing is the same original form or close to its original form?

Then why are these designed regulatory elements organized in a nested hierarchy along with the working genes they are (so we conclude) related to?

John, while you may think you are engaging a person in an argument bound by logic or standard rule of debate, what you’re actually dealing with is a ‘negation generator’. Internal consistency or even retained memory of previous threads is not part of the programming – That is a feature of the app, not a bug. This particular, first generation AI, is configured to process all inputs with two settings:

  1. Anything presented by the scientific establishment, no matter how well established must be negated.

  2. Anything proposed by any creation groups and related individuals will be presented as “feasible”, and never ‘off the table’. In cases where the creationist supports common descent or even evolutionary processes for the much of the biosphere, describe that as ‘reasonable’ even if the same argument was negated previously because that source fell under rule one.

It’s really just Monty Python’s ‘Argument room’ sketch.


Bill we can consider any and all possibilities, it’s just that that one is a worse explanation because it can’t make sense of all the facts.

So it is exactly because we can in fact consider, weigh, contemplate, comprehend, and explore that possibility both in our minds and in relation to the known facts and data, that we conclude it is a worse option in terms of everything we would want from a scientific explanation.

It predicts no particular pattern (why should some OR gene-that-functions-as-a-regulatory-element be mutated in this way, and another in that way? Why the high degree of synteny to other primates, why do we have a worse sense of smell and broken genes that make sense of it?), it doesn’t explain the pattern there is in the data (nested hierarchy, the biochemical causes of mutations that give transversion:transition biases) it explains almost none of the relevant facts (why they look like broken OR genes in the first place).

1 Like

If science is limited to methodological naturalism then I agree with you. I do think this limitation is not in the best interest of science as it can lead to misleading conclusions.

Design (different starting points) could be a complete explanation of the pattern if it turns out the pseudo genes are really functional regulatory elements or perhaps another function we have discovered.

Common descent is not a complete explanation for the pattern as you have to appeal to a dramatic shift in purifying selection as mice should see more losses then primates due to high reproduction rates.

This is the same problem with the Howe pattern as in this case you have to explain how large quantities of gene gain and gene loss get fixed in vertebrate populations…