"I'm treating the mutation rate as a substitution rate" - Dr. Nathaniel Jeanson

Michael_Okoko · July 12, 2022, 4:34am

Why?

Who says there must have been a selective pressure in the first place?

Argon · July 12, 2022, 7:31pm

Exactly. I’d like him to post a reference to a scientific paper or book and coherently support his claim of why this wouldn’t be expected under neutral theory.

And this doesn’t count…

colewd · July 13, 2022, 12:08pm

colewd

Hi Ron
Why would loss of purifying selection be unique to humans? Mice have much higher reproductive rates and are older species. How do you correlate unique smells to rodent reproductive rates?

It looks like different starting points for the sequences should be considered

A lack of selective pressure in what animal(s)?.

Rumraket · July 13, 2022, 12:12pm

The one in which those olfactory receptor genes decayed into pseudogenes. The animals on the Homo lineage. You.

Michael_Okoko · July 13, 2022, 12:12pm

My thoughts exactly. I suspect he doesn’t understand what he wrote.

thoughtful · July 13, 2022, 5:19pm

@Rumraket I read this but I’m not sure what I should be taking away from it. What do you see or what would you point out in terms of strong evidence for nested hierarchy? A Molecular Phylogeny of Living Primates

John_Harshman · July 14, 2022, 4:05am

Congruence of this study with previous studies. Congruence among separate partitions of the data within the study. High bootstrap values.

Rumraket · July 14, 2022, 2:40pm

What @John_Harshman said. But I figure it actually takes a bit of explaining. The bootstrap is basically a test of tree consistency.

You have to understand something about how the phylogeny is made in the first place, and then how it’s tested for consistency. As it says in the paper it’s inferred on the basis of 54 nuclear genes from each of those 191 species. It’s important to note that those 54 nuclear genes come from different chromosomes, have wildly different functions, some come from coding regions and some from non-coding regions. Introns, exons, genes that regulate other genes and control development, genes that function as enzymes, genes that take part in DNA replication, etc. Some from X and some from Y chromosomes. So whatever functional constraint you might imagine operates on some gene, it’s rather difficult to see how the same constraint can be operating on another, particularly in a sense where that constraint should force trees inferred from each of those 54 loci to give similar gene trees. That just doesn’t make sense.

Anyway, what that means is that in every species they collect the sequences for those 54 genes and put them literally end-to-end in one long DNA sequence(~35000 DNA basepairs for each species), and then they construct one giant alignment from those 191 x 35000bp sequences. This alignment is then used to infer a tree with a phylogenetic algorithm.

Here’s where the bootstrap comes in.

The bootstrap basically constructs a new alignment from the original alignment, by randomly pulling columns from the original one and building a new one. Then a new tree is inferred from this new alignment derived from random sampling of the original alignment. Then the trees are compared and it is noted how many nodes are different between the trees. This is done one hundred times. That is, 100 times a new alignment is made of the same size as the original, by taking random pieces of the original alignment and inferring a new tree from this random sample of the data. Since the original data is randomly sampled, you will not get the same alignment every time, and so in some alignments you get more data from some genes and less from others. And since it’s random and there’s so much data, it’s very unlikely you get the same random samples. So each tree should have different biases.

There’s this nice table in Baum & Smith’s Tree Thinking that shows what a resampled bootstrap data set is. Notice how for example column 35 from the original alignment was sampled 4 times for the bootstrap data, which means whatever tree is inferred from that new bootstrap alignment will be more strongly biased by the data from column 35 than the tree from the original alignment:

In the primate phylogeny paper the number of columns is of course then ~35000 and there’s 191 rows.

100 times such a new resampled alignment is created, a new tree inferred, and the new trees are compared.

It is then noted for each node in the tree how many times out of those 100 that particular node is recovered. So the bootstrap value for each node is then literally how often that node is found by randomly sampling the original data set and making a new tree from that sample. If the data from those 54 genes has a lot of inconsistency, it is likely it should show by low bootstrap values for lots of nodes. I think a rule of thumb is that bootstrap values above 90 are considered good, which means that node is recovered in 90 out of 100 times.

The vast majority of nodes in the tree from the molecular phylogeny of primates has bootstrap values above 90.

Again, it’s important to understand just how different the trees could be. For 191 species there are 6.2137x10⁴⁰⁷ possible rooted trees. Trees can disagree in an incomprehensible number of ways. Finding almost identical ones over and over again basically no matter what genetic locus it is inferred from is an incredible degree of consistency. What explanation is there for that, other than common descent? That is that they have been sharing the same genealogical history, and are therefore constrained by the same speciation events, having been part of the genomes of members of the same species and evolved for similar amounts of time? Common functional constraint, or common design as creationists often invoke, simply does not explain that. What IS this supposed functional constraint operating on such wildly different DNA sequences that forces them to give similar trees?

So that’s it. Did that make sense?

Edit: Btw the number of bootstraps replicates doesn’t have to be 100, it can be many more. But it’s extremely computationally demanding creating phylogenetic trees from so much data so afaik it’s kept to something like 1000 replicates or below and the bootstrap value for a node can be something like 852, or 978 out of 1000, or whatever.

John_Harshman · July 15, 2022, 5:15am

Well, I hope @thoughtful reads that. I would add a few slight complications that might cause confusion:

As the amount of data increases, the bootstrap becomes increasingly vulnerable to the tiniest biases in the data, such that for genome-level data it’s nearly useless and will pretty much always give you 100% bootstrap for every single node. This, on the other hand, is a mid-range data set by current standards, so anything with >90% support should be good.
Since incomplete lineage sorting is a thing, some partitions of the data will in some cases support different resolutions of a node or two, mostly when an intervening branch is very short. This will, for example, be true of the relationships among chimp, human, and gorilla. Some data partitions will support each of the three possible resolutions. But chimp + human will have many more partitions supporting it than will the other two. Not sure if there are other good examples on that tree.

Mercer · July 15, 2022, 5:15am

Constraint is the most important concept here. Evolution is constrained in ways that design and special creation are not.

colewd · July 15, 2022, 5:19am

If common descent is inferred (due to methodological naturalism) then yes I agree with you. If separate starting points is considered then there is a lot more testing to do to reach a conclusion.

https://www.nature.com/articles/s41576-019-0196-1#article-info

CrisprCAS9 · July 15, 2022, 3:02pm

If you assume something silly, reality causes problems. You’re so close to getting it.

Rumraket · July 15, 2022, 3:04pm

Bill that response doesn’t make sense.

Nothing about methodological naturalism is at work here. These protein coding genes are broken. It is entirely hypothetically possible that some of them have stretches where the DNA has retained or evolved other functions, but they clearly don’t and couldn’t possibly function as GPCRs and thus contribute to the sense of smell by binding odorants found in the environment.

Olfactory receptor genes code for proteins. You know, the typical G-protein coupled receptor (GPCR). Them being olfactory receptors is how they function in the sense of smell, they are receptors for molecules found in the environment, and how a smell is detected is by that molecule in the environment sticking to the receptor and triggering some other intracellular response.

Since many more of the OR genes in humans are broken in ways that really do break protein-coding genes(premature stop codons, missing or partial exons, frameshift mutations, missing promoters), they clearly no longer function in the sense of smell. The molecules that used to bind them no longer bind them, either because the protein is broken and/or not even expressed. So they don’t work. Whatever possible function those pseudogenes might hypothetically now have instead it isn’t as a receptor for an odor molecule.

And we all survive just fine without being able to detect these smells, or detect them as well as our primate cousins and more distantly related species. And we really do have a much worse sense of smell than our primate cousins and even more distant relatives. This isn’t something that can be reasonably disputed.

I can only speak for myself here but I don’t find my food, or willing mates, or recognize friends and family members by sniffing them out. Call me weird.

colewd · July 15, 2022, 9:54pm

Hi Rum
If the standard in science was that there is a mixture of common descent and multiple separate origin events how would you determine that these seqments were truly broken genes and not simply expressed regulatory elements?

Mercer · July 15, 2022, 9:54pm

I can safely speak for all of the scientists here in stating that we do not infer common descent because of methodological naturalism. We do so because it is consistent with the extant data, and more importantly, has made hundreds of thousands of correct predictions of data.

What are you proposing and/or doing to test, then?

RonSewell · July 16, 2022, 8:17pm

You are driving down the highway when you encounter vehicular heaps of twisted metal, shards of glass, parts and tires strewn in the ditch. Is your first impulse to say, “check out this work of abstract art?”

We take them to be broken because we know what they are supposed to look like when they are functioning. We might even be able to identify specific pseudogenes which are still functioning in other lineages (unless actually being able to smell is the aberration).

Rumraket · July 18, 2022, 12:21am

Those two categories are not mutually exclusive. A pseudogene is a broken protein-coding gene. That does not actually preclude some part of it from having retained or subsequent to it’s breaking acquired some other function.
Common vs separate ancestry simply does not enter the picture in considering whether it is, in fact, a gene that used-to-be-but-no-longer-functions-as-a-protein-coding-gene. I don’t have to assume common ancestry(or naturalism, whether methodological or metaphysical) in order to see that a premature stop codon, missing exon, or frameshift mutation is very likely to render the resulting protein sequence unable to perform it’s function as an olfactory receptor.

And nothing about this inference requires that I assume that some piece of this pseudogene can’t serve as a binding spot for some regulatory protein. But I think you have to also understand that binding spots for regulatory proteins (vast majority fall in in the area of 5-30 basepairs) are very small compared to the average length of an entire protein coding gene including both exons and introns(which is in the multiples of tens of thousands of basepairs).

So if you think the entire locus of a pseudogene of an OR—which if it is even half-way intact in length is likely to be at least ten thousand basepairs long—should be considered functional and therefore not a pseudogene because you’ve identified a 30 binding spot for a regulatory protein somewhere in this stretch of DNA, then you’re being silly.

colewd · July 18, 2022, 2:31pm

Hi Rum
Why can we not consider the possibility that it is a designed regulatory element and what we are observing is the same original form or close to its original form?

John_Harshman · July 19, 2022, 4:58am

Then why are these designed regulatory elements organized in a nested hierarchy along with the working genes they are (so we conclude) related to?

Argon · July 19, 2022, 11:37pm

John, while you may think you are engaging a person in an argument bound by logic or standard rule of debate, what you’re actually dealing with is a ‘negation generator’. Internal consistency or even retained memory of previous threads is not part of the programming – That is a feature of the app, not a bug. This particular, first generation AI, is configured to process all inputs with two settings:

Anything presented by the scientific establishment, no matter how well established must be negated.
Anything proposed by any creation groups and related individuals will be presented as “feasible”, and never ‘off the table’. In cases where the creationist supports common descent or even evolutionary processes for the much of the biosphere, describe that as ‘reasonable’ even if the same argument was negated previously because that source fell under rule one.

It’s really just Monty Python’s ‘Argument room’ sketch.

Topic		Replies	Views
Testing Jeanson's Model: Y Chromosome Mutation Rates Conversation Science	108	6029	August 10, 2020
Dr. Joshua Swamidass and TMR4A: Some Major Modeling Problems for Young Earth Creationism Conversation Science , Theology	59	4379	November 18, 2020
A Dialogue with Nathaniel Jeanson? Conversation Science , Society	90	6925	July 8, 2020
Nathaniel Jeanson’s Traced Conversation Science	96	2325	May 19, 2022
Testing the creationist hypothesis Conversation Adam	32	1436	March 10, 2022

"I'm treating the mutation rate as a substitution rate" - Dr. Nathaniel Jeanson

Related topics