Coalescent processes are describing gene trees in populations. To generate a gene tree you sample from a population. Any sample will give you a result that on average reflects the same population history, especially in relation to population size, but coalescent models can account for many other aspects of population history (selection, bottlenecks, gene flow, multispecies coalescent models with some combination of isolation, etc.). The coalescent is describing a stochastic process so different samples will always produce different trees but on average conform to the expectations of coalescent theory (for example, shorter branches for smaller effective population sizes and longer branches for larger effective population sizes). **The point is these trees look nothing like Jeanson would expect a tree to look like for ANY given population (growing, stable, bottlenecked, etc.).** If he can not apply his method to a tree and correctly conclude anything about the population (for example he will always assume given his method a population is growing for ANY tree regardless of what the population is actually doing) then what good is it.

I didn’t say you were. I asked if you wanted Jeanson to be right and you said yes. I merely asked why would you want him to be right about this.

Can you share the full math/software you used for all these calculations/graphs/trees? I’d be interested to play around with this a bit myself. I’ve tried a bit myself but my popgen is rusty and the equations in your post aren’t super clearly written (e.g. 1-(1-(1-1/N*y*^(k(k-1)/2))^*t*)) - maybe you could use LaTeX math (LaTeX/Mathematics - Wikibooks, open books for an open world) instead e.g.:

“$$

1-\left(1-\left(1-\frac{1}{Ny^{\left(\frac{k(k-1)}{2}\right)}}\right)^t\right)

$$”

(remove the quotes)

gives:

If this is the equation you meant to describe? It doesn’t seem right to me.

My problem is the assumptions I see behind that - randomness and static populations. Has anyone tested the largest population size the majority of humans tend to find mates within? It has to be pretty small.

Anyway, I think I should work out some more trees and test them.

Sorry, I thought the general idea behind my answer to that would be obvious. My more specific reply would be that if he’s right, disproving the current paradigm would be helpful because I think it’s a hindrance for some people to listen to the gospel. I also think science advancement would happen more quickly if we understood evolution correctly. The cultural stuff you usually mention I could care less about except that I think it doesn’t lead to human flourishing. In my eschatology, there is no such thing as a “Christian” kingdom on earth.

Sorry, I didn’t use any programs. I only generated the graphs on excel using the formulas. The trees are manually made (For each branch, I asked how many generations ago is the probability for coalescence at or above 50%).This admittedly would not suffice for a college course on coalescence. I only did this to illustrate the point.

Yes, I also find it difficult to convey the formulas using simple texts. All those brackets are really confusing. And you are right, I made a typo in that formula. Although not when making the graphs, so they still check out (at least I think they do). The formula should’ve been the following:

**1-(1-(1-(1-1/Ny)^(k(k-1)/2)))^t**

Although, I now also see that this is redundant, since it can be simplified to this:

**1-((1-1/Ny)^(k(k-1)/2))^t**

Or, using a more eligible format, I will go over the formulas step by step:

The probability of coalescence for one allele pair (or two Y-chr. lineages), in the previous generation, with a Y-chr. population size of *Ny*.

The probability of **NOT** coalescence for one allele pair (or two Y-chr. lineages), in the previous generation, with a population of *Ny*.

The probability of **NOT** coalescence for *k* number of allele pairs (or # of Y-chr. lineages), in the previous generation, with a Y-chr. population of size *Ny*.

The probability of **NOT** coalescence for *k* number of allele pairs (or # of Y-chr. lineages), at *t* generations ago, with a Y-chr. population of size *Ny*.

The probability of **at least ONE** coalescence for *k* number of allele pairs (or # of Y-chr. lineages), at *t* generations ago, with a Y-chr. population of size *Ny*. **This is how the formula is supposed to be. The one you asked about.**

I think this correct, right?

I know you have said other stuff later, but I was late to come back to the forum and a lot happened, so I would still like to answer your question to me.

It depends on what “tree” you are referring to. If you look at the Y-chromosomal tree like this and only look at the branching order (ignore the time scale on the left of it for a moment).

(Figure from a recent paper)

The branching order of this tree is not determined from coalescence. That’s just from the comparison of different y-chromosome sequences and determining the root of the tree. Although Jeanson would also dispute the root, but that is a different discussion. Regarding coalescence and random mating, that doesn’t matter. This tree is ‘biologically accurate’.

So when you said **regarding coalescence and the resulting trees** it sounds like you are saying that coalescence is what produces these trees (the branching order), which is not the case. Although, you probably didn’t meant to say this.

Concerning coalescence, that becomes relevant regarding the age estimates for when these branches have join (i.e. coalesce) at the common ancestor (the nodes). The authors of that paper and figure used both mutation rates and coalescence to estimate the ages of divergence. If you assume random mating in coalescence, while random-mating doesn’t apply, your age estimations will be very skewed. Most of the time, these ages would be under estimated, because humans are more likely to mate with others that are nearby, especially during most of human history. So, with the probability of mating being highly dependent on distance, more time is needed for lineages (especially those separated by great distances) to coalesce then you would calculate assuming random-mating. However, population structure is taken into account when using coalescence to estimate the times of coalesnce.

I didn’t use a problem (see my reply to @evograd). But I can make the graphs for those.

The main difference is that in the left (population = 18 million), most lineages are expected to coalesce over 80 generations ago. On the right (population = 4 billion) most are expected to coalesce over 130 generations ago. Again, my simple calculations assumes random mating, but you can clearly notice the effect of population size.

Great, that helps! The formula is correct now, what about the next one with r in it? You mentioned **Ny0(1+r)^-t** but how does that relate to k to make the graphs? (e.g. the one with Ny0 = 2000, r = 0.01).

When you say the trees were manually made, how exactly? Running calculations by hand and literaly drawing a tree, or using a package like ape in R? I’m using R and going to try and automate all this so you can enter the key parameters and get the curves and trees.

And there’s the problem. You are religiously motivated to agree with Jeanson and that is a problematic thing. The possibility of using Jeanson’s conclusions to bring people to the gospel of Jesus Christ is very compelling and that will affect your ability to be critical of his claims. Again you WANT Jeanson to be right and you want him to be right because him being right benefits your religious agenda. We all should examine our biases when evaluating science and I see very little of that done among creationists.

All those assumptions can be tested. Coalescent approaches are NOT limited to static populations as I and others have repeatedly explained. Coalescent models may accommodate lots of different parameters including population dynamics and structure, selection, etc. There is a deep and extensive literature on coalescent approaches in population genetics going back almost 40 years and all of these things have been modeled. There are stacks of books on this subject. It should cause you some pause that Jeanson is very obviously oblivious to all of that. You at this point Valerie know far more about coalescent theory than does Harvard-educated Jeanson.

@Nesslig20 Figured out the answer to my own question, you replace Ny in the previous formula with (Ny0(1+r)^-t). Are you sure it’s -t at the end there though, not simply t?

Ny0(1+r)^-t leads to exponentially shrinking Ny with increasing t, Ny0(1+r)^t leads to exponential growth.

Right, now that you ask about it, I realize that I made a mistake on that one.

Starting from this following formula.

I simply replaced **Ny** with the formula **Ny0(1+r)^-t** (which calculates the population at *t* generations back for a population growing exponentially at a rate of *r*). From that, I used the following formula for those Ny0 = 2000, r = 0.01 graphs.

First I thought I could simply replace the variable for a static population size (**Ny**) with a formula for a exponentially growing population size (**Ny0(1+r)^-t**), but I now see that this doesn’t work. For example, if I plug in t=10 to calculate to probability of coalesce 10 generations ago, the formula would calculate the probability as if the population size from generation 1 to 10 is the same as that of generation 10 (thus not exponentially growing).

@thoughtful please ignore those graphs for exponentially growing populations.

I made a mistake on those.

The trees were drawn by hand, yes. They were just crude illustrations to convey the point that branches coalesce with higher rates in smaller populations and lower rates in larger populations. They are not generated by a computer using coalescence simulations. It would be a lot better to have R do the job for you, yest. Can you do it? I also use R, but I have never applied it with coalescence specifically.

Yes, because the calculation goes backward. From generation 0 (the current generation) to *t* generations ago (in the past). Although, the formula is not applicable (I think) for the reasons I pointed out in the previous reply.

Ok yeah I see what you mean about the calculation going backwards.

It’s not immediately clear to me why the formula is inapplicable, but from reading more about simulating coalescence in growing populations it’s all a bit over my head. I can sort of intuitively understand what’s going on, like all this stuff about scaling fictional to real time (e.g. p469 on https://evolution.gs.washington.edu/pgbook/pgbook.pdf), but I don’t understand the equations well enough to perform the simulations myself.

Maybe @Joe_Felsenstein or other popgen aficionados can help?

Ok, I’ll ignore them for now, but if you feel you have accurate ones, please share. Again thank you for an excellent reply - really helpful to understand correctly a d figure out my misunderstanding. You would be a great teacher, or maybe you already do teach.

Sure, I agree. And yes, I’m religiously motivated to agree with him; I don’t deny that. But if I didn’t want to examine my biases at least a little bit I wouldn’t be on this forum. If I didn’t care enough about everyone else here to challenge their biases a bit, I wouldn’t be on this forum. If I thought I could prove anything to anyone, I’ve dropped that delusion long ago.

… also in my (free) online book (PDF linked by you) on page 470.

I am incredibly overworked until the 16th (running an online workshop in my field). So NO. But there is more detailed coverage in my book *Inferring Phylogenies*, chapter 26. Particularly in the sections on “Kingman’s coalescent” and “Effect of varying population size” (pages 456 and 460-461). These give a simple recipe for simulating coalescent genealogies in either a population of constant size, or an exponentially growing population. Maybe someone can dig up a copy from their shelves or the shelves of a library and fill people in.

Ok I think I’ve made some progress, at least in terms of understanding how the probability of 2 alleles in a population coalescing in growing/shrinking populations.

I’ve been coding everything up in R, anyone can copy and paste this in to a new script and play around too. I’ve tried to document everything but it’s still fairly messy at the moment.

Here’s an example where Ny0=500, simulating different exponential population growth rates and also showing the population size history:

```
###load packages
library(SciViews) #needed to use ln function
library(ggplot2) #needed to make nice graphs
library(gridExtra) #needed to show plots side-by-side
###
###define parameters
Ny0 <- 500 #population at time 0 (present)
t <- seq(0,2000,50) #range of time points in the past to be simulated
#k <- 16 number of sampled alleles (not used yet)
g <- 0.01 #population growth rate since time t to present
###
Nyt<-Ny0*exp(-g*t) #exponential growth between time t and 0 - gives population size at time t
plot(t, Nyt) #plot population size going back in time
###coalescence in constant population
P = 1-((1-1/Ny0)^t) #calculate coalescence probability of 2 alleles in constant population size
constant <- data.frame(t, P)
cgraph <-
ggplot(constant, aes(x=t, y=P)) +
geom_line() +
theme_classic() +
geom_point() +
ylab("P(coalescence)") +
ggtitle("constant population size")
gvalue0 <- rep("0 (constant)", length(t)) #make vector with gvalues (0)
constantdf <- data.frame(t, gvalue0, P) #format constant data into data for combination later
colnames(constantdf)[1] <- "Rt"
colnames(constantdf)[2] <- "gvalue"
###
###coalescence in growing population
#need to convert the times in constant population to "real time" to account for growing population
Rt <- (1/g)*ln(1+g*t) #treat t as fictional time then convert to real time (Rt)
growing <- data.frame(Rt, P)
ggraph <-
ggplot(growing, aes(x=Rt, y=P)) +
geom_line() +
theme_classic() +
geom_point() +
xlim(0, max(Rt)) +
ylab("P(coalescence)") +
ggtitle("growing at rate g=0.01")
###
grid.arrange(cgraph, ggraph+xlim(0,max(t)), ncol = 1) #plot both graphs at the same time
###coalescence in shrinking population
s=-0.01 #population shrinkage rate (inverse of g)
Nyt<-Ny0*exp(-s*t) #exponential shrinkage between time t and 0 - gives population size at time t
#need to convert these times to real t
Rts <- (1/s)*ln(1+s*t) #treat t as Ft then convert to Rt
shrinking <- data.frame(Rts, P)
sgraph <-
ggplot(shrinking, aes(x=Rts, y=P)) +
geom_line() +
theme_classic() +
geom_point() +
xlim(0, max(Rt)) +
ylab("P(coalescence)") +
ggtitle("shrinking at rate g=-0.01")
###
grid.arrange(ggraph+xlim(0,max(t)), cgraph, sgraph+xlim(0,max(t)), ncol = 1) #plot growing, constant, and shrinking population graphs
###plotting a range of population growth rates
grange <- c(0.001, 0.01, 0.1) #set list of g values to simulate
gtable<-data.frame(x1 = 1:length(t)) #create placeholder dataframe to store simulated real times
for(g in grange) {
new <- (1/g)*ln(1+g*t)
gtable[ , ncol(gtable) + 1] <- new
colnames(gtable)[ncol(gtable)] <- paste0(g)
} #iterate through different g values
gtable = gtable[ ,-1] #remove first placeholder column
head(gtable) #check top of table
gtable<-data.frame(x=unlist(gtable)) #flatten into single column
gtable$gvalue <- rep(grange, each=length(t)) #add g value column as labels
rownames(gtable) <- NULL #remove misleading rownames
multipleg <- data.frame(gtable, rep(P, length(grange))) #add in P values to data frame
colnames(multipleg)[1] <- "Rt"
colnames(multipleg)[3] <- "P"
head(multipleg) #check top of table
multiplegfinal <-rbind(constantdf, multipleg) #combine with constant population results
multipleggraph <-
ggplot(multiplegfinal, aes(x=Rt, y=P, color=factor(gvalue))) +
geom_line() +
theme_classic() +
geom_point(size = 1) +
xlim(0, max(multiplegfinal$Rt[is.finite(multiplegfinal$Rt)])) +
ylab("P(coalescence)") +
xlab("generations before present") +
ggtitle("P(coalescence) with growth rates g") +
theme(legend.position = c(0.7, 0.4))
multipleggraph
#note leftward shift of coalescence probability in growing population - more likely to coalescence earlier as a result of the smaller population sizes in the past
ggsave(plot=multipleggraph,filename = "P(coalescence).png", units = "px", width = 1200, height = 1300, device='png', dpi=300)
###plot population size over time
gpoptable<-data.frame(x1 = 1:length(t)) #create placeholder dataframe to store simulated real times
for(g in grange) {
new <- Ny0*exp(-g*t)
gpoptable[ , ncol(gpoptable) + 1] <- new
colnames(gpoptable)[ncol(gpoptable)] <- paste0(g)
} #iterate through different g values
gpoptable = gpoptable[,-1] #remove first placeholder column
head(gpoptable) #check top of table
constantpopN <- rep(Ny0, length(t)) #make vector with constant population size
constantpopdf <- data.frame(constantpopN, gvalue0, t) #format constant data into data for combination later
colnames(constantpopdf)[1] <- "N"
colnames(constantpopdf)[2] <- "gvalue"
colnames(constantpopdf)[3] <- "t"
gpoptable<-data.frame(x=unlist(gpoptable)) #flatten into single column
gpoptable$gvalue <- rep(grange, each=length(t)) #add g value column as labels
rownames(gpoptable) <- NULL #remove misleading rownames
colnames(gpoptable)[1] <- "N" #rename first column
head(gpoptable) #check top of table
gpoptable <- data.frame(gpoptable, t) #add t column to table
multiplegpopfinal <-rbind(constantpopdf, gpoptable) #combine with constant population N table
multiplepopgraph <-
ggplot(multiplegpopfinal, aes(x=t, y=N, color=factor(gvalue))) +
geom_line() +
theme_classic() +
geom_point(size = 1) +
ylab("Ny") +
xlab("generations before present") +
ggtitle("Population size history by growth rate g") +
theme(legend.position = "none")
#+
#theme(legend.position = c(0.8, 0.63))
###
figure <- grid.arrange(multiplepopgraph, multipleggraph, nrow = 1)
ggsave(plot=figure, filename = "coal and pop.png", units = "px", width = 3000, height = 1500, device='png', dpi=400)
```

Thanks, those parts of your books were actually my first port of call, and were helpful, although as this subject isn’t fresh in my mind I’m still working on understanding a lot of it.

I think I’m happy with the simplest coalescence calculation in growing populations, getting the probability of coalescence of 2 alleles going back in time (please check my graphs/calculations when you have free time in a couple of weeks!), but I haven’t got to the point of understanding how to get into the probability density functions to see how to coalesce k alleles and simulate trees from this. It’s been a long time since I’ve practiced any of this kind of maths.

Oops make that mean *4N/((k-J+1)(k-J))*. (I wish I knew why, when I post a small correction to a bigger comment, the correction is approved first.)

Your graphs look OK except for the mistake in the color key in the growing-population graph. If you have the function that goes from fictional time to real time, you just simulate a coalescent with constant population size with *k* tips on the fictional time scale, then map the coalescence times back to the real times. The times will get squished together back when the population size is small. You don’t want to try to get the density functions of the coalescence times (too messy). Just use the fact that the successive coalescence intervals (going back) are independent exponential variables, the *J*th of which has mean length *4N/(J(J+1))*.generations.