Window Size For Effective Population Size Estimates?

No, I’m not aware of any alternative to discretization. I could imagine a method in which population size is an algebraic function of time, with a finite number of coefficients, such as log-size being a polynomial function of time. That would of course have to be done so that the proper computations were tractable. But it is hard to imagine any method more flexible than that being usable.

The divergence of forward and backward (i.e. coalescent) simulations should be small if the sample size is a small fraction of the population size. I think that in the Lauterbur paper sample sizes were, say, half the population size. That would correspond to a sample of vast numbers of people. In a PMC/SMC type method such as you use, sample size is in effect 2 haploid genomes. However you do investigate imagined bottlenecks and that may lead to some discrepancy between forward and backward simulations – it is hard to see how large those effects are.

1 Like

Having trouble understanding your graphs and do not have time spend on this right now. The issue I was trying to address is whether there are so few coalescent lineages back at the time of the imagined bottleneck that it’s going to be hard to infer its existence. Note that the coalescent is an Ancestral Recombination graph so we have to have enough pieces of genome uncoalesced back then to be able to see the bottleneck.

(Minor brag: You mentioned the known property of coalescent inferences which allow for population size change, that they tend to artifactually infer population growth. Yes, I know about that – it was first noticed and explained in a paper by Mary Kuhner, Jon Yamato, and I in Genetics in 1998 and we discussed it further in a symposium-volume paper in 1999 here.)


If you’re doing some variant of a coalescent model, I would validate the bottleneck handling with an explicit forward model, using a small enough population size that the forward model could handle it.

This is a forward simulation (WF) through the bottleneck. The bottleneck is instantaneous, consistent with a founding event, followed by exponential growth at reasonable rates.

Our approach uses well validated simulation software used by hundreds of other scientists, not custom code unique to our group.

Correction to my earlier statement. We are using MSMC in this one, and I think a sample size of about 10 individuals.

Fair enough. That graph is hard to interpret. We are working on it. And I agree with you about the question of where the coalescence are. The key finding we can demonstrate nicely is that the TMRCA times can be used to delimit the “streetlight”, how far back the inferences can be trusted. It’s a data driven approach, so we don’t need the theoretical system worked out too much for it to be useful.

Really cool. When this comes out expect this to be referenced :slight_smile: .

From this conversation I gather that there is not any clear theoretical work published yet that can guide us. That’s what I thought, but I wanted to be sure.

Thanks for the time you have spend on this.

Sounds good.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

FYI, here is a much improved update to the figure:


This topic was automatically closed after 6 days. New replies are no longer allowed.