What is Effective Population Size

I think it would be helpful to explain this in a separate peaceful science post.
It was only recently that I realised the significance of Ne( mainly because of recent reading on the subject).
A detailed explanation from your end would definitely help me and others grasp this better.

1 Like

Brian Charlesworth has a characteristically excellent review of effective population size and drift that you can read here.

To be (very) brief, effective population size (Ne) is a theoretical construct that allows scientists to assess the affect of genetic drift on real populations using various population genetics models. Ne is typically defined as the size of a hypothetical population that experiences the same amount of genetic drift (or other genetic property of interest to the researcher) as the actual population, given certain modeling assumptions. all of which can be relaxed to one extent or the other, to account for more realistic scenarios.

Ne is critical for understanding te expected amount of genetic diversity present, the time to the most recent common ancestor (time to coalescence) of various alleles in a population, the relative strength of drift vs. natural selection, etc.


The key issue here is that it is not a point estimate of census population size, nor is it the minimum population size over a window. It is rather the harmonic average over a large sliding window.

@Ashwin_s, in the future just start a new thread. Don’t call for it. You’ve been around long enough to do this yourself.


What is understood by a “sliding window” of population size? Is that the census population size within some set number of generations, or other defined period of time? And is Ne then the harmonic mean of the changing census population size in that window?


I’m just starting to go through the article @davecarlson linked to, but this caught my eye right away:


Exploding head


Can you translate that?
Why is it significant?

It means that different parts of the genome will give you different answers for effective population size. Therefore, you have to take the whole genome into account. This may be part of the “sliding window” that @swamidass is talking about. The other part may have to do with changes in population dynamics through time.


Oh my …

I get this.

Because the diversity levels are different… but then this should mainly be only in parts of the genomes that has undergone more selective pressure… correct?

Can Ne be considered as the smallest hypothetical population size in which a set of genes must have gotten “fixed” in order to give the amount of variation currently observed?

Ne is essentially defined as the inverse of the coalescent rate (CR). Whatever increases CR decreases Ne, and visa a versa.

Selection increases CR, and therefore decreases Ne in positively selected regions of genome.

Recombination decreases CR, and therefore increases Ne.

Isolated populations increase CR, and therefore decrease Ne.

Differential survival between males/females creates different CR for sex chromosomes, and therefore different Ne.

Immigration decreases CR, and therefore increases Ne.

Inbreeding increases CR and therefore decreases Ne.

CR is just a rate measured over time, a single number, but it is affected by all these things and more.

Well, for a moment, let’s ignore the “size” of the window, and some normalization details. I’ll add this back in a moment.

Think of a timeline stretching into the past. There is a tick mark wherever there is a coalescence. What CR is (which is the just the inverse of Ne) is just the average number of tick markers in a window along that line. The tick marks are coalescence in the phylogenetic tree. The time of these tick marks is based on the number of mutations along each leg, and there is a lot of noise here.

So add back the complications.

How big is the window? The window size increases exponentially as you go back in time, and it is not well defined beyond this. I’ve worked out some of the statistics for a yet to be published paper on this, but the key point is that the window size increases as you go back in time, and no one really tracks how large this is right now in population demographic inference.

What about the normalization details? Each coalescence event is weighted by how many active lineages there are. The more lineages, the less weight. The exact formula is given by the Kingman Coalescent. I can explain it if you like, but the key point is that each of the events “counts” a different amount. In the recent past, we have way more of them that count just a little, but in the distant past we have only a few that count a lot.

Now, that we have settled that, it should be clear that Ne does not tell much at all about brief tight bottlenecks. In fact it tells us just about diddly squat about them. It can only pick up bottlenecks that last for a large number of generations.


That definition is a big help (also mentioned by @davecarlson).

1 Like

Note for observers: this is true because recombination controls the range over which selection affects the chromosome.

Sex chromosomes have a different Ne anyway because there are fewer of them in the population than autosomes (3/4 as many for the X, 1/4 for the Y.)

It’s not really relevant here, but note that this is one way of defining Ne. What Ne means in general is the population size in whatever model you’re thinking of that would match an empirical measurement, i.e. it’s how big an ideal population would have to be to behave like the real population. Depending on the population’s history, you can get wildly different values for Ne if you measure different things. Immediately after a bottleneck, Ne based on diversity might ~10,000 while Ne based on variance in allele frequencies might be ~10.


I suppose the context I mean it is in population demographic inference. Do you know of any program or approach that does not define Ne as CR? This the way it is done for MSMC, SMC, and just about every algorithm I’ve looked at. Am I missing something?

Very true. And I suppose AFS is one way to do population inferences, although not the most powerful way once you get more ancient than, say, 10,000 years. Was that your point?

1 Like

@jordan and @AJRoberts this is a good thread for you.

In the context of long-term demographic inference, sure, that’s what you do. In the context of short-term inference (e.g. when we measured the variance effective size of the malaria population of Senegal a few years ago, following anti-malaria intervention), you should be using the variance Ne or something similar. As I said, in this context it doesn’t matter.