The paper linked to below is not peer-reviewed but it has interesting content. For one, it uprooted my belief that genetic data neatly lumps humans into distinct continental categories, and by looking at the human genetic variation graph in it you can easily see why. It also brings to the fore the issue of conflating racial categorization with continental ancestry.
@swamidass, I would love to hear your thoughts on the paper and also comment on their statement of there being no generally agreed definition of genetic ancestry. What definition of genetic ancestry did you use in your book (which I have not read)?

From the paper:

First, there are deep-seated ambiguities about what genetic ancestry actually is. There is no widely agreed upon definition. There are many statistical methodologies across sub-fields of genetics and genomics whose outputs are framed as “genetic ancestry” (11), see Figure 1. Some of these attempt to approximate the subset of paths through a family tree via which DNA has been inherited. But many do not even attempt to do this, and are better thought of as estimates of genetic similarity between individuals in a dataset (11), rather than as genetic ancestry. This applies to the commonly used technique of assessing relative position in principal component space, i.e. a space that captures the major axes of variation in a dataset. Framing this instead as genetic similarity helps bring to the fore the question “similar compared to whom?”, and hence the crucial role of which individuals are selected in the analysis.

I meant genetic ancestors as in those who grant us DNA by direct descent. That is different than genetic similarity. Sometimes genetic similarity (which we can directly observe) is used to infer genetic ancestry (which we can’t), but it seems like a category error to equate the two.

There have long been good reasons to avoid that conflation, even on the subcontinental level:

This paper seems to be less about genetic ancestry than about the invalidity of notions of race and continental groupings. I was surprised at the widespread use of PCA to represent ancestry, which seems silly.

Without having time or energy to read the paper, I note that in small pedigrees one can use comparison of genotypes (not just genetic “similarity” values) to try to infer whether another individual is your sibling, offspring, or parent. It is often hard to tell these apart. Comparing your genotype to those of individuals drawn from another population, it seems to be a waste of time to even ask if anyone there is your ancestor, because it is very unlikely that your (say) grandfather is still alive there and got sampled. But you can ask for your genotype (at multiple SNPs, say) is close to a mixture of X percent French people and Y percent people from Pakistan. The present-day populations there, their samples anyway, stand in for the ancestors who might have come from there. There is the issue of whether a mixed population of the two might have been ancestral to you, as with a person from Hawai’i who might be descended from people who were themselves a mixture of Chinese and Portugese people. I think the consumer testing companies don’t really ask about that (I could ask a colleague who wrote such programs for one of these companies). And there are limits to how finely they subdivide these putative ancestral populations (do they distinguish between North Swedes and North Norwegians?). And what names they choose when two populations are genetically so close you don’t consider both of them. That’s the problem of Scots being upset about being called Irish and vice versa.

PCA loses information and all you get is a fuzzy impression of overall similarity. You don’t even know what the axes represent and whether they mean anything. If different bits of the genome have different ancestry, everything is mushed together into a meaningless average.

