Inferring the ancestry of everyone

swamidass · August 10, 2020, 1:22am

https://www.nature.com/articles/s41588-019-0483-y

A central problem in evolutionary biology is to infer the full genealogical history of a set of DNA sequences. This history contains rich information about the forces that have influenced a sexually reproducing species. However, existing methods are limited: the most accurate is unable to cope with more than a few dozen samples. With modern genetic data sets rapidly approaching millions of genomes, there is an urgent need for efficient inference methods to exploit such rich resources. We introduce an algorithm to infer whole-genome history which has comparable accuracy to the state-of-the-art but can process around four orders of magnitude more sequences. Additionally, our method results in an “evolutionary encoding” of the original sequence data, enabling efficient access to genealogies and calculation of genetic statistics over the data. We apply this technique to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the genealogies we estimate are both rich in biological signal and efficient to process.

I’m looking forward to dipping into the data on this one. It seems to be essentially the same as the ArgWeaver data, but with the 1000 Genomes data. Any bets on what the TMR4A might be?

glipsnort · August 10, 2020, 1:46pm

And the companion article: https://www.nature.com/articles/s41588-019-0484-x

Joe_Felsenstein · August 10, 2020, 1:46pm

Gil McVean does very good work. Still, this makes an estimate of the genome-wide ARG. Two issues arise: (1) plus or minus what? How much noise will still be there in the estimate of the gene-level pedigree, and (2) given that we could get an infallible estimate of the true ARG, does that lead to precision in estimates of gene-phenotype relationships? The latter we already know: for example a very good simulation study of this by Lucian Smith and Mary Kuhner of my own lab in 2009 in Genetic Epidemiology showed that locating the site of a phenotypic effect has noise even if you know the true ARG. These issues remain despite the impressive algorithmic achievements.

(For those who are unfamiliar: ARG = Ancestral Recombination Graph, a detailed pedigree of who each gene came from. Note that Josh’s GAE need not be on that graph,)

Topic		Replies	Views
Inferring human evolutionary history Conversation Science	7	665	March 9, 2022
A unified genealogy of modern and ancient genomes Conversation Science	2	480	March 8, 2021
Heliocentric Certainty Against a Bottleneck of Two? Conversation Adam , Science , Featured	17	19955	July 11, 2018
Deleted: Does Genetics Point to a Single Primal Couple? Conversation Adam , Communication , Society	33	5997	November 8, 2020
Gauger: A Single-Couple Human Origin is Possible Conversation Adam , Science	58	4220	May 11, 2023

Inferring the ancestry of everyone

Related topics