A genealogy-based framework to estimate population structure and demographic history
by Charleston Chiang | University of Southern California
Abstract ID: 62
Event: The 3rd AsiaEvo Conference
Topic: Genomic diversity in nonequilibrium populations
Presenter Name: Charleston Chiang

Learning the demographic histories of nonequilibrium populations helps us understand the causes of population structure, the pattern of genetic variation, and the evolution of traits. Many existing methods to infer population structure or demographic history from genetic data use relatively low-dimensional summaries, such as the allele frequency spectra, which often ignore the linkage information between markers. In principle, much more information is available from the sequence of gene-genealogical trees, known as the ancestral recombination graph (ARG), that describes the history of sampled alleles. As a step toward capturing all the available genomic information, we introduce two methods that leverage the ARG to infer population structure and demographic histories. First, we describe a framework to infer the expected relatedness between pairs of individuals given an ARG of the sample, which we call the eGRM. We show that the eGRM better captures the structure of a population than the canonical Genetic Relationship Matrix (GRM), even when using limited genetic information found on a genotyping array. Moreover, the eGRM can reveal the time-varying nature of population structure in a sample. Second, we devised a method called gLike that derives the full likelihood of a genealogical tree under any hypothesized demographic history. Employing a graph-based structure, gLike summarizes the relationships among all lineages in a tree with all possible trajectories of population memberships through time and efficiently computes the exact marginal probability under a parameterized demographic model. Through extensive simulations of multiple admixtures, we showed that gLike accurately estimates dozens of demographic parameters, including ancestral population sizes, admixture timing, and admixture proportions, and outperforms conventional demographic inference methods that leverage only the allele frequency spectrum. We applied both methods to real-world human genomic data from Finnish, Latino American, and Native Hawaiian cohorts to gain further insights into the patterns of population structure and to estimate parameters of the admixture histories. Taken together, our studies demonstrate the power of leveraging the genealogical trees for downstream population genetic inferences.