Quantifying genetic variation across multiple populations
Supervisor: David Balding
Available for: MSc only
Location: Melbourne Integrative Genomics, University of Melbourne
Project title: Quantifying genetic variation across multiple populations
Description: The genetic distance between two populations has traditionally been measured by a parameter called FST (S is for subpopulation, T is for total population). There has long been disagreement over its precise definition and the best way to estimate it. These disagreements became important with the advent of genome-wide genotype data, with estimates of inter-continental human genetic distances differing almost by a factor of two due to different definitions with different sensitivities to rare variants. The problem has now largely been resolved for pairs of populations, but there remains the problem of defining and estimating FST for multiple populations. A recent academic visitor to Melbourne, Dr Tristan Mary-Huard from AgroParisTech, France, made considerable progress in clarifying the definition and developing fast and efficient method-of-moments estimators of FST for multiple populations. He also developed a fast and simple procedure for building population trees in which the branch lengths correspond to FST. In this project we will develop this work further, in collaboration with Dr Mary-Huard, to construct more general graphical structures representing the genetic variation across a set of populations that may have had historical episodes of admixture, in particular the human populations of the Americas.