Treelets-based approaches to estimating sparse fine-scale population structure from genetic data
Available for: MSc/PhD and undergraduate research projects.
Location: Melbourne Integrative Genomics, University of Melbourne
Project title: Treelets-based approaches to estimating sparse fine-scale population structure from genetic data
Background: Methods for analysis of population structure using genetic data have been widely used to understand human history and correct for population structure in genome-wide association studies, and the most common methods to analyzing population structure include admixture-based models  and principal components analysis (PCA) . This project will develop new approaches to estimating sparse fine-scale population structure from genetic data. Our methods build on multi-scale methods, treelets , that extend wavelets  for analyses of unordered data. Treelets simultaneously construct a data-driven hierarchical tree structure of individuals and a multi-scale orthonormal basis on the hierarchical tree, both of which capture sparse structure in the genetic data .
Proposed projects: The specific project will depend on the student’s interest and background. Options are 1) software development for the new methods, 2) contributing to the development of the new methods, or 3) benchmarking admixture-based models and PCA against our treelets-based approaches using real human genetic data and computer simulations.
Learning outcomes: software development, statistics / machine learning, programming using C\C++ (or Python) and R, statistical analysis of complex and large-scale genomic data, data visualization.
 Pritchard, Jonathan K., Matthew Stephens, and Peter Donnelly. Inference of population structure using multilocus genotype data. Genetics 155.2 (2000): 945-959.
 Novembre, John, et al. Genes mirror geography within Europe. Nature 456.7218 (2008): 98-101.
 Lee, Ann B., Boaz Nadler, and Larry Wasserman. Treelets: an adaptive multi-scale basis for sparse unordered data. The Annals of Applied Statistics (2008): 435-471.
 Shim, Heejung, and Matthew Stephens. Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays. The Annals of Applied Statistics (2015): 665-686.