Speed, Holmes & Balding - Nature Genetics 2020
Evaluating and improving heritability models using summary statistics
Speed, D., Holmes, J. & Balding, D.J. Evaluating and improving heritability models using summary statistics. Nat Genet (2020). https://doi.org/10.1038/s41588-020-0600-y
MIG Director David Balding, in collaboration with Doug Speed in Denmark and former MIG postdoc John Holmes, have made big advances in understanding how heritability is distributed across the human genome, and in how selection has shaped these effects in different genomic regions and for different human traits. Their study "Evaluating and Improving Heritability Models using Summary Statistics" has just been published in the journal Nature Genetics.
The paper largely resolves a controversy that has been debated over recent years about the best way to model genome-wide causal effects (measured as heritability). In early genetic association studies, genetic variants were studied one at a time. About a decade ago, the move to simultaneous analysis of all genetic variants was a big advance, but brought with it new problems: the very large number of genetic variants in the genome meant that answers could depend on statistical modelling assumptions. The earliest software for genome-wide association analysis (GCTA for individual genotype information, then LDSC for summary statistics) made simplistic modelling assumptions that turned out to be far from optimal. Those programs are still widely used (e.g. in the website LDhub), but results from them can be seriously inaccurate.
The new work develops a principled way of choosing among heritability models. The authors used this new tool to find the best current heritability model, when tested on large datasets for 31 complex human traits. They then made further improvements by combining the best elements of the top current models. The traits studied included many diseases, as well as educational attainment, sleep patterns and physical appearance.
The resulting new heritability model has many uses in genome-wide analyses to understand complex traits. The illustrative analyses in the paper showed that heritability for all complex traits studied is widely dispersed in the genome, and not restricted to coding regions (genes) or other known functional elements. Although genes harbour more heritability than would be expected given the small fraction of the genome that they occupy, still only about 8% of heritability on average is located in genes which represents about a 5-fold "enrichment" relative to the genome-wide average.
They also studied how the distribution of heritability reflects the effects of selection across different traits (see figure) and across genome regions. Perhaps surprisingly, height and educational attainment showed the strongest signs of being selected against, while (as expected) coding sequences, and their close flanking regions, showed the strongest selection effects in the genome.
Doug has integrated the new tool into his software package LDAK (www.ldak.org), so that it is freely available for other researchers to apply to their own data. A web-server for online genome-wide heritability analyses using summary statistics will be developed shortly.
The parameter alpha measures the extent to which causal effects are concentrated in rare variants, which is interpreted as reflecting the effects of negative selection. The figure reports estimates of alpha for 14 complex traits. The names in black indicate traits for which alpha is significantly negative. The strongest effects of selection are for College Education (which could reflect more educated people having fewer offspring), Height, Pulse and Blood Pressure.