Population Substructure and Control Selection in Genome-wide Association
Studies
Kai Yu
Biometry and Mathematical Statistics Branch
National Institute of Health
Friday, April 18, 2008, 12:30–1:30 pm
GEMS classroom, 3rd Floor in
Shriner's Building
Coffee, tea, and cookies will be provided
Abstract
The importance of meeting the classic but burdensome epidemiologic
criteria for control selection and of aggressive handling of population
stratification (PS) are two intertwined questions in design and analysis
of genome-wide association studies (GWAS). Empirical data from two GWAS
in European Americans of the Cancer Genetic Markers of Susceptibility
(CGEMS) project were used to evaluate the impact of PS in studies with
different control selection strategies. In each of the two original
case-control studies nested in their corresponding prospective cohorts,
a minor confounding effect due to PS (inflation factor of 1.025 and
1.005) was observed. In contrast, when the control groups were exchanged
to mimic a cost-effective but theoretically less desirable control
selection strategy, the confounding effects were larger (inflation
factor of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to
both the Illumina and Affymetrix commercial platforms and with low local
background linkage disequilibrium (pair-wise r2 < 0.004) was selected to
infer population substructure with principal component analysis. A novel
permutation procedure was developed for the correction of PS that
identified a smaller set of principal components and achieved a better
control of type I error (to inflation factor of 1.032 and 1.006,
respectively) than currently used methods. The overlap between sets of
SNPs in the bottom 5% of p-values based on the new test and the test
without PS correction was about 80%, with the majority of discordant
SNPs having both ranks close to the threshold. Thus, for GWAS in
European Americans, PS does not appear to be a major problem in
well-designed studies. Especially with effective correction of PS, use
of suboptimal controls seems to have acceptable type I error
performance.