Recommendations for using standardised phenotypes in genetic association studies

Genetic association studies of complex traits often rely on standardised quantitative phenotypes, such as percentage of predicted forced expiratory volume and body mass index to measure an underlying trait of interest (eg lung function, obesity). These phenotypes are appealing because they provide an easy mechanism for comparing subjects, although such standardisations may not be the best way to control for confounders and other covariates. We recommend adjusting raw or standardised phenotypes within the study population via regression. We illustrate through simulation that optimal power in both population- and family-based association tests is attained by using the residuals from within-study adjustment as the complex trait phenotype. An application of family-based association analysis of forced expiratory volume in one second, and obesity in the Childhood Asthma Management Program data, illustrates that power is maintained or increased when adjusted phenotype residuals are used instead of typical standardised quantitative phenotypes.


Introduction
Failure to adjust for confounders and other covariates can greatly diminish the efficiency of genetic association studies. Traditional regression methods that control for confounders often apply directly to genetic association studies, and these techniques have been extended and adapted in settings where this is not the case. [1][2][3] Despite this, researchers conducting genetic association studies of quantitative traits do not always take full advantage of their ability to adjust for important covariates.
Covariate adjustment is so crucial for traits like body mass index (BMI) and percentage of predicted forced expiratory volume in one second (PPFEV) that they are standardised by definition. BMI (instead of weight) is used as a measure of obesity because height contributes noise to the relationship between obesity and weight. Similarly, PPFEV, the amount of air a person can blow out in one second divided by the expected amount, given the person's sex, height and, sometimes, other covariates, is used as a measure of lung function instead of unadjusted forced expiratory volume (FEV) because sex, height and other covariates add noise to the relationship between lung function and FEV. To determine expected FEV, various equations have been proposed, each a regression equation fit to a specific study population. [4][5][6] Both BMI and PPFEV were developed to assess phenotypes in individuals when there are no population data available -for example, determining the severity of asthma or obesity during physical examination.
Standardised phenotypes were not, however, intended to serve as a substitute for within-study adjustment in association studies. Since genetic association studies have sample sizes large enough to adjust using the study population itself, it is no longer necessary to rely on standardisations based on external populations. It is especially important to adjust within the study population when the study population is clearly dissimilar to the general population. Consider a study population living at high altitude: using PPFEV with predicted FEV estimated based on a population dwelling at sea level could make even asthmatics seem healthy.
To determine whether researchers conducting genetic association studies of FEV actually adjust for confounders using the study population, we performed a literature search using PubMed. Of 26 genetic association studies with FEV as a main outcome published in the past three years, only 14 used within-study adjusted FEV. 7 -19 The 12 studies that did not mention within-study adjustment included one that stated, 'No covariate adjustment was used since the percent-predicted lung function is already covariate adjusted'. 20 -31 The studies also varied as to what potential confounders were available (for instance, many studies did not record height). Our literature search showed that there is no consensus on whether standardised phenotypes should be further adjusted.
We hypothesised that within-study adjustment of standardised quantitative phenotypes increases power in genetic association studies. To test this, we compared power obtained using PPFEV with and without within-study adjustment under both population-and family-based designs via simulation. We also examined the effects of applying within-study adjustment to raw FEV as opposed to PPFEV and of having an ascertainment condition. Finally, we applied the different methods to the Childhood Asthma Management Program (CAMP).

Simulated genotypes
To simulate a family-based design, either 100, 400 or 800 independent trios (two parents and an offspring) were generated. We assumed Hardy-Weinberg equilibrium and drew parental alleles from a binomial distribution with the probability of carrying the risk allele equal to allele frequency (0.05, 0.10 or 0.20). Offspring genotypes were then based on Mendelian transmission. We assumed there were no genotyping errors. For the simulated population-based design, only offspring genotypes were used.
Simulated raw phenotypes Height, weight and age were generated using a multivariate normal distribution with means and covariances equal to those observed in a real dataset (CAMP). We restricted our samples to Caucasian males, to create a more homogeneous population to which a simple set of prediction equations would apply.
The primary unadjusted phenotype of interest, FEV, was generated in two ways: first with the relationship between FEV and its confounders based on that observed in real data, and, secondly, with the aim of determining what happens in the worst-case scenario for within-study adjustmentthat is, when the FEV prediction equations really do describe the mean of the distribution of FEV. In the first case, FEV was simulated from a normal distribution with mean: aX i þ 1:522 þ 0:0271 height þ 0:000197 height 2 þ 0:0219 age þ 0:00345 weight (where a is the additive genetic effect and X i is the number of copies of the allele of interest), and variance set equal to the variance of the residuals calculated when this model was fit to CAMP data. This model describes the relationship between height, age, weight and FEV in the CAMP study but with an additive genetic effect (heritability of FEV ¼ 0.01, 0.025 or 0.05). 32 The second way we generated FEV was similar, except that the model used to specify the mean and the variance of the residuals was:  4 By simulating the data two ways, we can see how much within-study adjustment affects power under a realistic setting and also whether it diminishes power when it is unnecessary (ie when the FEV prediction equation explains the exact relationship between FEV and its confounders). To examine the effect of phenotype truncation, the simulations were repeated with an ascertainment condition, excluding all subjects with PPFEV 80 per cent. (Final sample sizes were still 100, 400 and 800 trios.) This mimics the CAMP dataset, which includes only children with mild to moderate asthma.

Methods of adjustment and analysis
Four confounder-adjustment methods were applied to obtain four corresponding phenotypes used in genetic association testing: centred FEV, centred PPFEV, residuals from regressing FEV on relevant covariates and residuals from regressing PPFEV on relevant covariates. PPFEV was calculated using the prediction equations derived by Knudson et al. 4 Population-based association testing was done by regressing phenotype on genotype, assuming an additive model and performing a Wald test to check whether the regression coefficient for genotype is equal to zero. Family-based association testing was done using the family-based association test (FBAT). 33 In the population-based simulations, FEV and PPFEV were regressed on the number of copies of the risk allele. Within-study adjustment was accomplished by including relevant covariates in the regression model.

Simulations
Both population-and family-based association studies were simulated under varying allele frequencies, heritabilities and sample sizes. In each case, the power was estimated for four phenotypes: centred FEV, PPFEV, residuals from regressing FEV on relevant covariates and residuals from regressing PPFEV on relevant covariates. Simulated data were based on a real dataset and on the Knudson FEV prediction equation, with both unascertained and ascertained samples (Figure 1 and Supplementary Tables S1-4). In all cases, using FEV without adjustment was least powerful and within-study adjusted FEV most powerful. Results were similar for family-based and population-based simulations.
When the data were generated to resemble the CAMP study population as closely as possible, the most powerful approach used the residuals from within-study adjustment of FEV, followed by within-study adjusted PPFEV, PPFEV and finally FEV. Within-study adjustment led to gains in power of up to 20.5 per cent in population-based analyses and up to 17.6 per cent in family-based analyses. To determine how much of the variance in FEV and PPFEV was determined by covariates, we calculated the R 2 from regressing FEV and PPFEV on height, weight and age under each of the simulation settings. Because the variance of the distribution of FEV was fixed in the simulations, covariates consistently explained 82 -86 per cent of variance in FEV. Covariates still explained 22 -26 per cent of the variation in PPFEV, which explains why within-study adjustment increased power more so than simply calculating PPFEV.
When the data were simulated based on the FEV prediction equations, PPFEV, within-study adjustment of FEV and within-study adjusted PPFEV, all yielded approximately equal power, since the model assumed in the FEV prediction equation was truly the expected value of FEV. Covariates only explained 3-8 per cent of the variation in PPFEV. Even in this case, where the confounding relationship between FEV and covariates was wholly explained by standard prediction equations, using within-study adjustment did not diminish power.
The results were similar when the samples were truncated to include only subjects with PPFEV !80 per cent. Power decreased overall owing to the decreased amount of variation in FEV, but the trends in power were the same as in the untruncated simulations. In fact, the within-study adjustment led to even greater gains in power in the truncated analyses: within-study adjustment led to gains in power of up to 23.3 per cent in population-based analyses and up to 20.4 per cent in family-based analyses.

Data analysis: Childhood Asthma Management Program
We demonstrated the four methods of confounderadjustment using CAMP, a multicentre, randomised clinical trial including 1,041 children between five and 14 years of age with mild to moderate asthma. 34 The present analysis included 711 genotyped Caucasian trios. Each of six single nucleotide polymorphisms (SNPs) in the gene encoding interleukin 10 (IL-10), a gene previously associated with asthma, 35 -43 was tested for association with each of four lung-function phenotypes (FEV, PPFEV, within-study adjusted FEV and within-study ) The relationship between FEV and confounders was modelled using the CAMP data and using the equations derived by Knudson et al. 4 Estimated power levels are for n trios simulated 10,000 times, with a type one error rate of 5 per cent. We simulated both family and population designs, each with and without truncation. Four methods of confounder adjustment were employed: FEV, PPFEV, residuals from regressing FEV on relevant covariates and residuals from regressing PPFEV on relevant covariates. adjusted PPFEV) ( Table 1). Within-study adjustment was carried out, regressing FEV on age, sex, weight and height (all recorded at baseline). In a second analysis, all SNPs genotyped in the fat mass and obesity-associated (FTO) gene, which is associated with BMI, 44 were tested for association with six obesity phenotypes: weight, BMI and BMI z-scores (BMIZ), 45 each with and without withinstudy adjustment. Weight was adjusted for age, sex and mean-centred height. BMI and BMIZ were adjusted for age and sex. Neither age nor sex was a significant predictor of BMIZ; as a result, using within-study adjusted BMIZ was equivalent to using unadjusted BMIZ in this dataset. For both analyses, the family-based association test (FBAT) statistic was used, assuming an additive genetic model. Regression models were fit using SAS version 9.1. All genetic association tests were performed by HelixTree version 5.1.3.
After Bonferroni adjustment for multiple comparisons, none of the six SNPs in IL-10 were significantly associated with any of the FEV-derived phenotypes ( Figure 2). Height, weight, sex and age explained 83.82 per cent of the variation in FEV and 17.44 per cent of the variation in PPFEV. For the SNP previously associated with FEV, rs3024496, 35 PPFEV residuals yielded the lowest p-value ( p ¼ 0.0135) followed by FEV residuals ( p ¼ 0.0317). The worst p-value for this SNP was obtained using PPFEV. As in simulations, the within-study adjusted phenotypes did best but, unlike in the simulations, adjusted PPFEV outperformed adjusted FEV and PPFEV did worse than FEV.
No SNPs in FTO were significant after Bonferroni correction. The SNP previously found to be associated with obesity, rs9939609, was not genotyped in CAMP. Examining quantile-quantile plots of the -log 10 p-values revealed that the association signal was readily apparent in BMIZ (and the equivalent within-study adjusted BMIZ) and somewhat apparent in within-study adjusted BMI ( Figure 3).

Discussion
Our results suggest that genetic association studies using standardised phenotypes can potentially avoid confounding and gain power by using within-study adjusted phenotypes, as opposed to the typical standardised form of the phenotype. Moreover, doing so will not decrease power. Through simulations, we showed that using within-study adjustment of FEV or PPFEV increased power in genetic association testing by more than using PPFEV. This was true in both population-and family-based designs and with and without an ascertainment condition  on the phenotype. Even in the (unlikely) case where the FEV prediction equation used to calculate PPFEV truly determined the relationship between FEV and its confounders, using within-study adjustment did not decrease power. Although the previously associated IL-10 SNP, rs3024496, was only marginally significant in the data analysis, the signal was strongest when within-study adjustment was used. In the FTO analysis, the distribution of p-values from BMIZ remained unchanged after within-study adjustment, indicating that BMIZ effectively controlled for the available covariates in the CAMP study population. For weight and BMI, the association signal was enhanced by within-study adjustment. The data analysis results must be interpreted with caution, since we cannot be certain that any of the SNPs tested confer risk for decreased lung function or obesity.
Since within-study adjustment involves fitting a model to explain how the raw phenotype is related to confounders in the study population, it does not provide a measure of what might be considered normal. For this, it is still necessary to use standardisations based on a healthy population. For instance, in the CAMP study, a child with 'average' adjusted FEV would still be considered to have low lung function because all the children in CAMP have asthma. Therefore, within-study adjustment is advantageous only if there is no need to compare subjects' phenotypes with what might be considered normal in the general population. The advantage of within-study adjustment in large studies is that it can provide a more accurate relative measure of a complex phenotype, allowing study subjects to be compared among themselves. In genetics studies, this is precisely what is needed.
When using standardised phenotypes, the method of standardisation can also affect study results. Rosenfeld, et al. showed that using different reference equations to calculate predicted FEV leads to differences in clinical assessment of individuals and in the results of cross-sectional and longitudinal analyses of cystic fibrosis. 46 Both the standardisation and method of within-study adjustment should be carefully thought out. Although we only considered the Knudson equations here, our simulation design allows the results to be generalised to other equations for predicted FEV.
In our simulations, within-study adjustment of standardised phenotypes was always equally or less powerful than within-study adjustment of raw phenotypes; however, our data analysis did not reflect this. The smallest p-value for the SNP previously associated with FEV was obtained using within-study adjusted PPFEV. Similarly, when we analysed the FTO SNPs, the most standardised phenotype, BMIZ, performed best (and within-study adjustment did not make a difference). Simulated data differ from real data because associations are simplistically modelled. Real data are much more complex. In real data, withinstudy adjustment does not fully capture the relationship between raw measurements and the complex trait of interest (eg between FEV and lung function or between weight and obesity). By standardising and then using the study population to adjust further, it is possible to make use of two sources of information: the reference population used to derive the standardisation and the study population. Standardisation may also be the only way to account for confounders not collected in a particular study. For these reasons, it may be advantageous to do genetic association testing on within-study adjusted, standardised phenotypes.