In spite of the multiple efforts to find genetic factors conferring susceptibility to complex diseases, the success of genetic association studies is still hampered by the difficulty in replicating findings in different populations. Among the plausible explanations for this lack of replication is the fact that the effects of environmental factors, which can interact with genetic factors, are not always taken into consideration . There is an increasing interest in studying different susceptibilities to environmental factors in subjects with different genotypes; however, power and bias issues with regard to the statistical estimation of gene-environment interaction effects persist.
High-quality information about individual environmental exposure is crucial for the assessment of gene-environment interactions . Failure to measure changes in exposure levels over time could lead to an underestimation of the role of the environment in the interaction. Repeated measurements of the temporal relationship between an outcome and the exposure may overcome such a problem when both the endpoint and the exposure are time-dependent variables. In addition, potential misclassification due to ambiguity in the definition of complex diseases may be avoided through the measurement of quantitative disease-related phenotypes as the outcomes of interest. For example, quantifying the decrements in lung function over time through repeated spirometric tests may provide insights into the pathogenesis of chronic obstructive pulmonary disease (COPD) or asthma. Many disease 'predictor' phenotypes are thought to change within-subject because of both environmental and genetic factors, and of their potential interactions over time.
On the genetic side, population substructure is an important practical issue for genetic association studies. When the study population is not a collection of randomly mating individuals, several discrete subgroups that are genetically different may be identified; the collection of these subpopulations is referred to as population substructure or stratification . Moreover, disease prevalence also tends to differ among these subgroups . Consequently, without stratification adjustment, allele frequency can appear to be associated with the disease, regardless of whether the genotype has a functional effect on that health outcome or not. By contrast, when the genotype distribution is homogeneous among groups, population substructure may not be an issue. For example, if people are randomly assigned to treatment groups, it is expected that those groups will be genetically similar. If, additionally, there are no differences in the response to treatment among the different subgroups, bias due to population substructure is unlikely.
Another source of spurious associations is population admixture, which refers to the mixture of different ancestries; that is, people from different ethnic groups interbreed, so the genome of the new generations is a combination of genotypes of the original ancestry groups, and, consequently, in some genes, allele frequencies are not homogeneously distributed in the study population. For example, it has been recognised that Latino populations have varying proportions of African, Native American and European ancestry . Like population substructure, if the risk of disease depends on ancestry, a high risk of disease may be erroneously associated with a high allele frequency; thus, in admixed populations, ethnicity may confound associations between genotype and outcome and assessment of gene by environment interactions. The direction of the confounding could be positive or negative. Therefore, to identify true associations, population substructure must be taken into account in the analysis.
With the increasing availability of genetic data, there is a growing interest in modelling both marginal genetic effects and gene-environment interactions. Inclusion of interactions, when they exist, can increase the statistical power of detecting both genetic and environmental effects . Traditional statistical models for detecting significant main effects and interactions may not be completely adequate for studying genetics in admixed or stratified populations, however.
A variety of methods have been developed to account for the genetic substructure of human populations . Family-based designs provide an important resource for avoiding confounding due to admixture . The simplest design for testing association is the case-parent (or trio) design because it uses genotypes from an affected offspring, the case, and his/her two parents. The outcome is measured, however, only in the offspring. Many of these methods have been developed for cross-sectional designs, but can be applied to repeated measurements through the two-step modelling approach. The first step consists of calculating the slope between the longitudinal outcome and the time-dependent environmental exposure; thus, we calculate a single individual endpoint, the slope, for each subject. In the second step, the genetic methods for cross-sectional studies, where the slope is the single outcome, can be applied .
In this paper, we first provide a short review of different approaches for studying gene-environment interactions for quantitative traits, and then propose a method that aims to improve the assessment of main and gene-environment interaction effects by combining the advantages of both longitudinal studies for continuous phenotypes and the family-based designs. This approach is based on an extension of ordinary linear mixed models (OLMM) for quantitative phenotypes which incorporates information from a case-parent design. We call the model the 'adjusted linear mixed model' (ALMM), and through simulation methods we show that even when population stratification is present, both main genetic and gene-environment interaction effects can be estimated without bias, and that this is more powerful than the two-step modelling approach.
The broad objectives of this paper do not extend to giving technical details about the family-based approach and its extensions, or to giving an extensive explanation about linear mixed models. Rather, we present what we consider to be a widely applicable method for correctly assessing the main genetic effect and gene-environment interactions for time-dependent quantitative traits in stratified populations. For this purpose, we use simulated repeated measurements of forced expiratory flow between 75 per cent and 25 per cent of vital capacity (FEF25-75) ie (lung function) on asthmatic children exposed to ozone pollution, based on the observed distributions in a real cohort study conducted in Mexico City .
In order to set the stage for our methodology, we first provide a brief overview of some existing ordinary linear regression (OLR) models for testing main genetic effects and gene-environment interactions in cross-sectional studies that incorporate information about parental genotype (case-parent or trio design), adjusting for admixture. We then briefly present the family-based association test (FBAT) approach, which, as a second step (after computing the slope between the outcome and the exposure), represents an alternative method for analysing genetic associations over time. We next review the ordinary linear mixed models (OLMM) which are a standard approach for the analysis of longitudinal data, and present the adjusted linear mixed models (ALMM) as an extension of OLMM combined with the adjusted cross-sectional regression models. In order to show that the two-step modelling approach provides a valuable alternative for analysing longitudinal data, we explain the relationship between this approach and the linear mixed models. Finally, we give details about the simulation procedures and present our results and discussion.