Genetic association studies in cancer: Good, bad or no longer ugly?

For some time, investigators have appreciated that genetic association studies in cancer are complex because of the multi-stage process of cancer and the daunting challenge of analysing genetic variants in population and family studies. Because of recent technological advances and annotation of common genetic variation in the human genome, it is now possible for investigators to study genetic variation and cancer risk in many different settings. While these studies hold great promise for unravelling multiple genetic risk factors that contribute to the set of complex diseases called cancer, it is also imperative that study design and methods of interpretation be carefully considered. Replication of results in sufficiently large, well-powered studies is critical if genetic variation is to realise the promise of personalised medicine -- namely, using genetic data to individualise medical decisions. In this regard, the plausibility of validated genetic variants can only be realised by the study of gene-gene and gene-environment interactions. The genetic association study in cancer has come a long way from the days of restriction fragment length polymorphisms, and now promises to scan an entire genome 'agnostically' in search of genetic markers for a disease or outcome. Moreover, the application and interpretation of these studies should be conducted cautiously.


Introduction
The promise of analysing common germ-line genetic variationa nd cancer risk has been accelerated by knowledge gained from annotating thed raft sequence of the human genome.Genetic variationindifferent populations can nowbe used to search for genetic markerst hat associate with cancer risk, therapeutic response and outcome.T his new paradigm, the study of complex diseases, such as cancer,bythe analysis of common genetic variationrepresents the first step in surveying the genome comprehensively.I ti sa lso known that the differences between individual humang enomes additionally includes other types of variation, such as microsatellite markers, insertions and deletions( from as ingle base to large regions of thousandsofbases) and copynumber variation, but, nevertheless, the first large-scale maps have been generated for single nucleotide polymorphisms (SNPs). 1,2 Since most common SNPs (with am inor allele frequency greater than 5 per cent in as tudied population) ares ilent and have no apparent function, currently,t estingf or SNPsi sd irected at identifying markerso fd isease risk or outcome. [3][4][5] There is a subset of SNPs, however, that have functional consequences which can result in as ubtle change in genef unction, such as alteration of atranscription factor binding site in the promoter of ag ene or in the codings equence of ag ene product.
One of the first major steps towardsi dentifying the common SNPs for study wast he establishing of the InternationalH apMap Project, which has developed afi nescale haplotype map of the humang enome. 6 This project has genotyped more than 2.6 million SNPs in three distinct continental populations. In parallel, other initiatives have begun to sequence genes of great biological interest in search of common and uncommon SNPs. Sequence verification, while slowera nd more costly,h as provided important insights into the spectrum of common and uncommon single-base nucleotide substitutions in the genome.F or example,t he National Cancer Institute SNP500 Cancer project is validating SNPs in genes implicated in cancer biology ( http:// snp500cancer.nci.nih.gov), 7 and the National Heart, Lung and Blood Institute'sS eattle SNPs project ( http://pga.mbt. washington.edu/)h as focused on candidate genes and pathways that underlie the inflammatoryr esponse.

REVIEW
Genotyping technologyh as advanced significantly and it is nowp ossible to genotype hundreds of thousands of SNPs in accurate,h igh-throughput platformsa tl ower prices. 8,9 In fact,t here are commercial products available for interrogation of common genetic variationacross the 'whole genome', utilisingastrategyofsurrogacy testing. Based on the HapMap Phase 2data, 6 it is possible to takeadvantage of linkage disequilibrium across the genome by choosing aset of tagging SNPs as markersofgenetic variationacross the human genome.Itisestimated that with this approach,atleast 500,000 SNPs would be required to surveycommon genetic variation. 10,11 This significantexpansion of knowledge of normalhuman genetic variation, together withtechnical advances, has created an opportunity to interrogate the genetic basis of cancer risk, response to therapyand outcome.There are manyissues in study design and analysisthat must carefully be considered.

The complexities of genetic association studies in cancer
The study of genetic variation and its contribution to cancer risk is adaunting undertaking because of the need to combine large population-based studies with dense genetic analyses. Figure 1shows many of the steps to consider in designingand interpreting ag enetic association study in cancer.
Although the complexity of cancer as ad isease has been described by others, the interaction between genes and the environmenth as not yetb een explored in detail. 12,13 In any one type of cancer,t here are often significant differences in age of onset, rapidity of tumourg rowth, presence of metastases, pathological appearance,g enee xpression patterns, somatic genetic changes, response to therapyand familial risk. Thus, the task of searching for common factorst hat associate with genetic markersh as to carefully consider well-designed studies that address specific hypotheses.

Cancer genetics
Studies of familial cancer have provided great insights into cancer biology by mappingr are familialm utations that have been subsequently evaluated in the laboratory,thus adding plausibility to the observedd isruptioninf unction due to a mutation in one or more genes.These observations have also led to insights in sporadic cancers. In this regard, studies in rare paediatric cancersh aveyielded important insights. For example,the RB gene wasthe first tumour suppressor gene  Figure 1. The steps of ag enetic association study in cancer.I ssues related to each step arenoted in the 'staircase'. If the end goal of an association study is personalised medicine,c areful planning and analysis is crucial.

Savage and Chanock
Review REVIEW identifiedt hrough ag enetic association study. 14 Knudson's original description of the inheritance of retinoblastoma became the foundation of an excellent understanding of the role that RB playsasatumours uppressor and transcriptional regulator. 15 Another such example is the Li -Fraumeni syndrome, which is characterised by family pedigrees with high rates of sarcoma and breast cancer,a sw ellasl eukaemia, brain tumours and adrenocortical carcinomas. 16,17 Subsequently,the identification of mutations in the TP53 geneinamajority of, but not all, patients with Li-Fraumenisyndrome led directly to an understanding of TP53 and its role as acritical transcription factor in normal cell growth, apoptosis and DNA repair. 18 -20 The identification of familial breast cancer pedigrees through careful epidemiological study identifiedt he BRCA1 and BRCA2 genes; 21,22 in turn, follow-up studies have generated important insights into the function of these genes in DNA repair.M utations in these genes in family pedigrees are highly penetrant and are associated with as ignificant risk for breast and ovarian cancers. Common genetic variation in BRCA1 and BRCA2 also appears to contribute to the risk for sporadic breast cancer,a lbeit with as ubstantially smaller effect. For example, genetic variation in BRCA2 wass hown to result in an increased risk for sporadic breast cancer in the Multiethnic Cohort( MEC). 23 Specifically,asingle SNPi n intron 24 wasa ssociated withat wo-fold increased risk for breast cancer.T his suggests that, even in the absence of a mutation that could change protein function or regulation, more subtle variants can servea sm arkersf or increased risk for cancer.

SNPs as disease markers
Although the early historyo fS NP analysisw as predicated on choosing candidate SNPs with known functional consequences, currently no functional information is available for the vast majority of SNPs. In fact, it is unlikelyt hat most SNPs have functional consequences. 24 SNPs in certain genomic regions, such as promoterso ri ntron -exon splice sites, could result in significant functional alterations in gene regulation, but the effortt ov alidate this in the laboratory is arduous. It has been suggested by otherst hat, in choosing SNPs for ag enetic association study,o ne should cull from high-priority lists of SNPs with functional implications. 25 This approach has the potential to find the more highly penetrant SNPs in an association study,b ut is limited because it underutilises SNPs as genetic markersa nd, in particular,o ther untested SNPs that could be in linkage disequilibrium witht he positivem arker SNP. Until recently, many studies focused on non-synonymous SNPs becauseo f potentiala mino acid changes that could affect protein structure and function. Non-synonymous SNPs contribute to the genetic diversity seen in the immune system 26 and potentially change the structure or function of the protein of interest; however, al argen umber of non-synonymousS NPsm ay be conservativea nd have minimal or no effect on gene function. 27,28 SNPs that change gene regulation have also been described. Examples include an SNP in the promoter of MDM2 ,a negativer egulator of p53,w hich wass hown to increase the affinity of the transcriptional activator Sp1,r esulting in higher levels of MDM2 RNA and protein, 29 and synonymous variants in the humand opamine receptor 2( DRD2 )g ene, which affect mRNAs tability and translation. 30 Other such functional variants have been recently described, especially in pharmacogenomics. 31 It is possible that SNPs that result in subtle changes in gene regulation are of minimal consequence in the shortt erm, but, over the life span of an individual, accumulated changesc ould be significant. It is quite likely, however, that even when the nuances of gene regulation are fully understood, the majority of SNPs will still serve best as genetic markerso fd isease.
An understanding of population-specific genetic variation in healthyi ndividuals is critical in choosing SNPs to investigate in as tudy of cancer risk. It has been well established that the distribution of the incidence of specific cancersc an vary greatly across the global populations. While someo ft his has been ascribed to different environmental factors, it is also plausible that differences in the genetic variationo fd istinct populations could also contribute.I nm any ways, large association studies in cancer ared esigned to analyse genetic profiles of common variationt hat has been shapedb yu nrelated factors. In this regard, the moleculare volutiono fS NPs reflects the specific historyo fp opulations -i np articular the admixture of different populations over time.T his latter issue has been exploited by some in the use of admixture markerst oi nvestigate cancersw ith ad isparate incidence between populations. 32,33 Throughout evolution, humans have been subjected to different selective pressures (ieendemic pathogens or dietaryn eeds), resulting in genetic variants which have been 'fine-tuned' in their ability to fightinfection, reproducea nd respond to other challenges. 27,34,35 This results in genetic differences between different populations around the world. Differences in the origin of groups within as tudy can be significante nough to generate sufficient population stratification and thus add ap otentialc onfounding factor in the genetic epidemiologyo fc omplex disease. 36 -38

Multiple interactions
Other biomarkersa nd environmental influences which contribute to the multi-factorial nature of cancer,a sw ell as other complex diseases, further complicate the study of genetic association and cancer risk. Gene -gene interactions are also crucial to cancer risk assessment. The recent reportf rom the InterLymph Consortium showedt he greatest risk for non-Hodgkin lymphoma to be in individuals homozygous for the TNF-308A allele and carrying at least one IL10-3585A allele (odds ratio [OR] 2.13). 39 The importance of gene -gene

Geneticassociation studies in cancer
Review REVIEW interactions wasa lso demonstrated in as tudy of gastric cancer and cytokine gene SNPs. 40 Individuals with multiple polymorphisms of interleukin-(IL-) 1r eceptor antagonist, tumourn ecrosis factor Aa nd IL-10 had the greatest risk for gastric cancer,w ith ORs of 2.8 for one,5 .4 for twoa nd 27.3 for three or four high-risk genotypes. Gene -environment interactions add complexity to the interpretation of genetic association studies. One example is the investigation which has focused on the contribution of the genetic variations in the N-acetyltransferase ( NAT2) gene to the risk for specific cancers, especially bladder and lung cancer. 41 In particular,d ifferences in the activity of NAT2 (ie rapid and slowa cetylator genotypes) could explain the association between the NAT2 genea nd tobacco smoke and subsequent risk for bladder cancer.T he slowa cetylator phenotype is associated with an increased risk for bladder cancer compared with individuals with the fast acetylator phenotype,e specially when combined with tobacco use. 42,43 Interestingly,t he type of tobacco appearst ob ei mportant; for example,s o-called black tobacco is more strongly associated with the observede ffect of NAT2 genotypes. 42,43 Genetic association and other clinicalstudies often assess only twooutcomes: affected or unaffected. This approach is useful in cancer studies becausecancer is usuallyanall or none diagnosis at the time of the study.When intermediate precursorsor quantitativetraits of disease areadded to the analysis, however, the complexitysignificantly increases. Mendelian randomisation is aconcept that attempts to bring together independent inheritance of individual traits withmodifiable environmentally modifiable exposures. 44,45 By usingindependent inheritance of traits, it is possible to reduce the confounding in studying exposure-disease associations. 46 Examples include studies of serum cholesterol, cancer risk and the APOE gene; 47 folate, homocysteine,coronary heart disease and the MTHFR gene; 44 and the relationship between alcohol, variation in the ALDH2 gene and oesophageal cancer. 48

Subject selection and sample size
In designing as tudy of genetic variationa nd cancer risk in apopulation, there are anumber of critical factorstoconsider, such as sample size,p opulations tratification, allele frequencies of the SNPs of interest, environmental risk factorsa nd phenotype definition. In particular,acareful definitiono ft he cancer phenotype to be studied is crucial. Genetic factorst hat contribute to low-grade prostate cancer could be different to those that contribute to high-grade prostate cancer.I fs o, a study in which low-and high-grade diseases areg rouped together could miss ap otentialg enetic contribution for one formo ft he disease. 49,50 While it mayb ed ifficult to ensure a study population that is as homogeneous as possible,i ti s crucial to limit confounding due to backgroundg enetic differences. Differences in genetic variation between ethnic groupshavebeen well described and aredue to acombination of evolutionary history, migration and admixture. 36,38,51 Efforts to avoid populations tratification also need to be taken to provide cases and controls with genetic backgrounds as similar as possible.
To address some of these issues, large cohort studies, such as the MEC, 52 areb eing established to create the large sample sizes needed.O ne strength of the MEC is that exposure and biomarker data on individuals from fivedifferentethnicgroups in Hawaii and California have beenc ollected. This study is an immense resource for genetic epidemiology. Another such study is theN ationalC ancer Institute (NCI)'sB reast and Prostate Cancer Cohort Consortium, consisting of over 5,000 breast cancer and 8,000 prostate cancer cases. The consortium'sg oali st os tudy genetic variation in genes in key pathways. 53 The Network of Investigator Networks, 54 sponsored by the Human Genome Epidemiology Network, seeks to pool analysis from multiple investigations for critical analysisa nd to address reproducibility issues. 55 -57

SNP choice and interpreting the results
In order fully to understand the results of agenetic association study,a ll of the study endpoints described above must be considered to designas tudy with sufficient powert od etect a measurable effect. So far, the majority of genetic association studies with common SNPsi nc ancer have reported modest associations, with ORstypically between 1and 2. Examples of meta-analyses that found ORs in this range in lung cancer include XPD 751GG (OR 1.27) 58 and CYP1A1 exon 7 polymorphism (OR 1.15), 59 in breast cancer include XRCC3 T241M (OR 1.16) and BRCA2 N372H (OR 1.13) 60 and in gastric cancer include an approximately twofold increased risk for the IL8-251A allele. 61 -63 These studies illustrate the fact that the likelihood of findinga significantassociation (ie OR . 2) in alarge study of asporadic cancer is low, even for candidate genes with astrong prior.Since, by definition, SNPs are common genetic variants, individuals with aparticular risk allele maynever develop disease.Instead, it has become apparent that alarge number of variants will each have asmallcontribution, perhaps evident in its populationattributable risk of 1-2 per cent per SNP. The consequence of searchingfor alleles with amoderate effect, namely an OR less than 1.8, is that studies have to be large and can, with rare exception, only address high frequency SNPs (ie SNPs greater than 5per cent). Moreover,the opportunity to examinegeneenvironmentinteractions should be considered as an important reason for conducting astudy.
Biological plausibility is ac ritical step in choosing genes for either ac andidate gene or pathway approach. So far, less than 2p er cent of genes have been studied, but with the advent of new tools of whole-genome scans, there is nowa n opportunity to look across the genome.Still, for many studies,

Review REVIEW
SNPs have to be selected based on knowledge of the pattern of linkage disequilibrium across the gene or chromosomal region. It is fortuitous that genetic association studies have increased rapidly in scope,m oving away from as ingleS NP in as ingle gene to haplotype-tagging methods for SNP selection in pathwayso fg enes or,i nt he nearf uture,w hole-genome scans of 500,000 or more SNPs per individual. In the end, whole-genome scans will identifym arkerst hat will need to be carefully mapped, similar to the approach for candidate gene studies.
One of the keyi ssues in SNP association studies is replication of results. The literature is strewn with false-positive associations and reproducibility issues. One wayt oa ddress the false-positiveassociation problem is by using theprobability of afalse-positiver eporta sam eans to weight the likelihood that aS NP would be associated with disease based on knowledge of the gene and/or pathway. 64 Thec oncept of false discovery rate (FDR) is an alternative,u seful wayo fc orrecting for multiple testingc omparisons without the stringent penalty stipulated by the Bonferroni correction. 65 The expected proportion of false rejections of the null hypothesis among the total number of rejections is used as am easureo fg lobal error. This method has been applied successfully to studies of qualitative 65 and quantitative 66 traits. Due to linkage disequilibrium between SNPs, however, the Bonferroni correctionwhich tests each SNP as an individual entity-mayb et oo stringent, and an FDR approach mayb em ore conducivet o multiple testingc oncerns in genetic association studies.
The whole-genome association study is based on the extremely high-throughput methods of genotyping hundreds of thousands of SNPs in each individual in the study.A n advantage of this method is that the extent of genetic variation across the entire humangenome can be evaluated at one time, in an 'agnostic manner';n amely withoutp rior knowledge of the putativef unctional importance of ar egion. The NCI Cancer Genetic Markerso fS usceptibility Strategic Initiative, ( http://cgems.cancer.gov)i saprogramme designed to conduct whole-genome scans in breast and prostate cancer,s eparately, and maket he data available to the public.B uilt into the study is the availabilityofnearly 7,000 cases and 7,000 controls for each disease to conduct rapid replication of findings based on an initial scan of 1,200 cases and 1,200 controls per disease, drawn from prospective, cohorts tudies.O ver5 00,000 SNPs will be analysed per subject.T he choice of SNPs is based on tagging bins of SNPs using apairwise correlation (r 2 . 0.8) in the North European group in HapMap Phase 2. 11

Reproducibility
As mentioned above,o ne of the most challenging aspects of genetic association studies in cancer is replication of study results. This is essential for am ore thorough understanding of biological mechanisms and the developmento fp reventive or treatment strategies. Many studies resultingi napossible association of aparticular genetic variant with an increased risk of cancer have failed to be reproduced. Some of the reasons for this include small study size,p opulation stratification, gene -environment interactions, linkage disequilibrium around the variant studied and other intrinsic study biases. For example,i na na nalysiso f2 01 studies of complex disease of 25 different associations, Lohmueller et al.f ound evidence for replication in just less than half of the studies. 67 Another review of genetic association studies in complex diseases also showedl ow reproducibility. 68 Meta-analyses and large investigator networks arec rucial to address these issues. Recent meta-analyses have shown reproducibility of both positivea nd negativea ssociations. These include an ull association of GSTM1 deficiency in breast cancer, 69 -71 prostate cancer 72 and in colorectal cancer, 73 but positivea ssociations of GSTM1 in leukaemia 74 and bladder cancer. 75 Meta-analyses have confirmed positive associations of IGF1 promoter [CA]n repeats in breast cancer, 76 the NAT2 slow-acetylator phenotype in bladder cancer 75 and polymorphisms in DNA-repair genes in breast. 60,77 The InterLymph Consortium investigated SNPs in keyi mmune pathway genes, TNF and IL10 ,i nn on-Hodgkin'sl ymphoma (NHL) and showeda ni ncreased risk for NHL in TNF-308A and IL10-3575A allele carriers. 39

Ah opeful future
We ll-designed, well-powered studies of genetic association in cancer hold greatp romise for advancing knowledge of cancer biology,g enetic risk factorsf or cancer,t herapeutic response and outcome.SNPshavethe potentialtobeused as markersof disease risk, even in the absence of understandingt he functional implications of the SNP.S tudies of mutations in genes such as BRCA1 and TP53 in families have madep rofound impacts on our understanding of moleculara nd cellular biology. 19,22 While SNPs mayn ot be associated with cancer risk to the same degree as ah ighly penetrant mutation in familialc ancer,t hey will still contribute significantly to an understanding of apathway or process in cancer biology.SNPs mayc onfer as yetu nknown subtlec hangesi ng ene function, transcription, intron -exon splicing or protein folding that, in the context of the right environmental exposure and/or in the appropriate genetic background of other variants, could have a significante ffect on disease risk or outcome.
The public health implications of genetic association studies in cancer and other complex diseases are just beginning to emerge. 44 An excellent exampleofthis is astudy of age-related maculard egeneration in which the population-attributable risk of genetic variationi nt he complement factor Hg ene is approximately5 0p er cent. 78 -80 Ap opulation-attributable risk for genetic variation in cancer this significanth as yet to be described, but it is possible.T his, in combination with improved understanding of gene -genea nd

Geneticassociation studies in cancer
Review REVIEW gene -environment interactions, will providethe basis for early diagnosis, interventiona nd prevention of cancer.I ts hould be pointed out, however, that the promise of studying genetic variationi nc ancer cannot be realised without the careful collection and annotation of cases and controls in sufficiently large studies. For the lowp enetrant SNPs, replication of results will have to be followedb yd emonstration of plausibility before entering clinical testing.
In conclusion, the tools for looking at common genetic variationare nowavailable.Moreover, there is the opportunity to sequence large portions of the genome in many cases and controls on theh orizon. The genetic opportunitiesw ill best be realised when studies that include outcome and co-variates have been carried out, especially those that reflect the environmental contributions to cancer.

Note
This is aU SG overnment work,a nd, as such, is in thep ublic domaino ft he United Stateso fA merica.