Genetic factors leading to chronic Epstein-Barr virus infection and nasopharyngeal carcinoma in South East China: Study design, methods and feasibility

Nasopharyngeal carcinoma (NPC) is a complex disease caused by a combination of Epstein-Barr virus chronic infection, the environment and host genes in a multi-step process of carcinogenesis. The identity of genetic factors involved in the development of chronic Epstein-Barr virus infection and NPC remains elusive, however. Here, we describe a two-phase, population-based, case-control study of Han Chinese from Guangxi province, where the NPC incidence rate rises to a high of 25-50 per 100,000 individuals. Phase I, powered to detect single gene associations, enrolled 984 subjects to determine feasibility, to develop infrastructure and logistics and to determine error rates in sample handling. A microsatellite screen of Phase I study participants, genotyped for 319 alleles from 34 microsatellites spanning an 18-megabase region of chromosome 4 (4p15.1-q12), previously implicated by a linkage analysis of familial NPC, found 14 alleles marginally associated with developing NPC or chronic immunoglobulin A production (p = 0.001-0.03). These associations lost significance after applying a correction for multiple tests. Although the present results await confirmation, the Phase II study population has tripled patient enrolment and has included environmental covariates, offering the potential to validate this and other genomic regions that influence the onset of NPC.

Introduction immunoglobulin (Ig) Aantibodies to EBV viral capsid antigens (EBV/IgA/VCA) wasfound to serveasapredictive marker for the developmentofNPC in Chinesepopulations. 8 More than 95 per cent of adults in all ethnicgroups across theworld are healthycarriersofEBV.Inhigh NPC incidence regions, EBV infection of the nasopharyngeal epitheliuminduces IgA antibodies against VCA, suggestingthat reactivation of EBV replication at the mucosal surface precedes the developmentof NPC.Consistentwith this, approximately2.5 per cent of the generalpopulation areEBV/IgA/VCA antibody positive. Of these,less than 3per cent will develop NPC,while . 95 per cent of all NPC patients areEBV/IgA/VCA antibody positive. [9][10][11][12][13][14] In additiontoEBV infection, case control studies have indicated arole for environmental factors, including food preservatives (carcinogenic nitrosamines), salt-preservedfish and phorbolestersinherbs and plants that are commonly consumed among ethnicpopulations with the highest NPC rates. 15,16 Evidence for genetic modulation of NPC risk has accumulated recently.F amilial aggregation of NPC has been observedi nC hina and in other countries. 17 -19 Familial aggregation of NPC is uncommon in low-risko rn on-Chinese populations.T he proportion of NPC witha ffected first-degree family historyi s . 5p er cent in south China, 7.2 per cent in Hong Kong, 6.0 per cent in Yu lin and 5.9 per cent in Guangzhou. 20 Descendants of south Chinese immigrants to westerncountries showprogressively lowerrisk, but their NPC incidence remains higher than that of the indigenous population, 21 suggesting bothe nvironmental and genetic components to disease susceptibility.S everal studies have shown associations between HLA genes and NPC, 22 -28 and the D6S1624 microsatellite within the HLA class Ir egion has been associated with NPC. 29 Studies comparing ageo f NPC onset reportc onflicting results for familialv ersus sporadic NPC.I nastudy comparing 200 probands with and withoutN PC-affected first-degree relativesf romS ingapore, the age of onset was48and 49 years, respectively. 30 In another Chinese study,t he average ageo fo nset was3 5.5y earsi n3 2 Guangdong families with 4-5 relativesw ithN PC compared with 46.6 yearsf or sporadic cases. 20 In at hirds tudy,h owever, the age of onset decreased from 44.5 yearst o4 0.4 as the number of NPC-affected relativesi ncreased from one to four. 31 There is, therefore, somes uggestion that age of onset mayb el ower in families with one or more NPC-affected first-degree relatives.
Agenome-wide linkage analysisof20NPC families from a high incidence region in Guangdong identifiedasusceptibility region on the shorta rm of chromosome4 . 32 Tw oc hromosome 4p15.1-q12 markers, D4S405 and D4S3002, yielded high logarithm of the odds (LOD) scores ( . 3.5)b yb oth parametric and multipoint non-parametric analysis in 70 per cent of the NPC families studied. As ubsequent study of 18 families from Hunanp rovince genotyped ap anel of markerso nt he shorta rmso fc hromosomes 3, 9a nd 4t hat included D4S405 and D4S3002 and failed to detect an obvious susceptibility locus on 4p15.1-q12. 33 Ar egion on chromosome 3p21.31-21.2 containing at umours uppressor gene cluster,however,showedamodestassociation with NPC incidence. 33 Here,w edescribe the designofanew case -control study population recruited for the discovery of genetic factorst hat are involved in the developmentofchronic EBV infection and in the developmento fN PC.I napreliminaryt est to resolve the discrepancy between the twof amily-based studies, we performed ap opulation-based case-control association analysiso f3 4m icrosatellite markersw ithin 4p15.1-q12 ( Figure 1) to determine if specific alleles within the region: 1) were associated with ap ropensity to develop chronicE BV replication, as evidenced by IgA antibodies against EBV viral capsid antigen (EBV/IgA/VCA); or 2) were associated with NPC susceptibility.

Study design
Enrolment into the study occurredi nt wo collection phases. The Phase Ip ilot wasp owered to detect single gene associations and to determine feasibility for meeting recruitment goals, accuracy of data collection and sample handling, and to develop the infrastructure for al arge international collaboration. Cases and controls ( n ¼ 984) were recruited in 2000 from Wuzhou City and Cangwu County,b ordering the Xijiang River in the Guangxi province of South East China. An effortw as made to enrol triads consisting of ap roband, an unaffected spouse and an adult child or parent. Family triads were enrolled for haplotype inference and for quality control assessment. Three clinically described disease categories were collected: 1) incident or prevalent NPC biopsy-confirmed (NPC þ )c ases( n ¼ 350) who were EBV/IgA/VCA antibody positive( IgA þ ); 2) IgA þ cases ( n ¼ 288) who were defined as EBV/IgA/VCAantibody positiveand NPC free at the time of study enrolment (EBV/IgA/VCA titres were confirmed by serological testing at the time of study enrolment); 3) IgA 2 controls ( n ¼ 346). For each case,h is or her spouse wast ested for EBV/IgA/VCA antibodies, and the spouse and parent or adult child were invited to enrol. The IgA 2 group consisted of 346 spousesw ho were IgA 2 at the time of study enrolment ( Ta ble 1). Ad ominant model wass elected for powerc alculations for tworeasons: 1) if the true model is additive, there is little difference in poweru sing either an additive model or a dominant model for powercalculations (data not shown); 2) if, however, the true model is dominant, adominant model is the most powerful. Assuming ad ominant genetic modela nd at least a10per cent allele frequency,this number of NPC, IgA þ , and IgA 2 cases and controls provided . 90 per cent powert o detecta ssociationsw ith an odds ratio (OR) $ 3, at the p ¼ 0.01l evel for at wo-tailed test ( Ta ble 2).
Phase II enrolment wasi nitiated in 2004 and after the completion of Phase Ic ollection. TheP hase II design is a cross-sectional, case control study: family membersw ere not recruited. Aq uestionnaire capturing environmental factors, including occupational, dietarya nd tobacco exposures, was administered to each study participanta te nrolment ( Ta ble 3). NPC cases were recruited from the Wuzhou RedC ross Hospital in collaboration witht he Cancer Institute of Wuzhou, Wuzhou City and the Cangwu Institute for Nasopharyngeal Carcinoma Control and Prevention, Cangwu County.NPC cases,IgA þ subjects and IgA 2 participants were recruited from citiesa nd villagesb ordering the Xijiang River. Powerw as determined for single gene and genee nvironment interactions for participants in each group (Table 2). For single-gene associations at a1 0p er cent allele frequency, powerwill range from 83 per cent to . 99 per cent and from 35 per cent to . 99 per cent to detecta ssociationsw ith an odds ratio (OR) of 1.5-3.0 at p , 0.05 and p , 0.001, respectively,f or the dominantg enetic model and at wo-sided significance level. For gene-environment interactions, there is powert od etect gene-environmente ffects for genotype and exposures with frequencies $ 0.1 for genotype and exposure, if the main exposure effect and genotype have an OR $ 1and an interaction effect of OR $ 3. 34 Exclusion criteria for PhasesIand II were ethnicity other than Han Chinese,b irth or residency for more than six monthso utside of the NPC endemic region or failure to provide informed consent. Internal reviewboard approval was obtained from all participatingi nstitutions and informed consent waso btained from each study participanto rt heir guardian for subjects between 16 and 18 yearso fa ge.

Sample and data handling
At otal of 10 -20mlo fb loodw as collected in acid citrate dextrose (ACD) vacutainersf or serology testing, direct DNA extractiona nd for cryopreservation of peripheral blood of the genotypes were determined from DNA directly extracted from whole blood.

Genetic association analyses
Allele frequencies were computed and compared between cases and controls usingP earson's x 2 test or Fisher'se xact test.
ORs, 95 per cent confidence intervals (CIs) and p values were computedf or dominanta nd recessive genetic models adjusted for age and sex. Logistic regression adjusted for age and sex wasu sed to compute ORs using SAS PROC LOGISTICs oftware (SAS Institute,C ary, NC,U SA). ORs were computedf or ad ominant model, comparing the combined homozygous and heterozygous genotypes against all other genotypes. When the allele frequency of the minor allele was $ 5p er cent, ORs were calculated for the recessive model, comparing the homozygous genotype against all other genotypes. Conformance to Hardy-Weinberg equilibrium expectations wasc alculated for all loci. Te sts for D' as a measure of linkage disequilibrium (LD) were conducted for allele pairsu sing SAS Genetics software( SAS Institute,C ary, NC,U SA).

Results
The Phase Ip ilot study enrolled participants from the Cancer Institute in Wuzhou City and the Cangwu Institute for Nasopharyngeal Carcinoma Control and Prevention, Cangwu Countyi nG uangxi province in the autumno f2 000. For NPC cases, 71.3 per cent of spousesa nd 81 per cent of adult children were enrolled. For cases withE BV/IgA/VCAt itres consistent with chronicE BV infection, 72.4 per cent of spousesa nd 67.4p er cent of adult children or parents were enrolled. Complete triad sets were available for 366 NPC probands. As predicted for this highly endemic NPC region, 71.8 per cent of the NPC cases were male.P BMCs cryopreservedon-sitew ere transported to the Laboratory of Genomic Diversity-National Cancer Insitute( LGD-NCI) for EBV immortalisation: 83 per cent of 633 transformation attempts resulted in LCLs.
Sample and genotyping errors were estimated by including 10 per cent duplicate sampling with one sample derived from DNA isolated directly from peripheral blood and the second from DNA isolated from LCLs. Lesst han 0.5 per cent mismatches within duplicate samples were observed, all of which were resolved using family trios, indicating that tubes collected from as ingle individual were appropriately labelled (data not shown) and that errorwas not introduced during cell line development or sampleh andling. As econd test for Mendeliane rrorsu singP edChek wasp erformed usingt he chromosome 4m icrosatellite data (described below). Tw o unresolved Mendeliane rrorsw ere observedw ithin the 366 family triads. Near-complete genotyping and complete clinical data were available for 350 NPCc ases, 288 IgA seropositives and 346 IgA seronegatives (Table 1).
Phase II enrolment occurred between November 2004 and July 2005 in Guangxi province.S ubjectsw ere enrolled if at least one parent wasf romt he Guangxi or Guangdong provinces.N PC cases were identifieda ss eroincident or seroprevalent cases presenting at RedC ross hospitals and IgA þ and IgA 2 controls were identifiedf romfi eld stationsi n cities and villagesb ordering the Xijiang River drainage. Ta ble 3p resents summaryd ata of environmental exposures for the Phase II NPC þ ,I gA þ and IgA 2 groupsa nd the numberso fp articipants enrolled.
We have addressed the questions of whether alocus within the chromosome4 p15.1-q12 region leads to the development of NPC or the developmentofEBV/IgA/VCAinresponse to EBV replication using the Phase Ic ases and controls. Microsatellite loci(n ¼ 34) were distributed over an 18 Mb region on chromosome 4p15.1-q12, with intervals of 10 -3,500 kb and an average distance of 530 kb.F our Phase Igenetic association comparisons were made: 1) NPC cases versus EBV/IgA/VCA seropositivecontrols ( Ta ble 4); 2) EBV/IgA/VCA seropositive cases without NPC versus EBV/IgA/VCA seronegativec ontrols ( Ta ble 5); 3) NPC cases plus EBV/IgA/VCAseropositive cases versus EBV/IgA/VCA seronegativec ontrols ( Ta ble 6); and 4) NPC cases versus EBV/IgA/VCA seronegativecontrols (data not shown). No distortions in Hardy-Weinberg equilibrium were observed. Alleles with at least one significant result ( p , 0.05) for either the dominant or recessive genetic models are reported in Ta bles 4-6. The results arep resented without correction for multiplecomparisonsb ecause the interrogated 4p15.1-q12 region waspreviously implicated as asusceptibility locus in afamily-based study and we were specificallyt esting the prior hypothesis that markerswithin the region would also be associated withNPC in apopulation-based study. 32 It should be noted that associations with p . 0.0015 would not remain significant after correction for multiple comparisons considering the 34 independentl oci.

Linkage disequilibrium among the 34 loci
The spacingo fmarkersv aried from 10 -3,500kb, with denser coverage flanking them icrosatellite markersw itht he highest LOD scoresf romt he familys tudy ( Figure 1). We calculated two-point D' as am easureo fL Db etweena ll alleles at neighbouring shorttandem repeat loci; however, aD'value of 1( completeL D) waso bservedf or only 60 two-point allele combinations. Using HapMap single nucleotide polymorphism (SNP) data (http://www.hapmap.org), we examined whether the microsatellites were included in reasonably strong LD blocks. Ther 2 between any givenm arker pairsw ere set at a0 .8 cut-off threshold to determine the LD blocks. Only 11 of the 34 microsatellite markerso ccurred within an LD block: D4S396 and D4S401 occurred within the same 17 kb block. Of the twoN PC-linkedm arkers, 32 D4S405 wasn ot within ab lock and D4S3002 occurredw ithin an 8kbb lock. The mean size of the blocks was1 7.9kb (range 8-50kb).

Genetic association with persistent IgA 1 status
To test the hypothesis that genetic factorsmay influence EBV/ IgA/VCAf ormation in response to EBV infection,w ec ompared genotype frequencies between 288 IgA þ cases and 346 IgA 2 controls (Table 1). Ta ble 5p rovides the allele frequencies, p values, ORs and 95 per cent CIs in cases and controls for significant results. Eleven alleles were significantly associated with IgA þ persistence: fiver isk alleles (OR 1.51 -2.38; p ¼ 0.004 -0.040) and six protective alleles (OR 0.33 -0.70; p ¼ 0.002 -0.050).
Becausea ll NPC cases in our study were IgA þ ,w et hen pooledN PC and IgA þ cases together to increase power, with the hypothesis being that the alleles associated with IgA þ serostatus would be shared among NPC þ IgA þ and NPC 2 IgA þ individuals. Significant associations are presented in Ta ble6:four alleles were associated with risk for IgA þ (OR 1.5 -1.63; p ¼ 0.001 -0.030) and seven were protective( OR 0.46 -0.76; p ¼ 0.001-0.050). Based on the twoc omparisons (Tables 5and 6), ten alleles associated withIgA were shared in both comparisons.

Discussion
We have described the designa nd recruitment efforts for a genetic association study to investigate the role of host genetic factorsi nt he development of chronic EBV infection leading to NPC in subjectsb orna nd living in ar egion with one of the world'sh ighest incidence rates of NPC.T his study wasc onducted in twop hases. Phase Iw as ap ilot study to explore the feasibility of conducting ac ross-sectional study      ( Ta ble 1). Thep ilot provided strong supportf or expanding the study in several important ways: exportpermits for genetic material were obtained, sample handlingw as excellent -with few detectable errors -and recruitment goals were attainable.Upon the successful completion of Phase I, we increased the catchment area for IgA þ cases to cities and villagesa long the Xijiang River and tributaries, expanded the study to include more subjectsa nd added ad etailed questionnaire to capture environmental exposures that may interactwith host genes in the developmentofNPC (Table 2). Complementing previous studies, we also attempted to determine if Phase II of thes tudy wasp owered for the detection of both gene-gene and gene-environmenti nteractions.
To revisit the recent linkage analysisi nN PC families implicating as usceptibility locus linked to chromosome 4p15.1-q12, we selected 34 microsatellite locis panningt he 18 Mb region at intervals of 10 -3,500 kb.U nlikei np revious studies, we first also attempted to determine if the chromosome 4r egion wasa ssociated with EBV/IgA/VCA antibody formation and, secondly,i ft he chromosome4region was associated with NPC incidence in the setting of EBV replication as indicated by EBV/IgA/VCA. We identifieds everal loci that showedsignificantassociations witheither EBV/IgA/ VCA or NPC status. The associations tended to be marginally significant for NPC (Table 4), with somewhat stronger associations observedf or EBV/IgA/VCA (IgA þ ) (Tables 5a nd 6).
FewN PC families have been identifiedo utside of NPC endemic areas. More than 90 per cent of all NPC cases do not showf amiliala ggregation or family history, implying either environmental causes or geographicalf amily clustering. Tw o family-basedN PC linkage studies implicated different chromosomes as harbouring an NPC susceptibility locus. 32,33 Although the studies differed in strategy, bothu sed multiple families with twoo rm ore NPC cases from twos eparate high NPC incident provinces in Chinaa nd included similar numberso ff amilies and affected cases.A lthough it is possible that environmental exposures mayd iffer between the two provinces,i ti su nlikely that different environmental factors account for the lack of concordance between the studies. More likely,m ultiple genes predispose to chronic EBV replication and the developmento fN PC,e ach of which may contribute only as mall parto ft he total genetic influence. Family-based linkage studies arei deal for identifying single genes withl arge effects, but arer elatively insensitivef or localising genetic factorsw ith small effects. By contrast, casecontrol association studies arei deal for identifying genetic factorswith small or moderate effects once acandidate gene or region has been identified. 38 We cannot exclude the possibility that there mayb ec ausal alleles in the chromosome4region that mayb ea ssociated with chronic EBV replication or ap redisposition to develop NPC.Marker associations within this region (Tables4-6)may be trackingasusceptible locus through LD.B ecause included alleles predominantly occurred at very lowf requencies and haplotype inferences were unreliable,w ec ould not reliably assess associations with either EBV persistence or NPC (data not shown). Ad enser placement of polymorphic markersi s required to surveyt he genetic variationc ontento ft he region more thoroughly.
Although this study did not find associations with robust p values for NPC,making conclusionstentative, anumber of loci did showmoderate to strong risk, suggestingthatthis region warrantsfurther attention, particularly for chronicEBV replication. For one of the microsatelite loci D4s3347 (Tables 5  and 6), twoalleleswereassociated withEBV/IgA/VCA, suggesting that these alleles maybetracking apotentialcausative allele (see Figure 1). Of potential interestisthe association of twomicrosatellites with IgA incidence: D4S3347 ,which shows three significant associationswith p , 0.01 and one with p , 0.05 for twoalleles ( 213 and 217), and the tightly linked ( , 20 kb) D4S1577 locus, which also shows four significant associations(p , 0.05) (Tables 5and 6, Figure 1). Microsatellite D4S190occurswithin the oncogene ARHH. D4S190was associated with risk for EBV/IgA/VCAseropositive status butnot with NPC.ARHH, amember of the ras homolog gene family,encodes asmall GTP-binding protein belonging to the RAS superfamily and is transcribed by only haemopoietic cells. ARHH non-coding variants that mayaffect expression areobservedin46per cent of diffuse large-cell lymphomas. 39 It is possible that one or more variant alleles of ARHH in LD withassociated D4S190-170 maymodify EBV replication.
Given the similar geographicald istribution of familiala nd non-familialN PC,i ti sl ikelyt hat both forms share similar aetiological risk factors, particularly environmental and viral factors; however, it is likely that the genetic factorsu nderpinningf amilial, early-onset and non-familialN PC susceptibility maya lso overlap.I ti sa lso possible that different genes contribute to familialNPC cases, analogous to the situation in breast cancer,w here BRCA1 and BRCA2 account for only as mallp roportion of non-familialb reast cancer cases. 40,41 The besta pproach to identifying NPC susceptibility factors mayb et he organisation of well-designed and highly powered case-control studies for whole-genome and targeted candidate gene association investigations, as we describe here.