Open Access

Characterisation of SNP haplotype structure in chemokine and chemokine receptor genes using CEPH pedigrees and statistical estimation

Human Genomics20041:195

https://doi.org/10.1186/1479-7364-1-3-195

Received: 16 January 2004

Accepted: 16 January 2004

Published: 1 March 2004

Abstract

Chemokine signals and their cell-surface receptors are important modulators of HIV-1 disease and cancer. To aid future case/control association studies, aim to further characterise the haplotype structure of variation in chemokine and chemokine receptor genes. To perform haplotype analysis in a population-based association study, haplotypes must be determined by estimation, in the absence of family information or laboratory methods to establish phase. Here, test the accuracy of estimates of haplotype frequency and linkage disequilibrium by comparing estimated haplotypes generated with the expectation maximisation (EM) algorithm to haplotypes determined from Centre d'Etude Polymorphisme Humain (CEPH) pedigree data. To do this, they have characterised haplotypes comprising alleles at 11 biallelic loci in four chemokine receptor genes (CCR3, CCR2, CCR5 and CCRL2), which span 150 kb on chromosome 3p21, and haplotyes of nine biallelic loci in six chemokine genes [MCP-1(CCL2), Eotaxin(CCL11), RANTES(CCL5), MPIF-1(CCL23), PARC(CCL18) and MIP-1α(CCL3) ] on chromosome 17q11-12. Forty multi-generation CEPH families, totalling 489 individuals, were genotyped by the TaqMan 5'-nuclease assay. Phased haplotypes and haplotypes estimated from unphased genotypes were compared in 103 grandparents who were assumed to have mated at random.

For the 3p21 single nucleotide polymorphism (SNP) data, haplotypes determined by pedigree analysis and haplotypes generated by the EM algorithm were nearly identical. Linkage disequilibrium, measured by the D' statistic, was nearly maximal across the 150 kb region, with complete disequilibrium maintained at the extremes between CCR3-Y17Y and CCRL2-1243V. D'-values calculated from estimated haplotypes on 3p21 had high concordance with pairwise comparisons between pedigree-phased chromosomes. Conversely, there was less agreement between analyses of haplotype frequencies and linkage disequilibrium using estimated haplotypes when compared with pedigree-phased haplotypes of SNPs on chromosome 17q11-12. These results suggest that, while estimations of haplotype frequency and linkage disequilibrium may be relatively simple in the 3p21 chemokine receptor cluster in population samples, the more complex environment on chromosome 17q11-12 will require a higher resolution haplotype analysis.

Keywords

chemokineSNPhaplotype estimationpedigree analysislinkage disequilibrium

Introduction

Chemokines are small intercellular signalling molecules that recruit immune cells to sites of inflammation and infection. The two major subfamilies of chemokine proteins are defined as CC, with two adjacent cysteine residues, or as CXC, with an intervening non-conserved amino acid. Other chemokine family members have cysteine residues separated by more than one intervening amino acid (eg CX3C or Fractalkine) [1, 2], or are characterised by having only one cysteine (eg XCL1 or Lymphotactin) [3, 4]. Chemokine receptors are defined by the subfamily of chemokine ligand that they bind. Both the chemokine and the chemokine receptor genes are generally clustered in four distinct chromosomal regions: CC on 17q11-21, CXC on 4q12-21, both CCR and CXCR on 3p21-24 and CXCR on 2q21-35.

Variation in chemokines, or their cell-surface receptors, influences an individual's susceptibility to HIV-1 infection and modulates progression to AIDS [511]. Chemokine signals are also important in the angiogenic [1214] and metastatic [15, 16] processes of cancer. Therefore, describing the genetic variation and haplotype structure of chemokine and chemokine receptor gene clusters is necessary for further disease association analyses of these candidate genes.

The focus of the present analysis is to describe the structure of multi-single nucleotide polymorphism (SNP) haplotypes in chemokine genes on chromosome 17q11-12 and chemokine receptor genes on chromosome 3p21 in Centre d'Etude Polymorphisme Humain (CEPH) pedigrees (n = 489). Secondary to this goal is to use the empirically phased haplotypes to determine the accuracy of estimated measures of haplotype frequencies and linkage disequilibrium using the subset of CEPH grandparents (n = 103).

Samples and methods

Study samples

SNP screening and validation were performed using two population panels: a 16-individual panel (four European-Americans, four African-Americans, four Chinese and four self-identified Hispanic-Americans) and an 88-individual panel (30 African-Americans, 34 European-Americans and 24 Hispanics). Forty multi-generation CEPH families, a total of 489 individuals, were genotyped for 23 SNPs scattered over two gene clusters: CC-chemokines on 17q11-12 and CC-chemokine receptors on 3p21 (see Table 1). Genotype data from a subsample of 103 unrelated grandparents were used for comparative haplotype analyses. The use of all anonymous DNA samples was either reviewed by the NIH Internal Review Board or determined 'exempt' from review.
Table 1

Biallelic loci typed in CEPH pedigrees

Haplo- type position

Gene

NCBI locus link

Nucleotide/AA position

NCBI genbank number

NCBI contig

Contig position

NCBI dbSNP ss#

Allele 1

CEPH GP frequency

3p2l

         

1

CCR3

1232

Y17Y, A/G

NM_001837

NT_05827

3997337

4987053

A

0.907

 

CCR2

1231

- 5983 G/A

U95626

NT_05827

4083672

 

G

1

2

CCR2

1231

- 5048 G/T

U95626

NT_05827

4084607

3918357

G

0.892

 

CCR2

1231

- 4866 C/G

U95626

NT_05827

4084789

3918370

C

1

3

CCR2

1231

- 3433 T/C

U95626

NT_05827

4086222

3092964

T

0.802

4

CCR2

1231

V64I, C/T

NM_000647

NT_05827

4089845

1799864

C

0.898

5

CCR2

1231

N260N, A/G

NM_000647

NT_05827

4090435

1799865

A

0.696

6

CCR5

1234

208 C/A

NM_000579

NT_05827

4102477

2734648

G

0.631

7

CCR5

1234

303 C/T

NM_000579

NT_05827

4102572

1799987

G

0.524

8

CCR5

1234

676 T/C

NM_000579

NT_05827

4102945

1800023

A

0.631

9

CCR5

1234

L55Q, T/A

NM_000579

NT_05827

4105194

1799863

T

0.976

10

CCR5

1234

D32

NM_000579

NT_05827

  

NODEL

0.905

11

CCRL2

9034

1243V, C/T

NM_003965

NT_05827

4140934

3204850

C

0.902

 

CCRL2

9034

1137 C/G

NM_003965

NT_05827

4141344

 

C

1

17q11-12

         

1

MCP-1(CCL2)

6347

- 362 C/G

M37719

NT_010799

7315787

2857656

C

0.718

2

EOTAXIN (CCL11)

6356

- 1382 C/T

Z92709

NT_010799

7345226

4795895

C

0.777

3

RANTES (CCL5)

6352

- 8147 A/G

NM_002985

NT_010799

8932972

 

A

0.903

4(1)

MPIF-1 (CCL23)

6368

M106V, G/A

U85767

NT_010799

9074064

1003645

A

0.832

5(2)

PARC (CCL18)

6362

- 116 C/T

AB012113

NT_010799

9125397

2015086

C

0.662

6(3)

PARC (CCL18)

6362

81 G/A

AB012113

NT_010799

9125563

2015070

G

0.97

7(4)

PARC (CCL18)

6362

311 C/A

AB012113

NT_010799

9125793

2015052

A

0.922

8(5)

PARC (CCL18)

6362

6793 A/G

AB012113

NT_010799

9132275

14304

G

0.909

9(6)

MIP-1A (CCL3)

6348

- 1541 T/C

M23178

NT_010799

9152727

1634497

A

0.705

Chemokine and chemokine receptor SNPs

Conditions for SNP detection in the CCR2 promoter

Four of the 23 SNPs included in the haplotype analysis (Table 1) have not previously been reported and were discovered by direct sequencing. Three kilobases of the CCR2 promoter region were amplified using the Invitrogen Platinum Taq™ kit in a panel of 16 individuals (32 chromosomes), including four European-Americans, four African-Americans, four Chinese and four Hispanic-Americans (self-identified). For 100 μL polymerase chain reactions (PCRs), 200 nM deoxyribonu-cleotide triphosphates (dNTPs), 200 nM of each primer, 400 nM MgSO4, 10 μL of 10 × Platinum Taq™ buffer and 1 μL Platinum Taq™ were mixed with approximately 100 ng of genomic DNA. Primer sequences for the 3 kb product were as follows: 5'-TCATCTGCTTCTTAATTGCCTTCAG-3' (forward) and 5' -CAGGGTTTCTCTAACATCTCCTGGT-3' (reverse). PCR was performed in a PE Biosystems 9700 ThermoCycler with long-range PCR conditions recommended for Platinum Taq™.

Sequencing was performed on a 3 kb segment at intervals of 400-500 kb with internal primers using the BigDye™ (Applied Biosystems) cycle sequencing kit with some modifications. Sequencing reactions were performed as follows: 15-30 ng of purified product was added to 10 μL reaction solution, which included 2 μL of BigDye™ mix, 1 μL of standard 5 × dilution buffer, 1.1 μL of 0.5 μM primer stock and double-distilled water (ddH2O) for the remaining volume. Reactions were cycled in a PE Biosystems 9700 thermo cycler under the following conditions: 95°C for five minutes, and 30 cycles of 95°C for 30 seconds, 50°C for ten seconds and 60°C for four minutes. All individuals were sequenced for the entire 3 kb in both forward and reverse directions on an ABI 3700 capillary sequencer. Sequence trace files were analysed by the Phred/Phrap/Consed system [1720], and PolyPhred was used to detect putative SNPs [21].

Eight SNPs (-5983 G/A, -5047 G/T, -4866 G/C, -4599 T/G, -4419 C/T, -4338 A/T, -3433 T/C and -3232 C/T) were confirmed by visual inspection of the CCR2 promoter sequence of the 16-individual screening panel. Five of these SNPs (-5983 G/A, -5047 G/T, -4866 G/C, -4599 T/G and -3433 T/C) were validated by direct sequencing in a larger sample set that comprised 88 individuals from three populations: 30 African-Americans, 34 European-Americans and 24 self-described Hispanics. The other three SNPs were not validated in the larger sample set, as they are in nearly complete linkage disequilibrium with at least one of the five SNPs chosen for further study. Four of the five validated SNPs listed in Table 1 (-5983 G/A, -5047 G/T, -4866 G/C and -3433 T/C) were successfully optimised for 5'-nuclease assays.

Conditions for screening putative SNPs

The remaining 19 SNPs listed in Table 1 were previously characterised in this laboratory by denaturing high performance liquid chromato-graphy (dHPLC) or single-stranded conformation polymorphism (SSCP) analysis, or were taken from published works or public databases. Flanking primers were designed for a total of 22 polymorphisms from dbSNP [22] using Primer 3.0 from MIT, Cambridge, MA [23]. PCR was performed in 25 μL-scale reactions with the following components: 50 ng genomic DNA, 3 mM MgCl2, 200 nM dNTPs, 200 nM of each primer, 1U TaqGold™ (Applied Biosytems) and 2.5 μL 10 × TaqGold™ Buffer. The cycling conditions (PE Biosystems 9700) for all reactions were as follows: a 95°C hold for ten minutes, then a touch-down cluster of 12 cycles (95°C for 30 seconds, 62-57°C (decreasing by 0.5°C every cycle) for one minute and 72°C for 1 minute), a standard cluster of 30 cycles (95°C for 30 seconds, 57°C for one minute and 72° for one minute) and a final 72°C hold for seven minutes. PCR products were purified using 10 U exonuclease 1 and 2 U shrimp alkaline phosphatase (SAP) enzymes under the protocol specified by the Washington University Sequencing Center [24].

All purified reaction solutions were sequenced as follows: 15-30 ng of purified product was added to 10 μL reaction solution, which included 2 μL of BigDye™ mix, 1 μL of standard 5 × dilution buffer, 1.1 μL of 0.5 μM primer stock and ddH2O for the remaining volume. Reactions were cycled in a PE Biosystems 9700 thermo cycler under the following conditions: 95°C for five minutes, and 30 cycles of 95°C for 30 seconds, 50°C for ten seconds, and 60°C for four minutes. Nine of the 22 primer pairs produced viable sequences and the SNPs were polymorphic in at least one of the 16-individual population panel. Those 'confirmed' SNPs were further characterised by either sequencing or genotyping in the larger sample of 88 individuals (data not shown).

SNP genotyping

All 23 SNPs were genotyped using the 5'-nuclease assay under a set of universal assay conditions. Dual-labelled TaqMan™ (Applied Biosystems) probes, standard, Turbo and Minor-Groove Binding (MGB) chemistries were designed using Primer Express™ (Applied Biosystems). Previous analysis of genotyping accuracy using the TaqMan method revealed 14 discordancies out of 1,165 duplicate genotype pairs, a 1.2 percent error rate averaged over multiple TaqMan assays [25]. PCR conditions for genotyping (reaction components and cycling conditions), as described in Morin et al. (1999) and Clark et al. (2001), were used for all SNPs typed in this study [25, 26]. PCR was performed in 96-well plates that included positive genotypic controls (for both homozygote states and the heterozygote state for each SNP) and reactions with no DNA as a negative control. All 5'-nuclease assay plates were read on the ABI 7700 Sequence Detector, and analysed using the 'dye components' feature of the SDS v1.6.3 or v1.7 software package (Applied Biosystems). Genotype determinations for each reaction were made manually by visual inspection of a scatter-plot of the data, with reference to the results of the genotype control samples. CEPH pedigree data for all 23 genotyping assays were checked for concordance with Mendelian inheritance using PEDCHECK [27].

Haplotype analysis

Haplotype phase was determined using the CYRILLIC II pedigree drawing software (Cherwell Scientific) to establish the inheritance of multi-locus genotypes. The algorithm developed by Guo and Thompson (1992) was used to determine whether the distribution of whole haplotypes in the CEPH grandparent sample (n = 103) deviates from Hardy-Weinberg proportions [28]. Significance is determined by an exact test, with a cut-off of p = 0:05: Haplotype states and frequencies on both chromosomes 3p21 and 17q11-12 were estimated in sets of unphased genotype data by MLOCUS [29, 30], which uses the expectation-maximization (EM) algorithm [31], a maximum-likelihood based method. A previously described three-step procedure to determine the most likely set of haplotypes to describe the genotype data was used here to analyse the haplotype states and Frequencies for all datasets [32]. Haplotype blocks on 3p21 were assessed using HaploBlock-Finder [33], which performs the four-gamete test (FGT) between each pairwise SNP to identify past recombination events [34]. The minimum-D' method [35, 36] (with minimum D' = 0.80) was also used to assess haplotype block structure in the 150 kb region of 3p21.

Validation of haplotype estimation

Haplotype frequencies are determined by direct counting of whole chromosomes in the grandparents after haplotypes are established by pedigree analysis. Haplotypes were estimated using MLOCUS with unphased genotype data from these same individuals. Comparisons of the two methods were performed with genotype data from two regions: the chemokine cluster (six genes) on chromosome 17q11-12 and the chemokine receptor cluster (four genes) on chromosome 3p21. For the 17q11-12 data, two analyses were performed: one included all nine SNPs typed in all six genes arrayed over 2 Mb, and the other included only six of these SNPs in the 77 kb 'core' region of three genes (MPIF-1, PARC, MIP-1a) on 17q11-12. The analysis of the 3p21 chemokine receptor genes included 14 SNPs arrayed over 150 kb.

The IF and IH algorithm performance indices suggested by Excoffier and Slatkin (1995) were used to quantitatively evaluate the estimation results in the CEPH grandparents [37]. The IH index evaluates the performance of the algorithm to identify the actual haplotypes, and the IF statistic examines how close the estimated frequencies are to the pedigree haplotype frequencies. IH and IF values were calculated using only those haplotypes above the threshold frequency (1/2n). A mean squared error (MSE) statistic was also used to compare the estimated haplo-type frequencies to the pedigree-derived frequencies [38]. To determine whether omitting those grandparents who could not be phased from the analysis generates skewed pedigree-derived haplotype frequencies, MLOCUS haplotype estimations of the total sample (n = 103) were compared to the 'phased-only' sample using the above-described performance indices.

Estimating linkage disequilibrium in population data

D' statistics were calculated with phased haplotypes derived from pedigree analysis with DnaSP (v3.53) [39]. Linkage disequilibrium estimates generated by haplotypes determined by pedigree analysis in the CEPH grandparents were compared with those estimates calculated from MLOCUS reconstructed haplotypes in the same datasets. PAIRWISE was used to estimate linkage disequilibrium from the estimated haplotypes generated by MLOCUS [30]. PAIRWISE generates Lewontin's normalised D' statistic [40] and the p-value determined from an exact test of association between all pairs of polymorphic loci in the dataset.

Results

Haplotype analysis of 3p21 SNPs

To determine the haplotype structure of SNPs in the 3p21 region, we typed 14 polymorphisms in the CEPH pedigrees. Eleven of the 14 SNPs were polymorphic and none of these SNPs deviated from Hardy-Weinberg equilibrium (HWE) at the p = 0:05 significance level. Haplotype phase was established for every grandparent in the sample (n = 103): Haplotype frequencies were then determined by direct counting of whole haplotypes (Table 2). Nine haplotypes explained nearly all of the variation (98 per cent) in the CEPH grandparents. The remaining 2 per cent is composed of two haplotypes that occur only once.
Table 2

Results from a comparison of pedigree-derived and estimated haplotype frequencies (n = 103).

Haplotype number

Haplotype

GP count

Pedigree frequency

MLOCUS frequency

Similarity index

MSE

1

11111111111111

60

0.2913

0.2936

0.0024

0.00001

2

11111112221111

39

0.1893

0.1909

0.0015

0.00000

3

11112122221111

35

0.1699

0.1719

0.0020

0.00000

5

11211211111111

21

0.1019

0.1001

0.0019

0.00000

4

21111121211121

20

0.0971

0.0882

0.0089

0.00008

6

11111111111211

18

0.0874

0.0919

0.0045

0.00002

8

11111111112111

5

0.0243

0.0223

0.0020

0.00000

7

11112121111111

4

0.0194

0.0186

0.0008

0.00000

 

11112121211121

2

0.0097

0.0098

0.0001

0.00000

 

11212122221111

1

0.0049

0.0049

0.0000

0.00000

 

11111121211111

1

0.0049

0.0049

0.0000

0.00000

 

11211211211111

0

0.0000

0.0022

0.0022

0.00000

  

206

1.0000

0.9993

0.9869

0.00001

The total similarity index (IF) and mean squared error (MSE) values are indicated at the bottom of the table. Haplotypes that are only present in MLOCUS estimates are denoted in italics. The haplotype number indicates the equivalent haplotype to those seven SNP haplotypes discussed in Clarket et al. 2001 [25].

The diplotypes, or multi-locus genotypes, were also counted in the CEPH grandparent sample. The diplotype combination of haplotypes 1 and 3 was the most frequent in the sample, at 13 per cent. In the CEPH grandparent sample, the 3p21 haplotypes were in HWE, as the randomisation test of the distribution of diplotypes yielded a non-significant p-value of 0.2708. When analysed individually, the 11 polymorphisms demonstrated no deviations from HWE in the CEPH grandparent sample.

Both haplotype block tests, the FGT and the minimal-D' method (set to the default of a minimum D' = 0.80), found a break between CCR2-N260N and CCR5-208. This indicates a past recombination event somewhere in the 20 kb between CCR2 and CCR5. The pedigee haplotypes support this, as although there was no direct observation of a recombination event in the pedigree data, one haplotype (11112121211121) appeared to be a recombinant of haplotypes 4 (211111121211121) and 7 (11112121111111).

Haplotype analysis of 17q11-12 SNPs

To characterise the chemokine loci on chromosome 17, haplotype analysis was performed using all nine SNPs (over a 2 Mb region), as well as a subset of six SNPs arrayed over the 73 kb region, which includes MPIF-1, PARC and MIP-1a. Conclusive phase was established for only 87 individuals of the 103 in the CEPH grandparent sample for nine-SNP haplotypes. A total of 70 per cent of the variation of the total sample (n = 103) was explained by 14 haplotypes (of nine SNPs) (Table 3). The remaining portion included 11 doubleton haplotypes (found in two individuals), ten singletons (occured only once), as well as the 32 unphased chromosomes. When the analysis was reduced to six SNPs in the 73 kb region (Table 4), we were able to phase 96 grandparents by visual inspection of the pedigrees. Haplotype phase was not definitely assigned to seven of the 103 grandparents because two or more haplotype combinations could be inferred, given the diplotypes of their children or because of missing data. Eight six-SNP haplotypes explain 90 per cent of the variation in the CEPH grandparent sample (n = 103); and 41 per cent of the total number of chromosomes carry the most common haplotype (111111) (Table 3). The remaining 10 per cent of the total number of chromosomes (2n = 206) is comprised of two doubletons, two singletons and the 14 unphased chromosomes.
Table 3

Comparison of pedigree-phased haplotypes for nine SNPs over 2 Mb of 17q11-12 in Centre d'Etude Polymorphisme Humain (CEPH) grandparents (n = 87) with MLOCUS estimates from unphased genotype data from these same individuals

Haplotype

Count

Frequency

MLOCUS

Similarity index

MSE

111111111

36

0.2069

0.2374

0.0305

0.0009

111111122

22

0.1264

0.1332

0.0067

0.0000

121111111

19

0.1092

0.1450

0.0358

0.0013

211111111

18

0.1034

0.0631

0.0404

0.0016

211111122

9

0.0517

0.0605

0.0088

0.0001

111211111

8

0.0460

0.0245

0.0214

0.0005

111111121

6

0.0345

0.0341

0.0004

0.0000

121122111

5

0.0287

0.0145

0.0143

0.0002

121111121

4

0.0230

0.0198

0.0032

0.0000

121211122

3

0.0172

0.0000

0.0172

0.0003

211122111

3

0.0172

0.0315

0.0143

0.0002

112111122

3

0.0172

0.0168

0.0004

0.0000

112111111

3

0.0172

0.0196

0.0023

0.0000

121111122

3

0.0172

0.0209

0.0037

0.0000

121211111

2

0.0115

0.0051

0.0064

0.0000

212111122

2

0.0115

0.0000

0.0115

0.0001

212111111

2

0.0115

0.0345

0.0230

0.0005

211221211

2

0.0115

0.0165

0.0050

0.0000

211111121

2

0.0115

0.0088

0.0027

0.0000

211211111

2

0.0115

0.0218

0.0103

0.0001

111121211

2

0.0115

0.0000

0.0115

0.0001

122211111

2

0.0115

0.0113

0.0002

0.0000

111122111

2

0.0115

0.0000

0.0115

0.0001

122111122

2

0.0115

0.0000

0.0115

0.0001

111211122

2

0.0115

0.0245

0.0130

0.0002

111111112

1

0.0057

0.0000

0.0057

0.0000

112211111

1

0.0057

0.0000

0.0057

0.0000

121222111

1

0.0057

0.0000

0.0057

0.0000

211211122

1

0.0057

0.0000

0.0057

0.0000

111221211

1

0.0057

0.0000

0.0057

0.0000

121111112

1

0.0057

0.0066

0.0009

0.0000

111222111

1

0.0057

0.0136

0.0078

0.0001

122111111

1

0.0057

0.0000

0.0057

0.0000

211122122

1

0.0057

0.0000

0.0057

0.0000

111121212

1

0.0057

0.0000

0.0057

0.0000

 

174

1.0000

0.9636

0.8196

0.0002

The IF (similarity index) and the mean squared error (MSE) for the two haplotype analyses are indicated at the bottom of the table. Those haplotypes present only in the MLOCUS analysis (less than 1 per cent frequency) are not included.

Table 4

Comparison of MLOCUS estimated to pedigree-phased haplotypes (n = 96) for six SNPs in 79 kb 'core' region of 17q11-12

No.

HAPLOTYPE

GP count

Frequency

MLOCUS

Similarity index

MSE

1

111111

85

0.4427

0.4519

0.0092

0.00008

2

111122

46

0.2396

0.2632

0.0236

0.00056

3

211111

20

0.1042

0.0934

0.0108

0.00012

4

111121

12

0.0625

0.0610

0.0015

0.00000

5

122111

10

0.0521

0.0503

0.0018

0.00000

6

211122

6

0.0313

0.0148

0.0164

0.00027

7

221211

4

0.0208

0.0254

0.0045

0.00002

8

111112

3

0.0156

0.0063

0.0093

0.00009

9

121211

2

0.0104

0.0046

0.0058

0.00003

10

222111

2

0.0104

0.0174

0.0070

0.00005

11

121212

1

0.0052

0.0065

0.0013

0.00000

12

122122

1

0.0052

0.0000

0.0052

0.00003

13

211112

0

0.0000

0.0052

0.0052

0.00003

  

192

1.0000

1.0000

0.9491

0.00010

Those haplotypes that are only present in the MLOCUS estimation results are denoted in italics.

Diplotypes were assigned to all individuals for which phase was established (n = 96) for the six SNPs in MPIF-1, PARC and MIP-1a. The most frequent diplotype combination included haplotypes 1 and 2, at 28 per cent in the CEPH grandparents. There was no significant deviation from Hardy-Weinberg proportions for the six-SNP multi-site genotypes, with a randomisation p-value of 0.1102. When analysing the SNPs individually for HWE, one SNP -- PARC (- 116) -- showed a sigmicant deviation using a χ2 test, at p = 0.012; which did not survive a Bonferroni multiple-test correction.

Validation of the EM algorithm on 3p21 and 17q11-12

To validate the accuracy of the EM algorithm, we compared the pedigree-derived haplotypes to those estimated haplotypes generated by MLOCUS. The 3p21 haplotype distributions were nearly identical to the estimated frequencies (Table 2). The similarity (IF) and identity (IH) indices were calculated for haplotypes in the CEPH grandparent sample (n = 103) for 14 SNPs. For the 14 SNP haplotypes in 3p21, as indicated in the Table 2, the similarity index (IF) was 0.9869. An IF of 1.0 would indicate perfect concordance between the haplotype frequencies generated by the two methods. The identity index (IH) for these data was exactly 1.0, as all haplotypes derived by pedigree analysis were present in the MLOCUS results. One estimated haplotype was dropped from the analysis, as it was below the frequency threshold of (1/2n = 0.004854); as suggested by Excoffier and Slatkin (1995) [37]. The MSE incorporates the overall difference in frequencies between actual (pedigree-derived) and estimated frequencies for all H haplotypes. The MSE for the 3p21 haplotypes was small (0.00001), which, again, indicates that the two frequency distributions are nearly identical.

As mentioned previously, phase could not be determined for the nine SNPs typed on chromosome 17q11-12 for all grandparents. Haplotype frequencies were determined, both by whole chromosome counting and by estimation, with data from 87 out of 103 individuals. The similarity index (IF) for the distribution of frequencies for the 43 haplotypes (nine SNPs) in this region is 0.8196, as indicated in Table 3. The haplotype estimation yielded 24 haplotypes with frequencies over the threshold value (1/(2n) = 0.0057); and missed 13 haplotypes that were present in the pedigree data. The IH statistic for these data is 0.7457. The MSE for the nine-SNP haplotypes is 0.0002, as indicated in Table 3. The EM algorithm also generated seven low frequency haplotypes (less than 1 per cent, not shown) that were not observed in the pedigree analysis. Constraining the MLOCUS analysis by removing these haplotypes did not significantly improve the MLE. This constrained analysis also resulted in the generation of other spurious low-frequency haplotypes, indicating that the EM algorithm could not effectively resolve haplotype phase for some individuals in the nine-SNP dataset.

Not surprisingly, paring the analysis down to the six SNPs in the 77 kb region that contains MPIF-1, PARC and MIP-1 α yields more accurate haplotype estimates. Ninety-six grandparents were included in this analysis, as phase could not be determined for seven of the 103 individuals in the total sample. As indicated in Table 4, the IF statistic increased to 0.9491, and the IH of 0.9167 is closer to perfect identity (1.0). The MSE is also closer to zero, at 0.0001.

Comparisons of MLOCUS haplotype estimates for 17q11-12

Omitting the unphased chromosomes from the pedigree haplotype frequency calculation of the 17q11-12 SNPs is a potential source of bias, as those individuals for whom complete resolution is not possible may have a higher per site heterozygosity than randomly sampled individuals. Additionally, those 'unphasable' individuals may carry haplotypes that are not present in the phased portion of the sample. To test if using only the phased individuals generates skewed 'pedigree-derived' 17q11-12 haplotype frequencies, MLOCUS haplotype frequency estimates were generated from both the total dataset of unphased genotypes (n = 103; data not shown) and those genotypes only from the phased individuals -- n = 87, for the nine-SNP haplotypes (Table 3), and n = 96 for the six-SNP haplotypes (Table 4). Comparisons of nine-SNP MLOCUS haplotypes (above 1 per cent frequency) from the whole sample (n = 103) and the phased sample (n = 87) yielded an IH of 0.9729, an IF of 0.9313 and an MSE of 0.00007. The same comparison performed on the six-SNP haplotypes yielded an Ih of 1, an IF of 0.9838 and an MSE of 0.00002. One nine-SNP haplotype present in the total sample (at a frequency of 0.015) was missed in the 'phased-only' sample, while in the six-SNP analysis, both sets of genotypes generated identical haplotypes. The potential bias of removing the unphased grandparents from the haplotype analyses appears to be slight, as the index values indicate that the haplotype frequencies generated by the two datasets (the complete sample and the 'phased-only' sample) are very similar, particularly for the six-SNP haplotypes.

Comparisons of methods to estimate linkage disequilibrium

Both phased haplotypes and unphased genotype data from the CEPH grandparents (n = 103) were used to estimate the extent of pairwise linkage disequilibrium (described by D') between SNPs in the chemokine receptor region on chromosome 3p21 and the chemokine cluster on chromosome 17q11-12. The D' statistic (above the diagonal) and the measure of statistical significance (p-value) (below the diagonal) are presented for pairwise comparisons of the 11 polymorphic sites in 3p21 in Table 5. Negative values indicate that there is disequilibrium between opposite alleles at the two SNPs (ie allele 1 at the first SNP and allele 2 at the second SNP, where the common allele is allele 1).
Table 5

Estimated D' values generated by two methods for all polymorphic loci in the 3p2l chemokine receptor gene region in the CEPH sample

Numbers above the diagonal for each table indicate the pairwise D' value for pairs of SNPs in the CEPH grandparent sample (n = 103). p values for each test are indicated below the diagonal. Values in the upper table (A) indicate D' values calculated in DnaSP from the pedigree-derived haplotypes. Values in the lower table (B) indicate D' values calculated using the PAIRWISE program from the MLOCUS haplotype frequency estimates. Those values in boxed cells indicate non-significant results. D' estimates in the lower table denoted in italics indicate differences from the values generated in DnaSP for that particular comparison of loci.

The D'-values generated from analyses of the 3p21 polymorphisms by the DnaSP and PAIRWISE programs were, for the most part, very similar. The three differences, noted in bold, are slight. As discussed previously, the haplotypes generated by the EM algorithm were essentially identical when compared with those discerned by pedigree analysis for the variants in this region. The analysis of both the haplotypes and the unphased genotype data indicated that linkage disequilibrium in this 150 kb region of 3p21 is high in the CEPH grandparents. There is intact linkage disequilibrium (D' = 1) between two SNPs at the extremes of the region (CCR3-Y17Yand CCRL2-I243V), preserved primarily on haplotype 4 (211111121211121). The relative loss of linkage disequilibrium in the centre of the region, between CCR2-N260N and two SNPs in the CCR5 promoter, 208 (D' = 0.326) and CCR5-676 (D' = 0.326), was detected by haplotype block analysis, indicating past recombination between these two genes.

It is not surprising that the DnaSP analysis of haplotypes on 17q11-12 indicated no evidence of long-range linkage disequilibrium between variants at the extremes of the 2 MB region. There is significant linkage disequilibrium between the SNPs typed in MCP-1 and nearby Eotaxin (D' = -1) at the centromeric end of the region. Likewise, there is some significant allelic association between SNPs in MIP-1α, PARC and MPIF-1, which are within 77 kb of each other. The relative lack of association between more distal SNPs seems to have hampered the ability of the PAIRWISE analysis of unphased genotype data to accurately detect the extent of linkage disequilibrium, when compared with the DnaSP analysis of whole haplotypes. This lack of sensitivity is especially evident in the analyses of all nine SNPs, as the multitude of haplotypes (including spurious haplotypes generated by the EM estimation) created false-positive associations between distal variants (such as between MCP-1 and SNPs in PARC) (Tables 6 and 7).
Table 6

Estimated D' values generated by two methods for all nine SNPs in the 2 Mb chemokine gene region on chromosome I7ql 1-12 in CEPH grandparents (n = 87)

Numbers above the diagonal for each table indicate the pairwise D' value for pairs of SNPs in the CEPH grandparent sample, p values for each test are indicated below the diagonal. Values in the upper table (A) indicate D' values calculated in DnaSP from the pedigree-derived haplotypes. Values in the lower table (B) indicate D'-values calculated using the PAIRWISE program from the MLOCUS haplotype frequency estimates. Those values in boxed cells indicate non-significant results. D' estimates in the lower table denoted in italics indicate differences from the values generated in DnaSP for that particular comparison of loci.

Table 7

Estimated D' values generated by two methods for six SNPs in the 79 kb 'core' region of three chemokine genes on chromosome 17q11-12 in CEPH

Numbers above the diagonal for each table indicate the pairwise D' value for pairs of SNPs in the CEPH grandparent sample (n = 96). p values for each test are indicated below the diagonal. Values in the upper table (A) indicate D' values calculated in DnaSP from the pedigree-derived haplotypes. Values in the lower table (B) indicate D' values calculated using the PAIRWISE program from the MLOCUS haplotype frequency estimates. Those values in boxed cells indicate non-significant results. D' estimates in the lower table denoted in italics indicate differences from the values generated in DnaSP for that particular comparison of loci.

Discussion

Given the potential accuracy of low-cost statistical methods, and the current high cost of molecular haplotyping and pedigree analysis, statistical estimation to determine haplotypes may be a cost-effective strategy for many gene regions. As a minimum, statistical estimation can be used to determine the overall need for molecular haplotyping and to specify where in the dataset molecular haplotyping would provide the most benefit [4143]. Independent assessments of the effectiveness of the EM algorithm have been discussed at length [38, 44, 45]. Xu et al. (2002) discuss a comparison of three computational algorithms for estimating haplotype frequencies: the Clark (1990) rule-based method [47], the EM algorithm and the Stephens et al. (2001) Bayesian PHAS method [48]. Using previously described criteria [37], Xu et al. ound that all three methods performed better for regions with a high degree of linkage disequilibrium, such as in the NAT2 gene, than for regions where linkage disequilibrium is not maintained (chromosome 8p22) when compared with haplotypes.determined by molecular methods [46].

The purpose of the evaluation presented here is to establish the accuracy of statistical estimation in these chemokine and chemokine receptor gene clusters. Estimated haplotypes from unphased genotypes were compared with haplotypes derived empirically from pedigree analysis in the CEPH grandparent sample (n = 103). How the EM algorithm responds to irregular linkage disequilibrium, sample size, different levels of polymorphism and deviations from HWE is critical for the effectiveness of haplotype estimation [38]. These conditions will be affected by the genomic environment of the region of interest, the history of the population from which the samples were selected and the quality of the genotype data. While these validation results cannot control for all these variables, an attempt was made to explore how the EM algorithm responds to the conditions of the gene clusters studied on chromosomes 3p21 and 17q11-12 in a European-derived sample set.

A greater degree of linkage disequilibrium between SNPs, and therefore fewer haplotypes, increases the accuracy of the EM algorithm and aids subsequent estimates of measures of linkage disequilibrium (such as D'). This is evident from the results of estimations of haplotype frequency and linkage disequilibrium in the 150 kb region on 3p21. Relatively few haplotypes explain the variation between these SNPs, at least in the CEPH grandparent sample. Indeed, there is intact linkage disequilibrium at the extremes of this region, as CCR3-Y17Y and CCRL2-I243V have a pairwise D'-value of 1. The haplotype block analysis also indicates a fairly simple structure, as both tests applied here found only two blocks, with what appeared to be a past recombination event between CCR2 and CCR5.

The degree of linkage disequilibrium between SNPs is one of the most important factors in the ability of the EM algorithm to properly detect haplotypes in population samples [38, 44]. The analysis presented here shows that the EM algorithm accurately describes the haplotype structure and patterns of pairwise linkage disequilibrium on chromosome 3p21 (a region of higher linkage disequilibrium). As for chromosome 17, it is important to note that, because of the relatively few SNPs assessed (a total of nine), this analysis is a low resolution evaluation of haplotypes and linkage disequilibrium across a large region (2Mb). While including only the 'core' region of 17q11-12 yields more accurate estimates of haplotype frequencies and linkage disequilibrium, these analyses still include a relatively sparse sampling of SNPs (six in 77 kb). The results of the pedigree analysis indicate that, while haplotype estimations in the chemokine receptor cluster on 3p21 may be fairly straightforward, special care must be taken for any haplotype inference in the chemokine genes on chromosome 17. More SNP genotype data, especially in the chromosome 17 chemokine genes, will no doubt aid in further characterisation of variation and linkage disequilibrium in these gene regions, as well as improve the accuracy of future haplotype analyses.

Declarations

Acknowledgements

The authors would like to thank Bert Gold, Jim Lautenberger, Raymond Peterson and George Nelson for helpful discussion of haplotype analysis. Bill Modi, Noah Metheny and Julie Bergeron provided technical assistance with some TaqMan assays, and the LGD Cell Repository aided with DNA extraction. Jeff Long provided software and Carrie Pfaff performed necessary program modifications for this analysis. We also wish to thank the two anonymous reviewers for helpful comments.

Authors’ Affiliations

(1)
Laboratory of Genomic Diversity, Human Genetics Section, National Cancer Institute
(2)
Department of Human Genetics, University of Chicago, 515 CHSC

References

  1. Bazan JF, Bacon KB, Hardiman G, et al: 'A new class of membrane-bound chemokine with a CX3C motif'. Nature. 1997, 385: 640-644. 10.1038/385640a0.View ArticlePubMedGoogle Scholar
  2. Pan Y, Lloyd C, Zhou W, et al: 'Neurotactin, a membrane-anchored chemokine upregulated in brain inflammation [published erratum appears in Nature (1997), Vol. 389, p.100]'. Nature. 1997, 387: 611-617. 10.1038/42491.View ArticlePubMedGoogle Scholar
  3. Yoshida T, Imai T, Kakizaki H, et al: 'Molecular cloning of a novel C or gamma type chemokine, SCM-1'. FEBS Lett. 1995, 360: 155-159. 10.1016/0014-5793(95)00093-O.View ArticlePubMedGoogle Scholar
  4. Yoshida T, Imai T, Kakizaki H, et al: 'Identification of single C motif-1/lymphotactin receptor XCR1'. J Biol Chem. 1998, 273: 16551-16554. 10.1074/jbc.273.26.16551.View ArticlePubMedGoogle Scholar
  5. Cocchi F, DeVico AL, Garzino-Demo A, et al: 'Identification of RANTES, MIP-1 alpha, and MIP-1 beta as the major HIV-suppressive factors produced by CD8 + T cells'. Science. 1995, 270: 1811-1815. 10.1126/science.270.5243.1811.View ArticlePubMedGoogle Scholar
  6. Dean M, Carrington M, Winkler C, et al: 'Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene. Hemophilia Growth and Development Study, Multicenter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, San Francisco City Cohort, ALIVE Study'. Science. 1996, 273: 1856-1862. 10.1126/science.273.5283.1856.View ArticlePubMedGoogle Scholar
  7. Smith MW, Carrington M, Winkler C, et al: 'CCR2 chemokine receptor and AIDS progression'. Nat Med. 1997, 3: 1052-1053.View ArticlePubMedGoogle Scholar
  8. Kostrikis LS, Huang Y, Moore JP, et al: 'A chemokine receptor CCR2 allele delays HIV-1 disease progression and is associated with a CCR5 promoter mutation'. Nature Med. 1998, 4: 350-353. 10.1038/nm0398-350.View ArticlePubMedGoogle Scholar
  9. Martin MP, Dean M, Smith HW, et al: 'Genetic acceleration of AIDS progression by a promoter variant of CCR5'. Science. 1998, 282: 1907-1911.View ArticlePubMedGoogle Scholar
  10. Winkler C, Modi W, Smith HW, et al: 'Genetic restriction of AIDS pathogenesis by an SDF-1 chemokine gene variant. ALIVE Study, Hemophilia Growth and Development Study (HGDS), Multicenter AIDS Cohort Study (MACS), Multicenter Hemophilia Cohort Study (MHCS), San Francisco City Cohort (SFCC)'. Science. 1998, 279: 389-393. 10.1126/science.279.5349.389.View ArticlePubMedGoogle Scholar
  11. An P, Martin MP, Nelson GW, et al: 'Influence of CCR5 promoter haplotypes on AIDS progression in African-Americans'. Aids. 2000, 14: 2117-2122. 10.1097/00002030-200009290-00007.View ArticlePubMedGoogle Scholar
  12. Strieter RM, Polverini PJ, Arenberg DA, et al: 'Role of C-X-C chemokines as regulators of angiogenesis in lung cancer'. J Leukoc Biol. 1995, 57: 752-762.PubMedGoogle Scholar
  13. Arenberg DA, Kunkel SL, Polverini PJ, et al: 'Interferon-gamma-inducible protein 10 (IP-10) is an angiostatic factor that inhibits human non-small cell lung cancer (NSCLC) tumorigenesis and spontaneous metastases'. J Exp Med. 1996, 184: 981-999. 10.1084/jem.184.3.981.View ArticlePubMedGoogle Scholar
  14. Moore BB, Arenberg DA, Addison CL, et al: 'Tumor angiogenesis is regulated by CXC chemokines'. J Lab Clin Med. 1998, 132: 97-103. 10.1016/S0022-2143(98)90004-X.View ArticlePubMedGoogle Scholar
  15. Wang JM, Chertov O, Proost P, et al: 'Purification and identification of chemokines potentially involved in kidney-specific metastases by a murine lymphoma variant: Induction of migration and NFKB activation'. Inlt J Cancer. 1998, 75: 900-907. 10.1002/(SICI)1097-0215(19980316)75:6<900::AID-IJC13>3.0.CO;2-6.View ArticleGoogle Scholar
  16. Muller A, Homey B, Soto H, et al: 'Involvement of chemokine receptors in breast cancer metastasis'. Nature. 2001, 410: 50-56. 10.1038/35065016.View ArticlePubMedGoogle Scholar
  17. Gordon D, Abajian C, Green P, et al: 'Consed: A graphical tool for sequence finishing'. Genome Res. 1998, 8: 195-202.View ArticlePubMedGoogle Scholar
  18. Kwok PY, Carlson C, Yager JD, et al: 'Comparative analysis of human DNA variations by fluorescence-based sequencing of PCR products'. Genomics. 1994, 23: 138-144. 10.1006/geno.1994.1469.View ArticlePubMedGoogle Scholar
  19. Ewing B, Green P: 'Base-calling of automated sequencer traces using phred. II. Error probabilities'. Genome Res. 1998, 8: 186-194.View ArticlePubMedGoogle Scholar
  20. Ewing B, Hillier L, Wendl HC, et al: 'Base-calling of automated sequencer traces using phred. II. Accuracy assessment'. Genome Res. 1998, 8: 175-185.View ArticlePubMedGoogle Scholar
  21. Nickerson DA, Tobe VO, Taylor SL, et al: 'PolyPhred: Automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing'. Nucleic Acids Res. 1997, 25: 2745-2751. 10.1093/nar/25.14.2745.PubMed CentralView ArticlePubMedGoogle Scholar
  22. NTH (2001-2003), dbSNP: National Center for Biotechnology Information, National Institutes of Health, [http://www.ncbi.nlm.nih.gov/SNP/]
  23. Whitehead Institute (2001-2003), Primer 3.0: Whitehead Institute, [http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi]
  24. Genome Sequencing Centre (2001), Exo-SAP Protocol: Genome Sequencing Center, Washington University, [http://genome.wustl.edu/tools/protocols/]
  25. Clark VJ, Metheny N, Dean M, Peterson R: 'Statistical estimation and pedigree analysis of CCR2-CCR5 haplotypes'. Hum Genet. 2001, 108: 484-493. 10.1007/s004390100512.View ArticlePubMedGoogle Scholar
  26. Morin PA, Saiz R, Monjazeb A: 'High-throughput single nucleotide polymorphism genotyping by fluorescent 5' exonuclease assay'. Biotechniques. 1999, 544: 538-540. 542, 544.Google Scholar
  27. O'Connell JR, Weeks DE: 'PEDCHECK: A program for identification of genotype incompatibilities in linkage analysis'. Am J Hum Genet. 1998, 63: 259-266. 10.1086/301904.PubMed CentralView ArticlePubMedGoogle Scholar
  28. Guo SW, Thompson EA: 'Performing the exact test of Hardy-Weinberg proportion for multiple alleles'. Biometrics. 1992, 48: 361-372. 10.2307/2532296.View ArticlePubMedGoogle Scholar
  29. Long JC, Williams RC, Urbanek M: 'An E-M algorithm and testing strategy for multiple-locus haplotypes'. Am J Hum Genet. 1995, 56: 799-810.PubMed CentralPubMedGoogle Scholar
  30. Long JC: 'Multiple locus haplotype analysis (MLOCUS, OBSHAP, PAIRWISE), software and documentation distributed by the author'. 1999, Bethesda, MD, Section on Population Genetics and Linkage, Laboratory of Neurogenetics, NIAAA, National Institutes of HealthGoogle Scholar
  31. Dempster AP: 'Maximum-likelihood from incomplete data via the EM algorithm'. JR Stat Soc B. 1977, 39: 1-38.Google Scholar
  32. Peterson RJ, Goldman D, et al: 'Effects of worldwide population subdivision on ALDH2 linkage disequilibrium'. Genome Res. 1999, 9: 844-852. 10.1101/gr.9.9.844.PubMed CentralView ArticlePubMedGoogle Scholar
  33. Zhang K, Jin L: 'HaploBlockFinder: Haplotype block analyses'. Bioinformatics. 2003, 19: 1300-1301. 10.1093/bioinformatics/btg142.View ArticlePubMedGoogle Scholar
  34. Wang N, Akey JM, Zhang K, et al: 'Distribution of recombination crossovers and the origin of haplotype blocks: The interplay of population history, recombination, and mutation'. Am J Hum Genet. 2002, 71: 1227-1234. 10.1086/344398.PubMed CentralView ArticlePubMedGoogle Scholar
  35. Daly MJ, Rioux JD, Schaffner SF, et al: 'High-resolution haplotype structure in the human genome'. Nat Genet. 2001, 29: 229-232. 10.1038/ng1001-229.View ArticlePubMedGoogle Scholar
  36. Gabriel SB, Schaffner SF, Nguyen H, et al: 'The structure of haplotype blocks in the human genome'. Science. 2002, 296: 2225-2229. 10.1126/science.1069424.View ArticlePubMedGoogle Scholar
  37. Excoffier L, Slatkin M: 'Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population'. Mol Biol Evol. 1995, 12: 921-927.PubMedGoogle Scholar
  38. Fallin D, Schork NJ: 'Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data'. Am J Hum Genet. 2000, 67: 947-959. 10.1086/303069.PubMed CentralView ArticlePubMedGoogle Scholar
  39. Rozas J, Rozas R: 'DnaSP version 3: An integrated program for molecular population genetics and molecular evolution analysis'. Bioinformatics. 1999, 15: 174-175. 10.1093/bioinformatics/15.2.174.View ArticlePubMedGoogle Scholar
  40. Lewontin RC: 'The interaction of selection and linkage. I. General considerations: Heterotic models'. Genetics. 1964, 49: 49-67.PubMed CentralPubMedGoogle Scholar
  41. Michalatos-Beloin S, Tishkoff SA, Bentley KL, et al: 'Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR'. Nucleic Acids Res. 1996, 24: 4841-4843. 10.1093/nar/24.23.4841.PubMed CentralView ArticlePubMedGoogle Scholar
  42. Tishkoff SA, Dietzsch E, Speed W, et al: 'Global patterns of linkage disequilibrium at the CD4 locus and modern human origins'. Science. 1996, 271: 1380-1387. 10.1126/science.271.5254.1380.View ArticlePubMedGoogle Scholar
  43. Clark AG, Weiss KM, Nickerson DA, et al: 'Haplotype structure and population genetic inferences from nucleotide sequence variation in human lipoprotein lipase'. Am J Hum Genet. 1998, 63: 595-612. 10.1086/301977.PubMed CentralView ArticlePubMedGoogle Scholar
  44. Tishkoff SA, Pakstis AJ, Ruano G, Kidd KK: 'The accuracy of statistical methods for estimation of haplotype frequencies: An example from the CD4 locus'. Am J Hum Genet. 2000, 67: 518-522. 10.1086/303000.PubMed CentralView ArticlePubMedGoogle Scholar
  45. McKeigue PM: 'Efficiency of estimation of haplotype frequencies: Use of marker phenotypes of unrelated individuals versus counting of phase-known gametes'. Am J Hum Genet. 2000, 67: 1626-1627. 10.1086/316912.PubMed CentralView ArticlePubMedGoogle Scholar
  46. Xu W, Tse HF, Chan FH, et al: 'New Bayesian discriminator for detection of atrial tachyarrhythmias'. Circulation. 2002, 105: 1472-1479. 10.1161/01.CIR.0000012349.14270.54.View ArticlePubMedGoogle Scholar
  47. Clark AG: 'Inference of haplotypes from PCR-amplified samples of diploid populations'. Mol Biol Evol. 1990, 7: 111-122.PubMedGoogle Scholar
  48. Stephens M, Smith NJ, Donnelly P, et al: 'A new statistical method for haplotype reconstruction from population data'. Am J Hum Genet. 2001, 68: 978-989. 10.1086/319501.PubMed CentralView ArticlePubMedGoogle Scholar

Copyright

© Henry Stewart Publications 2004

Advertisement