A comprehensive literature review of haplotyping software and methods for use with unrelated individuals

Interest in the assignment and frequency analysis of haplotypes in samples of unrelated individuals has increased immeasurably as a result of the emphasis placed on haplotype analyses by, for example, the International HapMap Project and related initiatives. Although there are many available computer programs for haplotype analysis applicable to samples of unrelated individuals, many of these programs have limitations and/or very specific uses. In this paper, the key features of available haplotype analysis software for use with unrelated individuals, as well as pooled DNA samples from unrelated individuals, are summarised. Programs for haplotype analysis were identified through keyword searches on PUBMED and various internet search engines, a review of citations from retrieved papers and personal communications, up to June 2004. Priority was given to functioning computer programs, rather than theoretical models and methods. The available software was considered in light of a number of factors: the algorithm(s) used, algorithm accuracy, assumptions, the accommodation of genotyping error, implementation of hypothesis testing, handling of missing data, software characteristics and web-based implementations. Review papers comparing specific methods and programs are also summarised. Forty-six haplotyping programs were identified and reviewed. The programs were divided into two groups: those designed for individual genotype data (a total of 43 programs) and those designed for use with pooled DNA samples (a total of three programs). The accuracy of programs using various criteria are assessed and the programs are categorised and discussed in light of: algorithm and method, accuracy, assumptions, genotyping error, hypothesis testing, missing data, software characteristics and web implementation. Many available programs have limitations (eg some cannot accommodate missing data) and/or are designed with specific tasks in mind (eg estimating haplotype frequencies rather than assigning most likely haplotypes to individuals). It is concluded that the selection of an appropriate haplotyping program for analysis purposes should be guided by what is known about the accuracy of estimation, as well as by the limitations and assumptions built into a program.


Introduction
The completion of the human genome project marks as ignificant milestone in genetic research, ushering in an era of research opportunitiesi nt he application of genomic technologies to medical and public health problems. [1][2][3] One area of application involves the identification and characterisation of DNA sequence variation and its relationship (or association) with, for example,disease susceptibility.Manyinitiatives have been put in place to facilitate relevant association studies, but them ost important is the International HapMap Project (IHP). 4 Thea ssignmenta nd analysiso fh aplotype frequencies (ie the number of times alleles at different loci are observedt ogether on the same chromosome in as ample of individuals) can not only lead to estimates of linkage disequilibrium (LD) strength, butc an alsob eu sed as the basis for a number of additional phenomenaa nd analyses -s uch as the comparison of population genetics structures (eg immigration rates, genetic distances, etc), the consideration of chromosome phylogenya nd the estimation of thea ge of mutations. [5][6][7][8][9][10][11][12][13][14][15] Moreover, the use of haplotypesm ay result in considerable savingsi nt erms of genotyping costsa nd powero fa na ssociation study. 16 -18 Unfortunately,m any current genotyping technologies are unable to resolvet he phase of maternal and paternal chromosomes in unrelated individuals, and hencet he actual haplotypes an individual possesses mayb ei nd oubt. This ambiguity is referred to as the 'haplotype problem', and its REVIEW PAPER q HENRYSTEWART PUBLICATIONS 1479-7364. HUMANGENOMICS .V OL 2. NO 1. 39-66 MARCH 2005 complexity increases exponentially with the number of loci being studied. Although there are technologies that can be used to unambiguously resolvep hase at the chromosome or DNA level, they tend to be cost prohibitive. 19 -24 Haplotype analysisi nvolving related individuals (individuals collected from families and/or pedigrees) potentially offersm ore information and certain advantages compared with analysis involving unrelated individuals. Familyb ased analysisi mposes additional challenges and mayn ot be suitable for all study designs or research objectives. 5,25 -27 Acompanionreview that focuseso nc omputer programs and issues related to haplotype analyses involving related individuals will follow. 28 Statistical procedures are therefore required to both estimateh aplotype frequencies and assign the most likely haplotypest ou nrelated individuals from genotype data. 23,29,30 In this paper,a vailable computer programs for haplotype frequency estimation will be considered as well as assignmento fh aplotypesi nvolving unrelated individuals. The paper builds on an earlier review, 31 recent discussions of relevant algorithms 32,33 and articles comparing different procedures. 30,34 -36 Some simple recommendations are made for addressing specific research questions usinga vailable software.F inally,w eb-based summaries of these evaluation area vailable and provide greater detail than that outlined here (URL: http://polymorphism. ucsd.edu/HapSoftwareReview/).
The methods, features and limitations of the identified programs were evaluated usingt he originalp ublished articles describingt he methods, the manuals associated with the software and articles comparing programs and methodologies. The assessments provided here,b uild on an earlier review, 31 published discussions of algorithmsf or haplotype analysis 32,33 and articles contrasting different methodologies. 30,34 -36 Accuracy of the methods used for estimating haplotype frequencies and assigningh aplotypes to individuals wasc onsidered to be of particular importance.I deally,v alidation of an indirect (ie statistically-based) haplotyping method shouldb ec ompared with direct, DNA sequence-derived haplotype information. Although studies with simulated data are also informative, allowing discrimination of program performance under av ariety of situations, without a' gold standard'f or comparison purposes it is hard to assess the true reliability of am ethod. The large number of reviewed programs precludes systematic testing of the identifiedp rograms' accuracy,p erformance and claims. Thee valuation of this largeg roup of programs is complicated by the diversity of methods used, measures of reliability algorithms used, varying datasets and assumptions and program characteristics which limit or prevent ap rogram from working in all instances. The authorsh avee ndeavoured to provideathorough review of the literature of haplotyping software in unrelated individuals, but it is acknowledged that not all original authors' claims have been validated( Supplemental Ta ble S-A provides ab rief summaryo fr eviewed articles in which programs were actually compared).T hus, there is ar eliance on some authors' claims that have not been independently verified. The majority of identifiedp rograms are freely available to academica nd non-profit users. Finally, recommendations areprovided for specific research objectives.

Evaluation criteria
The identifiedc omputer programs were evaluatedo nt he basis of an umber of criteria and/or softwaref eatures. Many of thesef eatures and criteriaw ere considered because they reflect items that should guide the use of particular haplotyping software.
1. Algorithms and methods: the analyticalm ethods and algorithms implemented in the available programs arec onsidered. Essentially,a lgorithms can be divided into two broad classes: parsimony and likelihood methods. 2. Accuracy: thea ccuracy of haplotyping algorithms is considered in terms of the algorithms ability to assess haplotype frequencies from as ample of unrelated individuals, as well as to assign haplotypest op articular individuals. Measures of accuracy ared iscussed briefly in the accuracy section and are detailed on the abovementioned website (see supplementaryT able S-B). 3. Assumptions: haplotyping programs often make assumptions about,f or example,H ardy-Weinberg equilibrium (HWE), LD,p opulationh istorya nd recombination. These assumptions can have an impact on the accuracy of haplotype frequency estimates and assignments. 4. Genotypinge rror: the accommodationo fg enotyping error in haplotype inference is considered. Programs that identify and accommodate genotyping errors are noted. 5. Hypothesis testing: not all programs have the ability to conduct statistical tests of hypotheses, so this feature is considered as well. 6. Missing data: the accommodation of missing data in haplotype analysisi sc onsidered. 7. Software characteristics: issues related to the usability of programs are considered, including computer system requirements, input data formats, interfaces, output, run time and sample size. 8. Webi mplementation: web-based implementations of available computer programs are considered.

Results
Forty-six haplotyping programs were identifieda nd reviewed. The programs were divided into twog roups: those designed for analyses involving individual genotype data from unrelated individuals (a total of 43 programs) and those designed for analysiso fD NA pools (three total programs). An overview of reviewed programs is presented in Ta bles 1-4 and in Supplemental Ta bles S1-S4, S-A and S-B: (http:// polymorphism.ucsd.edu/HapSoftwareReview/). Additional information on the softwarep rograms discussed in this paper, links and contact information for programs, all supplemental tables, updates to existing softwareand newly released software are available at the following website: http://polymorphism. ucsd.edu/HapSoftwareReview/. The majority of identifiedp rograms for estimating haplotype frequencies and assigning them to individuals use methods rooted in likelihood theory( eg for estimation purposes -p rimarily the maximum likelihood approach). From as urveyo ft he literature,i ta ppearst hat most of the programs give similar results, although performance is not alwaysc onsistent. No group or individual program appears to work well in all situations, or have all thef eatureso ne might liket os ee implemented in ah aplotype analysisp rogram. It appearst hat accuracy and performance area ffected by the characteristics of the data to be analysed and the characteristics of the population from which thei ndividuals are sampled.

Haplotypingi nu nrelated populations
Algorithms and methods.A number of different analytical methods have been proposed for haplotype analysisi nvolving unrelated individuals (see Ta ble 1and Supplemental Ta ble S1). Ultimate classification of haplotyping algorithms is difficult, since implemented algorithms are often modified and combined in programs. Ab road classification can be made,h owever,b etween algorithms based on parsimony and algorithms based on likelihood theory. An overview of each of these classes is provided below.
Methods based on parsimony: in 1990, Clark 37 proposed an innovativem ethod of constructing haplotypesu singa rule-based algorithm. This simple method uses the frequencies of individuals whose haplotypesa re known with certainty (eg individuals homozygous at everyl oci) to drawi nferences about the most likely haplotypes for individuals whose haplotypes are ambiguous, givent heir genotype data. HAPI-NFREX, which employsC lark'sm ethod, is computationally fast and efficienta nd has been used in ag reat deal of research 14,38 Limitationso ft he method include the requirement of unambiguous individuals in the study population, sensitivity to the order in which data area nalysed,t he inabilityt oa ssign haplotypes to all individuals and potentially erroneous haplotype assignments. 37,39 To overcome these limitations, ap ure parsimonye xtension, using integer linear programming, has been proposed 40,41 and implemented in the program HAPAR. 39 Extensions of parsimonym ethods take advantage of the 'perfect phylogeny framework'. 40 These programs apply the results of recent research that indicates that recombination is uncommon within LD blocks 16 -18 for efficient and effectiveh aplotype analysis. Perfect phylogeny haplotyping (PPH) reduces the haplotype analysisp roblem to ap hylogenyp roblem 40 by makingt he assumptions of no recombination and infinite site mutations. Along this framework, unphased genotype dataa re reduced to a' graph realisation problem' and solved usingm etroid theorya nd graph analysisi nG PPH, althoughau nique solution is not guaranteed. 40,42 Asimpler alternativemethod based on graph analysis is employedb yD PPH. 43 Since empirical data mayv iolate the prefect phylogeny assumption, 44 thea ssumption is relaxed in the 'perfect phylogeny' model implemented in HAP H44 and BPPH. 45 HAP H constructs haplotypeswithin LD blocks using am aximum likelihood method.
Methods based on likelihood theory: the majority of programs that could be located are rooted in likelihood theory. Methods that exploit likelihood theoryc an be further broken down into maximum likelihood and Bayesian methods. The expectation maximisation (EM) algorithm is the most widely used haplotyping algorithm based on likelihood theory. In 1995, three research groupss eparately implemented and published EM-based haplotyping programs, 3locus.PAS, 46 HAPLO H47 and MLHAPFRE. 48 Excoffier and Slatkin 48 present adiscussionofthe challenges and limitationsofapplying the EM algorithm to haplotype analysis. In brief,t he EM method has twop arts, alikelihood function usingi nitial parameter inputs and estimating sets of haplotypesthat maximise the posterior probabilities of givengenotypes. The estimates are iteratively updated to maximise the likelihood function.
The EM algorithm has been shown to be accurate via simulations, 49 and produces haplotype frequency estimates comparable to molecular haplotype frequencies. 23,29,30 Moreover,m uch of the errori nh aplotype frequency estimation associated with the EM algorithm has been found to be due to sampling error. 29,40 The EM algorithm mayo ccasionally miscall rare or lowfrequency haplotypes. 29,30,49,50 Accuracy of the EM algorithm improves with increasing sample size. 49 The EM algorithm does have somel imitations: it mayc onverge to an on-global maximum, requiring restarts to ensure that a global maximum is reached 48,49 and it can maked emands on memoryr equirements that mayl imit its utility with large numberso fs ubjectsa nd datasets. 48,51 Va riants of the EM algorithm have been developed that allowt he EM algorithm to overcomes omeo ft hese constraints. The SNPHAP program handles the limitations by progressively expanding the subsets of markersand eliminating lowf requency haplotypes from consideration at each step (refereed to as posterior and prior 'trimming'). 52 The THESIAS program uses as tochastic variant of the EM algorithm to overcome many of its limitations. 53  the PL-EM program combines ap artition-ligation (PL)s trategy with the EM algorithm to allowh aplotyping of hundreds of loci. 54 -56 The HPLUS program combines the EM likelihood function witha ne stimating equation and the PL model to efficiently handle construction of large haplotypesw ith missing data. 55 The second class of likelihood algorithms areb ased on Bayesian estimatorsa nd Bayesian-based numerical strategies, such as Gibbs sampling. 51,57 -61 Bayesian methods use different models or prior assumptions to model haplotype frequencies, and as such can be tailored to different settings, thereby improving its accuracy.B ayesian haplotype analysism ethods can be further subdivided into 'simple' and 'coalescent-based' methods. The simple methods maken oa ssumptiona bout the historyo ft he populations from which samples of individuals have been drawn.Simple Bayesian programs include HAPLO-TYPERa nd HAPLOREC. HAPLOTYPER uses as tatistical method similar to EM. 57 HAPLOREC implements a Bayesian method using aV ariable Length Markov Chain chain approach. 62 The coalescent-based Bayesian methods essentially take similarities between and among haplotypes into account.T his classi ncludes the widely-used program, PHASE. The latestv ersion of PHASE( v2.0) incorporates an updateda lgorithm to improvea ccuracy and the PL algorithm to improvep erformance time. 59 Am odified model, the neutral coalescent model,i si mplemented in SLHAP v1.0. 58 SLHAP v1.0 builds on PHASEv 1.0 to include modifications to improvec omputation time and to accommodate missing data. 58 Finally,A rlequin (version 3.0) draws on the coalescent model, exploitingar elaxed definitionf or similar haplotypes in an adaptivew indowa pproach. 60 Accuracy.T he accuracy of available programs wasa ssessed through consideration of published articles investigating haplotype frequency estimation and assignmenta ccuracy,i ncluding comparisons to molecular and simulated haplotype data. The measurement of the accuracy of ah aplotyping method necessitates ac omparison, comparing observedh aplotype assignments and/or frequency estimates to expected haplotypes. The 'gold standard' for comparison is DNA sequencederived haplotype information. Thea dvantage of using accuratem olecular haplotype data is that no assumptions, guiding, for example,s imulations, are specified. The accuracy of as pecific program is not influenced or biased by assumptions imposed in simulated data. Additional testing, including the discrimination of program performance under av arietyof situations and assumptions is facilitated withu se of simulated data.
Comparison of accuracy between haplotyping programs is a taxing venture,complicated by avariety of issues. Asignificant challenge is that most programs have not been directly compared withe ach other (Supplemental Ta ble S-A provides a brief overview of retrieved articles that compared accuracy and performance of programs). Only as mall set of programs are compared in each individual paper.C omparison of accuracy and performance of these select programs is often carried out with different datasetsa nd under varyingc onditions.
Af urther challenge is that numerous measures have been used to assess accuracy,a nd these vary across publications, which aredescribed in the reviewed literature.Inbrief,several measures of global accuracy of frequency estimates/assignments were found: discrepancy,error rate,mean square error(MSE), similarity index I f and similarity index I S ,inaddition to several measures comparing similarity of incorrect haplotype assignments to true haplotypes: hamming distance 'errorr ate H', similarity index I G ,single site errorr ate and switch accuracy (see Supplemental Ta bleS-B for detailed accuracy definitions). Divergent results maybeattributable to the method of accuracy measurement. Unfortunately,acomparison of the different accuracy measures wasn ot identifiedi nr eviewedl iterature.
To illustrate this, arelatively simple example of four articles that all focus on comparing the PHASE (v1.0) program to EM-based programs is provided here.A no riginal publication describingP HASE (v1.0) reported that the program outperformed other haplotyping methods, reducing MSE rates by more than 50 per cent relative to the HAPINFREX program and ap rogram with as tandard EM algorithm. 51 As ubsequent comparison 35 between PHASE v1.0 and as tandard EM program comparing accuracy,m easured by discrepancy error rates, showedt hat average errorr ates did not differ statistically between EM-based methods and PHASE v1.0. This finding wasseenacross simulated and phase-known data. 35 In rebuttal, Stephens et al. 63 showedt hat PHASE v1.0 outperforms HAPLOTYPERa nd PL-EM, with lowere rror rates on data simulated to fit ac oalescentm odel. Ther esults were reversed when ad ataset of molecular haplotypes wasu sed, where HAPLOTYPERa nd PL-EMw ere comparable,w ithb oth outperforming PHASE v1.0. 57 As this example demonstrates, characteristics inherent to a specific dataset whether molecularorsimulated data, influence the performance and accuracy of ap rogram. This mayi nfluence the perceivedaccuracy and performance of ahaplotyping program. Moreover,t he studies did not comparei dentical set of programs. Both Stephens et al. 51 and Zhang et al. 35 employedt heir owns tandard versions of the EM algorithm, which shouldb ec omparable but mayn ot have identical specifications. Af urther challenge is that, while PL-EM is an EM-based program, it is one of several EM programs that have been modified to overcome performance problems of the EM algorithm, as discussed previously.T herefore, the improvement in the performance of the EM-based program, PL-EM, versus PHASE mayn ot necessarily be generalisable to all EM-based programs. To overcome these problems, Stephens et al 59 compared their updatedv ersion of PHASE (v2.0) with several programs, usingt he samed atasetsa nd measures of accuracy as published comparisons of PHASE v1.0 to other programs. 57,58 Overall, programs based on the Bayesian principles, EM algorithm and imperfect phylogeny performed similarly with Salem,W essel and Schork Review REVIEW PAPER sequence-derived and simulated haplotype data. As shown previously, 31 no program or algorithm clearly distinguished itself from the rest. While Clark'si ntuitive method has shown utility,t he present assessment of the literature suggests that other methods offer distinct advantages. The performance of all programs is affected by model assumptions and population genetic parameters. The impact of these assumptions is discussed below.
Assumptions.T his section focuses on several common assumptions incorporated in haplotyping programs. Departures from or violations of these assumptions maya ffect program accuracy and performance.T he assumptions are related to each other;v iolationo fo ne assumption mayl ead to violation of as econd. For ease of evaluation and discussion, each assumption is addressed separately.P rogram assumptions (HWE, LD,p opulationh istory, etc) are noted in Ta bles 1 and S1.
Hardy-Weinberg equilibrium: as described in Ta bles 1 and 4, many programs -i ncluding all EM algorithm-based programs -a ssume HWE. Algorithms that assume HWE mayb es ensitivet od eparturesf romt his assumption. Departures from HWE arise either from excess homozygosity or heterozygositya tal ocus in ap opulation. Measures evaluating departures from HWEh aveb een shown to correlate with haplotype frequency estimation and assignment inference accuracy. 57 Increases in homozygosityt end to decrease the number of ambiguous individuals (ie individuals whose phase cannot be determined with certainty)and have been shown to have little impact on the accuracy of the EM-based method, as measured by the MSE. 49,64 By contrast, accuracy decreases with HWE departures resulting from increased heterozygosity. Comparing the performance of HAPINFREX,E M-DEC-ODER,P HASE v1.0 and HAPLOTYPER in simulated data with varying HWE departures found that all methods showed increased errorl evelsw ith excess heterozygosity. 57 HAPI-NFREX wasmost vulnerable to HWEdepartures, particularly underperforming in situations with lown umberso fh omozygotes. Performance improves rapidly withi ncreasing proportions of homozygotesi napopulation. 57 In data with a significant proportion of homozygous individuals, HAPI-NFREX outperformed PHASEv 1.0. 57 In an evaluation of HPLUS on simulated data with HWE departures, accuracy improved with increasing sample size,a lthough little benefit wasa chieved with samples beyond 100 subjects. 55 Linkage disequilibrium and recombination: research suggestst hat recombination hotspots -t hat is, chromosomal segments withh igh levels of recombination -t end to be separated by extended LD or haplotype 'blocks' exhibiting little recombination and strong LD.T his structuring of LD blocks mayb ec ommon in the humang enome. [16][17][18]65 Highly variable recombination rates in as mallg enomic region may violate assumptions of the current coalescent-based programs; 51,58 however, all methods mayh avep roblems constructing haplotypesa cross regionsw ith high levels of recombination 57,60 and lowL D. 36 While am ajority of programs do not makee xplicit assumptions about LD,t he performance of both EM methods 29,36,48,64 and PHASE v1.0 51 has been shown to improvew ith increasing LD.C omparisons of the accuracy of PHASE v1.0, HAPLOTYPER and Arlequin v3.0, showedthat accuracy wasadversely affected by increases in the recombination rate. 60 Doubling in theta ( u )that is, the mutationr ate per locus -r esults in a5-10p er cent decrease in accuracy for both Arlequin v3.0 and PHASE v1.0. By contrast, the global accuracy of HAPLTOYPER increased witht heta in some situations. 60 In this comparison, Arlequin v3.0 demonstrated the highest accuracy in the presence of recombination, by using as liding windows approach to phase loci. Performance measured by as imilarity index for HPLUS declined withi ncreasing number of single nucleotide polymorphisms (SNPs) for as imulated dataset with recombination, although this trend wasn ot observedw ith MSE. 55 The PL method used by HAPLOTYPER wasshown to be insensitivet ot he presence of recombination hotspots, although extensiver ecombination mayb ep roblematic. 57 Accuracy improves when hotspots are used as the partition sites, however. 54,57 PL-EM allows userstospecify the partition size,thereby allowing partitioning at the hotspot. Focusing on DNA segments in LD offersamethod to overcome the challenges and errors related to haplotyping in the presence of recombination hotspots. Since the recombination hotspots are not known in advance,a utomating the identification of LD block boundaries, haplotyping within blocks mayo ffer significant benefits 40,57 Several programs, notably HAP H , SLHAP v1.0 and PHASE v2.0, have exploited this methodology.S LHAP v1.0 58 and HAP H have been reported to improvet he accuracy of inferred haplotypes. Ar elated approach limits haplotype analysist os egments in LD. HAPLOREC based on the variable-length chains allowst he program to obtaind ifferent length haplotype fragments in different regions, based on the LD strength. 62 Ad rawback of these methods is that it mayl ead to al oss of phase information. 66 PHASE v2.0 incorporates as eparate algorithm to accommodate recombination, based on the method proposed by Fearnhead and Donnelly. 67 Evaluation of linkage and recombination is an important first step in haplotype analysis. The HAP H and HAPLOVIEW programs identifyhaplotype blocks in agraphicaldisplay. Data that contain recombination hotspots maypose achallenge to haplotyping software that assumes no recombination.Decreases in LD are correlated withincreasing estimation error 36 and magnify the effects of genotyping error; 68 thus, although haplotyping with loci whose alleles are in lowLDisimportant, haplotype estimates from such data maybeunreliable.Further study in this area is required,particularly in situations of intermediate LD levels; the influence of LD level on accuracy and determination of the LD level that, if surpassed, improves accuracy.This is not trivial, especially if many loci areconsidered, each with varying degrees of LD by comparison withthe others.

Literature review of haplotyping software
Review REVIEW PAPER As one would expect,recombination leads to an increase in the number of haplotypes, including lowfrequency haplotypes that ared ifficultt oe stimatea ccurately. 36,49,53 Increasing sample size mayimprovehaplotyping accuracy in the presence of high recombination. 39 Finally,a nalysing chromosome segments on either sideo farecombination hotspoti sm ostl ikely to be the only current viable option. 8 Population evolutionaryh istory: several programs impose assumptions on the evolutionaryh istoryo ft he populations from which samples have been obtained to improvep rogram efficiency and accuracy and simplify haplotype analysis. The PHASE program is the best-known example of aprogram that incorporates ap opulation evolutionaryh istorym odel -i n this case the coalescentm odel. 51,59 Moreover, the SLHAP v1.0 58 and Arlequin v3.0 60 programs areb ased on variants of the coalescentm odel. Several programs exploit the 'perfect phylogeny' concept. These programs (GPPH,D PPH and BPPH) are reported to be fast and accurate and to accommodate large numberso fm arkers. 40,42,43,45 The HAP H program uses arelaxed model -imperfectphylogeny-to make the model more amenable to what is currentlyk nown about population evolutionaryh istory. 44 The benefit of incorporating an evolutionary model,s uch as the coalescentm odel, is to takea dvantage of similarities between haplotypes; it is thought to result in morea ccurate haplotypes than other methods. 51,59 The disadvantage is that the behaviour of alleles in the short-terme volutiono f chromosomes mayv iolate the model, potentially leading to errors.B yc ontrast, HAPLOTYPER, HAPINFREX and HAPAR impose no population evolutionary historya ssumptions. Program performance and accuracy mayb ea ffected when data fit or do not fit the program'sp opulationa ssumption. To illustrate,S tephens et al. 51 note that PHASE v1.0, by comparison with EM algorithm-based methods, would reduce errorr ates by 50 per cent when data fit thec oalescent model. When compared to PL-EM, usings imilar data, the improvement in errorr ate was2 6p er cent lowert han that shown by Stephens et al.f or data that fit the coalescentm odel. 54 The coalescent model is appropriate for stable populations that have evolved over long periods of time,but is less suitable for populations with past genefl ow,s tratificationa nd/or population migration. There is disagreement as to whether haplotyping programs based on the coalescence model aret he most appropriate for accurate haplotyping. 35,51,57 Even when data do not fit the coalescent model,t he performance of PHASE v1.0 is suggested to be no worset han that of EM methods. 63 Using simulated data that violate the coalescent model, Niu et al. 57 showedt hat HAPLTOYPER and EM-DECODERa re more accurate than PHASE v1.0 and HAPINFREX.T he decline in performance of PHASEv 1.0 in at least one of the instances mayh aveb een due to insufficient updates rather than model assumptions. 59 The findings of Niu et al.w ere supported in as ubsequent comparison of PHASE v1.0, HAPLOTYPER and Arlequin v3.0. 60 Arlequin v3.0 had the highest accuracy of the three programs when the coalescentm odel wasv iolated. In ac omparison of PHASE v1.0, HAPINFREX,H APAR and HAPLOTYPER using data modelled to fit the coalescence model,P HASE v1.0 yielded the lowest errorr ate,f ollowedb yH APAR. 39 The updatedv ersion of PHASE v2.0 demonstrated improved performance with molecular haplotype data, exceedingt he performance of HAPLOTYPER, SLHAP v1.0 and the earlier version of PHASE. 59 An additional study assessed performance of PHASE v1.0, HAPAR and HAPLOTYPER using data simulated to fit the phylogenym odel, an evolutionarym odel related to the coalescence model. The comparison found that PHASE v1.0 had the lowest errorr ate,f ollowedb yH APAR and HAPLOTYPER. Errorr ates became similar for the three programs as sample size increased. 39 In summary, programs that assume ap opulation evolutionaryh istoryo fd ata should be used with care, since departures from model assumptions mayh aveasignificanti mpact on thea ccuracy of haplotype assignments and estimates. This shouldinnoway detract from the utility and flexibility of these programs, but servest o illustrate that model assumptions should be considered when these programs are used.
Genotypinge rror.G enotypinge rror is af ormo fm isclassification which can lead to deleterious effects on the powero f association analyses, 69 -72 LD measurements 69 and erroneous haplotype analysis. 60,68,73,74 The powero fS NP association studies decreases with even relatively small genotyping error rates. 71 As imilar trend maye xist for haplotype association studies, although further examination is required. Sample size requirements of varying SNP errorr ates and powerl evels can be examined at the Powerfor Associationwith Error( PAWE) website 70,71 (see Ta bles 2a nd S2).
Most genotyping errors are due to allelic dropout (missing data) and the inability to score heterozygotes, resultingi na n increased proportion of homozygotes. 73,75 Non-random distributions of missing genotypes represent an errori ng enotype assignments. Programs that deal with missing data often do so by assuming that data are missing at random. Spurious haplotypes mayb ei ntroduced if lociw ith genotype errors are included in haplotype analysis. 60 Errorr ates of 5per cent may bias haplotype estimates by as much as 30 per cent. 72 Genotyping errorl eads to as ubstantial lossi nh aplotype accuracy, particularly when LD is lowand many rare haplotypese xist. 74 Haplotyping methods that favour similar haplotypes mayb e less sensitivetogenotyping error. 60 Recently,Zou and Zhao 72 introduced an EM-based program that corrects haplotype frequency estimates for known genotype errorr ates, although determining genotyping errorc an be difficult in unrelated populations. 76 -78 Ac ommon strategy is to genotype as ubset of the study populationt wice,t od etermine errorr ates. Genotyping as fewa s2 5i ndividuals has been shown to be sufficient for determining genotyping errori nasimulation study. 76 Te sting assays pecificity and HWEd eviations of loci are established methods for reducing genotyping errorr ates. 79 Salem,W essel and Schork Review REVIEW PAPER Finally,t he accuracy and powerofa ssociation analyses mayb e improved by incorporating genotyping uncertainty in haplotype inference to negatet he effects of genotyping errors,a si n GS-EM. 73 Missingd ata.C urrent genotyping methods often result in missing data, owing to av ariety of factors, including, for example, polymerase chainr eaction dropouts, inability to score loci and systematic genotyping technology errors. Missing datac omplicate haplotype inference by increasing the difficulty and uncertainty of haplotype estimates.M issing data decrease the available information and mayb ias the haplotype assignment. The majority of programs score poorly in this area, as they are unable to accommodate any missing data (see Ta bles 1a nd 4f or programs that accommodate missing data). Some of these programs deal with missing data by ignoring subjects witha ny missing marker data, leading to a loss of data. Most programs assume that missing dataa re missing at random (see the section above,o ng enotyping error).
Accommodating missing data results in ap erformance decline,w ith increased memory requirements, longer run times and increased uncertainty.S everal strategies have been proposed and implemented for dealingwith haplotyping in the presence of missing data. The EM algorithm can be sett o accommodate missing data; ad iscussion focusing on EM haplotyping and missing data is provided elsewhere. 80 Among EM-based programs, LOGINSERM_ESTIHAPLOE includes the option of ignoring individuals with missing data or of using them in haplotype inference,d epending on research objectives, 80 whereas PL-EMa llows userst os pecify the number of possible haplotype sets with ap robability above a specificlevel. 54 By contrast, HAP H ignores missing markersin haplotype construction, and uses am aximum likelihood method to infer missing allele(s) to match common haplotypes. 44 The accuracy of HAP H wasmaintainedwith up to 10 per cent missing data. Arlequin v3.0 does not tryt oi mpute missing data in haplotype analysis, but rather ignores missing loci in thep rocess. 60 This approach is sensitivet ot he amount of missing data, withs mall decreases in accuracy with up to 2 per cent missing data becomingmore noticeable at 4per cent. Moreover, the additiono fasubset of individuals with large amountsofm issing data (20 per cent) has been shown to have ad etrimental effect on haplotype analysiso nt he larger group with completed ata. 60 Al imitation of the originalv ersion of PHASE (v1.0) was that it could not accommodate missing data. 51 SLHAP v1.0, based on of PHASE v1.0'sm ethods, includes modifications that allowa ccommodation of missing data. 58 Theu pdated version of PHASE v2.0 wasa lso adapted to accept missing data; phase at unknown positions is randomiseda nd any missing genotypes arei mputed with random guesses. 59 The HAPLORECp rogram also handles missing data by matching haplotypes withm issing data to known haplotypes, although missing allelesa re not imputed. 62 Finally,t he performance of HAPLOTYPER wass hown to be stable in the presence of missing data, although cautions houldb ee xercised when missing data are included. 57 Excellent discussions of the challenges of haplotyping with missing data are presented elsewhere. 57,81 The inclusion of individuals with too much missing data ( . 10 per cent) mayh aveadetrimental effect on the reconstruction of phase of individuals without missing data. Finally,markers with non-random patterns of genotyping failure should be redesigned or dropped from the haplotyping set. 57,80 Software characteristics.I nt his section, issues related to usability of programs ared iscussed. User-friendliness is an important issue in thes electiono fa ppropriate haplotyping programs, especially in terms of practicalusability of programs. Relevant issues include computer system requirements, data format,i nterface,m arker characteristics, runt ime and sample size.
Computer system requirements: as detailed in the 'platform' column in Ta bles 1a nd 4, not all programs are available for use withall computer operating systems. The selection of a haplotyping program mayn ecessitate investment in new computer equipment and training. Compilingprograms to run on new operating systemsp oses similar challenges.
Data input format: unfortunately,t here is no standard data input format. Nearly all of the programs use au nique data input format. Manipulatingd ata from one formatt ow ork with another is cumbersome and difficult. HIT and HAPLOSCOPE are platformp rograms, incorporating several haplotyping programs in one interface.T hese programs facilitate comparisonso fp rograms on the samed atasets.
User interface: the interface is an important component of usability of ahaplotyping program. Selection of aprogram will depend heavily on current knowledge or ability to invest time in learning about ac omputer system.T he majority of identified programs are command prompt driven (see Ta bles 1 and 4). These interfaces tend to intimidate computer novices or non-computer scientists. Fortunately,several programs with agraphicaluser interface were identified, including: Arlequin, HAPLOVIEW, HAPLOSCOPE and HPLUS.F inally,i ndividuals familiarw ith SAS and S-PLUS mayb ei nterested in the SAS Genetics module and HAPLO.STATSp rograms, respectively.
Marker characteristics: many of thew idely-used haplotyping programs are limited to biallelic loci. Programs that accommodate multiallelic markerso ften experience longer runt imes. Allele frequency is an important consideration in the selection of markers. Lowa llele frequencies result in low frequency haplotypes that mayh avel ittle value in explaining common disease variation. 49 Moreover, lowf requency haplotypes, for avarietyofreasons (eg sampling error, genotyping error, recombination and lowL D), ared ifficultt oe stimate accurately. 29,30,36,49,50,53 Output: in additiont oh aplotype frequency estimates and assignments, many programs provide measures for evaluating Literature review of haplotyping software Review REVIEW PAPER the 'goodness of fit' of constructed haplotypes. An umbero f EM-based programs provide posterior probabilities of haplotype assignments, including GENECOUNTING, HPLUS, HAPLO.STATS, LDSUPPORT, MLOCUS,P L-EM and SNPHAP.P osterior probabilities areh elpful for evaluation of haplotype assignmenta nd anys ubsequent analyses. Moreover, the probabilities can be used to weight and evaluate assigned haplotypes and frequency estimates. 25,82 Determination and interpretation of posterior probabilities is difficult for programs that use pseudo-Gibbs samplers, including Arlequin, HAPLOTYPERa nd PHASE. 51,57,60 Finally,A rlequin, HAPLO H ,HPLUS and PL-EMprovide the variance estimates for the estimated haplotype frequencies.
Run time: another issue in assessing thep erformance of haplotyping programs involves the programs' use of memory and demands on the centralp rocessingu nit. Run time is also affected by the complexityofthe haplotyping problem, which increases with the number of loci. 48,51 Although the present EM algorithm can theoretically handle an infinite number of polymorphic sites in as ample,i ti sl imitedi np ractice by its exponentially increasing memoryr equirements. 48,49 Moreover,E Mm ethods mayr equirem ultiple restarts to avoid local convergence and non-globalo ptimum, increasing the time required to infer haplotypes. 48 Using aGibbs sampler,PHASE v1.0 more efficiently determines phase than the EM algorithm and constructsh aplotypes with al argern umber of markers, although runt imes arel engthy. 51,58 PHASE has been universally recognised as having several useful features, but av ery slowi mplementation. 51,55,58,60 In the original article describing PHASE v1.0, it took minutes to hourst orun, whereas an EM program and HAPINFREX took seconds. 51 Among Bayesian-based programs, with 50 subjects and 14-119 loci, HAPLOTYPERe stimated haplotypesi ns econds, Arlequin v3.0 in minutes and PHASE v1.0 in hours. 60 In comparisons of several programs over complete datasetsfromReich et al., 16 HPLUS and HAPLOTYPERcompleted analysis in under one second,A rlequin v2.0 in less than one minute and PHASE v2.0 in 11 minutes. 55 Additional comparisonss uggest that programs that implement modified EM algorithms, such as SNPHAP and PL-EM,h ad shorter runt imes than PHASEv 1.0 on large datasets.H APLOREC has similar runt imes to the modified EM programs. 62 The updated version of PHASE (v2.0) improves program performance,a lthough it wasf ound still to be slowerthan the other programs. 59 The phylogeny programs (GPPH,D PPH,B PPH and HAP H )h aver emarkably fast run times. 40,43 -45 HAP H wass hown to runf aster than both HAPLOTYPERa nd PHASE v1.0 in av arietyofs ituations. 44 Run times for all programs increased in the presence of missing data and multiallelic markers. 54,60,62 Sample s ize: both sample size and the number of locia re important components for the selection of haplotyping programs. Details on samples ize and loci limits are listed in Ta bles 1a nd S1. As sample size increases, both in terms of the number of markersa nd subjects, the runt ime increases. The accuracy of EM-based programs has been shown to improve with increasing sample size. 4,53 Likewise,t he accuracy of HAPAR, HAPLOTYPER and PHASE v1.0 were also shown to improvew ith increasing sample size. 39 Accurate haplotyping of lowf requency haplotypes improves withi ncreasing sample size. 30 While standard EM-based programs have no theoretical limit, in practice these programs are limitedt of ewer than 25 loci, due to memory and processing requirements. 48,49,51 HAPINFREX,l ikewise,h as no practicals ize limits, although the program mayfail to start withlarge numbersofmarkers. 37 The parsimonyp rogram, HAPAR, overcomes HAPINFREX limitations, with accuracy improving with increasing sample size. 39 Programs that accommodate large datasetso ften sacrifice performance.P L, ad ivide and conquer strategy, has been proposeda sa ne ffectivem ethod of dealing with the construction of large haplotypes. 57 This and similar schemes have been implemented in both EM- 54 -56 and Bayesian-based programs. 57,59,60,62 These programs are able to handle large datasets,a lthough performance varies (see runt ime discussion above).
Hypothesis testing.H aplotypingi na nd of itself is usually not the final outcome of interest. Ther esearch objective dictates which subsequent analyses aren eeded.T his section will focus on programs that combineh aplotyping with hypothesis testingi ng enetic association studies (see Ta ble 3 and Supplemental Ta ble S3). Allh aplotype reconstruction methods will encounter ad egree of misclassification erroro r uncertainty in haplotype assignments. 7,81,83 If uncertainty of assignments is ignored in subsequent analyses, it can lead to biased parameter estimates and inflated false-positiver ates for statistically-based hypothesis tests. 25,31,82,83 In situations where inferred haplotypes had high reliability,b iased estimates were avoided,a nd found to be usefulf or hypothesis testing. 83 The imperfect phylogeny-based method in HAP H has been shown to assigna ccurate haplotypes 62 and has recently been updated to include association analysis of discrete and continuous phenotypes, althought he potential for bias exists, due to uncertainty of haplotype assignments. Several programs avoid this pitfall by comparing estimated haplotype frequencies between twog roups, 84,85 that is, ac ase-control model, these include EH, EHPLUS,F ASTEHPLUS,GENECOUNTING, PHASE v2 .0, SAS Genetics module and SNPEM. Fallin et al. 10 demonstrated the advantages of this approach using the SNPEM program.
This methodology has been extended to allowa djustment for covariates. The Zaykin 82 program uses al ikelihood ratio test statistic for association analysiso fh aplotypesa nd phenotypes. HAPLO.STATS 86,87 and THESIAS 53 also include atest for interaction with covariates using as core and likelihood ratio statistic,r espectively.T he HPLUS program is limited to qualitativep henotypes,a nd it provides odds ratio estimates. 55,83 The THESIAS program has recently been Salem,W essel and Schork Review REVIEW PAPER expanded to allowh aplotype-based association analysiso f survival outcomes. 88 Finally,A rlequin 60,89 incorporates numerous population genetics tests. Additional discussions on hypothesis testingw ith haplotypes are available. 82,86,90 -94 Web-based programs.S everal web-based haplotyping programs were identifieda nd arep resented in Ta ble 2a nd supplemental Ta ble S2. We b-based versions of haplotyping programs help researcherst oc ircumventm any of thei ssues related to practicalu sability, discussed previously.W eb-based programs negate then eed for the researcher to learnacomputer language(s), purchase computer hardware/software, install and maintain programs or to have to troubleshoot computer problems, thus allowing genetics researcherstofocus on what they do best. Moreover, web-based programs usually employ graphicali nterfaces, allowing the computer layman easily to use ahaplotyping program. Additionally,many of the identifiedw eb-based programs allowt he user to select results sent via e-mail. Finally, additional websites were identified with links to programs, as well as the websitef or the supplemental tables, also presented in Ta ble 2.

Haplotypingi np ooled data
Haplotype analysis using pooled samples is possible,b ut requires that alleles arei ns trong LD,a re severely limited to a small number of individuals and that only afew of the possible allele combinations arep resent. 95 This requires actualg enotyping of individuals to determine which haplotypes exist in the population of interest before testingfor differences in allele frequencies in the twop ooled samples. 95,96 Three programs for pooled samples were identified, as well as one technique, none of which were web-based (see Ta ble 4and Supplemental  Ta ble S4). All of the programs areo nly compatible with pools of one to six individuals, in which each pool uniquely comprises cases or controls of unrelated individuals. There has been some discussionastothe number of individuals and SNPs that the pooling technique or algorithm can handle. 95 -99 Pools of three to fouri ndividuals areo ptimal, in terms of accuracy and efficiency.A ccuracyb egins to decline beyond four individuals. 12 Zoua nd Zhao 72 pointo ut that pooled samples are particularly susceptible to genotyping errora nd that consideration should be givent ot he impact of population stratification in pooled samples.

Discussion
While no single haplotyping program is ideal in all situations, this reviewf ound that currently available haplotyping programs should accommodate the research needs of most scientists. While the programs share many similarities, significant differences were observedi nt heir ability to handle variousd ata characteristicsa nd population genetic parameters. Each program had its ownunique combination of features and limitations. It is hoped that researchersi nterested in haplotype analysisw ill use this paper as ag uide for selecting the haplotype analysis program(s)mosts uitable for their research needs. Moreover, it is anticipated that this review will be an impetus for additional testing, developmenta nd improvement of haplotyping software.
The selection of haplotyping programs shouldb eb ased on the research needs and characteristics of the data to be used for analysis. These criteria include: research objectives,h ypothesis testing, data assumptions, genotyping error, missing data and computer expertise to implement programs, if necessary. As uitable haplotyping program is one that generatest he desired results (haplotype frequency estimates and/or assignments)a nd analyses. For hypothesis testing, several programs were identifiedt hat combine haplotype analysis with hypothesis testing, which should facilitate analysis. The accuracy of haplotyping programs varied under different assumptions and situations. It wasf ound that deviations from assumptions often resulted in declines in the performance of haplotyping programs, therefore, an important step in selectingah aplotyping program is the evaluation of the assumptions inherent to collection of the data. This should identify programs that can accommodate limitationso r departures from assumptions of thed ata.
Selection of the appropriate haplotyping programs should also takeinto account the usability of aprogram. Assessmentof this criterion is challenging because usefulness dependso na number of sub-criteria, discussed previously.W eb-based programs and those withgraphicaluser interfaces will generally be the easiest to use and have theb est usability.U nfortunately, only as hortl isto fp rograms mays uit the needs of researchers. The usability of ap rogram will also depend heavily on the researcher'sc omputer expertise.I ns ummary, the choice of haplotyping program should be based on identifying research needs and selectingahaplotyping program most appropriate to accommodating those requirements. Awareness of program assumptions and limitations shouldb ea ni mportant factor in the final decision.
All of the programs reviewed assume genetic homogeneity of individuals in study populations.I nb rief,t he basis of this assumption is that all individuals in as tudy population share a similar population history. Inclusion of individuals with dissimilar population histories will result in incorrect haplotype estimates due to,f or example,L Dd ifferences and allelef requencyd ifferences between the populations.A sa ne xample, consider ah ypothetical population of 200 individuals: half being of African-American ancestrya nd half of European-American ancestry. The resulting haplotyping estimates will not be correct for either the African-American or European American groups. To obtain accurate haplotype estimates and assignments, the groups must be analysed separately.F urther discussions on this topic are available elsewhere. 5,100 -103 The majority of the reviewed programs area ctively maintaineda nd updated regularly.H aplotyping analysis is a rapidly evolving field, with many new methods and programs Literature review of haplotyping software Review REVIEW PAPER emerging. Programs that are reviewed here mayb em odified or even be completely revamped in the nearf uture.A ccurate and updatedi nformation on existing haplotyping programs will be maintaineda th ttp://polymorphism.ucsd.edu/Hap-SoftwareReview/. An important limitation of this project is that it relied on areview of literature to evaluate the programs. Therefore, it wasn ot possible to validate the accuracy,p erformance and claims of all individual programs. This review found that haplotype analysisp rograms have increased in number and have improved rapidly over the past decade.W hile existing haplotyping methods maya ccommodate research needs, many opportunitiese xist for improvement of haplotyping programs. In particular,i mprovements in accuracy (particularly for assignments), faster runt ime, accommodation of larger sample sets and loci, handling missing data, incorporating association testing and identification and adjustmento fh aplotype estimates in the presence of genotyping error. In addition, an emerging questionishow to construct haplotypes across large genomic regionsespecially withs ubstantial numberso fl oci. Available methods include programs that use ab lock-based approach, methods that build large haplotypes by addingo ne loci at at ime (ie SNPHAP) or programs that use theP La pproach (ie HAPL-OTYPER, PL-EM). Future studies are necessaryt od irectly evaluate the different measures of accuracy,assess the influence of varying of LD levels on accuracy and further assess the impact of departures of assumptions on program performance and accuracy.I deally,f uture studies would evaluate several of the more commonly used programs in as tandard fashion, allowing comparison across studies. This would facilitate comparison of programs and determination of the most appropriate program. Moreover, adoption of au niversal data formatw ould also be helpful. Finally,the use of astandardised phase-known dataset(s), which developerso fh aplotyping programs could assess for evaluating their programs, would assist in the selection, improvement and development of haplotyping programs. Potential sources include examples from the literature 4,18,65 and the HapMap project data (available at: www.hapmap.org).