Skip to main content

Advertisement

Table 1 Description of unrelated haplotyping programs, divided into four classes based on method.

From: A comprehensive literature review of haplotyping software and methods for use with unrelated individuals

Program name Algorithm Outputa Missing datab Assumptionsc Key features Limitations MAX subjects, loci and type Platform Ref.d
Parsimony methods          
1. Simple parsimony          
HAPAR Parsimony HA No None Overcomes limitations of HAPINFREX May be susceptible to HWE departures Practical limit, biallelic PC/UNIX [39]
      Increasing sample size improves accuracy     
HAPINFERX Clark's HA No None Intuitive method, fast May fail to start Practical limit, biallelic/multiallelic UNIX [37]
      Reduced number of haplotypes Sensitive to data order    
      No limit on number of loci Unstable and erroneous estimates    
2. Phylogeny          
BPPH IP HA No IP Similar to HAPH User interface Practical limit, biallelic MAC [45]
      Speed     
DPPH PP HA No PP Handles large datasets Theoretical Practical limit, biallelic MAC [40, 43]
      Speed Strict population assumptions    
GPPH PP HA No PP Handles large datasets Theoretical Practical limit, biallelic MAC/PC/UNIX [40, 42]
      Speed Strict population assumptions    
HAPH IP HA/HF Yes HWE, IP Predicts haplotype blocks No probability for haplotype assignments Max 500 loci, Practical limit biallelic Web-based [44]
      Constructs haplotypes within blocks     
      Identifies block structure     
      Web-based     
Likelihood methods          
1. Maximum likelihood         
Arlequin v2.0 EM HA/HF No HWE Includes numerous population genetic analysis tools EM issues EM Practical Limits, biallelic/multiallelic JRE on MAC/PC/UNIX [89]
CHAPLIN ECM HF Yes HWE Graphical interface ECM algorithm needs to be compared with standard EM methods Practical limits, biallelic/multiallelic PC [91]
      Association tests     
      HWE assumption relaxed in case sample     
EH EM HF No HWE Estimates haplotype frequency EM issues No Max, 3-4 practical max, biallelic/multiallelic PC [85, 104]
      Compares case-control HF under different assumptions Must specify mode of inheritance and penetrance of disease    
EHPLUS EM HF No HWE Improves EH, more loci and polymorphic markers Long run times for permutation calculations Max 5 loci, 15 alleles in analysis PC/UNIX [84]
      Incorporates model-free analysis     
EM-DeCODER EM HA/HF No HWE Program with standard EM algorithm EM issues Max 15 loci, biallelic UNIX [57]
FASTEHPLUS EM HF No HWE Similar to EHPLUS, with speed improvements EM issues Max 5 loci, 15 alleles in analysis PC/UNIX [105]
GENECOUNTING EM HA/HF Yes HWE Provides posterior probabilities for assigned haplotypes Missing data limited to biallelic loci 10-15 loci practical limit, biallelic/ multiallelic PC/UNIX [106]
      Compares global and specific haplotypes between groups EM issues    
GCHAP EM HA/HF YES HWE Haplotypes with zero likelihood dropped to improve speed and accuracy EM issues 20 loci practical limit, biallelic JRE on PC/UNIX [107, 108]
      Similar to SNPHAP     
GS-EM EM HA/HF Yes HWE Includes algorithm for assigning probability to genotype calls from several genotyping methods EM issues Practical limit, biallelic Web-based [73]
      Haplotypes constructed using assigned genotypes probability Limited to biallelic SNPs    
      Web-based     
HAPZ EH HA/HF Yes HWE Modified version of SNPHAP that accommodates multiallelic loci EM issues Practical limit, biallelic/multiallelic PC/UNIX [106]
HAPMAX MLE HF No HWE Ease of use Accommodates a limited number of SNPs 8 loci, biallelic PC [109]
       Interface    
HAPLOH EM HF Yes HWE Handles some missing data EM issues 10 loci, 40 alleles max, biallelic/multiallelic UNIX [47]
      Utilises pedigree data, if available     
      Calculates standard error     
HAPLOSCOPE EM/MCMC Platform program, incorporates SNPHAP and PHASE v1.0 See individual programs for limitations/features UNIX/Windows [110]
      Facilitates comparison/testing     
      Graphical interface, identifies tagging SNPs and LD blocks     
HAPLOVIEW EM+PL HA/HF Yes HWE Calculates pairwise LD EM issues 100 s, practical limit, biallelic JRE on MAC/PC/UNIX [56]
      Checks for recombination     
      Identifies tagging SNPs     
      Accepts pedigree and unrelated genotype data     
HAPLO.STATS EM HA/HF Yes HWE Incorporates method similar to SNPHAP, with user inputs Requires knowledge of S-Plus 6.0 or R Practical limit, biallelic/ multiallelic S-PLUS 6.0 on UNIX/R on UNIX & PC [86]
      Separate programs that: EM issues    
      (1) assign haplotypes with posterior probability of assignments     
      (2) allow linear regression for trait to haplotype analysis     
      (3) calculates score statistic for haplotype phenotype association     
HIT EM/MCMC/ MC+PL Platform program, incorporates SNPHAP and PHASE v1.0 See individual programs for limitations/ features * [111]
      Facilitates comparison     
      Graphical interface, identifies tagging SNPs and LD blocks     
HPLUS EM+EE+PL HA/HF Yes HWE Provides posterior probabilities for assigned haplotypes Requires Matlab 100 loci, biallelic MATLAB on PC/ UNIX [55, 83]
      Compares haplotype frequencies between groups, adjusts for covariates EM issues    
      Utilises pedigree data, if available     
LDSUPPORT EM HA/HF Yes HWE Provides posterior probabilities for assigned haplotypes EM issues * UNIX [29, 112]
      Identifies LD blocks for haplotype reconstruction     
      Examines association with disease, automation speeds process     
LOGINSERM ESTIHAPLO EM HA/HF Yes HWE Program uses ML method to infer haplotypes for individuals with missing data EM issues Practical limit, biallelic/multiallelic PC/ UNIX [80]
      Offers option to exclude individuals with missing data     
MLHAPFRE EM HF Yes HWE Performance improves with presence of LD Incorporated into Arlequin 16 loci, biallelic JRE on Mac/PC/UNIX [48]
      Performs well with large sample size EM issues    
MLOCUS EM HA/HF Yes HWE Provides posterior probabilities for assigned haplotypes EM issues 11 loci, biallelic/multiallelic PC [46, 113]
      Notes observed vs. inferred haplotypes     
      Calculates pairwise LD     
OSLEM EM Yes No HWE Modified EM algorithm that runs 2 × faster EM issues Practical limit, biallelic Web-based [114]
PL-EM EM+PL HA/HF Yes HWE Combines PL with EM EM issues 100 s, practical limit, biallelic PC/UNIX [54]
      EM-based version of HAPLOTYPER     
      Calculates variance of haplotype frequency estimates     
SAS Genetics EM HA/HF Yes HWE Provides posterior probabilities for assigned haplotypes Requires SAS Practical limit, biallelic/multiallelic SAS on PC/UNIX [115]
      Incorporates statistical tests and procedures EM issues    
SNPEM EM HF No HWE Estimates haplotype frequency by population EM issues 10 loci, biallelic UNIX [10]
      Compares global and specific haplotype between 2 groups     
SNPHAP EM HA/HF Yes HWE Uses posterior and prior trimming to handle large number loci EM issues Practical limit, biallelic UNIX [52]
      Provides posterior probabilities for assigned haplotypes     
THESIAS S-EM HF Yes HWE Stochastic EM avoids issues of standard EM programs S-EM algorithm needs to be compared with standard EM methods Practical limit, 20 loci, biallelic PC/UNIX [53, 88]
      Includes tests for haplotype-phenotype association     
      Accommodates large sample sizes     
WHAP EM Uses haplotype output from SNPHAP for association testing EM issues PC/UNIX [116]
      Allows weighted association analysis Requires separate haplotyping program    
Zaykin et al. EM HF No HWE Program on analysis of haplotype-phenotype association EM issues Practical limit, biallelic/multiallelic PC/UNIX [82]
       Subjects with missing data ignored    
Zou and Zhao MLE/EM HF Yes HWE Adjust haplotype frequency estimates for genotyping error Assumes genotyping errors are random Practical limits, biallelic/multiallelic * [68]
      Program also works for nuclear families Assumes error rates are known    
3locus.PAS EM HF Yes HWE Handles some missing data EM issues 3 loci, biallelic/ multiallelic PC/UNIX [46]
      Various tests available     
      Improves with increasing sample size     
2. Simple Bayesian          
HAPLOTYPER MC+PL HA/HF Yes HWE Uses PL algorithm to construct haplotypes with many loci Long run times 256 max, biallelic UNIX [57]
      Provides posterior probabilities for assigned haplotypes Posterior probabilities may be difficult to interpret    
HAPLOREC MC-VL HA/HF Yes HWE Uses variable length chain based on maximising LD Restarts avoid non-global optimum Practical limit, biallelic Java virtual machine, v1.4 or newer [62]
      Handles large number loci     
3. Coalescent-based Bayesian e         
Arlequin v3.0 ELB HA/HF No Adaptive window Includes numerous population genetics analyses Long run times 1,000 s, biallelic/ multiallelic JRE on LINUX/ PC/Mac [60, 89]
      Handles recombination     
PHASE v2.0 MCMC+PL HA/HF Yes Coalescent/ HWE Improved run time Departure for coalescent model may impact performance Practical limit, biallelic/ multiallelic PC/MAC/ UNIX [59]
      Compares haplotype frequency between groups Posterior probabilities may be difficult to interpret    
      Handles recombination     
      Provides posterior probabilities for assigned haplotypes     
PHASE v1.0 MCMC HA/HF No Coalescent/ HWE Incorporates pop-genetics and coalescence ideas Departures for coalescent model may impact performance Practical limit, biallelic/ multiallelic UNIX [51]
      Incorporates known phase and trios pedigrees into analysis Slow run times    
      Provides posterior probabilities for assigned haplotypes Posterior probabilities may be difficult to interpret    
SLHAP v1.0 MCMC HA/HF Yes Neutral coalescent/ HWE Similar to PHASE v1.0 Departures for coalescent model may impact performance Practical limit, biallelic/multiallelic UNIX [58]
      Missing data     
      Improved run time     
  1. a Program haplotype output, individual assignment, frequency estimates or both.
  2. b Ability of program to accept missing data.
  3. c Program assumptions.
  4. d List of references.
  5. e Programs in this section make assumptions based on or draw inference from coalescent model.
  6. *Could not determine from available data.
  7. See incorporated programs for features and limitations.
  8. EE: Estimating equation; ECM: Expectation conditional maximisation algorithm; ELB: Excoffier-Laval-Balding algorithm, Bayesian; EM: Expectation maximisation algorithm; EM issues: May be sensitive to HWE departures, long run times, and non-global max (requiring multiple restarts); HF: Haplotype frequency estimate; HA: Individual haplotype assignment; HWE: Hardy-Weinberg equilibrium; IP: Imperfect phylogeny-based method; JRE: Java runtime environment; LD: Linkage disequilibrium; MAC: Program runs on Apple computer; MC: Monte Carlo algorithm, Bayesian algorithm; MCMC: Markov Chain Monte Carlo algorithm, Bayesian algorithm; MC-VL: Monte Carlo-variable length chain algorithm, Bayesian Algorithm; MLE: Maximum likelihood estimation algorithm; PC: IBM compatible personal computer; PL: Partition ligation algorithm; PP: Perfect phylogeny-based method; Practical Limit: program has no upper limit on number of markers and/or subjects, however computational and practical considerations limit this value; S-EM: Stochastic EM algorithm; UNIX: Runs on Unix operating system, including Linux, FORTRAN, Solaris and others.