Skip to main content

Table 1 Description of unrelated haplotyping programs, divided into four classes based on method.

From: A comprehensive literature review of haplotyping software and methods for use with unrelated individuals

Program name Algorithm Outputa Missing datab Assumptionsc Key features Limitations MAX subjects, loci and type Platform Ref.d
Parsimony methods          
1. Simple parsimony          
HAPAR Parsimony HA No None Overcomes limitations of HAPINFREX May be susceptible to HWE departures Practical limit, biallelic PC/UNIX [39]
      Increasing sample size improves
accuracy
    
HAPINFERX Clark's HA No None Intuitive method, fast May fail to start Practical limit,
biallelic/multiallelic
UNIX [37]
      Reduced number
of haplotypes
Sensitive to
data order
   
      No limit on
number of loci
Unstable and
erroneous
estimates
   
2. Phylogeny          
BPPH IP HA No IP Similar to HAPH User interface Practical limit, biallelic MAC [45]
      Speed     
DPPH PP HA No PP Handles large datasets Theoretical Practical limit, biallelic MAC [40, 43]
      Speed Strict population assumptions    
GPPH PP HA No PP Handles large datasets Theoretical Practical limit, biallelic MAC/PC/UNIX [40, 42]
      Speed Strict population assumptions    
HAPH IP HA/HF Yes HWE, IP Predicts haplotype blocks No probability for haplotype assignments Max 500 loci, Practical limit biallelic Web-based [44]
      Constructs
haplotypes within blocks
    
      Identifies block
structure
    
      Web-based     
Likelihood methods          
1. Maximum likelihood         
Arlequin v2.0 EM HA/HF No HWE Includes numerous population genetic analysis tools EM issues EM Practical Limits, biallelic/multiallelic JRE on MAC/PC/UNIX [89]
CHAPLIN ECM HF Yes HWE Graphical interface ECM algorithm needs to be compared with standard EM methods Practical limits, biallelic/multiallelic PC [91]
      Association tests     
      HWE assumption relaxed in case
sample
    
EH EM HF No HWE Estimates haplotype frequency EM issues No Max, 3-4 practical max, biallelic/multiallelic PC [85, 104]
      Compares case-control HF
under different assumptions
Must specify mode
of inheritance and
penetrance of disease
   
EHPLUS EM HF No HWE Improves EH,
more loci and
polymorphic
markers
Long run times
for permutation
calculations
Max 5 loci,
15 alleles in
analysis
PC/UNIX [84]
      Incorporates
model-free
analysis
    
EM-DeCODER EM HA/HF No HWE Program with
standard EM
algorithm
EM issues Max 15 loci,
biallelic
UNIX [57]
FASTEHPLUS EM HF No HWE Similar to EHPLUS,
with speed
improvements
EM issues Max 5 loci,
15 alleles in
analysis
PC/UNIX [105]
GENECOUNTING EM HA/HF Yes HWE Provides posterior
probabilities
for assigned
haplotypes
Missing data
limited to
biallelic loci
10-15 loci
practical
limit, biallelic/
multiallelic
PC/UNIX [106]
      Compares global
and specific
haplotypes
between groups
EM issues    
GCHAP EM HA/HF YES HWE Haplotypes with
zero likelihood
dropped to
improve speed and
accuracy
EM issues 20 loci practical
limit, biallelic
JRE on
PC/UNIX
[107, 108]
      Similar to SNPHAP     
GS-EM EM HA/HF Yes HWE Includes algorithm
for assigning probability to genotype
calls from several
genotyping
methods
EM issues Practical limit,
biallelic
Web-based [73]
      Haplotypes constructed using assigned genotypes probability Limited to biallelic SNPs    
      Web-based     
HAPZ EH HA/HF Yes HWE Modified version of
SNPHAP that
accommodates
multiallelic loci
EM issues Practical limit, biallelic/multiallelic PC/UNIX [106]
HAPMAX MLE HF No HWE Ease of use Accommodates a limited number
of SNPs
8 loci, biallelic PC [109]
       Interface    
HAPLOH EM HF Yes HWE Handles some missing data EM issues 10 loci, 40 alleles max, biallelic/multiallelic UNIX [47]
      Utilises pedigree data, if available     
      Calculates standard
error
    
HAPLOSCOPE EM/MCMC Platform program, incorporates SNPHAP and PHASE v1.0 See individual programs for limitations/features UNIX/Windows [110]
      Facilitates
comparison/testing
    
      Graphical interface, identifies tagging SNPs and LD
blocks
    
HAPLOVIEW EM+PL HA/HF Yes HWE Calculates pairwise LD EM issues 100 s, practical limit, biallelic JRE on MAC/PC/UNIX [56]
      Checks for
recombination
    
      Identifies tagging
SNPs
    
      Accepts pedigree
and unrelated
genotype data
    
HAPLO.STATS EM HA/HF Yes HWE Incorporates
method similar to
SNPHAP, with user
inputs
Requires
knowledge of
S-Plus 6.0 or R
Practical
limit, biallelic/
multiallelic
S-PLUS
6.0 on
UNIX/R
on UNIX
& PC
[86]
      Separate
programs that:
EM issues    
      (1) assign
haplotypes with
posterior
probability of
assignments
    
      (2) allow linear
regression for trait
to haplotype
analysis
    
      (3) calculates score
statistic for haplotype phenotype
association
    
HIT EM/MCMC/
MC+PL
Platform program,
incorporates
SNPHAP and
PHASE v1.0
See individual
programs for
limitations/
features
* [111]
      Facilitates
comparison
    
      Graphical interface,
identifies
tagging SNPs
and LD blocks
    
HPLUS EM+EE+PL HA/HF Yes HWE Provides posterior
probabilities for
assigned haplotypes
Requires Matlab 100 loci,
biallelic
MATLAB
on PC/
UNIX
[55, 83]
      Compares
haplotype
frequencies
between groups,
adjusts for
covariates
EM issues    
      Utilises pedigree
data, if available
    
LDSUPPORT EM HA/HF Yes HWE Provides posterior
probabilities for
assigned haplotypes
EM issues * UNIX [29, 112]
      Identifies LD
blocks for
haplotype
reconstruction
    
      Examines association with disease,
automation speeds
process
    
LOGINSERM
ESTIHAPLO
EM HA/HF Yes HWE Program uses ML
method to infer
haplotypes for individuals with missing
data
EM issues Practical
limit, biallelic/multiallelic
PC/
UNIX
[80]
      Offers option to
exclude individuals
with missing data
    
MLHAPFRE EM HF Yes HWE Performance improves with presence of LD Incorporated into Arlequin 16 loci, biallelic JRE on Mac/PC/UNIX [48]
      Performs well with
large sample size
EM issues    
MLOCUS EM HA/HF Yes HWE Provides posterior probabilities for assigned haplotypes EM issues 11 loci, biallelic/multiallelic PC [46, 113]
      Notes observed vs.
inferred haplotypes
    
      Calculates pairwise LD     
OSLEM EM Yes No HWE Modified EM algorithm that runs 2 × faster EM issues Practical limit, biallelic Web-based [114]
PL-EM EM+PL HA/HF Yes HWE Combines PL with EM EM issues 100 s, practical limit, biallelic PC/UNIX [54]
      EM-based version
of HAPLOTYPER
    
      Calculates variance
of haplotype frequency estimates
    
SAS Genetics EM HA/HF Yes HWE Provides posterior probabilities for assigned haplotypes Requires SAS Practical limit, biallelic/multiallelic SAS on PC/UNIX [115]
      Incorporates
statistical tests and
procedures
EM issues    
SNPEM EM HF No HWE Estimates
haplotype
frequency by
population
EM issues 10 loci, biallelic UNIX [10]
      Compares global
and specific
haplotype between
2 groups
    
SNPHAP EM HA/HF Yes HWE Uses posterior and
prior trimming to
handle large
number loci
EM issues Practical limit,
biallelic
UNIX [52]
      Provides posterior
probabilities for
assigned haplotypes
    
THESIAS S-EM HF Yes HWE Stochastic EM
avoids issues of
standard EM
programs
S-EM algorithm
needs to be
compared with
standard EM
methods
Practical limit,
20 loci,
biallelic
PC/UNIX [53, 88]
      Includes tests for
haplotype-phenotype association
    
      Accommodates
large sample sizes
    
WHAP EM Uses haplotype
output from
SNPHAP for
association testing
EM issues PC/UNIX [116]
      Allows weighted
association
analysis
Requires
separate
haplotyping
program
   
Zaykin et al. EM HF No HWE Program on
analysis of
haplotype-phenotype
association
EM issues Practical
limit, biallelic/multiallelic
PC/UNIX [82]
       Subjects with
missing data
ignored
   
Zou and Zhao MLE/EM HF Yes HWE Adjust haplotype
frequency estimates for genotyping error
Assumes genotyping errors are
random
Practical limits,
biallelic/multiallelic
* [68]
      Program also
works for nuclear
families
Assumes error
rates are known
   
3locus.PAS EM HF Yes HWE Handles some
missing data
EM issues 3 loci, biallelic/
multiallelic
PC/UNIX [46]
      Various tests
available
    
      Improves with
increasing sample
size
    
2. Simple Bayesian          
HAPLOTYPER MC+PL HA/HF Yes HWE Uses PL algorithm
to construct haplotypes with many
loci
Long run times 256 max,
biallelic
UNIX [57]
      Provides posterior
probabilities for
assigned haplotypes
Posterior
probabilities may
be difficult to
interpret
   
HAPLOREC MC-VL HA/HF Yes HWE Uses variable
length chain based
on maximising LD
Restarts avoid
non-global
optimum
Practical
limit,
biallelic
Java
virtual
machine,
v1.4
or newer
[62]
      Handles large
number loci
    
3. Coalescent-based Bayesian e         
Arlequin v3.0 ELB HA/HF No Adaptive
window
Includes numerous
population genetics
analyses
Long run times 1,000 s, biallelic/
multiallelic
JRE on
LINUX/
PC/Mac
[60, 89]
      Handles recombination     
PHASE v2.0 MCMC+PL HA/HF Yes Coalescent/
HWE
Improved run time Departure for
coalescent model
may impact performance
Practical limit,
biallelic/
multiallelic
PC/MAC/
UNIX
[59]
      Compares haplotype frequency
between groups
Posterior
probabilities may
be difficult to
interpret
   
      Handles
recombination
    
      Provides posterior
probabilities for
assigned haplotypes
    
PHASE v1.0 MCMC HA/HF No Coalescent/
HWE
Incorporates
pop-genetics and
coalescence ideas
Departures for
coalescent model
may impact performance
Practical limit,
biallelic/
multiallelic
UNIX [51]
      Incorporates
known phase and
trios pedigrees into
analysis
Slow run times    
      Provides posterior
probabilities for
assigned haplotypes
Posterior probabilities
may be difficult
to interpret
   
SLHAP v1.0 MCMC HA/HF Yes Neutral
coalescent/
HWE
Similar to PHASE
v1.0
Departures for
coalescent model
may impact performance
Practical limit,
biallelic/multiallelic
UNIX [58]
      Missing data     
      Improved run time     
  1. a Program haplotype output, individual assignment, frequency estimates or both.
  2. b Ability of program to accept missing data.
  3. c Program assumptions.
  4. d List of references.
  5. e Programs in this section make assumptions based on or draw inference from coalescent model.
  6. *Could not determine from available data.
  7. See incorporated programs for features and limitations.
  8. EE: Estimating equation; ECM: Expectation conditional maximisation algorithm; ELB: Excoffier-Laval-Balding algorithm, Bayesian; EM: Expectation maximisation algorithm; EM issues: May be sensitive to HWE departures, long run times, and non-global max (requiring multiple restarts); HF: Haplotype frequency estimate; HA: Individual haplotype assignment; HWE: Hardy-Weinberg equilibrium; IP: Imperfect phylogeny-based method; JRE: Java runtime environment; LD: Linkage disequilibrium; MAC: Program runs on Apple computer; MC: Monte Carlo algorithm, Bayesian algorithm; MCMC: Markov Chain Monte Carlo algorithm, Bayesian algorithm; MC-VL: Monte Carlo-variable length chain algorithm, Bayesian Algorithm; MLE: Maximum likelihood estimation algorithm; PC: IBM compatible personal computer; PL: Partition ligation algorithm; PP: Perfect phylogeny-based method; Practical Limit: program has no upper limit on number of markers and/or subjects, however computational and practical considerations limit this value; S-EM: Stochastic EM algorithm; UNIX: Runs on Unix operating system, including Linux, FORTRAN, Solaris and others.