Skip to main content

Table 1 Description of unrelated haplotyping programs, divided into four classes based on method.

From: A comprehensive literature review of haplotyping software and methods for use with unrelated individuals

Program name

Algorithm

Outputa

Missing datab

Assumptionsc

Key features

Limitations

MAX subjects, loci and type

Platform

Ref.d

Parsimony methods

         

1. Simple parsimony

         

HAPAR

Parsimony

HA

No

None

Overcomes limitations of HAPINFREX

May be susceptible to HWE departures

Practical limit, biallelic

PC/UNIX

[39]

     

Increasing sample size improves

accuracy

    

HAPINFERX

Clark's

HA

No

None

Intuitive method, fast

May fail to start

Practical limit,

biallelic/multiallelic

UNIX

[37]

     

Reduced number

of haplotypes

Sensitive to

data order

   
     

No limit on

number of loci

Unstable and

erroneous

estimates

   

2. Phylogeny

         

BPPH

IP

HA

No

IP

Similar to HAPH

User interface

Practical limit, biallelic

MAC

[45]

     

Speed

    

DPPH

PP

HA

No

PP

Handles large datasets

Theoretical

Practical limit, biallelic

MAC

[40, 43]

     

Speed

Strict population assumptions

   

GPPH

PP

HA

No

PP

Handles large datasets

Theoretical

Practical limit, biallelic

MAC/PC/UNIX

[40, 42]

     

Speed

Strict population assumptions

   

HAPH

IP

HA/HF

Yes

HWE, IP

Predicts haplotype blocks

No probability for haplotype assignments

Max 500 loci, Practical limit biallelic

Web-based

[44]

     

Constructs

haplotypes within blocks

    
     

Identifies block

structure

    
     

Web-based

    

Likelihood methods

         

1. Maximum likelihood

        

Arlequin v2.0

EM

HA/HF

No

HWE

Includes numerous population genetic analysis tools

EM issues

EM Practical Limits, biallelic/multiallelic

JRE on MAC/PC/UNIX

[89]

CHAPLIN

ECM

HF

Yes

HWE

Graphical interface

ECM algorithm needs to be compared with standard EM methods

Practical limits, biallelic/multiallelic

PC

[91]

     

Association tests

    
     

HWE assumption relaxed in case

sample

    

EH

EM

HF

No

HWE

Estimates haplotype frequency

EM issues

No Max, 3-4 practical max, biallelic/multiallelic

PC

[85, 104]

     

Compares case-control HF

under different assumptions

Must specify mode

of inheritance and

penetrance of disease

   

EHPLUS

EM

HF

No

HWE

Improves EH,

more loci and

polymorphic

markers

Long run times

for permutation

calculations

Max 5 loci,

15 alleles in

analysis

PC/UNIX

[84]

     

Incorporates

model-free

analysis

    

EM-DeCODER

EM

HA/HF

No

HWE

Program with

standard EM

algorithm

EM issues

Max 15 loci,

biallelic

UNIX

[57]

FASTEHPLUS

EM

HF

No

HWE

Similar to EHPLUS,

with speed

improvements

EM issues

Max 5 loci,

15 alleles in

analysis

PC/UNIX

[105]

GENECOUNTING

EM

HA/HF

Yes

HWE

Provides posterior

probabilities

for assigned

haplotypes

Missing data

limited to

biallelic loci

10-15 loci

practical

limit, biallelic/

multiallelic

PC/UNIX

[106]

     

Compares global

and specific

haplotypes

between groups

EM issues

   

GCHAP

EM

HA/HF

YES

HWE

Haplotypes with

zero likelihood

dropped to

improve speed and

accuracy

EM issues

20 loci practical

limit, biallelic

JRE on

PC/UNIX

[107, 108]

     

Similar to SNPHAP

    

GS-EM

EM

HA/HF

Yes

HWE

Includes algorithm

for assigning probability to genotype

calls from several

genotyping

methods

EM issues

Practical limit,

biallelic

Web-based

[73]

     

Haplotypes constructed using assigned genotypes probability

Limited to biallelic SNPs

   
     

Web-based

    

HAPZ

EH

HA/HF

Yes

HWE

Modified version of

SNPHAP that

accommodates

multiallelic loci

EM issues

Practical limit, biallelic/multiallelic

PC/UNIX

[106]

HAPMAX

MLE

HF

No

HWE

Ease of use

Accommodates a limited number

of SNPs

8 loci, biallelic

PC

[109]

      

Interface

   

HAPLOH

EM

HF

Yes

HWE

Handles some missing data

EM issues

10 loci, 40 alleles max, biallelic/multiallelic

UNIX

[47]

     

Utilises pedigree data, if available

    
     

Calculates standard

error

    

HAPLOSCOPE

EM/MCMC

†

†

†

Platform program, incorporates SNPHAP and PHASE v1.0

See individual programs for limitations/features

†

UNIX/Windows

[110]

     

Facilitates

comparison/testing

    
     

Graphical interface, identifies tagging SNPs and LD

blocks

    

HAPLOVIEW

EM+PL

HA/HF

Yes

HWE

Calculates pairwise LD

EM issues

100 s, practical limit, biallelic

JRE on MAC/PC/UNIX

[56]

     

Checks for

recombination

    
     

Identifies tagging

SNPs

    
     

Accepts pedigree

and unrelated

genotype data

    

HAPLO.STATS

EM

HA/HF

Yes

HWE

Incorporates

method similar to

SNPHAP, with user

inputs

Requires

knowledge of

S-Plus 6.0 or R

Practical

limit, biallelic/

multiallelic

S-PLUS

6.0 on

UNIX/R

on UNIX

& PC

[86]

     

Separate

programs that:

EM issues

   
     

(1) assign

haplotypes with

posterior

probability of

assignments

    
     

(2) allow linear

regression for trait

to haplotype

analysis

    
     

(3) calculates score

statistic for haplotype phenotype

association

    

HIT

EM/MCMC/

MC+PL

†

†

†

Platform program,

incorporates

SNPHAP and

PHASE v1.0

See individual

programs for

limitations/

features

†

*

[111]

     

Facilitates

comparison

    
     

Graphical interface,

identifies

tagging SNPs

and LD blocks

    

HPLUS

EM+EE+PL

HA/HF

Yes

HWE

Provides posterior

probabilities for

assigned haplotypes

Requires Matlab

100 loci,

biallelic

MATLAB

on PC/

UNIX

[55, 83]

     

Compares

haplotype

frequencies

between groups,

adjusts for

covariates

EM issues

   
     

Utilises pedigree

data, if available

    

LDSUPPORT

EM

HA/HF

Yes

HWE

Provides posterior

probabilities for

assigned haplotypes

EM issues

*

UNIX

[29, 112]

     

Identifies LD

blocks for

haplotype

reconstruction

    
     

Examines association with disease,

automation speeds

process

    

LOGINSERM

ESTIHAPLO

EM

HA/HF

Yes

HWE

Program uses ML

method to infer

haplotypes for individuals with missing

data

EM issues

Practical

limit, biallelic/multiallelic

PC/

UNIX

[80]

     

Offers option to

exclude individuals

with missing data

    

MLHAPFRE

EM

HF

Yes

HWE

Performance improves with presence of LD

Incorporated into Arlequin

16 loci, biallelic

JRE on Mac/PC/UNIX

[48]

     

Performs well with

large sample size

EM issues

   

MLOCUS

EM

HA/HF

Yes

HWE

Provides posterior probabilities for assigned haplotypes

EM issues

11 loci, biallelic/multiallelic

PC

[46, 113]

     

Notes observed vs.

inferred haplotypes

    
     

Calculates pairwise LD

    

OSLEM

EM

Yes

No

HWE

Modified EM algorithm that runs 2 × faster

EM issues

Practical limit, biallelic

Web-based

[114]

PL-EM

EM+PL

HA/HF

Yes

HWE

Combines PL with EM

EM issues

100 s, practical limit, biallelic

PC/UNIX

[54]

     

EM-based version

of HAPLOTYPER

    
     

Calculates variance

of haplotype frequency estimates

    

SAS Genetics

EM

HA/HF

Yes

HWE

Provides posterior probabilities for assigned haplotypes

Requires SAS

Practical limit, biallelic/multiallelic

SAS on PC/UNIX

[115]

     

Incorporates

statistical tests and

procedures

EM issues

   

SNPEM

EM

HF

No

HWE

Estimates

haplotype

frequency by

population

EM issues

10 loci, biallelic

UNIX

[10]

     

Compares global

and specific

haplotype between

2 groups

    

SNPHAP

EM

HA/HF

Yes

HWE

Uses posterior and

prior trimming to

handle large

number loci

EM issues

Practical limit,

biallelic

UNIX

[52]

     

Provides posterior

probabilities for

assigned haplotypes

    

THESIAS

S-EM

HF

Yes

HWE

Stochastic EM

avoids issues of

standard EM

programs

S-EM algorithm

needs to be

compared with

standard EM

methods

Practical limit,

20 loci,

biallelic

PC/UNIX

[53, 88]

     

Includes tests for

haplotype-phenotype association

    
     

Accommodates

large sample sizes

    

WHAP

EM

†

†

†

Uses haplotype

output from

SNPHAP for

association testing

EM issues

†

PC/UNIX

[116]

     

Allows weighted

association

analysis

Requires

separate

haplotyping

program

   

Zaykin et al.

EM

HF

No

HWE

Program on

analysis of

haplotype-phenotype

association

EM issues

Practical

limit, biallelic/multiallelic

PC/UNIX

[82]

      

Subjects with

missing data

ignored

   

Zou and Zhao

MLE/EM

HF

Yes

HWE

Adjust haplotype

frequency estimates for genotyping error

Assumes genotyping errors are

random

Practical limits,

biallelic/multiallelic

*

[68]

     

Program also

works for nuclear

families

Assumes error

rates are known

   

3locus.PAS

EM

HF

Yes

HWE

Handles some

missing data

EM issues

3 loci, biallelic/

multiallelic

PC/UNIX

[46]

     

Various tests

available

    
     

Improves with

increasing sample

size

    

2. Simple Bayesian

         

HAPLOTYPER

MC+PL

HA/HF

Yes

HWE

Uses PL algorithm

to construct haplotypes with many

loci

Long run times

256 max,

biallelic

UNIX

[57]

     

Provides posterior

probabilities for

assigned haplotypes

Posterior

probabilities may

be difficult to

interpret

   

HAPLOREC

MC-VL

HA/HF

Yes

HWE

Uses variable

length chain based

on maximising LD

Restarts avoid

non-global

optimum

Practical

limit,

biallelic

Java

virtual

machine,

v1.4

or newer

[62]

     

Handles large

number loci

    

3. Coalescent-based Bayesian e

        

Arlequin v3.0

ELB

HA/HF

No

Adaptive

window

Includes numerous

population genetics

analyses

Long run times

1,000 s, biallelic/

multiallelic

JRE on

LINUX/

PC/Mac

[60, 89]

     

Handles recombination

    

PHASE v2.0

MCMC+PL

HA/HF

Yes

Coalescent/

HWE

Improved run time

Departure for

coalescent model

may impact performance

Practical limit,

biallelic/

multiallelic

PC/MAC/

UNIX

[59]

     

Compares haplotype frequency

between groups

Posterior

probabilities may

be difficult to

interpret

   
     

Handles

recombination

    
     

Provides posterior

probabilities for

assigned haplotypes

    

PHASE v1.0

MCMC

HA/HF

No

Coalescent/

HWE

Incorporates

pop-genetics and

coalescence ideas

Departures for

coalescent model

may impact performance

Practical limit,

biallelic/

multiallelic

UNIX

[51]

     

Incorporates

known phase and

trios pedigrees into

analysis

Slow run times

   
     

Provides posterior

probabilities for

assigned haplotypes

Posterior probabilities

may be difficult

to interpret

   

SLHAP v1.0

MCMC

HA/HF

Yes

Neutral

coalescent/

HWE

Similar to PHASE

v1.0

Departures for

coalescent model

may impact performance

Practical limit,

biallelic/multiallelic

UNIX

[58]

     

Missing data

    
     

Improved run time

    
  1. a Program haplotype output, individual assignment, frequency estimates or both.
  2. b Ability of program to accept missing data.
  3. c Program assumptions.
  4. d List of references.
  5. e Programs in this section make assumptions based on or draw inference from coalescent model.
  6. *Could not determine from available data.
  7. †See incorporated programs for features and limitations.
  8. EE: Estimating equation; ECM: Expectation conditional maximisation algorithm; ELB: Excoffier-Laval-Balding algorithm, Bayesian; EM: Expectation maximisation algorithm; EM issues: May be sensitive to HWE departures, long run times, and non-global max (requiring multiple restarts); HF: Haplotype frequency estimate; HA: Individual haplotype assignment; HWE: Hardy-Weinberg equilibrium; IP: Imperfect phylogeny-based method; JRE: Java runtime environment; LD: Linkage disequilibrium; MAC: Program runs on Apple computer; MC: Monte Carlo algorithm, Bayesian algorithm; MCMC: Markov Chain Monte Carlo algorithm, Bayesian algorithm; MC-VL: Monte Carlo-variable length chain algorithm, Bayesian Algorithm; MLE: Maximum likelihood estimation algorithm; PC: IBM compatible personal computer; PL: Partition ligation algorithm; PP: Perfect phylogeny-based method; Practical Limit: program has no upper limit on number of markers and/or subjects, however computational and practical considerations limit this value; S-EM: Stochastic EM algorithm; UNIX: Runs on Unix operating system, including Linux, FORTRAN, Solaris and others.