Skip to main content

Table 1 Comparison of ABySS and SOAPdenovo programs

From: State of the art de novoassembly of human genomes from massively parallel sequencing data

Program

  

Computing features

    
 

Language

Peak memory on single node

Time used

Availability

   

ABySS

C++

< 16 Gb

87 h in 8 × 21 CPU cores (much reduced in current version)

Open source

   

SOAPdenovo

C

140 Gb

40 h in 32 core

Free binary

   
   

Algorithm and data structure

  
 

Pre-assembly error correction

de Bruijn graph

de Bruijn graph clean-up

Contiging

Scaffolding

Post-scaffolding process

 

ABySS

N/A

Distributed; can handle colour-space reads

Blunt-end, bubbles and tiny repeats

Removes and reports ambiguous edges

Implicitly, branch-bound search to tackle repeats

N/A

 

SOAPdenovo

Integrated

Single; nucleotide reads only

Blunt-end, bubbles, low coverage links and tiny repeats

Cuts ambiguous edges at boundaries

Explicitly, mask repeats

Gap closure by local assembly

 
   

Genome assembly paramters*

  
 

Contig N50 (bp)

Sum length of contigs

Scaftig N50 (bp)

Sum Length of scaftigs

Scaffold N50 (bp)

Sum length of scaffolds

Genome coverage

ABySS

860

2.1 Gb

1,499

2.18 Gb

Not reported

Not reported

68%

SOAPdenovo

886

2.1 Gb

> 4,000

2.37 Gb

>4,000

2.38 Gb

85%

  1. *: Measurement is based on 40 × 210 bp paired-end Illumina GA reads of HapMap NA18507 individual published in 2008 by Dr Bentley et al.