Skip to main content

Table 2 Characteristic features of the six k-spectrum-based methods investigated in the present comparative study which distinguish one method from others

From: A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

Tools

Algorithm highlight

Data structure

Pros

Cons

Quality score

Target error type

Reptile

Explore multiple alternative k-mer decompositions and contextual information of neighboring k-mers for error correction

Hamming graph

Contextual information can help resolve errors without increasing k and lowering local coverage

Uses a single core (non-parallelized)

Used

Substitution

Deletion

Insertion

Musket

Multi-stage correction: two-sided conservative, one-sided aggressive and voting-based refinement

Bloom filter

Multi-threading based on a master–slave model results in high parallel scalability

A single static coverage cut-off to differentiate trusted k-mers from weak ones

Not used

Substitution

Bless

Count k-mer multiplicity; correct errors using Bloom filter; restore false positives

Bloom filter

High memory efficiency; handle genome repeats better; correct read ends

Cannot automatically determine the optimal k value

Not used

Substitution

Deletion

Insertion

Bloocoo

Parallelized multi-stage correction algorithm (similar to Musket)

Blocked Bloom filter

Faster and lower memory usage than Musket

Not extensively evaluated

Not used

Substitution

Trowel

Rely on quality values to identify solid k-mers; use two algorithms (DBG and SBE) for error correction

Hash table

Correct erroneous bases and boost base qualities

Only accept FASTQ files as input

Used

Substitution

Lighter

Random sub-fraction sampling; parallelized error correction

Pattern-blocked Bloom filter

No k-mer counting; near constant accuracy and memory usage

A user must specify k-mer length, genome length, and sub-sampling fraction α

Used

Substitution

Deletion

Insertion