A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

Akogwu, Isaac; Wang, Nan; Zhang, Chaoyang; Gong, Ping

doi:10.1186/s40246-016-0068-0

Human Genomics

Table 2 Characteristic features of the six k-spectrum-based methods investigated in the present comparative study which distinguish one method from others

From: A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

Tools	Algorithm highlight	Data structure	Pros	Cons	Quality score	Target error type
Reptile	Explore multiple alternative k-mer decompositions and contextual information of neighboring k-mers for error correction	Hamming graph	Contextual information can help resolve errors without increasing k and lowering local coverage	Uses a single core (non-parallelized)	Used	Substitution Deletion Insertion
Musket	Multi-stage correction: two-sided conservative, one-sided aggressive and voting-based refinement	Bloom filter	Multi-threading based on a master–slave model results in high parallel scalability	A single static coverage cut-off to differentiate trusted k-mers from weak ones	Not used	Substitution
Bless	Count k-mer multiplicity; correct errors using Bloom filter; restore false positives	Bloom filter	High memory efficiency; handle genome repeats better; correct read ends	Cannot automatically determine the optimal k value	Not used	Substitution Deletion Insertion
Bloocoo	Parallelized multi-stage correction algorithm (similar to Musket)	Blocked Bloom filter	Faster and lower memory usage than Musket	Not extensively evaluated	Not used	Substitution
Trowel	Rely on quality values to identify solid k-mers; use two algorithms (DBG and SBE) for error correction	Hash table	Correct erroneous bases and boost base qualities	Only accept FASTQ files as input	Used	Substitution
Lighter	Random sub-fraction sampling; parallelized error correction	Pattern-blocked Bloom filter	No k-mer counting; near constant accuracy and memory usage	A user must specify k-mer length, genome length, and sub-sampling fraction α	Used	Substitution Deletion Insertion

Back to article page

ISSN: 1479-7364

Contact us

General enquiries: journalsubmissions@springernature.com