Skip to main content

Table 4 Performance analysis of six k-spectrum-based error correctors as evaluated using six synthetic Illumina datasets

From: A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

Dataset

Method

TP

FP

FN

Recall

Gain

Precision

F-score

EC-1

Reptile

2335361

144751

451889

0.8378

0.7859

0.9416

0.8867

36 bp

Lighter

2695425

72843

91825

0.9671

0.9409

0.9737

0.9704

70×

Bless

2624659

48342

56279

0.9790

0.9610

0.9819

0.9805

k = 19

Bloocoo

2411701

22259

375549

0.8653

0.8573

0.9908

0.9238

Musket

2701885

61096

85365

0.9694

0.9474

0.9779

0.9736

Trowel

1246340

705438

1539825

0.4473

0.1941

0.6386

0.5261

EC-2

Reptile

681551

140039

114910

0.8557

0.6799

0.8296

0.8424

36 bp

Lighter

108241

58579

688220

0.1359

0.0624

0.6488

0.2247

20×

Bless

779824

18095

16637

0.9791

0.9564

0.9773

0.9782

k = 17

Bloocoo

689322

6454

107139

0.8655

0.8574

0.9907

0.9239

Musket

767087

18182

29374

0.9631

0.9403

0.9768

0.9699

Trowel

434885

19167

361576

0.5460

0.5220

0.9578

0.6955

EC-3

Reptile

105

461

876053

0.0001

-0.0004

0.1855

0.0002

100 bp

Lighter

858125

2446

18033

0.9794

0.9766

0.9972

0.9882

20×

Bless

746

872860

875412

0.0008

-0.9954

0.0009

0.0009

k = 24

Bloocoo

79790

3644539

796368

0.0911

-4.0686

0.0214

0.0347

Musket

873592

1645

2566

0.9971

0.9952

0.9981

0.9976

Trowel

155

178354

876003

0.0002

-0.2034

0.0009

0.0003

BC-1

Reptile

382043

22303

16602

0.9584

0.9024

0.9448

0.9515

56 bp

Lighter

331759

15470

141618

0.7008

0.6682

0.9554

0.8086

50×

Bless

429017

34018

11943

0.9729

0.8958

0.9265

0.9492

k = 27

Bloocoo

410156

24127

63221

0.8664

0.8155

0.9444

0.9038

Musket

355015

47460

118362

0.7500

0.6497

0.8821

0.8107

Trowel

55277

4976

26744

0.6739

0.6133

0.9174

0.7770

BC-2

Reptile

497425

116

208081

0.7051

0.7049

0.9998

0.8269

100 bp

Lighter

698089

159

7417

0.9895

0.9893

0.9998

0.9946

120×

Bless

k = 31

Bloocoo

27409

1278837

678097

0.0389

-1.7738

0.0210

0.0272

Musket

703882

68

1624

0.9977

0.9976

0.9999

0.9988

Trowel

652845

108

52661

0.9254

0.9252

0.9998

0.9612

DM

Reptile

11702183

187733

517322

0.9577

0.9423

0.9842

0.9708

100 bp

Lighter

42

23055867

12224293

0.0000

-1.8861

0.0000

0.0000

10×

Bless

11122683

126388

1101652

0.9099

0.8995

0.9888

0.9477

k = 21

Bloocoo

Musket

11550483

163838

673852

0.9449

0.9315

0.9860

0.9650

Trowel

1197127

384403

11027208

0.0979

0.0665

0.7569

0.1734

  1. In the first column, dataset ID, read length, genome coverage, and the optimal k estimated using KmerGenie are shown. The values in TP, FP, and FN columns are numbers of bases. Italicized values denote the best performer with regard to a specific evaluation measure for a dataset. The symbol “–” indicates that a method failed to process a specific dataset