Skip to main content

Table 2 The statistics of confusion matrixes of the final deployed RF model for both internal and external cross-validation

From: Pharmacovariome scanning using whole pharmacogene resequencing coupled with deep computational analysis and machine learning for clinical pharmacogenomics

 

Internal cross-validation

External cross-validation

Accuracy

0.9808

0.9512

95% CI

(0.8974, 0.9995)

(0.8347, 0.994)

No information rate

0.6346

0.6341

P-value (ACC > INR)

1.664e-09

2.309e-06

Kappa

0.9581

0.8918

McNemar’s test P-value

1.0000

0.4795

Sensitivity

1.0000

0.8667

Specificity

0.9474

1.0000

Pos pred value

0.9706

1.0000

Neg pred value

1.0000

0.9286

Precision

0.9705882

1.0000000

Recall

1.0000000

0.8666667

F1

0.9850746

0.9285714

Prevalence

0.6346

0.3659

Detection rate

0.6346

0.3171

Detection prevalence

0.6538

0.3171

Balanced accuracy

0.9737

0.9333

Area under the curve (AUC)

0.9736842

0.5384848

  1. Note that while the accuracies in both types of validation are quite high, the overfitting to the training data within internal cross-validation resulted in an unreal AUC. On the other hand, increasing the sample size with external cross-validation displayed more “close to real” performance of the RF model for small cohorts
  2. ‘Positive’ Class Patients with ADRs, AUC area under the curve, RF random forest