Skip to main content

Table 1 Characteristics of the protein variant effect prediction tools assessed in this study. The table indicates their scoring ranges and thresholds, training data, summary information about features and, where applicable, machine learning method

From: Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics

Prediction tool Score range Deleterious score cutoff Training data Features Machine learning method
GERP++ −12.0 to 6.17 >0.047 None Infers conserved or constrained elements from 33 mammalian genomes
fitCons 0 to 1 >0.4 None Functional genomics data mainly sourced from chromatin analysis, e.g. ChIP-seq, and evolutionary conservation data
SIFT 1 to 0 <0.05 None Conservation data (MSA of homologous sequences) and transformed into normalised probability matrix
PolyPhen 0 to 1 >0.5 HumVar, HumDiv Conservation data (MSA of homologous sequences), protein functional domain data and protein structural features Naïve Bayes classifier
CADD 0 to 35+ >15 Simulated, Swissvar, HumVar Integrates several annotations into a single score, e.g. SIFT, GERP++, PolyPhen, CPG distance, GC content SVM
Condel 0 to 1 >0.5   Builds a unified classification by integration output from a collection of tools, e.g. SIFT, PolyPhen Weighted average normalised scores
REVEL 0 to 1 >0.5 HGMD, EPS HGMD and rare EPS variants used for training Random forest
fathmm 0 to 1 >0.45 HGMD, Swiss-Prot Combines evolutionary conservation with disease-specific protein weights for intolerance to mutation Hidden Markov models