Skip to main content

Table 1 Characteristics of the protein variant effect prediction tools assessed in this study. The table indicates their scoring ranges and thresholds, training data, summary information about features and, where applicable, machine learning method

From: Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics

Prediction tool

Score range

Deleterious score cutoff

Training data

Features

Machine learning method

GERP++

−12.0 to 6.17

>0.047

None

Infers conserved or constrained elements from 33 mammalian genomes

–

fitCons

0 to 1

>0.4

None

Functional genomics data mainly sourced from chromatin analysis, e.g. ChIP-seq, and evolutionary conservation data

–

SIFT

1 to 0

<0.05

None

Conservation data (MSA of homologous sequences) and transformed into normalised probability matrix

–

PolyPhen

0 to 1

>0.5

HumVar, HumDiv

Conservation data (MSA of homologous sequences), protein functional domain data and protein structural features

Naïve Bayes classifier

CADD

0 to 35+

>15

Simulated, Swissvar, HumVar

Integrates several annotations into a single score, e.g. SIFT, GERP++, PolyPhen, CPG distance, GC content

SVM

Condel

0 to 1

>0.5

 

Builds a unified classification by integration output from a collection of tools, e.g. SIFT, PolyPhen

Weighted average normalised scores

REVEL

0 to 1

>0.5

HGMD, EPS

HGMD and rare EPS variants used for training

Random forest

fathmm

0 to 1

>0.45

HGMD, Swiss-Prot

Combines evolutionary conservation with disease-specific protein weights for intolerance to mutation

Hidden Markov models