Skip to main content

Table 1 Summary and comparison of classification and clustering methods

From: A survey of computational tools for downstream analysis of proteomic and other omic datasets

 

Classification

Clustering

 

PCA

ICA

RF

PLS

SVM

K-means

Hierarchical

What does it do?

Separates features into groups based on commonality and reports the weight of each component’s contribution to the separation

Separates features into groups by eliminating correlation and reports the weight of each component’s contribution to the separation

Separates features into groups based on commonality; identifies important predictors

Separates features into groups based on maximal covariation and reports the contribution of each variable

Uses a user-specified kernel function to quantify the similarity between any pair of instances and create a classifier

Separates features into clusters of similar expression patterns

Clusters treatment groups, features, or samples into a dendrogram

By what mechanism?

Orthogonal transformation; transfers a set of correlated variables into a new set of uncorrelated variables

Nonlinear, non-orthogonal transformation; standardizes each variable to a unit variance and zero mean

Uses an ensemble classifier that consists of many decision trees

Multivariate regression

Finds a decision boundary maximizing the distance to nearby positive and negative examples

Compares and groups magnitudes of changes in the means into K clusters where K is defined by the user

Compares all samples using either agglomerative or divisive algorithms with distance and linkage functions

Strengths

Unsupervised, nonparametric, useful for reducing dimensions before using supervision

Works well when other approaches do not because data are not normally distributed

Robust to outliers and noise; gives useful internal estimates of error; resistant to overtraining

Diverse experiments that have the same features are made comparable; variables can outnumber features

Robust to outliers, gives useful internal estimates of error, can exploit knowledge of the domain if using appropriate kernel functions

Easily visualized and intuitive; greatly reduces complexity; performs well when distance information between data points is important to clustering

Unsupervised; easily visualized and intuitive

Weaknesses

Number of features must exceed number of treatment groups

Features are assumed to be independent when they actually may be dependent

Does not allow missing data (requires imputation to replace missing values)

Fails to deal with data containing outliers

Selection of an inappropriate kernel yields poor results

Sensitive to initial conditions and specified number of clusters (K)

Does not provide feature contributions; not iterative, therefore, sensitive to cluster distance measures and noise/outliers

More information

  

Performance depends on number of trees and varies among experiments

Supervised; requires training and testing; groups pre-defined

Supervised; requires training and testing; many good kernel functions have been described, e.g., based on structural alignment

Tools are available to determine the optimal cluster count (K)

User does not define the number of clusters

Sample size/data characteristics

Unlimited sample size, data normally distributed

Unlimited sample size; data non-normally distributed

Performs well on small sample size and is resistant to over-fitting

Unlimited sample size; sensitive to outliers

Performs well on small sample size and resistant to over-fitting

Performs best with a limited dataset, i.e., ~20 to 300 features

Performs best with limited dataset, i.e., ~20 to 300 features or samples