From: A survey of computational tools for downstream analysis of proteomic and other omic datasets
 | Classification | Clustering | |||||
---|---|---|---|---|---|---|---|
 | PCA | ICA | RF | PLS | SVM | K-means | Hierarchical |
What does it do? | Separates features into groups based on commonality and reports the weight of each component’s contribution to the separation | Separates features into groups by eliminating correlation and reports the weight of each component’s contribution to the separation | Separates features into groups based on commonality; identifies important predictors | Separates features into groups based on maximal covariation and reports the contribution of each variable | Uses a user-specified kernel function to quantify the similarity between any pair of instances and create a classifier | Separates features into clusters of similar expression patterns | Clusters treatment groups, features, or samples into a dendrogram |
By what mechanism? | Orthogonal transformation; transfers a set of correlated variables into a new set of uncorrelated variables | Nonlinear, non-orthogonal transformation; standardizes each variable to a unit variance and zero mean | Uses an ensemble classifier that consists of many decision trees | Multivariate regression | Finds a decision boundary maximizing the distance to nearby positive and negative examples | Compares and groups magnitudes of changes in the means into K clusters where K is defined by the user | Compares all samples using either agglomerative or divisive algorithms with distance and linkage functions |
Strengths | Unsupervised, nonparametric, useful for reducing dimensions before using supervision | Works well when other approaches do not because data are not normally distributed | Robust to outliers and noise; gives useful internal estimates of error; resistant to overtraining | Diverse experiments that have the same features are made comparable; variables can outnumber features | Robust to outliers, gives useful internal estimates of error, can exploit knowledge of the domain if using appropriate kernel functions | Easily visualized and intuitive; greatly reduces complexity; performs well when distance information between data points is important to clustering | Unsupervised; easily visualized and intuitive |
Weaknesses | Number of features must exceed number of treatment groups | Features are assumed to be independent when they actually may be dependent | Does not allow missing data (requires imputation to replace missing values) | Fails to deal with data containing outliers | Selection of an inappropriate kernel yields poor results | Sensitive to initial conditions and specified number of clusters (K) | Does not provide feature contributions; not iterative, therefore, sensitive to cluster distance measures and noise/outliers |
More information | Â | Â | Performance depends on number of trees and varies among experiments | Supervised; requires training and testing; groups pre-defined | Supervised; requires training and testing; many good kernel functions have been described, e.g., based on structural alignment | Tools are available to determine the optimal cluster count (K) | User does not define the number of clusters |
Sample size/data characteristics | Unlimited sample size, data normally distributed | Unlimited sample size; data non-normally distributed | Performs well on small sample size and is resistant to over-fitting | Unlimited sample size; sensitive to outliers | Performs well on small sample size and resistant to over-fitting | Performs best with a limited dataset, i.e., ~20 to 300 features | Performs best with limited dataset, i.e., ~20 to 300 features or samples |