Skip to main content

Table 2 Common statistical methods and tests used in epidemiology, genetics, and metabolomics, with reference link to descriptive articles on appropriate general use

From: Beyond genomics: understanding exposotypes through metabolomics

Class of test Type of test Application/description Refs
Descriptive Mean
Median
Mode
The simplest of tests used to describe basic features within data. Covered in all general statistical textbooks and used in most if not all scientific disciplines.
[67,68,69]
Range, variance, SD Describe spreads of data within a population
Inferential z test, t test, chi-square Predicts/infers an observed mean, frequency, or proportion to a predetermined value, respectively.
ANOVA Parametric method that tests the hypothesis that the means of two or more populations are equal. Frequently used to compare variance among groups relative to variance within groups
Kruskal-Wallis Non-parametric method to rank statistical significant differences between two or more groups of an independent variable on a continuous/ordinal variable
Scaling Centering, auto, pareto, log, MD Data pretreatment methods aim at reducing biological and analytical bias [70, 71]
Principal component PCA Unsupervised dimensional reduction procedure used to explain the maximum variance within complex datasets. [72,73,74]
Multiblock PCA PCA extension designed to find the underlying relationships between sets of related data [65, 66, 75]
ANOVA-PCA Uses PC dimensional reduction to determines the effect of the experimental factors on multiple dependent variables [65, 76]
PC-DFA Supervised test that summarizes the differentiation between groups while overlooking within-group variation. [65, 77, 78]
Regression Linear Summarizes and quantifies the relationship between two continuous variables [72, 79]
  PLS Used to predict a set of dependent variables from a large set of independent variables [73, 77, 80,81,82]
O-PLS orthogonal signal correction on PLS that maximizes the explained covariance on the first latent variable [77, 81, 83]
PLS-R Combination of the predictive power of regression alongside the ability to deal with high dimensionality and multicollinearity of variables. [77, 84]
PLS-DA Supervised approach to prediction on discrete variables [77, 79, 83]
LASSO Parsimonious approach to variable selection and regularization in order to enhance interpretability and reduce noise [79, 80, 85,86,87]
Elastic net Variable reduction approach where strongly correlated predictors coalesce in or out of the model together [79, 80, 85, 87, 167]
  1. Definitions: SD standard deviation, MD median, PCA principal component analysis, ANOVA analysis of variance, PC-DFA principal component discriminant function analysis, PLS partial least squares (also known as projection of latent structures), O-PLS orthogonal PLS, PLS-R PLS regression, LASSO least absolute shrinkage and selection operator