- Software review
- Open Access
Software for quantitative trait analysis
Human Genomics volume 2, Article number: 191 (2005)
This paper provides a brief overview of software currently available for the genetic analysis of quantitative traits in humans. Programs that implement variance components, Markov Chain Monte Carlo (MCMC), Haseman-Elston (H-E) and penetrance model-based linkage analyses are discussed, as are programs for measured genotype association analyses and quantitative trait transmission disequilibrium tests. The software compared includes LINKAGE, FASTLINK, PAP, SOLAR, SEGPATH, ACT, Mx, MERLIN, GENEHUNTER, Loki, Mendel, SAGE, QTDT and FBAT. Where possible, the paper provides URLs for acquiring these programs through the internet, details of the platforms for which the software is available and the types of analyses performed.
Localisation and characterisation of quantitative trait loci (QTLs) and causal polymorphisms influencing complex phenotypes are of major importance in statistical genetic analyses. Important steps in this process are linkage analyses of QTLs and association studies correlating phenotypes and genotypes. These investigations have been greatly facilitated by the development of a variety of computer software allowing for the fast and efficient analysis of quantitative traits.
This paper surveys software programs currently available for quantitative trait analysis. Given the rapid development of new programs, as well as the inevitable obsolescence of others, the focus here is necessarily limited to the software most widely used in the analysis of quantitative traits in humans. Also beyond the scope of this review are programs commonly used in studies of non-human model organisms and in species of agricultural importance, but designed primarily for the analysis of inbred, F2, half-sib and other specialised pedigrees.
The text of this paper has been organised by method, and is broadly divided into linkage methods and association methods. The linkage methods are further divided into parametric model-based, variance components and Hasemen-Elston (H-E) methods. The association methods are categorised into measured genotype and transmission disequilibrium-based methods. Table 1 lists all of the software discussed in this paper and provides details regarding the methods implemented, the platforms for which software is available and a link to an online site for the program. The text provides a way to survey the available offerings by methodological approach, and Table 1 a way to search by program name. All of the programs discussed are freely available over the internet, with the exception of SAGE, for which there is a charge.
The most commonly used methods for quantitative trait linkage analysis in humans are the variance components and H-E approaches. It is also possible to use parametric, model-based methods for quantitative trait linkage analysis. These require the specification of allele frequencies at the trait locus and genotype-specific trait means. The LINKAGE, FASTLINK, Mendel, PAP and SAGE packages can be used for model-based linkage and joint segregation-linkage analysis of quantitative traits. As with analyses of discrete disease traits using these programs, large and complex family structures are easily accommodated, but computing time for multipoint linkage analyses is exponential with the number of genotyped markers analysed, due to the use of the Elston-Stewart algorithm . A new Java-based jPAP is now available, although some of the functions of the original PAP have yet to be implemented in it.
At their simplest, variance components approaches model the covariance among family members as a function of unspecified aggregate additive genetic effects, effects due to a hypothetical gene in the region being tested for linkage, and a residual component that is uncorrelated among individuals and is sometimes described as an environmental component [1, 38, 40]. Most variance component programs use maximum likelihood methods to estimate these components of variance. It is possible to add numerous complexities onto variance component linkage models, including dominance genetic effects, epistatic interactions, gene-environment interactions, shared environment correlations, spouse correlations, corrections for non-normality of the trait distribution, estimation of empirical p-values, estimation of linkage power for a given study and multivariate models. Different programs provide automated routines for different subsets of these variance component extensions. GENEHUNTER automates the inclusion of dominance components. SOLAR has an epistasis option in its oligogenic multipoint linkage routine. SEGPATH makes it very easy to include a spouse correlation. SEGPATH, SOLAR, Mendel and ACT allow multivariate linkage analysis, in which the correlations between genetic and environmental components for multiple traits can be estimated. Mx has specialised routines for the analysis of twin data -- its original function -- although it has now been expanded to accommodate nuclear families.
The MCMC-based approach implemented in Loki also estimates the variance due to a QTL, but adds the number of QTLs influencing the trait and their allele frequencies . This model is easily expanded to incorporate dominance effects, epistatic and gene-environment interactions and other, more complex models. Whereas most QTL linkage routines provide an LOD (logarithm of odds) score as a measure of the evidence in favour of a trait-influencing locus in the region being tested, Loki reports a posterior probability of there being a QTL in the region.
One of the greatest differences between specific implementations of the variance component linkage method is the source of the multipoint identity by descent (IBD) matrices that are used to estimate QTL-specific variance in a linkage analysis. GENEHUNTER and MERLIN use a Lander-Green algorithm , for which computing time becames exponential with the numbers of non-founders in a pedigree. Generally, families larger than 20 or 25 individuals cannot be analysed in these programs without breaking up the pedigrees into smaller units. By using a sparse binary tree, as opposed to a full binary tree, MERLIN is able to accommodate larger families than GENEHUNTER. Mendel uses either the Elston-Stewart or the Lander-Green algorithm, depending on pedigree size. SOLAR uses an extension of the Fulker-Cardon interval approach to estimating multipoint IBDs [38, 42]. This allows both pedigrees of unlimited complexity and an unlimited number of genotyped markers. Whereas this approximation performs well for markers that are individually quite informative (such as short tandem repeats), however, it is not suitable for marker sets in which the markers are individually less informative (such as single nucleotide polymorphisms). Markov Chain Mounte Carlo (MCMC) methods are used to estimate IBDs in Loki. These methods are also approximate, but are more precise than the Fulker-Cardon interval approach. Computation time for MCMC IBD estimation is linear in both the number of markers and the size of pedigree, making it suitable for use with large complex pedigrees and unlimited numbers of markers. It is more computationally intensive than the interval approach, however, and may require weeks of computing time in the case of pedigrees of 100 individuals or more. A number of programs (SOLAR, ACT, SEGPATH and SAGE) are set up to use IBD matrices that are generated once per study and then stored, making it possible to import IBD matrices from a variety of sources if they are converted to the proper program-specific format.
Because a model with no QTL effects, with all genetic effects in an unspecified aggregate genetic component, forms the basis of comparison for the likelihood ratio test of linkage, most variance components programs also provide an overall estimate of the trait heritability.
The H-E linkage method, at its most basic, models the squared difference in siblings' trait values as a function of their IBD allele sharing at a particular chromosomal location . There have been a wide variety of extensions to the general H-E method. The 'revisited' H-E uses the mean corrected cross-product of the siblings' trait values . This was found to be less powerful than the original H-E in some cases , which led to the development of a variety of 'weighted' H-E tests using functions of the original and revisited H-E [46, 47]. Most recently, the H-E model has been extended to model the full variance-covariance matrix within a family . This latest version of the H-E is very similar to a variance component approach, the primary difference being that the various components are generally estimated by regression rather than maximum likelihood. Regression approaches should be computationally more efficient, whereas maximum likelihood approaches are, in theory, more powerful, although this difference is likely to be negligible in practice. Regression-based approaches may also be more robust to non-normality of the trait distribution. As with a variance components approach, the latest version of the H-E, in which the full variance-covariance matrix is modelled, is easily extended to include epistatic interactions, gene-environment interactions and so on. The original H-E linkage approach is implemented in SAGE and GENEHUNTER. The latest expansion of the H-E is also available in SAGE. The MERLIN REGRESS routine implements an H-E extension developed by Sham et al.  that uses squared trait sums and differences between relative pairs.
The commonly used association methods for quantitative traits generally fall into two main categories: measured genotype approaches and transmission disequilibrium approaches. The measured genotype approach [40, 50] is a fixed-effects model in which genotype-specific trait means are estimated. An additive model, in which the heterozygote trait mean is constrained to be halfway between the means of the two homozygotes, provides a single degree of freedom test. This can be implemented through a covariate that takes the values of -1 and +1 for opposing homozygotes and 0 for heterozygotes. Thus, any quantitative trait program that permits the use of covariates can be used to test a measured genotype model. These include PAP, ACT, SEGPATH, SOLAR, SAGE, Mendel and MERLIN. Non-additive models are also easily investigated through the use of different codings of the covariate. Similar analyses could be carried out with any regression program, of course, but the packages listed above have the advantage of dealing with non-independence among family members. Any standard regression routine in a statistics package would be appropriate for measured genotype analyses in unrelated individuals but would provide incorrect p values in the case of family data. In general, linkage programs that permit the use of covariates can be used to perform linkage analyses conditional on measured genotype as a way of testing the contribution of an associated marker to a linkage signal .
Note that measured genotype analyses are susceptible to population stratification. That is, if there are subgroups in the data that have different trait means, any marker that differs in allele frequency between those subgroups may show association, regardless of whether or not it is in linkage disequilibrium with a QTL. Such population substructure can be detected through analyses of unlinked markers by programs such as Structure .
Transmission disequilibrium-based tests (TDT) were originally developed to provide a test for discrete trait association in the presence of linkage that was robust to population stratification [53, 54]. The TDT has been expanded to accommodate quantitative traits in a variety of ways [55–57]. Essentially, the quantitative trait TDT methods test whether the trait mean in offspring differs according to whether a particular allele was or was not transmitted by a parent heterozygous for that allele. The various methods differ by whether they require assumptions regarding the trait distribution and by the size of the families they can accommodate, from parent-child trios to arbitrary pedigrees. TDT analyses in FBAT require no assumptions regarding the trait distribution and are performed in nuclear families. Larger pedigrees can be used, but they will be decomposed into nuclear families for analysis and an empirical estimate of the variance of the test statistic will be used to account for the non-independence between nuclear families. The program QTDT  implements a variety of quantitative trait TDT tests, including those described by Allison  and by Rabinowitz . SOLAR has an extended pedigree-compatible TDT. The gamete competition model  is a generalisation of the quantitative trait TDT that is applicable to general pedigrees, implemented in Mendel.
No review of this type can be completely comprehensive. The programs outlined above all have additional capabilities and unique subroutines that could not be detailed here. Many of these programs are still under active development, with new features being added all the time. Hopefully, this paper will have provided enough information about each program that the reader will be able to discern which ones are appropriate for their needs and merit further investigation through the internet links and references provided in Table 1. Of course, Table 1 itself is not a complete catalogue of the available software, although the most widely used packages have been included. There are many more quantitative trait analysis programs than can be feasibly discussed within a single brief paper, and there are new programs appearing constantly. There are several websites that maintain general lists of genetic analysis software; perhaps the most comprehensive of these is the genetic analysis software list started by Wentian Li at Columbia University which is now maintained at: http://www.nslij-genetics.org/soft/ with a mirror site at: http://linkage.rockefeller.edu/soft.
Amos CI: Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Hum Genet. 1994, 54: 535-543.
Amos CI, Zhu DK, Boerwinkle E: Assessing genetic linkage and association with robust components of variance approaches. Ann Hum Genet. 1996, 60: 143-160. 10.1111/j.1469-1809.1996.tb01184.x.
de Andrade M, Amos CI, Thiel TJ: Methods to estimate genetic components of variance for quantitative traits in family studies. Genet Epidemiol. 1999, 17: 64-76. 10.1002/(SICI)1098-2272(1999)17:1<64::AID-GEPI5>3.0.CO;2-M.
Cottingham RW, Idury RM, Schaffer AA: Faster sequential genetic linkage computations. Am J Hum Genet. 1993, 53: 252-263.
Schaffer AA, Gupta SK, Shriram K, Cottingham RW: Avoiding recomputation in linkage analysis. Hum Hered. 1994, 44: 225-237. 10.1159/000154222.
Schaffer AA: Faster linkage analysis computations for pedigrees with loops or unused alleles. Hum Hered. 1996, 46: 226-235. 10.1159/000154358.
Dwarkadas S, Schaffer AA, Cottingham RW, et al: Parallelization of general-linkage analysis problems. Hum Hered. 1994, 44: 127-141. 10.1159/000154205.
Gupta SK, Schaffer AA, Cox AL, et al: Integrating parallelization strategies for linkage analysis. Comput Biomed Res. 1995, 28: 116-139. 10.1006/cbmr.1995.1009.
Becker A, Bar-Yehuda R, Geiger D: Random algorithms for the loop cutset problem. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30-August 1, 1999. Edited by: Laskey, KB, Prade, H. 1999, Morgan Kaufmann, San Francisco, CA, 49-56.
Becker A, Geiger D, Schaffer AA: Automatic selection of loop breakers for genetic linkage analysis. Hum Hered. 1998, 48: 49-60. 10.1159/000022781.
Rabinowitz D, Laird N: A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered. 2000, 50: 211-223. 10.1159/000022918.
Laird NM, Horvath S, Xu X: Implementing a unified approach to family-based tests of association. Genet Epidemiol. 2000, 19 (Suppl 1): S36-S42.
Horvath S, Xu X, Lake SL, et al: Family-based tests for associating haplotypes with general phenotype data: Application to asthma genetics. Genet Epidemiol. 2004, 26: 61-69. 10.1002/gepi.10295.
Horvath S, Xu X, Laird NM: The family based association test method: Strategies for studying general genotype-phenotype associations. Eur J Hum Genet. 2001, 9: 301-306. 10.1038/sj.ejhg.5200625.
Lake SL, Blacker D, Laird NM: Family-based tests of association in the presence of linkage. Am J Hum Genet. 2000, 67: 1515-1525. 10.1086/316895.
Lange C, Silverman EK, Xu X, et al: A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics. 2003, 4: 195-206. 10.1093/biostatistics/4.2.195.
Lunetta KL, Faraone SV, Biederman J, Laird NM: Family-based tests of association and linkage that use unaffected sibs, covariates, and interactions. Am J Hum Genet. 2000, 66: 605-614. 10.1086/302782.
Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: A unified multipoint approach. Am J Hum Genet. 1996, 58: 1347-1363.
Kruglyak L, Lander ES: Faster multipoint linkage analysis using Fourier transforms. J Comput Biol. 1998, 5: 1-7. 10.1089/cmb.1998.5.1.
Lathrop GM, Lalouel JM, Julier C, Ott J: Strategies for multilocus linkage analysis in humans. Proc Natl Acad Sci USA. 1984, 81: 3443-3446. 10.1073/pnas.81.11.3443.
Lathrop GM, Lalouel JM: Easy calculations of lod scores and genetic risks on small computers. Am J Hum Genet. 1984, 36: 460-465.
Lathrop GM, Lalouel JM, White RL: Construction of human linkage maps: Likelihood calculations for multilocus linkage analysis. Genet Epidemiol. 1986, 3: 39-52. 10.1002/gepi.1370030105.
Heath SC: Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet. 1997, 61: 748-760. 10.1086/515506.
Lange K, Weeks D, Boehnke M: Programs for pedigree analysis: MENDEL, FISHER, and dGENE. Genet Epidemiol. 1988, 5: 471-472. 10.1002/gepi.1370050611.
Stringham HM, Boehnke M: Identifying marker typing incompatibilities in linkage analysis. Am J Hum Genet. 1996, 59: 946-950.
Lange K, Cantor R, Horvath S, et al: Mendel version 4.0: A complete package for the exact genetic analysis of discrete traits in pedigree and population data sets. Am J Hum Genet. 2001, 69 (Suppl): 504.-
Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin -- Rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.
Neale MC: The use of Mx for association and linkage analysis. GeneScreen. 2000, 1: 107-111. 10.1046/j.1466-9218.2000.00032.x.
Neale MC, Boker SM, Xie G, Maes HH: Mx: Statistical Modeling. 2003, Department of Psychiatry, VCU, Richmond, VA, 6
Posthuma D, de Geus EJ, Boomsma DI, Neale MC: Combined linkage and association tests in Mx. Behav Genet. 2004, 34: 179-196.
Hasstedt SJ: Pedigree Analysis Package. 2002, Department of Genetics, University of Utah, Salt Lake City, UT, 5
Hasstedt SJ: jPAP -- Pedigree Analysis Package for Java. 2004, Department of Genetics, University of Utah, Salt Lake City, UT
Abecasis GR, Cardon LR, Cookson WO: A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000, 66: 279-292. 10.1086/302698.
Abecasis GR, Cookson WO, Cardon LR: Pedigree tests of transmission disequilibrium. Eur J Hum Genet. 2000, 8: 545-551. 10.1038/sj.ejhg.5200494.
S.A.G.E: Statistical Analysis for Genetic Epidemiology -- computer program package available from Statistical Solutions Ltd, Cork, Ireland. 2004
Province MA, Rao DC: General purpose model and a computer program for combined segregation and path analysis (SEGPATH): Automatically creating computer programs from symbolic language model specifications. Genet Epidemiol. 1995, 12: 203-219. 10.1002/gepi.1370120208.
Province MA, Rice TK, Borecki IB, et al: Multivariate and multilocus variance components method, based on structural relationships to assess quantitative trait linkage via SEGPATH. Genet Epidemiol. 2003, 24: 128-138. 10.1002/gepi.10208.
Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 1198-1211. 10.1086/301844.
Elston RC, Stewart J: A general model for the genetic analysis of pedigree data. Hum Hered. 1971, 21: 523-542. 10.1159/000152448.
Hopper JL, Mathews JD: Extensions to multivariate normal models for pedigree analysis. Ann Hum Genet. 1982, 46: 373-383. 10.1111/j.1469-1809.1982.tb01588.x.
Lander ES, Green P: Construction of multilocus genetic linkage maps in humans. Proc Natl Acad Sci USA. 1987, 84: 2363-2367. 10.1073/pnas.84.8.2363.
Fulker DW, Cardon LR: A sib-pair approach to interval mapping of quantitative trait loci. Am J Hum Genet. 1994, 54: 1092-1103.
Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972, 2: 3-19. 10.1007/BF01066731.
Elston RC, Buxbaum S, Jacobs KB, Olson JM: Haseman and Elston revisited. Genet Epidemiol. 2000, 19: 1-17. 10.1002/1098-2272(200007)19:1<1::AID-GEPI1>3.0.CO;2-E.
Palmer LJ, Jacobs KB, Elston RC: Haseman and Elston revisited: The effects of ascertainment and residual familial correlations on power to detect linkage. Genet Epidemiol. 2000, 19: 456-460. 10.1002/1098-2272(200012)19:4<456::AID-GEPI15>3.0.CO;2-N.
Xu X, Weiss S, Wei LJ: A unified Haseman-Elston method for testing linkage with quantitative traits. Am J Hum Genet. 2000, 67: 1025-1028. 10.1086/303081.
Forrest WF: Weighting improves the new Haseman-Elston method. Hum Hered. 2001, 52: 47-54. 10.1159/000053353.
Wang T, Elston RC: Two-level Haseman-Elston regression for general pedigree data analysis. Genet Epidemiol. 2005, 29: 12-22. 10.1002/gepi.20075.
Sham PC, Purcell S, Cherny SS, Abecasis GR: Powerful regression-based quantitative-trait linkage analysis of general pedigrees. Am J Hum Genet. 2002, 71: 238-253. 10.1086/341560.
Boerwinkle E, Chakraborty R, Sing CF: The use of measured genotype information in the analysis of quantitative phenotypes in man, I. Models and analytical methods. Ann Hum Genet. 1986, 50: 181-194. 10.1111/j.1469-1809.1986.tb01037.x.
Almasy L, Blangero J: Exploring positional candidate genes: Linkage conditional on measured genotype. Behav Genet. 2004, 34: 173-177.
Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.
Falk CT, Rubinstein P: Haplotype relative risks: An easy reliable way to construct a proper control sample for risk calculations. Ann Hum Genet. 1987, 51: 227-233. 10.1111/j.1469-1809.1987.tb00875.x.
Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet. 1993, 52: 506-516.
Allison DB: Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet. 1997, 60: 676-690.
Rabinowitz D: A transmission disequilibrium test for quantitative trait loci. Hum Hered. 1997, 47: 342-350. 10.1159/000154433.
Zhang S, Zhang K, Li J, et al: Test of association for quantitative traits in general pedigrees: The quantitative pedigree disequilibrium test. Genet Epidemiol. 2001, 21 (Suppl 1): S370-S375.
Sinsheimer JS, Blangero J, Lange K: Gamete-competition models. Am J Hum Genet. 2000, 66: 1168-1172. 10.1086/302826.
Preparation of this manuscript was supported in part by NIH grants MH59490, AA08403, HL70751 and MH61622. We are grateful to John Blangero and Thomas Dyer for their helpful comments and suggestions.
About this article
Cite this article
Almasy, L., Warren, D.M. Software for quantitative trait analysis. Hum Genomics 2, 191 (2005). https://doi.org/10.1186/1479-7364-2-3-191
- linkage analysis