A survey of current software for genetic power calculations
 Jo Knight^{1}Email author
Received: 2 March 2004
Accepted: 2 March 2004
Published: 2 March 2004
Abstract
Estimation of power is a key step in any study. This review briefly outlines the factors that affect power and the two main approaches for estimating it. There are a number of webbased tools and programs freely available to enable geneticists to perform power calculations, and the specifics of some of these are discussed here.
Keywords
power softwareIntroduction
The power of a study is the probability that it will detect an effect of a given size, and is therefore a subject of great importance. It is related to the magnitude of the effect, the sample size and the chosen level of statistical significance (ie the probability of a falsepositive result). Ideally, calculations are carried out in the early stages of planning, in order to establish the number of people required.
In genetic studies, power is estimated either by asymptotic approaches or by undertaking simulations. The former involves employing closed equations, whereas the latter requires the creation of thousands of datasets with the same parameters as the population being studied. (The proportion of simulated sets yielding positive analysis results gives an estimate of the power.) Simulation can be a more accurate approach than the use of closed equations if the investigator is able to use the correct parameters. As the parameters required (eg the frequency of the causative variant) are often unknown, however, this is by no means an inconsequential task. Furthermore, simulation approaches are usually more computer intensive and time consuming. Both approaches are required because of the diversity of calculations performed in the context of genetic studies. Where asymptotic methods have not been established, or for some reason are not considered sufficient, simulation can be used.
Despite the complexities, a variety of tools have been designed which allow investigators to estimate power using closed equations and/or to simulate a wide range of datasets. The purpose of this paper is to outline a number of these freely available programs and webbased utilities. Box 1 provides a summary of the tools, highlighting the nature of each utility, where they can be downloaded from and brief information about what they can do.
The range of types of software available is the first thing to note. As well as standalone programs, there are webbased tools and downloadable Excel spreadsheets. Some are designed to perform simulations and some to calculate power from closed equations, others perform both tasks in addition to data analysis.
The software
SIMLINK and SLINK, written in 1990 and 1991, respectively, are the tools that have been available for the longest time [1–4]. Both are standalone programs and allow the user to carry out simulation studies on pedigrees to establish power for parametric linkage analysis; hence, they require the same information about the trait under study that is requisite to such analysis. Since the development of these tools, a number of other simulation programs, with different requirements, have been written; for example, ASP, SIMLA, SIMNUC and GASP. Such programs can be used to assess the power of nonparametric linkage studies. In addition, the closed equations derived in 1990 by Risch [5] to calculate power for studies of affected siblings are programmed into a spreadsheet called POWTEST, available from Dave Curtis's website.
Closed equations for the detection of both linkage and association using variance components analysis have been encoded in the Genetic Power Calculator (GPC) [6]. Furthermore, GPC has an option to estimate the contribution to the test statistic of each sibship using trait data [7]. This allows ranking of sibships and hence provides a way of prioritising genotyping. An extension of this method is implemented in MerlinRegress, where the expected LOD scores can be calculated for general pedigrees [8]. MerlinRegress is also able to perform regressionbased analysis for quantitative traits in phenotypically selected samples.
Box 1. Summary of available tools
ASP, SIMLA, SIMLINK, SIMNUC, SLINK

Downloadable programs

Simulation of pedigrees

Respective website addresses:
Genetic Power Calculator (GPC)

Webbased utility

Closed equations for linkage and association of qualitative or quantitative traits in the variance components framework; power for individual sibships with trait data and casecontrol; and TDT for binary traits and thresholdselected traits
MerlinRegress

Downloadable program

Closed equations for expected LOD scores based on regression approaches
Power for Association With Errors (PAWE)

Webbased utility

Closed equations for casecontrol association with errors
PBAT

Downloadable program

Closed equation, simulation and analysis for familybased association studies
POWER

Downloadable program

Closed equations for studies of interactions
POWTEST

Excel spreadsheet

Closed equations for TDT and linkage with affected sib pairs
QUANTO

Downloadable program

Closed equations for studies of interactions
TDT calculator

Downloadable program

Closed equation and simulation for familybased association studies
UCLA stat calculator

Webbased utility

Closed equation for case control association (as well as closed equations for other nongenetic study types)

http://calculators.stat.ucla.edu/powercalc/binomial/casecontrol/bcasecontrolpower.php
TDT, transmission disequilbrium test
The GPC is perhaps the utility capable of performing the widest range of power calculations. In addition to the utilities already mentioned, it can also be used to calculate power for transmission disequilibrium tests (TDTs) of binary traits and TDTand casecontrol studies of thresholdselected quantitative traits. Calculating power for these tests in the GPC is advantageous, as the GPC takes linkage disequilibrium between the gene and the marker under study into account. This webbased utility calculates power from the information provided by the user and produces output that is concise and useful. Accompanying notes relate mainly to usage rather than theory, and direct the user to papers in which the latter is explained.
Familybased association studies are frequently used for gene mapping. Extensions of TDT allow for analysis of quantitative as well as dichotomous traits; inclusion of families with missing parents; and joint analysis of different types of families (eg single affected/multiple affected and discordant siblings). PBAT [9, 10] and the TDT [11] calculator allow the user to perform closedform calculations and simulation for such studies. The closed equations are slightly different. In the paper that outlines the theory behind PBAT, the authors suggest their approach is more accurate than that of Chen, as it calculates the power of the actual test statistic whereas Chen computes the power of the expected statistic [10]. Lange and Laird suggest that, although this does not appear to make a lot of difference in smaller studies, there is a greater difference in large studies [10].
Both PBAT and TDT are standalone programs. PBAT has a very helpful and detailed web page that includes everything from downloading instructions to an explanation of how to use the program. Furthermore, PBAT can actually carry out familybased association tests. There is no documentation for the TDT calculator but it is easy to use.
Researchers are becoming increasingly interested in investigating the combined effects of genetics and the environment, as well as the interactions between different genes. At least two programs are available to calculate power for such studies, Quanto [12, 13] and a National Cancer Institute program called 'Power' [14]. These programs are designed for regressionbased approaches. Quanto has the advantage of dealing with a wider range of study designs, including certain familybased populations as well as quantitative traits.
The final program that will be introduced here is Power Association With Errors (PAWE) [15, 16]. This webbased utility, available on the Rockefeller website, incorporates an error model into its power calculations. It computes power and sample size calculations for genetic casecontrol association studies in the presence of genotyping errors, and determines how much genotyping errors cost the researcher, in terms of decreased asymptotic power for a fixed sample size or increased sample size, to maintain constant asymptotic power.
This paper covers a variety of useful tools which should be helpful to geneticists attempting to perform power calculations; however, it is important not to become complacent. These calculations are, at best, an estimate of the power of the study, as the parameters used in them are often unknown. Furthermore, they will be imprecise when they do not take into account all of the factors that influence the magnitude of the effect. It is, therefore, encouraging to find recent programs, like PAWE, which continue to take steps to improve accuracy.
Authors’ Affiliations
References
 Boehnke M: 'Estimating the power of a proposed linkage study: A practical computer simulation approach'. Am J Hum Genet. 1986, 39: 513527.PubMed CentralPubMedGoogle Scholar
 Ploughman LM, Boehnke M: 'Estimating the power of a proposed linkage study for a complex genetic trait'. Am J Hum Genet. 1989, 44: 543551.PubMed CentralPubMedGoogle Scholar
 Ott J: 'Computersimulation methods in human linkage analysis'. Proc Natl Acad Sci USA. 1989, 86: 41754178. 10.1073/pnas.86.11.4175.PubMed CentralView ArticlePubMedGoogle Scholar
 Weeks DE, Ott J, Lathrop GM: 'SLINK: A general simulation program for linkage analysis'. Am J Hum Genet. 1990, 47: A204Google Scholar
 Risch N: 'Linkage strategies for genetically complex traits. II. The power of affected relative pairs'. Am J Hum Genet. 1990, 46: 229241.PubMed CentralPubMedGoogle Scholar
 Purcell S, Cherny SS, Sham PC: 'Genetic Power Calculator: Design of linkage and association genetic mapping studies of complex traits'. Bioinformatics. 2003, 19: 149150. 10.1093/bioinformatics/19.1.149.View ArticlePubMedGoogle Scholar
 Purcell S, Cherny SS, Hewitt JK, Sham PC: 'Optimal sibship selection for genotyping in quantitative trait locus linkage analysis'. Hum Hered. 2001, 52: 113. 10.1159/000053350.View ArticlePubMedGoogle Scholar
 Sham PC, Purcell S, Cherny SS, Abecasis GR: 'Powerful regressionbased quantitativetrait linkage analysis of general pedigrees'. Am J Hum Genet. 2002, 71: 238252. 10.1086/341560.PubMed CentralView ArticlePubMedGoogle Scholar
 Lange C, DeMeo DL, Laird NM: 'Power and design considerations for a general class of familybased association tests: Quantitative traits'. Am J Hum Genet. 2002, 71: 13301341. 10.1086/344696.PubMed CentralView ArticlePubMedGoogle Scholar
 Lange C, Laird NM: 'Power calculations for a general class of familybased association tests: Dichotomous traits'. Am J Hum Genet. 2002, 71: 575584. 10.1086/342406.PubMed CentralView ArticlePubMedGoogle Scholar
 Chen WM, Deng HW: 'A general and accurate approach for computing the statistical power of the transmission disequilibrium test for complex disease genes'. Genet Epidemiol. 2001, 21: 5367. 10.1002/gepi.1018.View ArticlePubMedGoogle Scholar
 Gauderman WJ: 'Sample size requirements for matched casecontrol studies of geneenvironment interaction'. Stat Med. 2002, 21: 3550. 10.1002/sim.973.View ArticlePubMedGoogle Scholar
 Gauderman WJ: 'Sample size requirements for association studies of genegene interaction'. Am J Epidemiol. 2002, 155: 478484. 10.1093/aje/155.5.478.View ArticlePubMedGoogle Scholar
 GarciaClosas M, Lubin JH: 'Power and sample size calculations in casecontrol studies of geneenvironmental interactions: Comments on different approaches'. Am J Epidemiol. 1999, 148: 689693.View ArticleGoogle Scholar
 Gordon D, Finch SJ, Nothnagel M, Ott J: 'Power and sample size calculations for casecontrol genetic association tests when errors present: Application to single nucleotide polymorphisms'. Hum Hered. 2002, 54: 2233. 10.1159/000066696.View ArticlePubMedGoogle Scholar
 Gordon D, Levenstien MA, Finch SJ, Ott J: 'Errors and linkage disequilibrium interact multiplicatively when computing sample sizes for genetic casecontrol association studies'. Pac Symp Biocomput. 2003, 8: 490501. [http://psb.stanford.edu/psbonline]Google Scholar