A survey of current software for genetic power calculations

Knight, Jo

doi:10.1186/1479-7364-1-3-225

Software review
Published: 02 March 2004

A survey of current software for genetic power calculations

Jo Knight¹

Human Genomics volume 1, Article number: 225 (2004) Cite this article

1840 Accesses
4 Citations
Metrics details

Abstract

Estimation of power is a key step in any study. This review briefly outlines the factors that affect power and the two main approaches for estimating it. There are a number of web-based tools and programs freely available to enable geneticists to perform power calculations, and the specifics of some of these are discussed here.

Introduction

The power of a study is the probability that it will detect an effect of a given size, and is therefore a subject of great importance. It is related to the magnitude of the effect, the sample size and the chosen level of statistical significance (ie the probability of a false-positive result). Ideally, calculations are carried out in the early stages of planning, in order to establish the number of people required.

In genetic studies, power is estimated either by asymptotic approaches or by undertaking simulations. The former involves employing closed equations, whereas the latter requires the creation of thousands of datasets with the same parameters as the population being studied. (The proportion of simulated sets yielding positive analysis results gives an estimate of the power.) Simulation can be a more accurate approach than the use of closed equations if the investigator is able to use the correct parameters. As the parameters required (eg the frequency of the causative variant) are often unknown, however, this is by no means an inconsequential task. Furthermore, simulation approaches are usually more computer intensive and time consuming. Both approaches are required because of the diversity of calculations performed in the context of genetic studies. Where asymptotic methods have not been established, or for some reason are not considered sufficient, simulation can be used.

Despite the complexities, a variety of tools have been designed which allow investigators to estimate power using closed equations and/or to simulate a wide range of datasets. The purpose of this paper is to outline a number of these freely available programs and web-based utilities. Box 1 provides a summary of the tools, highlighting the nature of each utility, where they can be downloaded from and brief information about what they can do.

The range of types of software available is the first thing to note. As well as stand-alone programs, there are web-based tools and downloadable Excel spreadsheets. Some are designed to perform simulations and some to calculate power from closed equations, others perform both tasks in addition to data analysis.

The software

SIMLINK and SLINK, written in 1990 and 1991, respectively, are the tools that have been available for the longest time [1–4]. Both are stand-alone programs and allow the user to carry out simulation studies on pedigrees to establish power for parametric linkage analysis; hence, they require the same information about the trait under study that is requisite to such analysis. Since the development of these tools, a number of other simulation programs, with different requirements, have been written; for example, ASP, SIMLA, SIMNUC and GASP. Such programs can be used to assess the power of non-parametric linkage studies. In addition, the closed equations derived in 1990 by Risch [5] to calculate power for studies of affected siblings are programmed into a spreadsheet called POWTEST, available from Dave Curtis's website.

Closed equations for the detection of both linkage and association using variance components analysis have been encoded in the Genetic Power Calculator (GPC) [6]. Furthermore, GPC has an option to estimate the contribution to the test statistic of each sibship using trait data [7]. This allows ranking of sibships and hence provides a way of prioritising genotyping. An extension of this method is implemented in Merlin-Regress, where the expected LOD scores can be calculated for general pedigrees [8]. Merlin-Regress is also able to perform regression-based analysis for quantitative traits in phenotypically selected samples.

Box 1. Summary of available tools

ASP, SIMLA, SIMLINK, SIMNUC, SLINK

Genetic Power Calculator (GPC)

Web-based utility
Closed equations for linkage and association of qualitative or quantitative traits in the variance components framework; power for individual sibships with trait data and case-control; and TDT for binary traits and threshold-selected traits
http://statgen.iop.kcl.ac.uk/gpc/

Merlin-Regress

Downloadable program
Closed equations for expected LOD scores based on regression approaches
http://www.sph.umich.edu/csg/abecasis/Merlin/

Power for Association With Errors (PAWE)

Web-based utility
Closed equations for case-control association with errors
http://linkage.rockefeller.edu/joanne/pawe/

PBAT

Downloadable program
Closed equation, simulation and analysis for family-based association studies
http://www.biostat.harvard.edu/~clange/default.htm

POWER

Downloadable program
Closed equations for studies of interactions
http://dceg2.cancer.gov/POWER/

POWTEST

Excel spreadsheet
Closed equations for TDT and linkage with affected sib pairs
http://www.mds.qmw.ac.uk/statgen/dcurtis/software.html

QUANTO

Downloadable program
Closed equations for studies of interactions
http://hydra.usc.edu/gxe/

TDT calculator

Downloadable program
Closed equation and simulation for family-based association studies
http://biosun01.biostat.jhsph.edu/~wmchen/pc.html

UCLA stat calculator

Web-based utility
Closed equation for case control association (as well as closed equations for other non-genetic study types)
http://calculators.stat.ucla.edu/powercalc/binomial/case-control/b-case-control-power.php

TDT, transmission disequilbrium test

The GPC is perhaps the utility capable of performing the widest range of power calculations. In addition to the utilities already mentioned, it can also be used to calculate power for transmission disequilibrium tests (TDTs) of binary traits and TDTand case-control studies of threshold-selected quantitative traits. Calculating power for these tests in the GPC is advantageous, as the GPC takes linkage disequilibrium between the gene and the marker under study into account. This web-based utility calculates power from the information provided by the user and produces output that is concise and useful. Accompanying notes relate mainly to usage rather than theory, and direct the user to papers in which the latter is explained.

Family-based association studies are frequently used for gene mapping. Extensions of TDT allow for analysis of quantitative as well as dichotomous traits; inclusion of families with missing parents; and joint analysis of different types of families (eg single affected/multiple affected and discordant siblings). PBAT [9, 10] and the TDT [11] calculator allow the user to perform closed-form calculations and simulation for such studies. The closed equations are slightly different. In the paper that outlines the theory behind PBAT, the authors suggest their approach is more accurate than that of Chen, as it calculates the power of the actual test statistic whereas Chen computes the power of the expected statistic [10]. Lange and Laird suggest that, although this does not appear to make a lot of difference in smaller studies, there is a greater difference in large studies [10].

Both PBAT and TDT are stand-alone programs. PBAT has a very helpful and detailed web page that includes everything from downloading instructions to an explanation of how to use the program. Furthermore, PBAT can actually carry out family-based association tests. There is no documentation for the TDT calculator but it is easy to use.

Researchers are becoming increasingly interested in investigating the combined effects of genetics and the environment, as well as the interactions between different genes. At least two programs are available to calculate power for such studies, Quanto [12, 13] and a National Cancer Institute program called 'Power' [14]. These programs are designed for regression-based approaches. Quanto has the advantage of dealing with a wider range of study designs, including certain family-based populations as well as quantitative traits.

The final program that will be introduced here is Power Association With Errors (PAWE) [15, 16]. This web-based utility, available on the Rockefeller website, incorporates an error model into its power calculations. It computes power and sample size calculations for genetic case-control association studies in the presence of genotyping errors, and determines how much genotyping errors cost the researcher, in terms of decreased asymptotic power for a fixed sample size or increased sample size, to maintain constant asymptotic power.

This paper covers a variety of useful tools which should be helpful to geneticists attempting to perform power calculations; however, it is important not to become complacent. These calculations are, at best, an estimate of the power of the study, as the parameters used in them are often unknown. Furthermore, they will be imprecise when they do not take into account all of the factors that influence the magnitude of the effect. It is, therefore, encouraging to find recent programs, like PAWE, which continue to take steps to improve accuracy.

References

Boehnke M: 'Estimating the power of a proposed linkage study: A practical computer simulation approach'. Am J Hum Genet. 1986, 39: 513-527.
PubMed Central CAS PubMed Google Scholar
Ploughman LM, Boehnke M: 'Estimating the power of a proposed linkage study for a complex genetic trait'. Am J Hum Genet. 1989, 44: 543-551.
PubMed Central CAS PubMed Google Scholar
Ott J: 'Computer-simulation methods in human linkage analysis'. Proc Natl Acad Sci USA. 1989, 86: 4175-4178. 10.1073/pnas.86.11.4175.
Article PubMed Central CAS PubMed Google Scholar
Weeks DE, Ott J, Lathrop GM: 'SLINK: A general simulation program for linkage analysis'. Am J Hum Genet. 1990, 47: A204-
Google Scholar
Risch N: 'Linkage strategies for genetically complex traits. II. The power of affected relative pairs'. Am J Hum Genet. 1990, 46: 229-241.
PubMed Central CAS PubMed Google Scholar
Purcell S, Cherny SS, Sham PC: 'Genetic Power Calculator: Design of linkage and association genetic mapping studies of complex traits'. Bioinformatics. 2003, 19: 149-150. 10.1093/bioinformatics/19.1.149.
Article CAS PubMed Google Scholar
Purcell S, Cherny SS, Hewitt JK, Sham PC: 'Optimal sibship selection for genotyping in quantitative trait locus linkage analysis'. Hum Hered. 2001, 52: 1-13. 10.1159/000053350.
Article CAS PubMed Google Scholar
Sham PC, Purcell S, Cherny SS, Abecasis GR: 'Powerful regression-based quantitative-trait linkage analysis of general pedigrees'. Am J Hum Genet. 2002, 71: 238-252. 10.1086/341560.
Article PubMed Central CAS PubMed Google Scholar
Lange C, DeMeo DL, Laird NM: 'Power and design considerations for a general class of family-based association tests: Quantitative traits'. Am J Hum Genet. 2002, 71: 1330-1341. 10.1086/344696.
Article PubMed Central CAS PubMed Google Scholar
Lange C, Laird NM: 'Power calculations for a general class of family-based association tests: Dichotomous traits'. Am J Hum Genet. 2002, 71: 575-584. 10.1086/342406.
Article PubMed Central CAS PubMed Google Scholar
Chen WM, Deng HW: 'A general and accurate approach for computing the statistical power of the transmission disequilibrium test for complex disease genes'. Genet Epidemiol. 2001, 21: 53-67. 10.1002/gepi.1018.
Article CAS PubMed Google Scholar
Gauderman WJ: 'Sample size requirements for matched case-control studies of gene-environment interaction'. Stat Med. 2002, 21: 35-50. 10.1002/sim.973.
Article PubMed Google Scholar
Gauderman WJ: 'Sample size requirements for association studies of gene-gene interaction'. Am J Epidemiol. 2002, 155: 478-484. 10.1093/aje/155.5.478.
Article PubMed Google Scholar
Garcia-Closas M, Lubin JH: 'Power and sample size calculations in case-control studies of gene-environmental interactions: Comments on different approaches'. Am J Epidemiol. 1999, 148: 689-693.
Article Google Scholar
Gordon D, Finch SJ, Nothnagel M, Ott J: 'Power and sample size calculations for case-control genetic association tests when errors present: Application to single nucleotide polymorphisms'. Hum Hered. 2002, 54: 22-33. 10.1159/000066696.
Article PubMed Google Scholar
Gordon D, Levenstien MA, Finch SJ, Ott J: 'Errors and linkage disequilibrium interact multiplicatively when computing sample sizes for genetic case-control association studies'. Pac Symp Biocomput. 2003, 8: 490-501. [http://psb.stanford.edu/psb-online]
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Psychiatry, Social Genetic and Developmental Psychiatry Research Centre, De Crespigny Park, Denmark Hill, Box P080, London, SE5 8AF, UK
Jo Knight

Authors

Jo Knight
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jo Knight.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Knight, J. A survey of current software for genetic power calculations. Hum Genomics 1, 225 (2004). https://doi.org/10.1186/1479-7364-1-3-225

Download citation

Received: 02 March 2004
Accepted: 02 March 2004
Published: 02 March 2004
DOI: https://doi.org/10.1186/1479-7364-1-3-225

A survey of current software for genetic power calculations

Abstract

Introduction

The software

Box 1. Summary of available tools

ASP, SIMLA, SIMLINK, SIMNUC, SLINK

Genetic Power Calculator (GPC)

Merlin-Regress

Power for Association With Errors (PAWE)

PBAT

POWER

POWTEST

QUANTO

TDT calculator

UCLA stat calculator

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Human Genomics

Contact us

A survey of current software for genetic power calculations

Abstract

Introduction

The software

Box 1. Summary of available tools

ASP, SIMLA, SIMLINK, SIMNUC, SLINK

Genetic Power Calculator (GPC)

Merlin-Regress

Power for Association With Errors (PAWE)

PBAT

POWER

POWTEST

QUANTO

TDT calculator

UCLA stat calculator

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Human Genomics

Contact us