Genomics software: The view from 10,000 feet

Weale, Michael E

doi:10.1186/1479-7364-4-1-56

Software review
Published: 01 October 2009

Genomics software: The view from 10,000 feet

Michael E Weale¹

Human Genomics volume 4, Article number: 56 (2009) Cite this article

1240 Accesses
Metrics details

Abstract

The rate of change in genomics, and 'omics generally, shows no signs of slowing down. Related analysis software is struggling to keep apace. This paper provides a brief review of the field.

For the bioinformaticist, and still more so the traditional genetic epidemiologist, the big view on how to tackle genomic data analysis looks daunting. Only a few years ago, the genome-wide association study (GWAS) represented the overshadowing Everest on the landscape, and commentators fretted about the computational feasibility of analysing 500,000 or so single nucleotide polymorphisms (SNPs) against one phenotype variable. Now, the single-phenotype GWAS is a foothill from which to launch attacks on datasets of much greater scale and variety. A new term - systems genetics - has emerged to describe this expanded world view. Analytical tools for dealing with molecular data have always lagged behind the acceleration of high-throughput methods for generating them, and that seems especially true at the present time. Here, I provide an overview of the software currently available, and look ahead to future developments.

First, a look at more familiar territory. The software package PLINK [1] has become the favoured work-horse of GWAS analysis, thanks to the untiring efforts of Shaun Purcell to keep the software well documented, flexible, fast and compact in its use of data structures. Few other packages surpass PLINK as far as basic quality control and first-pass SNP-by-SNP analysis are concerned, and many other, more advanced features are available and are being expanded continuously. In addition to SNP probes, modern GWAS panels are equipped with additional probe sets for interrogating copy number variation (CNV). PennCNV [2] is a popular software for calling these. CNV call uncertainty poses downstream problems for association analysis, and software for dealing with this has been reviewed recently in this journal [3]. Another trend is towards imputation of SNPs that are not present in the GWAS panel but can be inferred via linkage disequilibrium (LD), also reviewed here recently [4]. Popular choices are Mach, [5] Impute [6] and Beagle [7]. A more specialist imputation problem, but one of general interest due to the role of the immune response system in many diseases, is to call classical human leukocyte antigen (HLA) genotypes from SNPs typed in the HLA region of chromosome 6. Recently improved software from Gil McVean and colleagues is available for this [8].

SNP annotation tools provide the most straightforward window from GWAS hits and also sequence data into the wider 'omic universe. A recent review is by Rachel Karchin [9]. The SNP Function Portal [10] provides one of the more comprehensive lists of annotation for each SNP, including those arising via LD proxy or 'tagging'. Other options include FastSNP, [11] PupaSuite, [12] SNPnexus, [13] SNPinfo, [14] SNPselector, [15] F-SNP [16] and TAMAL [17]. WGAviewer [18] is geared specifically towards the analysis of GWAS results, and has a nice visual interface. All these tools struggle to keep up with the rapidly expanding set of available annotations. For example, several different datasets are now publically available that combine GWAS SNP data with genome-wide gene expression data (so-called genetical genomics or expression quantitative trait locus [eQTL] data). Currently, however, no one tool integrates the ability to search all these datasets simultaneously. One option for the more proficient investigator is to keep one step ahead by using the Galaxy web tool [19] to design their own application for integrating different annotation tracks with their GWAS hits. SNAP [20] is a useful tool for feeding LD proxy information into such a custom-made Galaxy application.

Beyond SNP annotation, there are more formal attempts at linking genetic data into functional networks. These may be created from internal sources, such as p-values for SNP-SNP interactions, or extrinsic sources, such as protein-protein interactions (reviewed here recently [21]) and gene ontology categories. A repository of types of network data is available at http://www.pathwaycommons.org. While network visualisation tools were previously the domain of expensive commercial software, Cytoscape [22] has become an excellent freeware alternative. For formal statistical significance of coincident patterns within these networks, there is a rapidly expanding literature and no consensus yet on the best approach to take. Two examples are ALIGATOR [23] and gene-set enrichment analysis (GSEA). The latter has been adapted from gene expression studies and applied to GWAS p-values. Web-based implementations are available at http://bioinfo.vanderbilt.edu/webgestalt and http://www.broadinstitute.org/gsea.

How can one keep up to date in the rapidly changing world of genomic software? Certainly, review sections such as the one here in Human Genomics will help. Nucleic Acids Research publishes a useful annual review of web server applications, [24] now also available online http://bioinformatics.ca/links_directory. The Applications Note section of the journal Bioinformatics provides the best, but by no means only, location for primary literature on new software. Looking ahead, software for handling high-throughput sequencing is an area where we can expect much development in the coming months. Bioinformatics has a useful online 'virtual issue' on tools for next generation sequencing which they are recurrently updating http://www.oxfordjournals.org/our_journals/bioinformatics/nextgenerationsequencing.html. One wonders whether 10,000 feet will be high enough for a synoptic view in 12 months time.

References

Purcell S, et al: PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.
Article PubMed Central CAS PubMed Google Scholar
Wang K, et al: PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007, 17: 1665-1674. 10.1101/gr.6861907.
Article PubMed Central CAS PubMed Google Scholar
Plagnol V: Association tests and software for copy number variant data. Hum Genomics. 2009, 3: 191-194.
Article PubMed Central CAS PubMed Google Scholar
Ellinghaus D, Schreiber S, Franke A, Nothnagel M: Current software for genotype imputation. Hum Genomics. 2009, 3: 371-380.
PubMed Central CAS PubMed Google Scholar
Li Y, Abecasis GR: Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet. 2006, S79: 2290-
Google Scholar
Marchini J, Howie B, Myers S, McVean G, et al: A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007, 39: 906-913. 10.1038/ng2088.
Article CAS PubMed Google Scholar
Browning BL, Browning SR: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009, 84: 210-223. 10.1016/j.ajhg.2009.01.005.
Article PubMed Central CAS PubMed Google Scholar
Leslie S, Donnelly P, McVean G: A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet. 2008, 82: 48-56. 10.1016/j.ajhg.2007.09.001.
Article PubMed Central CAS PubMed Google Scholar
Karchin R: Next generation tools for the annotation of human SNPs. Brief Bioinform. 2009, 10: 35-52.
Article PubMed Central CAS PubMed Google Scholar
Wang P, et al: SNP Function Portal: A web database for exploring the function implication of SNP alleles. Bioinformatics. 2006, 22: e523-529. 10.1093/bioinformatics/btl241.
Article CAS PubMed Google Scholar
Yuan HY, et al: FASTSNP: An always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Res. 2006, 34: W635-W641. 10.1093/nar/gkl236.
Article PubMed Central CAS PubMed Google Scholar
Conde L, et al: PupaSuite: Finding functional single nucleotide polymorphisms for large-scale genotyping purposes. Nucleic Acids Res. 2006, W621-W625. 34 Web Server
Chelala C, Khan A, Lemoine NR: SNPnexus: A web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics. 2009, 25: 655-661. 10.1093/bioinformatics/btn653.
Article PubMed Central CAS PubMed Google Scholar
Xu Z, Taylor JA: SNPinfo: Integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 2009, W600-W605. 37 Web Server
Xu H, et al: SNPselector: A web tool for selecting SNPs for genetic association studies. Bioinformatics. 2005, 21: 4181-4186. 10.1093/bioinformatics/bti682.
Article PubMed Central CAS PubMed Google Scholar
Lee PH, Shatkay H: F-SNP: Computationally predicted functional SNPs for disease association studies. Nucleic Acids Res. 2008, D820-D824. 36 Database
Hemminger BM, Saelim B, Sullivan PF: TAMAL: An integrated approach to choosing SNPs for genetic studies of human complex traits. Bioinformatics. 2006, 22: 626-627. 10.1093/bioinformatics/btk025.
Article CAS PubMed Google Scholar
Ge D, et al: WGAViewer: Software for genomic annotation of whole genome association studies. Genome Res. 2008, 18: 640-643. 10.1101/gr.071571.107.
Article PubMed Central CAS PubMed Google Scholar
Giardine B, et al: Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 2005, 15: 1451-1455. 10.1101/gr.4086505.
Article PubMed Central CAS PubMed Google Scholar
Johnson AD, et al: SNAP: A web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008, 24: 2938-2939. 10.1093/bioinformatics/btn564.
Article PubMed Central CAS PubMed Google Scholar
Lehne B, Schlitt T: Protein-protein interaction databases: Keeping up with growing interactomes. Hum Genomics. 2009, 3: 291-297.
PubMed Central CAS PubMed Google Scholar
Shannon P, et al: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.
Article PubMed Central CAS PubMed Google Scholar
Holmans P, et al: Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet. 2009, 85: 13-24. 10.1016/j.ajhg.2009.05.011.
Article PubMed Central CAS PubMed Google Scholar
Benson G: Nucleic Acids Research Annual Web Server Issue in 2009. Nucleic Acids Res. 2009, 37: W1-W2. 10.1093/nar/gkp505.
Article PubMed Central CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Medical and Molecular Genetics, King's College London, 8th Floor, Tower Wing, Guy's Hospital, London, SE1 9RT, UK
Michael E Weale

Authors

Michael E Weale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael E Weale.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weale, M.E. Genomics software: The view from 10,000 feet. Hum Genomics 4, 56 (2009). https://doi.org/10.1186/1479-7364-4-1-56

Download citation

Received: 18 September 2009
Accepted: 18 September 2009
Published: 01 October 2009
DOI: https://doi.org/10.1186/1479-7364-4-1-56

Genomics software: The view from 10,000 feet

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Human Genomics

Contact us

Genomics software: The view from 10,000 feet

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Human Genomics

Contact us