Software review | Open | Published:
Genomics software: The view from 10,000 feet
Human Genomicsvolume 4, Article number: 56 (2009)
The rate of change in genomics, and 'omics generally, shows no signs of slowing down. Related analysis software is struggling to keep apace. This paper provides a brief review of the field.
For the bioinformaticist, and still more so the traditional genetic epidemiologist, the big view on how to tackle genomic data analysis looks daunting. Only a few years ago, the genome-wide association study (GWAS) represented the overshadowing Everest on the landscape, and commentators fretted about the computational feasibility of analysing 500,000 or so single nucleotide polymorphisms (SNPs) against one phenotype variable. Now, the single-phenotype GWAS is a foothill from which to launch attacks on datasets of much greater scale and variety. A new term - systems genetics - has emerged to describe this expanded world view. Analytical tools for dealing with molecular data have always lagged behind the acceleration of high-throughput methods for generating them, and that seems especially true at the present time. Here, I provide an overview of the software currently available, and look ahead to future developments.
First, a look at more familiar territory. The software package PLINK  has become the favoured work-horse of GWAS analysis, thanks to the untiring efforts of Shaun Purcell to keep the software well documented, flexible, fast and compact in its use of data structures. Few other packages surpass PLINK as far as basic quality control and first-pass SNP-by-SNP analysis are concerned, and many other, more advanced features are available and are being expanded continuously. In addition to SNP probes, modern GWAS panels are equipped with additional probe sets for interrogating copy number variation (CNV). PennCNV  is a popular software for calling these. CNV call uncertainty poses downstream problems for association analysis, and software for dealing with this has been reviewed recently in this journal . Another trend is towards imputation of SNPs that are not present in the GWAS panel but can be inferred via linkage disequilibrium (LD), also reviewed here recently . Popular choices are Mach,  Impute  and Beagle . A more specialist imputation problem, but one of general interest due to the role of the immune response system in many diseases, is to call classical human leukocyte antigen (HLA) genotypes from SNPs typed in the HLA region of chromosome 6. Recently improved software from Gil McVean and colleagues is available for this .
SNP annotation tools provide the most straightforward window from GWAS hits and also sequence data into the wider 'omic universe. A recent review is by Rachel Karchin . The SNP Function Portal  provides one of the more comprehensive lists of annotation for each SNP, including those arising via LD proxy or 'tagging'. Other options include FastSNP,  PupaSuite,  SNPnexus,  SNPinfo,  SNPselector,  F-SNP  and TAMAL . WGAviewer  is geared specifically towards the analysis of GWAS results, and has a nice visual interface. All these tools struggle to keep up with the rapidly expanding set of available annotations. For example, several different datasets are now publically available that combine GWAS SNP data with genome-wide gene expression data (so-called genetical genomics or expression quantitative trait locus [eQTL] data). Currently, however, no one tool integrates the ability to search all these datasets simultaneously. One option for the more proficient investigator is to keep one step ahead by using the Galaxy web tool  to design their own application for integrating different annotation tracks with their GWAS hits. SNAP  is a useful tool for feeding LD proxy information into such a custom-made Galaxy application.
Beyond SNP annotation, there are more formal attempts at linking genetic data into functional networks. These may be created from internal sources, such as p-values for SNP-SNP interactions, or extrinsic sources, such as protein-protein interactions (reviewed here recently ) and gene ontology categories. A repository of types of network data is available at http://www.pathwaycommons.org. While network visualisation tools were previously the domain of expensive commercial software, Cytoscape  has become an excellent freeware alternative. For formal statistical significance of coincident patterns within these networks, there is a rapidly expanding literature and no consensus yet on the best approach to take. Two examples are ALIGATOR  and gene-set enrichment analysis (GSEA). The latter has been adapted from gene expression studies and applied to GWAS p-values. Web-based implementations are available at http://bioinfo.vanderbilt.edu/webgestalt and http://www.broadinstitute.org/gsea.
How can one keep up to date in the rapidly changing world of genomic software? Certainly, review sections such as the one here in Human Genomics will help. Nucleic Acids Research publishes a useful annual review of web server applications,  now also available online http://bioinformatics.ca/links_directory. The Applications Note section of the journal Bioinformatics provides the best, but by no means only, location for primary literature on new software. Looking ahead, software for handling high-throughput sequencing is an area where we can expect much development in the coming months. Bioinformatics has a useful online 'virtual issue' on tools for next generation sequencing which they are recurrently updating http://www.oxfordjournals.org/our_journals/bioinformatics/nextgenerationsequencing.html. One wonders whether 10,000 feet will be high enough for a synoptic view in 12 months time.
Purcell S, et al: PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.
Wang K, et al: PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007, 17: 1665-1674. 10.1101/gr.6861907.
Plagnol V: Association tests and software for copy number variant data. Hum Genomics. 2009, 3: 191-194.
Ellinghaus D, Schreiber S, Franke A, Nothnagel M: Current software for genotype imputation. Hum Genomics. 2009, 3: 371-380.
Li Y, Abecasis GR: Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet. 2006, S79: 2290-
Marchini J, Howie B, Myers S, McVean G, et al: A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007, 39: 906-913. 10.1038/ng2088.
Browning BL, Browning SR: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009, 84: 210-223. 10.1016/j.ajhg.2009.01.005.
Leslie S, Donnelly P, McVean G: A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet. 2008, 82: 48-56. 10.1016/j.ajhg.2007.09.001.
Karchin R: Next generation tools for the annotation of human SNPs. Brief Bioinform. 2009, 10: 35-52.
Wang P, et al: SNP Function Portal: A web database for exploring the function implication of SNP alleles. Bioinformatics. 2006, 22: e523-529. 10.1093/bioinformatics/btl241.
Yuan HY, et al: FASTSNP: An always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Res. 2006, 34: W635-W641. 10.1093/nar/gkl236.
Conde L, et al: PupaSuite: Finding functional single nucleotide polymorphisms for large-scale genotyping purposes. Nucleic Acids Res. 2006, W621-W625. 34 Web Server
Chelala C, Khan A, Lemoine NR: SNPnexus: A web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics. 2009, 25: 655-661. 10.1093/bioinformatics/btn653.
Xu Z, Taylor JA: SNPinfo: Integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 2009, W600-W605. 37 Web Server
Xu H, et al: SNPselector: A web tool for selecting SNPs for genetic association studies. Bioinformatics. 2005, 21: 4181-4186. 10.1093/bioinformatics/bti682.
Lee PH, Shatkay H: F-SNP: Computationally predicted functional SNPs for disease association studies. Nucleic Acids Res. 2008, D820-D824. 36 Database
Hemminger BM, Saelim B, Sullivan PF: TAMAL: An integrated approach to choosing SNPs for genetic studies of human complex traits. Bioinformatics. 2006, 22: 626-627. 10.1093/bioinformatics/btk025.
Ge D, et al: WGAViewer: Software for genomic annotation of whole genome association studies. Genome Res. 2008, 18: 640-643. 10.1101/gr.071571.107.
Giardine B, et al: Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 2005, 15: 1451-1455. 10.1101/gr.4086505.
Johnson AD, et al: SNAP: A web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008, 24: 2938-2939. 10.1093/bioinformatics/btn564.
Lehne B, Schlitt T: Protein-protein interaction databases: Keeping up with growing interactomes. Hum Genomics. 2009, 3: 291-297.
Shannon P, et al: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.
Holmans P, et al: Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet. 2009, 85: 13-24. 10.1016/j.ajhg.2009.05.011.
Benson G: Nucleic Acids Research Annual Web Server Issue in 2009. Nucleic Acids Res. 2009, 37: W1-W2. 10.1093/nar/gkp505.