Skip to main content
  • Software review
  • Published:

Genomics software: The view from 10,000 feet

Abstract

The rate of change in genomics, and 'omics generally, shows no signs of slowing down. Related analysis software is struggling to keep apace. This paper provides a brief review of the field.

For the bioinformaticist, and still more so the traditional genetic epidemiologist, the big view on how to tackle genomic data analysis looks daunting. Only a few years ago, the genome-wide association study (GWAS) represented the overshadowing Everest on the landscape, and commentators fretted about the computational feasibility of analysing 500,000 or so single nucleotide polymorphisms (SNPs) against one phenotype variable. Now, the single-phenotype GWAS is a foothill from which to launch attacks on datasets of much greater scale and variety. A new term - systems genetics - has emerged to describe this expanded world view. Analytical tools for dealing with molecular data have always lagged behind the acceleration of high-throughput methods for generating them, and that seems especially true at the present time. Here, I provide an overview of the software currently available, and look ahead to future developments.

First, a look at more familiar territory. The software package PLINK [1] has become the favoured work-horse of GWAS analysis, thanks to the untiring efforts of Shaun Purcell to keep the software well documented, flexible, fast and compact in its use of data structures. Few other packages surpass PLINK as far as basic quality control and first-pass SNP-by-SNP analysis are concerned, and many other, more advanced features are available and are being expanded continuously. In addition to SNP probes, modern GWAS panels are equipped with additional probe sets for interrogating copy number variation (CNV). PennCNV [2] is a popular software for calling these. CNV call uncertainty poses downstream problems for association analysis, and software for dealing with this has been reviewed recently in this journal [3]. Another trend is towards imputation of SNPs that are not present in the GWAS panel but can be inferred via linkage disequilibrium (LD), also reviewed here recently [4]. Popular choices are Mach, [5] Impute [6] and Beagle [7]. A more specialist imputation problem, but one of general interest due to the role of the immune response system in many diseases, is to call classical human leukocyte antigen (HLA) genotypes from SNPs typed in the HLA region of chromosome 6. Recently improved software from Gil McVean and colleagues is available for this [8].

SNP annotation tools provide the most straightforward window from GWAS hits and also sequence data into the wider 'omic universe. A recent review is by Rachel Karchin [9]. The SNP Function Portal [10] provides one of the more comprehensive lists of annotation for each SNP, including those arising via LD proxy or 'tagging'. Other options include FastSNP, [11] PupaSuite, [12] SNPnexus, [13] SNPinfo, [14] SNPselector, [15] F-SNP [16] and TAMAL [17]. WGAviewer [18] is geared specifically towards the analysis of GWAS results, and has a nice visual interface. All these tools struggle to keep up with the rapidly expanding set of available annotations. For example, several different datasets are now publically available that combine GWAS SNP data with genome-wide gene expression data (so-called genetical genomics or expression quantitative trait locus [eQTL] data). Currently, however, no one tool integrates the ability to search all these datasets simultaneously. One option for the more proficient investigator is to keep one step ahead by using the Galaxy web tool [19] to design their own application for integrating different annotation tracks with their GWAS hits. SNAP [20] is a useful tool for feeding LD proxy information into such a custom-made Galaxy application.

Beyond SNP annotation, there are more formal attempts at linking genetic data into functional networks. These may be created from internal sources, such as p-values for SNP-SNP interactions, or extrinsic sources, such as protein-protein interactions (reviewed here recently [21]) and gene ontology categories. A repository of types of network data is available at http://www.pathwaycommons.org. While network visualisation tools were previously the domain of expensive commercial software, Cytoscape [22] has become an excellent freeware alternative. For formal statistical significance of coincident patterns within these networks, there is a rapidly expanding literature and no consensus yet on the best approach to take. Two examples are ALIGATOR [23] and gene-set enrichment analysis (GSEA). The latter has been adapted from gene expression studies and applied to GWAS p-values. Web-based implementations are available at http://bioinfo.vanderbilt.edu/webgestalt and http://www.broadinstitute.org/gsea.

How can one keep up to date in the rapidly changing world of genomic software? Certainly, review sections such as the one here in Human Genomics will help. Nucleic Acids Research publishes a useful annual review of web server applications, [24] now also available online http://bioinformatics.ca/links_directory. The Applications Note section of the journal Bioinformatics provides the best, but by no means only, location for primary literature on new software. Looking ahead, software for handling high-throughput sequencing is an area where we can expect much development in the coming months. Bioinformatics has a useful online 'virtual issue' on tools for next generation sequencing which they are recurrently updating http://www.oxfordjournals.org/our_journals/bioinformatics/nextgenerationsequencing.html. One wonders whether 10,000 feet will be high enough for a synoptic view in 12 months time.

References

  1. Purcell S, et al: PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Wang K, et al: PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007, 17: 1665-1674. 10.1101/gr.6861907.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  3. Plagnol V: Association tests and software for copy number variant data. Hum Genomics. 2009, 3: 191-194.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Ellinghaus D, Schreiber S, Franke A, Nothnagel M: Current software for genotype imputation. Hum Genomics. 2009, 3: 371-380.

    PubMed Central  CAS  PubMed  Google Scholar 

  5. Li Y, Abecasis GR: Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet. 2006, S79: 2290-

    Google Scholar 

  6. Marchini J, Howie B, Myers S, McVean G, et al: A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007, 39: 906-913. 10.1038/ng2088.

    Article  CAS  PubMed  Google Scholar 

  7. Browning BL, Browning SR: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009, 84: 210-223. 10.1016/j.ajhg.2009.01.005.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Leslie S, Donnelly P, McVean G: A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet. 2008, 82: 48-56. 10.1016/j.ajhg.2007.09.001.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Karchin R: Next generation tools for the annotation of human SNPs. Brief Bioinform. 2009, 10: 35-52.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Wang P, et al: SNP Function Portal: A web database for exploring the function implication of SNP alleles. Bioinformatics. 2006, 22: e523-529. 10.1093/bioinformatics/btl241.

    Article  CAS  PubMed  Google Scholar 

  11. Yuan HY, et al: FASTSNP: An always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Res. 2006, 34: W635-W641. 10.1093/nar/gkl236.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  12. Conde L, et al: PupaSuite: Finding functional single nucleotide polymorphisms for large-scale genotyping purposes. Nucleic Acids Res. 2006, W621-W625. 34 Web Server

  13. Chelala C, Khan A, Lemoine NR: SNPnexus: A web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics. 2009, 25: 655-661. 10.1093/bioinformatics/btn653.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Xu Z, Taylor JA: SNPinfo: Integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 2009, W600-W605. 37 Web Server

  15. Xu H, et al: SNPselector: A web tool for selecting SNPs for genetic association studies. Bioinformatics. 2005, 21: 4181-4186. 10.1093/bioinformatics/bti682.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Lee PH, Shatkay H: F-SNP: Computationally predicted functional SNPs for disease association studies. Nucleic Acids Res. 2008, D820-D824. 36 Database

  17. Hemminger BM, Saelim B, Sullivan PF: TAMAL: An integrated approach to choosing SNPs for genetic studies of human complex traits. Bioinformatics. 2006, 22: 626-627. 10.1093/bioinformatics/btk025.

    Article  CAS  PubMed  Google Scholar 

  18. Ge D, et al: WGAViewer: Software for genomic annotation of whole genome association studies. Genome Res. 2008, 18: 640-643. 10.1101/gr.071571.107.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Giardine B, et al: Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 2005, 15: 1451-1455. 10.1101/gr.4086505.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  20. Johnson AD, et al: SNAP: A web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008, 24: 2938-2939. 10.1093/bioinformatics/btn564.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  21. Lehne B, Schlitt T: Protein-protein interaction databases: Keeping up with growing interactomes. Hum Genomics. 2009, 3: 291-297.

    PubMed Central  CAS  PubMed  Google Scholar 

  22. Shannon P, et al: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  23. Holmans P, et al: Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet. 2009, 85: 13-24. 10.1016/j.ajhg.2009.05.011.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Benson G: Nucleic Acids Research Annual Web Server Issue in 2009. Nucleic Acids Res. 2009, 37: W1-W2. 10.1093/nar/gkp505.

    Article  PubMed Central  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael E Weale.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weale, M.E. Genomics software: The view from 10,000 feet. Hum Genomics 4, 56 (2009). https://doi.org/10.1186/1479-7364-4-1-56

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1479-7364-4-1-56

Keywords