ArrayTrack development has continued with the addition of new utilities to address the growing and changing needs of FDA's research programmes. New features facilitate the management of preprocessed proteomics and metabolomics data. The single nucleotide polymorphism (SNP) and quantitative trait locus (QTL) libraries have been integrated to support pathway analysis and data mining for SNP-related studies. Extensive enhancements have also been made to manage and analyse the genetic profiling data related to bacterial food-borne pathogens. These enhancements are depicted in Figure 3 in relation to ArrayTrack's core functionality.
Support for proteomics and metabolomics data
Proteomics and metabolomics have grown steadily in importance in biomedical research, in parallel with microarrays. The integration of tri-omics data (ie genomics, proteomics and metabolomics data) has been a primary goal in systems biology for drug development and safety evaluation. To support this line of research and to review this type of data submitted by sponsors to the FDA through the VGDS -- or, as it has been renamed, the Voluntary Exploratory Data Submissions (VXDS) program -- ArrayTrack was previously modified to accommodate lists of proteins and metabolites. Additionally, a new systems biology function, called CommonPathway,[10] was added that enables the examination of common pathways and functional categories (eg gene ontology terms) shared by different data types.
ArrayTrack is now capable of analysing data from any mass spectrometry platform once the raw data have been processed for detection and quantification of peptides or metabolites -- an important step in the data analysis workflow [15, 16]. New tools have been created to simplify the handling of proteomics data from the two most popular database search programs, Mascot and Sequest, for detection of peptides. The tools convert output files from these two programs into ArrayTrack-readable files.
The same interpretation tools used for microarray data in ArrayTrack are extensible for proteomics and metabolomics data. Thus, by linking the results to gene, protein and pathway databases, researchers will be able to contextualise these results in the same way as gene expression experiments. Additionally, a unified interface helps to reduce the 'learning curve' associated with analysing new data types, giving researchers currently working on microarrays an incentive to move towards more integrated approaches that also encompass proteins and metabolites.
SNP and QTL libraries
Recent advances in microarray-based genotyping techniques have enabled researchers rapidly to scan for known SNPs across complete genomes. An efficient data-mining strategy and a set of sophisticated tools are necessary better to understand and utilise the findings from genetic association studies.
One of the focuses in genetic association studies is to relate SNPs to genes and pathways in order to understand the underlying disease mechanisms. ArrayTrack has already provided a gene-pathway discovery platform. By integrating the SNP library, which contains annotation summary information for SNPs and their mapped relationship to genes, ArrayTrack now provides an integrated SNP-gene-pathway analysis platform for SNP studies.
A QTL is a region of DNA that is associated with a particular phenotypic trait. A common use for QTL data is to identify candidate genes underlying a trait within one or more QTLs. The identification of the SNP-gene-QTL relationship is the basis of tests to determine whether the gene/SNP is associated with the aetiology of a disease in animal models or human studies. The integration of SNP and QTL libraries into ArrayTrack enables dynamic mining of such complex biological interactions.
SNP and QTL libraries [17] have been constructed and incorporated into ArrayTrack. Data from several public repositories were collected in the SNP and QTL libraries and connected to other domain libraries (genes, proteins, metabolites and pathways) in ArrayTrack. Linking the data sets within ArrayTrack allows searching of SNP and QTL data, as well as their relationships to other biological molecules. The SNP library includes approximately 15 million human SNPs and their annotations, while the QTL library contains publicly available QTL data associated with specific phenotypes identified in mice, rats and humans. Case studies demonstrating the utility of these libraries have been reported [17, 18].
Support for microbial pathogen microarray data
Food-borne pathogens are a leading cause of illness in the USA. High-throughput microarray technology provides an effective way to identify, characterise and obtain a nearly complete snapshot of the genetic traits of bacterial strains, such as their pathogenicity, virulence or antimicrobial resistance. Such genome-wide insight is necessary for accurate identification and discrimination of pathogens that may contaminate the food supply.
ArrayTrack has been extended to support microbial genomics research using microarrays [19]. ArrayTrack's libraries have been populated with bioinformatics data relating to bacterial pathogen species from the public domain. Data processing and visualisation tools have been enhanced with customised options to facilitate analysis of genetic profiling microarray data. Specifically, three new functions have been developed and are particularly effective for analysis of these microarray data: flag-based hierarchical clustering analysis (HCA), a flag concordance (FC) heat map and flag indicators in the mixed scatter plot (where 'flag' refers to a gene's presence or absence call). These functions are particularly relevant and effective for the identification and characterisation of bacterial pathogens using microarray genetic profiling data. The enhancements are displayed in Figure 4. For example, the Microbial Library (Figure 4C) is the newest addition to ArrayTrack's collection of libraries. Currently, it holds 270,000 gene records from a total of 84 bacterial strains: 30 Escherichia coli, 39 Salmonella enterica, ten Shigella spp. and five Vibrio spp. Thus, as a starting point, the Microbial Library is focused on these four bacterial genera that are common food-borne pathogens. ArrayTrack also holds microbial pathway information from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) for over 50 of these strains [20] and gene ontology information for the E. coli K12 substrain MG1655. The gene annotations and sequences for the Microbial Library were downloaded from the National Center for Biotechnology Information (NCBI) website.