Each gene-specific record is derived from a global template, utilising a series of curator-specified software data modules tailored to the ALDH gene superfamily. These modules perform pre-defined database queries and generate all of a record's text, tables and graphical representations for the end-user's web browser. At the time of writing, software data modules registered in this global template generate the 'Synopses', 'Trivial names', 'Global accession identifiers', 'Molecular features and cataloguing', 'Accession identifier details', 'Human polymorphisms', 'Enzyme kinetics', 'Tissue expression profiles' and 'Reference list' sections for each ALDH gene.
Synopses
ALDH gene records begin with a living review of the literature, prepared and regularly updated by the authors, describing available structural and functional data. All information and illustrations are referenced to the original publication using hyperlinked PubMed database identifiers (PMID). In addition, details for each reference are further described within the reference list section of the record.
Trivial names
Members of the ALDH gene superfamily have had a number of descriptive synonyms used in the literature over the past three decades. This section associates these synonyms with their corresponding nomenclature.
Global accession identifiers
Accession identifiers for a given ALDH gene, transcript or peptide sequence are numerous due to the multitude of disparate databases providing molecular sequence data. The 'Global accession identifiers' module tabulates these accessions into hyperlinked identifiers to all source databases. This enables users quickly to access all source data for an individual ALDH member.
Molecular features and cataloguing
The molecular sequence data available from source databases for most accession identifiers typically provide little perspective on the sequence's genomic alignment integrity or how the sequence compares with other transcripts for a particular gene. Publicly available alignment tools can detect if sequences for two or more transcripts are different at specific positions, but often cannot tell exactly where they differ relative to the gene's alternatively spliced transcription schema. This module uses the server-side alignment software to characterise the integrity of each accession's sequence relative to the latest genomic assembly for its respective species. Specifically, all transcripts for each ALDH for a given species within the http://www.aldh.org database are co-aligned to their genomic assembly. These data are then used dynamically to generate a graphical representation of all transcripts for a particular gene and species, allowing the user quickly to identify the similarities and differences between alternatively spliced variants (Figure 1). The software module then categorises and tabulates all ALDH accession identifiers into their respective alternatively spliced transcriptional variant (ie ALDH3A1_v1, ALDH3A1_v2, etc) and briefly summarises size, isoelectric point, number of exons, FASTA sequence summaries and any sequence anomalies (single nucleotide polymorphisms [SNPs], insertions or deletions) relative to the genomic assembly for a each identifier.
Accession identifier detail reports
Most source databases provide limited or vague details about the accession identifier's sequence. Consequently, the http://www.aldh.org database uses the server-side scripting software to process each accession identifier's sequence for all available information. This processing generates an individual report of details for each accession identifier as a new webpage on the end-user's browser. The detail report for each identifier is found within the 'Molecular features' section by clicking on 'Click here for graphical and tabular details' for each accession identifier catalogued. The objective is to provide a succinct data analysis for each accession's sequence and begins with a tabular summary of the accession identifier's source information. Using the identifier's sequence, a graphical representation of the transcript's exon - intron structure, as well as its size, coding sequence, genomic strand and locus, is then generated (Figure 2). A multicoloured graphical representation of the transcript's corresponding peptide translation (Figure 3a) is generated and provides positional highlights for any synonymous or non-synonymous polymorphisms relative to the genomic assembly. A residue content summary for the peptide sequence, displayed next to the sequence image, facilitates easy review of the positional coordinates of any residues (eg cysteine residues; Figure 3b). Next, the 'Transcript sequence and structural features' section breaks the sequence into multiple linear representations, with all segments and their respective coordinates identified. These include 5' and 3' untranslated regions (5' UTR, 3' UTR), coding sequence (CDS), triplet codons, translations, exon segments, polyA signals and tails, polymorphisms, insertions and deletions. Additionally, hidden Markov models (HMM) for the ADHD peptide domain are being incorporated into this section, as well as the graphical representation of the peptide sequence further to strengthen the characterisation of the transcript and peptide sequence analysis.
Substrates, inhibitors and enzyme kinetics
Characterisation of ALDH gene superfamily members typically includes their enzymatic activity for an assortment of substrates and inhibitors using various test systems, isolation techniques, cofactors, tissues and species origin. This module provides a tabular summary of the literature for all reported kinetics values, sorted by species. Each entry includes the hyperlinked PubMed identifier to the original literature source for quick accessibility.
Reference list
References to the original sources for all data within the http://www.aldh.org database are an important priority for the ALDH gene superfamily curators. The reference list module provides a bibliography for all data reported within each gene-specific record, including those from the 'Synopsis and substrates', and 'Inhibitors and enzyme kinetics' modules, as well as additional references recommended by the curators. All references within this section are hyperlinked to the PubMed database via the PMID for ease of access.