Analysis and update of the human aldehyde dehydrogenase (ALDH) gene family

The aldehyde dehydrogenase (ALDH) gene superfamily encodes enzymes that are critical for certain life processes and detoxification via the NAD(P)+-dependent oxidation of numerous endogenous and exogenous aldehyde substrates, including pharmaceuticals and environmental pollutants. Analysis of the ALDH gene superfamily in the latest databases showed that the human genome contains 19 putatively functional genes and three pseudogenes. A number of ALDH genes are upregulated as a part of the oxidative stress response and inexplicably overexpressed in various tumours, leading to problems during cancer chemotherapy. Mutations in ALDH genes cause inborn errors of metabolism -- such as the Sjögren - Larsson syndrome, type II hyperprolinaemia and γ-hydroxybutyric aciduria -- and are likely to contribute to several complex diseases, including cancer and Alzheimer's disease. The ALDH gene products appear to be multifunctional proteins, possessing both catalytic and non-catalytic properties.


Introduction
Aldehyde dehydrogenases (ALDHs; EC1.2.1.3) represent ag roup of enzymes that oxidise aw ide range of endogenous and exogenous aldehydes to their corresponding carboxylic acids. 1 Endogenous aldehydes aref ormed during the metabolism of aminoa cids, carbohydrates, lipids, biogenic amines, vitamins and steroids. Biotransformations of alarge number of drugsa nd environmental chemicals also generate aldehydes. Aldehydes areh ighly reactivee lectrophilic compounds which interactwiththiol and aminogroups, the resulting effects vary from physiological and therapeutic to cytotoxic,m utagenic or carcinogenic.Inthis respect, ALDHsefficiently oxidise and, in most instances, detoxify as ignificantn umber of chemically diverse aldehydes which otherwise would be harmful to the organism.S trong evidence supporting this notionc omes from the fact that mutations in ALDH genes cause inborne rrorso f metabolism associated with clinical phenotypes -such as Sjö gren-Larssons yndrome (SLS), type II hyperprolinaemia and g -hydroxybutyric aciduria. 2 In addition, mutations in ALDH genes contribute to clinically relevant diseases such as cancer and Alzheimer'sd isease.
There are instances, however, in which ALDHsc atalyse reactions yielding chemically reactive or bioactivem etabolites that are essential to the organism. Several ALDH enzymesincluding ALDH1A1, ALDH1A2 and ALDH1A3 -catalyse the irreversible oxidationo fr etinal to retinoic acid. 3 Whereas the light-absorbing properties of retinal are an ecessary element for vision, thec arboxylic acid isomers, all-transretinoic acid and/or 9-cis-retinoic acid, serve as ligands for the retinoic receptor (RAR)a nd the retinoid Xr eceptor (RXR) that mediate genee xpression for growth and development. 4 The importance of ALDH enzymes in retinoic acid formation became evident from the fact that homozygous disruptiono f the mouse Aldh1a2 gene results in an embryonicl ethal phenotype due to defects in early heart morphogenesis, 5,6 whereas Aldh1a3 null mice die shortly after birth, due to respiratoryd istress caused by choanal atresia. 7 Formation of retinoic acid and g -aminobutyric acid (GABA) area mong the most intriguing functions of ALDHs regarding bioactivation.G ABA is implicated in ther egulation of the GABAergic,d opaminergic and opioid systems. Even though the main pathway for GABA synthesis is the decarboxylationo f L -glutamate,t his neurotransmitter can also be formed from putrescine by directo xidatived eamination to give g -aminobutyraldehyde,w hich is then converted into GABA by an ALDH. 8 Alli na ll, the ALDH gene family represents at ruly diverse group of proteinsw hich arec ritical to metabolism.

Multiple function(s) of the ALDH enzymes
Although the major function of ALDH enzymes is the NAD(P) þ -dependenta ldehyde oxidation, it has become increasingly clear that some, if not most, ALDHs exhibit multiple functions (Figure 1). For example,A LDH1A1, ALDH2, ALDH3A1 and ALDH4A1 arek nown to catalyse ester hydrolysis, suggestingt hat the ALDHs mayh avem ore than one catalytic function. 9 Indeed, it has recently been suggested that ALDH2 also possesses nitrate reductase activity, which catalyses the formation of 1,2-glyceryl dinitrate and nitrite from nitroglycerin within mitochondria, leading to the production of cGMP and vasorelaxation. 10 Aside from their catalytic properties,A LDH proteinsa re capable of non-catalytic interactions with chemically diverse endogenous compounds and chemotherapeutic agents. In this context,A LDH1A1 has been identifieda sa na ndrogenbinding protein prominently expressed in humang enital fibroblasts; as ac holesterol-binding protein in bovine lens epithelium; and as ac ytosolic thyroid hormone-binding protein in Xenopus . 11 ALDH1A1 has also been identifieda sa flavopyridol-binding protein in non-small cell lung carcinomas and as ad aunorubicinb inding protein in rat liver. 1 Similar to ALDH1A1, ALDH2 also displaysb inding capabilities with exogenous compounds, which became evident from its identification as an acetaminophen binding protein. 1 In addition, it has been suggested that someA LDHsm ay playacritical role in cellular homeostasis by maintaining redox balance. 12 Fore xample,i th as been proposedt hat ALDH3A1 mayscavenge hydroxyl radicals via the-SH groupsofCys and Met residues, and that both ALDH3A1 and ALDH1A1 may contribute to theantioxidant capacity of the cell by generating NADPH and/or NADH. 13 The enzymatic activity of ALDH3A1 generatesN ADPH, which is linked to the regeneration of reduced glutathione (GSH) from its oxidised form( GSSG) via the glutathione reductase/peroxidase system. NAD(P)Hm ay also function as ad irect antioxidantb y reducing glutathiyl radicals (GS z )o rt yrosyl radicals. 14 The expression of ALDH3A1 and ALDH1A1 at very high concentrations in the mammalian cornea and lens (crystallins) has led to additional hypothesesr egarding the multifunctional properties of these proteins -including as tructural function contributing to transparency. 15,16 Finally,t he ALDH7A1 gene product is similar to theg reen garden pea '26g protein' involved in the regulation of turgor pressure, suggesting that the ALDH7A1 protein mighth aveo smoregulatory properties.

Evolution of the ALDH genes
ALDHsh aveawide distribution in nature,r anging from bacteria and yeastst op lants and animals. 17 Sequence comparisons indicate extensives imilarity between bacterial and human ALDHs and suggest that the superfamily has a common ancestral gene,datingback to , 3billion yearsago. 18 As ystematic nomenclatures chemef or the ALDH gene superfamily (in animals, plants, bacteria and yeasts) has been developed, based on evolutionary divergence, 18 which has been implemented with biannual updates 19,20 and is available via the internet (http://www.aldh.org).
ALDH proteins are conveniently classified into families and subfamilies based on the percentage of amino acid identity. Proteinss haring $ 40 per cent identity area ssigned to a particular family designated by an Arabic numeral, whereas those sharing $ 60 per cent identityare classified in the same subfamily designated by al etter.T hese cut-off values follow the originalr ecommendations by Margaret Dayhoff and were first applied to the cytochrome P-450 superfamily. 21 At present, more than 130 additional gene superfamilies and large gene families followt his same format.

Endogenous functions of ALDHs
Antioxidants and oxidatives tress increase the expression of certain ALDH genes, leading to increased protection of the cell against insult by environmental chemicals and drugs. 22 Increased expression of certain ALDHs in tumour cells, however, leads to decreased cellular sensitivity to cyclophosphamide and other oxazaphosphorines and,t hus, to clinical problems in the treatment of cancer patients. 23 The reason for certain ALDHs -and other non-P450 memberso ft he [ Ah ]

Functions of ALDHs
• Cornea and lens crystallins • Osmotic pressure Analysis and update of the ALDH gene family Review UPDATE ON GENE COMPLETION AND ANNOTATIONS battery-to be upregulated in some tumours 24 remains an enigma.
Numerous polymorphisms exist in the human ALDH genes, some of which cause inborne rrorso fm etabolism and contribute to clinically relevant diseases. 2 Polymorphism in the ALDH2 genei sa ssociated witha ltered acetaldehyde metabolism, alcohol-induced 'flushing' syndrome,d ecreased risk for alcoholism and increased risk of ethanol-induced cancers. The genetic ALDH2 deficiency has also been reported as ar isk factor in late-onset Alzheimer'sd isease. 25 Epidemiological studies have revealed conflictinge vidence about the association between the ALDH2 polymorphism and ethanol-induced hypertension. 11 Polymorphisms in the ALDH3A2 , ALDH4A1, ALDH5A1 and ALDH6A1 genes are associated with metabolic diseases, which, in most cases, are characterised by neurological complications. Mutations in ALDH3A2 are the molecular basis for SLS,a na utosomal recessive disorder characterised by congenital ichthyosis, mental retardation, spasticity,o cular abnormalities and pruritus. 26,27 Prematurebirth has also been observedin73per cent of children with SLS. 28 Loss of ALDH4A1 function causes type II hyperprolinaemia, an autosomal recessive disorder characterised by plasma accumulation of prolinea nd D 1 -pyrroline-5-carboxylate,a sw ell as neurological manifestations such as seizures and mental retardation. 29 Loss of ALDH5A1 function leads to g -hydroxybutyric aciduria, ar are autosomal recessive disorder in GABA metabolism associated with accumulation of both GABA and g -hydroxybutyric acid in blood seruma nd cerebrospinal fluid. 30 ALDH6A1 (methylmalonic semialdehyde dehydrogenase) deficiency is an inborn metabolic disorder that results in developmental delay. 31 Latest genes in the ALDH database As earch of the Human Gene NomenclatureC ommittee (HGNC) databaseu sing' ALDH' produced 20 'hits': 19 ALDH genes plus AGPS (encoding alkylglycerone phosphate synthase).This latter entryappearsinthe database because one of its aliases is ' ALDHPSY'. As earch of the HGNC database using 'aldehyde dehydrogenase' produced 21 hits -the 19 putatively functional ALDH genes plus twoo thers, AASD-HPPT and ADH5 (Table 1). After various analyses, it was concluded that these latter two, evolutionarily,d on ot belong to the ALDH gene superfamily. AASDHPPT wasf ound to belongt ot he 4 0 -phosphopantetheinyl transferase superfamily (pfam01648;A CPS) and ADH5 belongs to thea lcohol dehydrogenase family (pfam00107: ADH_zinc_N). These two genes appear in the HGNC database in response to the cue 'aldehyde dehydrogenase', because theset wo wordsa re included within their names: aminoadipate-semi aldehyde dehydrogenase-phosphopantetheinyl transferase and formaldehyde dehydrogenase.I nterestingly,t he EnzymeC ommission (EC)d atabase gives L -aminoadipate-semialdehyde dehydrogenase the number EC 1.2.1.31 -meaning that it is closely related functionally to the other ALDH activities (EC 1.2.1.3).
The twom ost recently discovered ALDH genes are ALDH1L2 and ALDH16A1.T he ALDH1L2 protein is very similar to ALDH1L1, which is better known as 10-formyltetrahydrofolate dehydrogenase (TFDH), ab ifunctional enzyme formed from the fusion of twou nrelated genes; TFDHi sh ighly expressed in human liver,k idney and pancreas. 32 The deduced aminoa cid sequence of ALDH1L1 contains three domains -including the aminot erminal (residues 1-203), which is approximately 30 per cent identicalt o phosphoribosylglycinamide formyltransferase (EC 2.1.2.2), and the carboxyl terminal( residues 417 -902), which belongs to the ALDH superfamily. 33 The intermediate domain( residues 204 -416) does not appeart oh avea ny known catalytic function, although it shows significant homology witht he structural domaino facalmodulin-like protein. 34 This intermediate domain is apparently an essential structural element that aligns the twof unctional domains together for 10-formyl-tetrahydrofolate (10-FTHF) dehydrogenase activity, 35 -the primaryf unction of this enzyme. 31 This multidomain enzyme catalyses: (a) the NADP þ -dependent oxidation of 10-FTHF to tetrahydrofolate (THF); (b) the NADP þ -dependento xidationo f2 -propanal and acetaldehyde and;( c) the NADP þ -independent hydrolysis of 10-FTHF to formate and THF. 34 ALDH1L1 is involved in formatem etabolism, as well as the regulation of 10-FTHF and THF,w hich are principal sources of folate in the cell.
The ALDH1L2 geneencodes aprotein that is 72.3 per cent identicalw ith the ALDH1L1 and is also af usion gene, comprising three domains: (a) the formyl-trans-N-formyl transferase (pfam00551) at the aminot erminal( residues 23 -202); (b) the formyltransferase carboxyl terminald omain (pfam02911)i nt he middle (residues 226 -327); and (c) the aldehyde dehydrogenase domaina tt he carboxyl terminal (residues 451 -910). No functional data have yetbeen reported for the ALDH1L2 protein. It is worthm entioning that the BLAST scores for the ALDH domain in both the ALDH1L1 and ALDH1L2 genesa re much higher than those for the other twod omains, which is strong evidence to support the notion that these twog enes should be listed as ALDH genes.
The ALDH16A1 gene,l ocated at 19q13.33, wasi dentified recently by the National Institutes of HealthM ammalian Gene Collection (MGC) Program, which represents a multi-institutional effortt oi dentifya nd sequence ac DNA clone containing ac omplete open reading frame for each human and mouse gene. 36 The ALDH16A1 gene encodes a protein of 802 amino acids (listed in databases as 'hypothetical protein MGC10204'), which is , 35 perc ent identical to putatively membrane-anchored ALDHs found in bacterial species such as Sinorhizobiummeliloti 102.Putativeorthologues of ALDH16A1 aref ound in mouse,r at and chimpanzee,a nd exhibita round 72 -74p er cent amino acid identity with the human ALDH16A1.
Finally,t here are three pseudogenes -ALDH7A1P1, ALDH7A1P2 and ALDH7A1P3 -located at Chr 5q14, 7q36 and 10q21, respectively,o fw hich only ALDH7A1P1 meets the HGNCc riteria for ap seudogene (at least 50 per cent amino acid identity across 50 per cent of the open reading frame);h owever,t he names ALDH7A1P2 and ALDH7A1P3 are proposed because sequenceh omologyw itht he ALDH7A1 genei ss ignificantly higher than that with any other ALDH gene.A na lternative nomenclatures ystem for naming four various types of pseudogenes has recently been proposed. 37 As is commonly seen with pseudogenes in the mammalian genome, 37 all three of these pseudogenes are found at chromosomal locations that differ from that of the ALDH7A1 functional gene from which the pseudogenes clearly originated.

Conclusions
The authors' current analysis of the ALDH genes within the human genome is nowp robably complete, and it can be concluded that the human ALDH genesuperfamily comprises 19 genes in 11 families and four subfamilies ( Figure 2). The ALDH1 family contains six functional genes:t he cytosolic ALDH1A1 and the mitochondrial ALDH1B1 mayb e involved in acetaldehyde metabolism; ALDH1A1 also participates in retinal oxidation and the detoxificationo f cyclophosphamide; the ALDH1A2 and ALDH1A3 proteins are integral to the oxidation of retinal to retinoic acid; the ALDH1L1 genec odes for 10-FTHF dehydrogenase; the ALDH1L2 gene product is very similar to that of ALDH1L1, but no functional data are available yet. The ALDH2 family has as ingle member,e ncoding the mitochondrial ALDH that exhibits the highest affinityf or acetaldehyde and is critical in ethanol metabolism. Although ALDH2 officially qualifies as as eventhm ember of the ALDH1 family,i ts longstanding name of "ALDH2" associated with ethanol emtabolism has been grandfathered into themore recent nomenclature system based on evolutionaryd ivergence. 18 The ALDH3A subfamily contains the dioxin-inducible ALDH3A1 and ALDH3A2 , which areprimarily involved in the oxidation of medium-and long-chain aliphatic and aromatic aldehydes. The ALDH3B subfamily consists of twostructurally related genes, ALDH3B1 and ALDH3B2;a sy et, there are no functional data for either gene product. ALDH5A1 encodes the succinic semialdehyde dehydrogenase. ALDH6A1 encodes the acetyl CoAdependent methylmalonate semialdehyde dehydrogenase. The ALDH7A1 gene product, also known as 'antiquitin', is similar to the green garden pea 26 gp rotein involved in the regulation of turgor pressure. ALDH8A1 appears to metabolise retinal. ALDH9A1 codes for an enzyme that participates in the metabolism of g -aminobutyraldehyde and aminoaldehydes derived from polyamines. The ALDH16A1 gene encodes an 802-aminoacid protein with as-yet unknown function. Finally, the ALDH18A1 gene encodes D 1 -pyrroline-5-carboxylate synthetase,w hich qualifies for classification in the ALDH superfamily based on the sequence homologyo f one of the protein domains (residues 361 -772). Figure 2. Dendrogram of the 19 human aldehyde dehydrogenase ( ALDH)genes that are bona fide members of the ALDH superfamily.T oa void additional clutter,a lternatives plice variants of ALDH genes have not been included in the construction of this tree or the three pseudogenes listed in Ta ble 1. This neighbour-joining method givesv arious branches of different lengths, reflecting that evolutionarydivergence is not the same between different branches of the gene tree.