Analysis and update of the human aldehyde dehydrogenase (ALDH) gene family
© Henry Stewart Publications 2005
Received: 23 February 2005
Accepted: 23 February 2005
Published: 1 June 2005
Skip to main content
© Henry Stewart Publications 2005
Received: 23 February 2005
Accepted: 23 February 2005
Published: 1 June 2005
The aldehyde dehydrogenase (ALDH) gene superfamily encodes enzymes that are critical for certain life processes and detoxification via the NAD(P)+-dependent oxidation of numerous endogenous and exogenous aldehyde substrates, including pharmaceuticals and environmental pollutants. Analysis of the ALDH gene superfamily in the latest databases showed that the human genome contains 19 putatively functional genes and three pseudogenes. A number of ALDH genes are upregulated as a part of the oxidative stress response and inexplicably overexpressed in various tumours, leading to problems during cancer chemotherapy. Mutations in ALDH genes cause inborn errors of metabolism -- such as the Sjögren - Larsson syndrome, type II hyperprolinaemia and γ-hydroxybutyric aciduria -- and are likely to contribute to several complex diseases, including cancer and Alzheimer's disease. The ALDH gene products appear to be multifunctional proteins, possessing both catalytic and non-catalytic properties.
Aldehyde dehydrogenases (ALDHs; EC126.96.36.199) represent a group of enzymes that oxidise a wide range of endogenous and exogenous aldehydes to their corresponding carboxylic acids . Endogenous aldehydes are formed during the metabolism of amino acids, carbohydrates, lipids, biogenic amines, vitamins and steroids. Biotransformations of a large number of drugs and environmental chemicals also generate aldehydes. Aldehydes are highly reactive electrophilic compounds which interact with thiol and amino groups, the resulting effects vary from physiological and therapeutic to cytotoxic, mutagenic or carcinogenic. In this respect, ALDHs efficiently oxidise and, in most instances, detoxify a significant number of chemically diverse aldehydes which otherwise would be harmful to the organism. Strong evidence supporting this notion comes from the fact that mutations in ALDH genes cause inborn errors of metabolism associated with clinical phenotypes -- such as Sjögren - Larsson syndrome (SLS), type II hyperprolinaemia and γ-hydroxybutyric aciduria . In addition, mutations in ALDH genes contribute to clinically relevant diseases such as cancer and Alzheimer's disease.
There are instances, however, in which ALDHs catalyse reactions yielding chemically reactive or bioactive metabolites that are essential to the organism. Several ALDH enzymes -- including ALDH1A1, ALDH1A2 and ALDH1A3 -- catalyse the irreversible oxidation of retinal to retinoic acid . Whereas the light-absorbing properties of retinal are a necessary element for vision, the carboxylic acid isomers, all-trans-retinoic acid and/or 9-cis-retinoic acid, serve as ligands for the retinoic receptor (RAR) and the retinoid X receptor (RXR) that mediate gene expression for growth and development . The importance of ALDH enzymes in retinoic acid formation became evident from the fact that homozygous disruption of the mouse Aldh1a2 gene results in an embryonic lethal phenotype due to defects in early heart morphogenesis [5, 6], whereas Aldh1a3 null mice die shortly after birth, due to respiratory distress caused by choanal atresia .
Formation of retinoic acid and γ-aminobutyric acid (GABA) are among the most intriguing functions of ALDHs regarding bioactivation. GABA is implicated in the regulation of the GABAergic, dopaminergic and opioid systems. Even though the main pathway for GABA synthesis is the decarboxylation of L-glutamate, this neurotransmitter can also be formed from putrescine by direct oxidative deamination to give γ-aminobutyraldehyde, which is then converted into GABA by an ALDH . All in all, the ALDH gene family represents a truly diverse group of proteins which are critical to metabolism.
Aside from their catalytic properties, ALDH proteins are capable of non-catalytic interactions with chemically diverse endogenous compounds and chemotherapeutic agents. In this context, ALDH1A1 has been identified as an androgen-binding protein prominently expressed in human genital fibroblasts; as a cholesterol-binding protein in bovine lens epithelium; and as a cytosolic thyroid hormone-binding protein in Xenopus . ALDH1A1 has also been identified as a flavopyridol-binding protein in non-small cell lung carcinomas and as a daunorubicin binding protein in rat liver . Similar to ALDH1A1, ALDH2 also displays binding capabilities with exogenous compounds, which became evident from its identification as an acetaminophen binding protein .
Human ALDH genes listed in the Human Gene Nomenclature Committee database, plus three pseudogenes
Approved gene symbol
Approved gene name
Aminoadipate-semialdehyde dehydrogenase-phosphopantetheinyl transferase
Alcohol dehydrogenase 5 (class III), chi polypeptide
Aldehyde dehydrogenase 1 family, member A1
Aldehyde dehydrogenase 1 family, member A2
Aldehyde dehydrogenase 1 family, member A3
Aldehyde dehydrogenase 1 family, member B1
Aldehyde dehydrogenase 1 family, member L1
Aldehyde dehydrogenase 1 family, member L2
Aldehyde dehydrogenase 2 family (mitochondrial)
Aldehyde dehydrogenase 3 family, member A1
Aldehyde dehydrogenase 3 family, member A2
Aldehyde dehydrogenase 3 family, member B1
Aldehyde dehydrogenase 3 family, member B2
Aldehyde dehydrogenase 4 family, member A1
Aldehyde dehydrogenase 5 family, member A1
Aldehyde dehydrogenase 6 family, member A1
Aldehyde dehydrogenase 7 family, member A1
Aldehyde dehydrogenase 7 family, pseudogene 1
Aldehyde dehydrogenase 7 family, pseudogene 2
Aldehyde dehydrogenase 7 family, pseudogene 3
Aldehyde dehydrogenase 8 family, member A1
Aldehyde dehydrogenase 9 family, member A1
Aldehyde dehydrogenase 16 family, member A1
Aldehyde dehydrogenase 18 family, member A1
ALDHs have a wide distribution in nature, ranging from bacteria and yeasts to plants and animals . Sequence comparisons indicate extensive similarity between bacterial and human ALDHs and suggest that the superfamily has a common ancestral gene, dating back to ~3 billion years ago . A systematic nomenclature scheme for the ALDH gene superfamily (in animals, plants, bacteria and yeasts) has been developed, based on evolutionary divergence , which has been implemented with biannual updates [19, 20] and is available via the internet (http://www.aldh.org).
ALDH proteins are conveniently classified into families and subfamilies based on the percentage of amino acid identity. Proteins sharing ≥ 40 per cent identity are assigned to a particular family designated by an Arabic numeral, whereas those sharing ≥ 60 per cent identity are classified in the same subfamily designated by a letter. These cut-off values follow the original recommendations by Margaret Dayhoff and were first applied to the cytochrome P-450 superfamily . At present, more than 130 additional gene superfamilies and large gene families follow this same format.
Antioxidants and oxidative stress increase the expression of certain ALDH genes, leading to increased protection of the cell against insult by environmental chemicals and drugs . Increased expression of certain ALDHs in tumour cells, however, leads to decreased cellular sensitivity to cyclophosphamide and other oxazaphosphorines and, thus, to clinical problems in the treatment of cancer patients . The reason for certain ALDHs -- and other non-P450 members of the [Ah] battery -- to be upregulated in some tumours  remains an enigma.
Numerous polymorphisms exist in the human ALDH genes, some of which cause inborn errors of metabolism and contribute to clinically relevant diseases . Polymorphism in the ALDH2 gene is associated with altered acetaldehyde metabolism, alcohol-induced 'flushing' syndrome, decreased risk for alcoholism and increased risk of ethanol-induced cancers. The genetic ALDH2 deficiency has also been reported as a risk factor in late-onset Alzheimer's disease . Epidemiological studies have revealed conflicting evidence about the association between the ALDH2 polymorphism and ethanol-induced hypertension . Polymorphisms in the ALDH3A2, ALDH4A1, ALDH5A1 and ALDH6A1 genes are associated with metabolic diseases, which, in most cases, are characterised by neurological complications. Mutations in ALDH3A2 are the molecular basis for SLS, an autosomal recessive disorder characterised by congenital ichthyosis, mental retardation, spasticity, ocular abnormalities and pruritus [26, 27]. Premature birth has also been observed in 73 per cent of children with SLS . Loss of ALDH4A1 function causes type II hyperprolinaemia, an autosomal recessive disorder characterised by plasma accumulation of proline and Δ1-pyrroline-5-carboxylate, as well as neurological manifestations such as seizures and mental retardation . Loss of ALDH5A1 function leads to γ-hydroxybutyric aciduria, a rare autosomal recessive disorder in GABA metabolism associated with accumulation of both GABA and γ-hydroxybutyric acid in blood serum and cerebrospinal fluid . ALDH6A1 (methylmalonic semialdehyde dehydrogenase) deficiency is an inborn metabolic disorder that results in developmental delay .
A search of the Human Gene Nomenclature Committee (HGNC) database using 'ALDH' produced 20 'hits': 19 ALDH genes plus AGPS (encoding alkylglycerone phosphate synthase). This latter entry appears in the database because one of its aliases is 'ALDHPSY'. A search of the HGNC database using 'aldehyde dehydrogenase' produced 21 hits -- the 19 putatively functional ALDH genes plus two others, AASDHPPT and ADH5 (Table 1). After various analyses, it was concluded that these latter two, evolutionarily, do not belong to the ALDH gene superfamily. AASDHPPT was found to belong to the 4'-phosphopantetheinyl transferase superfamily (pfam01648; ACPS) and ADH5 belongs to the alcohol dehydrogenase family (pfam00107: ADH_zinc_N). These two genes appear in the HGNC database in response to the cue 'aldehyde dehydrogenase', because these two words are included within their names: aminoadipate-semialdehyde dehydrogenase-phosphopantetheinyl transferase and form aldehyde dehydrogenase. Interestingly, the Enzyme Commission (EC) database gives L-aminoadipate-semialdehyde dehydrogenase the number EC 1.2.1 . -- meaning that it is closely related functionally to the other ALDH activities (EC 188.8.131.52).
The two most recently discovered ALDH genes are ALDH1L2 and ALDH16A1. The ALDH1L2 protein is very similar to ALDH1L1, which is better known as 10-formyltetrahydrofolate dehydrogenase (TFDH), a bifunctional enzyme formed from the fusion of two unrelated genes; TFDH is highly expressed in human liver, kidney and pancreas . The deduced amino acid sequence of ALDH1L1 contains three domains -- including the amino terminal (residues 1 - 203), which is approximately 30 per cent identical to phosphoribosylglycinamide formyltransferase (EC 184.108.40.206), and the carboxyl terminal (residues 417 - 902), which belongs to the ALDH superfamily . The intermediate domain (residues 204 - 416) does not appear to have any known catalytic function, although it shows significant homology with the structural domain of a calmodulin-like protein . This intermediate domain is apparently an essential structural element that aligns the two functional domains together for 10-formyl-tetrahydrofolate (10-FTHF) dehydrogenase activity , -- the primary function of this enzyme . This multidomain enzyme catalyses: (a) the NADP+-dependent oxidation of 10-FTHF to tetrahydrofolate (THF); (b) the NADP+-dependent oxidation of 2-propanal and acetaldehyde and; (c) the NADP+-independent hydrolysis of 10-FTHF to formate and THF . ALDH1L1 is involved in formate metabolism, as well as the regulation of 10-FTHF and THF, which are principal sources of folate in the cell.
The ALDH1L2 gene encodes a protein that is 72.3 per cent identical with the ALDH1L1 and is also a fusion gene, comprising three domains: (a) the formyl-trans-N-formyl transferase (pfam00551) at the amino terminal (residues 23 - 202); (b) the formyltransferase carboxyl terminal domain (pfam02911) in the middle (residues 226 - 327); and (c) the aldehyde dehydrogenase domain at the carboxyl terminal (residues 451 - 910). No functional data have yet been reported for the ALDH1L2 protein. It is worth mentioning that the BLAST scores for the ALDH domain in both the ALDH1L1 and ALDH1L2 genes are much higher than those for the other two domains, which is strong evidence to support the notion that these two genes should be listed as ALDH genes.
The ALDH16A1 gene, located at 19q13.33, was identified recently by the National Institutes of Health Mammalian Gene Collection (MGC) Program, which represents a multi-institutional effort to identify and sequence a cDNA clone containing a complete open reading frame for each human and mouse gene . The ALDH16A1 gene encodes a protein of 802 amino acids (listed in databases as 'hypothetical protein MGC10204'), which is ~35 per cent identical to putatively membrane-anchored ALDHs found in bacterial species such as Sinorhizobium meliloti 102. Putative orthologues of ALDH16A1 are found in mouse, rat and chimpanzee, and exhibit around 72 - 74 per cent amino acid identity with the human ALDH16A1.
Finally, there are three pseudogenes -- ALDH7A1P1, ALDH7A1P2 and ALDH7A1P3 -- located at Chr 5q14, 7q36 and 10q21, respectively, of which only ALDH7A1P1 meets the HGNC criteria for a pseudogene (at least 50 per cent amino acid identity across 50 per cent of the open reading frame); however, the names ALDH7A1P2 and ALDH7A1P3 are proposed because sequence homology with the ALDH7A1 gene is significantly higher than that with any other ALDH gene. An alternative nomenclature system for naming four various types of pseudogenes has recently been proposed . As is commonly seen with pseudogenes in the mammalian genome, 37 all three of these pseudogenes are found at chromosomal locations that differ from that of the ALDH7A1 functional gene from which the pseudogenes clearly originated.
The authors' current analysis of the ALDH genes within the human genome is now probably complete, and it can be concluded that the human ALDH gene superfamily comprises 19 genes in 11 families and four subfamilies (Figure 2). The ALDH1 family contains six functional genes: the cytosolic ALDH1A1 and the mitochondrial ALDH1B1 may be involved in acetaldehyde metabolism; ALDH1A1 also participates in retinal oxidation and the detoxification of cyclophosphamide; the ALDH1A2 and ALDH1A3 proteins are integral to the oxidation of retinal to retinoic acid; the ALDH1L1 gene codes for 10-FTHF dehydrogenase; the ALDH1L2 gene product is very similar to that of ALDH1L1, but no functional data are available yet. The ALDH2 family has a single member, encoding the mitochondrial ALDH that exhibits the highest affinity for acetaldehyde and is critical in ethanol metabolism. Although ALDH2 officially qualifies as a seventh member of the ALDH1 family, its longstanding name of "ALDH2" associated with ethanol emtabolism has been grandfathered into the more recent nomenclature system based on evolutionary divergence . The ALDH3A subfamily contains the dioxin-inducible ALDH3A1 and ALDH3A2, which are primarily involved in the oxidation of medium- and long-chain aliphatic and aromatic aldehydes. The ALDH3B subfamily consists of two structurally related genes, ALDH3B1 and ALDH3B2; as yet, there are no functional data for either gene product. ALDH5A1 encodes the succinic semialdehyde dehydrogenase. ALDH6A1 encodes the acetyl CoAdependent methylmalonate semialdehyde dehydrogenase. The ALDH7A1 gene product, also known as 'antiquitin', is similar to the green garden pea 26 g protein involved in the regulation of turgor pressure. ALDH8A1 appears to metabolise retinal. ALDH9A1 codes for an enzyme that participates in the metabolism of γ-aminobutyraldehyde and aminoaldehydes derived from polyamines. The ALDH16A1 gene encodes an 802-amino acid protein with as-yet unknown function. Finally, the ALDH18A1 gene encodes Δ1-pyrroline-5-carboxylate synthetase, which qualifies for classification in the ALDH superfamily based on the sequence homology of one of the protein domains (residues 361 - 772).
We thank Dr Elspeth Bruford and Tia Estey for valuable discussions and a critical reading of this manuscript. The writing of this paper was funded, in part, by NIH grants R01 EY11490 (VV) and P30 ES06096 (DWN).