Update of the human secretoglobin (SCGB) gene superfamily and an example of 'evolutionary bloom' of androgen-binding protein genes within the mouse Scgb gene superfamily

The secretoglobins (SCGBs) comprise a family of small, secreted proteins found in animals exclusively of mammalian lineage. There are 11 human SCGB genes and five pseudogenes. Interestingly, mice have 68 Scgb genes, four of which are highly orthologous to human SCGB genes; the remainder represent an 'evolutionary bloom' and make up a large gene family represented by only six counterparts in humans. SCGBs are found in high concentrations in many mammalian secretions, including fluids of the lung, lacrimal gland, salivary gland, prostate and uterus. Whereas the biological activities of most individual SCGBs have not been fully characterised, what already has been discovered suggests that this family has an important role in the modulation of inflammation, tissue repair and tumorigenesis. In mice, the large Scgb1b and Scgb2b gene families encode the androgen-binding proteins, which have been shown to play a role in mate selection. Although much has been learned about SCGBs in recent years, clearly more research remains to be done to allow a better understanding of the roles of these proteins in human health and disease. Such information is predicted to reveal valuable novel drug targets for the treatment of inflammation, as well as designing biomarkers that might identify tissue damage or cancer.


Introduction
The secretoglobins (SCGBs) comprise a family of secreted proteins found in mammals and marsupials. The first discovered SCGB was found in rabbits and was first called blastokinin, 1 then later uteroglobin 2 and is now designated SCGB1A1 (in some early literature, the SCGB family is referred to as the 'uteroglobin' family). Eventually, the term 'secretoglobin' was coined to refer to the characteristics that all family members have in common. The 'secreto-' portion of the name indicates that these proteins are secreted. A second reason was proposed for the suffix 'globin'; their functions had largely remained a secret (Lehrer, R., personal communication). This suffix was given because secretoglobins form dimers consisting of two four-a helix-bundle monomers, creating a hydrophobic binding pocket, reminiscent of the globinfold, which is an eight-a-helix bundle with a pocket for a molecule such as a heme group. 3 Secretoglobins are found at high levels in many secretions, including uterine, prostatic, pulmonary, lacrimal and salivary glands, with any specific secretoglobin often being expressed in more than one tissue. For example, mRNA expression of every SCGB family member (except SCGB1D2) has been demonstrated in human airways. 4 In general, the physiological and pathophysiological functions of most individual SCGBs remain to be defined. Nevertheless, roles currently ascribed to SCGBs include lung maintenance and repair, immune modulation and, at least in rodents, mate selection. Some SCGB family members, such as mammaglobin, have been successfully used as epithelial cancer biomarkers.
SCGBs are small ( 10 kDa in humans) proteins that dimerise before secretion. Dimers are resistant to proteases, heat and pH. 5,6 The crystal structures of several SCGBs have been resolved, including those of rabbit and rat uteroglobin (Protein Data Bank identifiers [PDB ID]:1UTG, 2UTG, 1UTR), rat Clara-cell specific protein (CCSP) (PDB ID:1CCD) and feline CH2 (Feld-1) (PDB ID:1PUO, 1ZKR, 2EJN). 7 These proteins contain four a-helical structures and assemble into homo-or hetero-dimers orientated in anti-parallel fashion, held together by covalent disulphide bonds (via one to three conserved Cys residues) and non-covalent interactions. 8 The uteroglobin (UGB) dimer forms an internal hydrophobic cavity, located at the interface between the two subunits; this is the location of binding of hydrophobic ligands, including steroid hormones, some polychlorinated biphenyl metabolites, retinoids and various eicosanoid mediators of inflammation. 9,10 UGB's subunits consist of four a-helices which do not form a canonical four-helix bundle motif but, rather, a boomerang-shaped structure. The subunits are connected in an antiparallel fashion to form a dimer in which helices 3 and 4 are involved in the dimer interface. In the structure of SCGB1A1, six residues (Phe6, Leu13, Tyr21, Phe28, Met41 and Ile63) in each subunit have been identified as being particularly important to this aspect of UGB structure. 11 All of these, except Phe28, are accessible to the ligand, which probably functions in maintaining the dimer interface. The other five are involved in ligand binding. The aromatic residues Phe6 and Tyr21 are critical to this binding and cannot be replaced by aliphatic amino acids. Conversely, Leu13 is accessible to solvent in the hydrophobic pocket and is commonly substituted by aromatic amino acids. This suggests that Leu13 may be involved in determining ligand specificity.

Sources of secretoglobin genes and proteins
Protein sequences for human SCGBs were accessed from Uniprot 12 through the HUGO Gene Nomenclature Committee website (http://www. genenames.org). Sequences for mouse SCGBs were retrieved from the National Center for Biotechnology Information (NCBI) gene database (http://www.ncbi.nlm.nih.gov/gene), and from the 'supplementary data' of Laukaitis et al. 13 Sequences were aligned with T-COFFEE using the most accurate mode, which combines multiple sources of sequence homology and structural information, where available. 14

Human gene family members
As is commonly the case for a newly discovered family of proteins, SCGBs were originally named based on the location in which they were most highly expressed; this led to the same SCGB often being 'rediscovered' and named multiple times. In 2000, a standard nomenclature was established, when all proteins in the family were named SCGBs and assigned family and subfamily names. 3 The nomenclature system was based on that used for the cytochrome P450 15,16 and nuclear hormone receptor 17 superfamilies, and was guided by the phylogenetic relationships of known SCGB family members, assembled by Ni and colleagues. 18 This provided a convenient and systematic naming system for an entire superfamily. In this report, the most common names used for each protein are listed, along with their standardised names. The human genome contains 11 SCGB genes and five pseudogenes ( Figure 1).

SCGB1A1
subfamily UGB, also known as blastokinin and CCSP (SCGB1A1), was initially discovered in the rabbit uterus and is the founding family member. 2 For this reason, more information about its biology is available than for many of the other SCGBs. These proteins differ from other SCGBs in that they are homodimers -that is, they are composed of two identical monomers and their subunits lack the middle Cys residue found in other SCGBs. In humans, high SCGB1A1 levels are found in peripheral airway surface fluid, where it is one of the most abundant proteins; it is also expressed in uterine endometrium and the prostate. 20 In the airways, SCGB1A1 is expressed in several cell types, especially Clara cells, and appears to play a role in immunomodulation through regulation of cell infiltration and in tissue repair after injury. 20 SCGB1A1 may also exert anti-tumorigenic activity. For example, ablation of the mouse Scgb1a1 gene in some strains is usually lethal and survivors develop tumours. 21 Conversely, recombinant SCGB1A1 inhibits proliferation and invasion of some cancer cell lines. 20 Studies of the Scgb1a1 ( -/ -) knockout mouse suggest that SCGB1A1 may provide protection from oxidative stress and exert anti-inflammatory actions, in addition to providing resistance to pollutant-induced injury. 22 Interestingly, SCGB1A1 is initially downregulated to allow the body to respond to an infection. 23

SCGB1B subfamily
The human genome contains six genes that cluster phylogenetically with genes encoding mouse androgen-binding proteins (Scgb1b and Scgb2b). These genes were described based on genomic analysis, and have been given SCGB4A designations. 24 Based on phylogenetic clustering of their protein sequences, however, we propose that these genes be changed to SCGB1B and SCGB2B designations, to reflect their similarity to the mouse proteins.
The SCGB1B subfamily includes SCGB1B1P (formerly ABPA1P), SCGB1B2P (formerly SCGB4A1P) and SCGB1B3 (formerly SCGB4A4). SCGB1B1P and SCGB1B2P are predicted to have become pseudogenes, whereas SCGB1B3 has no obvious inactivating mutations. SCGBs. For simplicity, and to avoid clutter, of the mouse androgen-binding protein (ABP) group, only SCGB1B27 (ABPA) and SCGB2B27 (ABPGB) are included. SCGB protein sequences were aligned using TCOFFEE 14 and analysed using nearest-neighbour-joining methods, as well as using 10,000 bootstrap replicates in the Phylip package. 19 Nodes with 50 per cent bootstrap confidence levels have been labelled.
Interestingly, however, SCGB1B2P is the only SCGB1B member having evidence for expression in expressed sequence tag (EST) databases.

SCGB1C1
SCGB1C1 has been shown to be localised to Bowman's glands in the olfactory mucosa. Here, it is thought to act as an odorant-binding protein, with ligands appearing to be small, hydrophobic molecules. 25 SCGB1D subfamily SCGB1D1 and SCGB1D2 are also known as lipophilin A and lipophilin B, respectively. The lipophilins form heterodimers with SCGB2A proteins, which further associate to form tetramers. They have been identified in the prostatic fluid of rats and in the lacrimal gland fluid of humans and rabbits; 26 little is known about their function.
SCGB1D4 is widely distributed throughout the body; however, expression is particularly strong in the lymph node, tonsil, cultured lymphoblasts and ovary. It is inducible by interferon-g in lymphoblast cells. SCGB1D4 appears to exhibit immunological functions, including regulation of chemotactic migration and invasion. 27 There is one pseudogene in this subfamily identified as SCGB1D1P1. We propose that it be renamed SCGB1D5P, keeping it in line with the other members of this subfamily.
SCGB2A subfamily SCGB2A1 is also known as lipophilin C. SCGB2A2 is also known as mammaglobin and is expressed in a highly tissue-specific manner in breast epithelium, where it forms heterodimers with SCGB1D2. 28

SCGB2B subfamily
The SCGB2B gene subfamily includes SCGB2B1P (formerly ABPBG1P), SCGB2B2 (formerly SCGB4A2, SCGBL) and SCGB2B3P (formerly SCGB4A3P). Of the SCGB1B and SCGB2B subfamilies, only SCGBL is listed in the HGNC database, but we propose a name change to include it in the SCGB superfamily-naming system. There is only a single reference to SCGB2B2 in the literature 24 but there is evidence for its expression in EST databases.
SCGB3A subfamily SCGB3A1 and SCGB3A2, identified in 2002, have high structural homology 29 with SCGB1A1. Their expression appears to be localised principally to epithelial organs, such as the lung, mammary gland, trachea, prostate and salivary gland. 30 In the bronchial epithelium, expression is decreased after injury. 29 It has been proposed 29 that SCGB3A1 might have similar and overlapping expression and function with SCGB1A1.
SCGB3A1 is a candidate tumour-suppressor gene and a target gene for endothelial PAS domain protein 1 (EPAS1-formerly HIF2a). 31 SCGB3A1 expression is diminished in many human cancers (including lung, prostate, pancreatic and nasopharyngeal); hypermethylation of the SCGB3A1 promoter has also been reported for many malignancies. 31 SCGB3A2 has been shown to be induced by T-helper cell 1 (Th1) cytokines but suppressed by proinflammatory and Th2 cytokines. 4 Any given cytokine can evoke different responses in different SCGBs. 4 Intranasal administration of recombinant SCGB3A2 suppresses allergen-induced lung inflammation, further highlighting similarities between SCGB3A2 and SCGB1A1. 32

Scgb1a1
This gene encodes mouse UGB and is orthologous to human SCGB1A1.

Scgb1c1
This gene encodes a protein that is the mouse equivalent of human SCGB1C1.
Scgb1b and Scgb2b: The androgen-binding protein (ABP) family Sixty-four of the 68 mouse Scgb genes belong to a family that has been called the ABP family. 13 These proteins are heterodimers consisting of two distinct types of subunits, SCGB1B (previously called ABPA-like), and SCGB2B (previously ABPBG-like). These were originally isolated from mouse saliva and described based on their ability to bind androgens. 33 ABPs have since been shown to be expressed in glands of the face and neck, as well as in the prostate and ovary. 34 The role of ABPs in communication is supported by the expression of many Abpa (Scgb1b) and Abpbg (Scgb2b) mRNAs in the brain (olfactory lobe), sensory organs (olfactory epithelium, vomeronasal organ), glands of the head and neck (parotid, sublingual, submaxillary and lacrimal) and sexual tissues (prostate and ovary and preputial and clitoral glands). 13 Scgb3a genes Scgb3a1 and Scgb3a2 encode predicted proteins that align well with human SCGB3A1 and SCGB3A2 protein structures and are most likely orthologous to them.
Evolution SCGB members have amino acid sequences that are highly divergent within the superfamily, complicating the identification of group members. To test whether all entries found were related to known SCGBs, a jackHMMER profile was created (an iterated sequence profile search, seeded with human SCGB1A1), which confirmed group membership for all human and mouse SCGBs 35 with an expected value of less than 0.001. Homologene, 36 a software program that analyses groups of homologous proteins across multiple species, currently recognises 21 SCGB clades ( Figure 2). The SCGB genes encode proteins that all have a similar structure. 18 Despite high amino acid sequence divergence, many structural features (such as helical bundles and the ability to dimerise) are retained. 11,18 This is consistent with a highly flexible and rapidly evolving gene superfamily and is likely to have aided in the evolution of the diverse functions of the superfamily.
When SCGBs were named in 2000, six human SCGBs were described and divided into five groups, based on proposed evolutionary relationships. 18 Currently, there are 11 described human SCGB genes. Figure 1 shows mouse and human proteins on a phylogenetic tree for this family. In the case of the ABP proteins, we have used SCGB1B27 (ABPA27) and SCGB2B27 (ABPBG27) to represent the mouse SCGB1B and SCGB2B groups, respectively. Table 1 lists chromosomal locations of human SCGBs, and only those mouse genes that share orthology. Four human SCGBs have direct mouse orthologues; the ABP subfamily includes three human SCGB1Bs versus 30 mouse Scgb1bs and three human SCGB2Bs versus 34 mouse Scgb2bs. The human genome contains the SCGB1D and SCGB2A subfamilies, both of which are absent in the mouse.
The ABP (Scgb1b/Scgb2b) family contains genes for two different types of subunit, ABPA (SCGB1B) and ABPBG (SCGB2B), 37 located adjacent to each other on mouse chromosome 7 ( Table 2). This 'recent, phylogenetically independent proliferation of close paralogs, or lineage specific gene family expansion' is an example of an 'evolutionary bloom'. 37 Another example of this has been most notably studied in the large and diverse cytochrome P450 family. 38 It has been suggested that these evolutionary blooms might represent simply a stochastic process. 37 The genes that encode any Scgb1b/Scgb2b pair tend to be next to each other on the chromosome and orientated in a 'head-to-head' (3 0 -5 0 j5 0 -3 0 ) fashion. These structures have been called 'modules'. 39 It appears that there was a single Scbg1b -Scgb2b module which has expanded dramatically in some species (64 genes in mouse, 43 in rabbit). In other species it has resulted largely in pseudogenes, such as those of the primate lineage, or been lost altogether in species such as the shrew and elephant. 13 Interestingly, in humans there are three such modules. Although at least two modules have become pseudogenes, it remains possible that the SCGB1B2-SCGB2B2 module might be active, based on EST data. The mouse shows the most extensive expansion, which began in the ancestor of the genus Mus 13 after divergence from rat, less than 17 million years ago, and apparently has involved two different modes of duplication. 39

Association of SCGBs with disease
SCGBs have been linked to multiple disease states, either as participants or as biomarkers. SCGB1A1 may serve as an early biomarker for lung injury, owing to the regenerative role of cells that secrete SCGB1A1. 20,40 In addition, SCGB1A1 may act as a tumour suppressor 41 and has been shown to be upor downregulated in various human lung cancers. 42 Figure 2. Phylogenetic tree of mouse androgen-binding proteins. Protein sequences were aligned using TCOFFEE 14 and analysed using nearest-neighbour-joining methods, as well as using 10,000 bootstrap replicates in the Phylip package. 19  Table 2. The mouse androgen-binding protein (ABP) family, complete with the newly proposed Scgb nomenclature. Indicated are the recommended gene symbol, previously published symbol, chromosomal location (Chr), strand, and start and end locations. These data were adapted from Laukaitis et al. 13 Genomic locations were updated from Build 36 (mm8) of the mouse to Build 37 (mm9) using the University of California, Santa Cruz (UCSC) BLAT tool. Records from the National Center for Biotechnology Information (NCBI) gene database that correspond to mouse ABPs were aligned to these records using TCOFFEE 14 and analysed using nearest-neighbour-joining methods, as well as using 10,000 bootstrap replicates in the Phylip package; 19 analogous records are placed on the same line. In some cases, the start and end locations are different from those reported by Laukaitis et al., presumably because of differences in gene prediction algorithms  43 In addition, SCGB1D2 has been reported to be upregulated in breast cancer, making it a potential marker for this type of malignancy. 44 In this context, panels of autoantibodies to tumour-associated antigens in breast cancer include SCGB1D2, which, when combined with others, may have diagnostic potential. SCGB1D2 is also downregulated in pituitary adenomas. 45 SCGB2A1 has been shown to be a prognostic marker in epithelial ovarian cancer 46,47 and endometrial cancer. 48 Because SCGB2A2 expression is highly specific to breast epithelial tissue, it has been proposed as a marker for detecting breast cancer metastases to sentinel lymph nodes and distant tissues. 49 -51 SCGB2A1 overexpression has also been evaluated as a marker for breast cancer, with mixed conclusions. 28,52 -55 SCGB3A1 has been shown to be differentially expressed in smokers with lung cancer. 56 Its decreased expression has been correlated with increased tumour burden in non-small-cell lung cancer. 31 A SCGB3A2 polymorphism has been associated with increased asthma risk in a Japanese population. 57,58 In chronic rhinosinusitis, SCGB3A2 levels in sino-nasal tissue are inversely correlated with the total number of infiltrating inflammatory cells, as well as scores of symptom severity. 4

Conclusions
The SCGBs represent an intriguing family of biologically active proteins. The relatively recent revelations of anti-inflammatory and immunomodulatory functions, together with their potential as cancer biomarkers, underscore their physiological and pathophysiological importance. However, a great deal more needs to be elucidated regarding the actions of individual SCGBs. Further studies directed at characterising the individual SCGBs are necessary, the results of which are likely to yield valuable targets for therapeutic intervention.
One of the most intriguing characteristics of the mammalian 'Abp' genes, the Scgb1b/Scgb2b subset of the SCGB gene family, is their evolutionarily independent expansions (so-called 'evolutionary blooms') in a number of mammalian lineages. Discovery of the reason for these blooms may lead to a better understanding of how these SCGBs function in different mammals.