Update on the olfactory receptor (OR) gene superfamily

The olfactory receptor gene (OR) superfamily is the largest in the human genome. The superfamily contains 390 putatively functional genes and 465 pseudogenes arranged into 18 gene families and 300 subfamilies. Even members within the same subfamily are often located on different chromosomes. OR genes are located on all autosomes except chromosome 20, plus the X chromosome but not the Y chromosome. The gene:pseudogene ratio is lowest in human, higher in chimpanzee and highest in rat and mouse -- most likely reflecting the greater need of olfaction for survival in the rodent than in the human. The OR genes undergo allelic exclusion, each sensory neurone expressing usually only one odourant receptor allele; the mechanism by which this phenomenon is regulated is not yet understood. The nomenclature system (based on evolutionary divergence of genes into families and subfamilies of the OR gene superfamily) has been designed similarly to that originally used for the CYP gene superfamily.


Introduction
Before 1980, the names of genes and classification of their encoded proteins were highly variable and non-systematic -especially to anyone slightly outside a particular field or to a new graduate student entering the field. Professor Margaret Oakley Dayhoff was a pioneer in attempting to create order out of chaos in the naming of genes and gene families by means of computerised protein alignments. 1 She was widely recognised as the founder in this new field of gene/protein classification, before her untimely death in 1983.
Cytochrome P450 (CYP) genes are conveniently arranged into families and subfamilies based on the percentage amino acid sequence identity. 2 -7 Enzymes that share approximately 40 per cent identity are assigned to a particular family designated by an Arabic numeral, whereas those sharing approximately 55 per cent identity are grouped into a particular subfamily designated by a letter. For example, the sterol 27-hydroxylase enzyme and the 25-hydroxy-vitamin D 3 1a-hydroxylase enzyme are both assigned to the CYP27 family because they share .40 per cent sequence identity. Furthermore, the sterol 27-hydroxylase is assigned to the CYP27 'A' subfamily and the 25-hydroxy-vitamin D 3 1a-hydroxylase to the CYP27 'B' subfamily because their protein sequences are ,55 per cent identical. If an additional enzyme were to be discovered that shared .55 per cent identity with the sterol 27-hydroxylase, then it would be named CYP27A2. If an additional enzyme were to be discovered that shared ,55 per cent but .40 per cent identity with the sterol 27-hydroxylase as well as the 25-hydroxy-vitamin D 3 1a-hydroxylase, then it would be named CYP27C1. The development and application of this delightfully logical system of nomenclature to the genes of many animals, plants and bacteria 8 has eliminated the confusion that often had plagued the naming of gene families and superfamilies. Subsequently, this 'divergent evolution' nomenclature system was adopted for several hundred other gene families and superfamilies -including the olfactory receptor superfamily.

Background and history
Vertebrate olfactory receptor (OR) genes represent a category of G-protein-coupled receptors (GPCRs) that contain seven transmembrane a-helical domains and function in the reception of innumerable odour molecules in the environment. 9 The OR gene superfamily is the largest in vertebrate genomes. 10 -13 The genomic architecture of mammalian OR gene clusters shows an ancient evolutionary origin, preceding the marsupialeutherian split; species-specific evolution has further shaped the different OR gene families, by means of both gains and losses of complete clusters, as well as expansion and contraction of existing clusters. 11 This dynamic flexibility is also reflected among individual humans; examining 51 candidate OR genes on DNA chips in 189 ethnically diverse subjects, a striking amount of population diversity was found. 14 Segregating pseudogenes (SPGs) are genes that segregate in populations between intact genes and pseudogenes -due to a disruptive single nucleotide polymorphism (SNP). A range of 16-24 functional OR genes was found, just in this study alone, indicating that the OR gene superfamily is among the most pronounced examples of functional population diversity in the human genome. 14 Copy number variations (CNVs), another type of polymorphism, are also highly prevalent among human OR genes. 15,16 All these genomic events are evidence of a relatively recent process, whereby the extreme diminution of a functional repertoire in humans has occurred -a process which is presumably still ongoing.
For most mammalian species, the ability to detect millions of different odourants is critical to their survival. Based on recent OR gene mining data in the platypus, opossum, cow and dog genomes -compared with that in the rat, mouse, macaque and human genomes 13 -we are now certain that there has been a substantial expansion of the OR gene superfamily since the mammalian radiation 100 million years ago.
The evolutionary change in the number of OR genes in insects is not nearly as extensive as that in mammals. Drosophila melanogaster has a relatively   small receptor repertoire of 62 odourant receptors. 17 A comparison of 12 Drosophila species, encompassing 60 million years of divergence, shows that the number of functional OR genes has remained fairly stable. 18 Caenorhabditis elegans has a highly developed chemosensory system, which enables it to detect a wide variety of volatile (olfactory) and water-soluble (gustatory) cues associated with food, danger or other animals; between 500 and 1,000 different GPCRs are expressed in chemosensory neurones, and these may be supplemented by alternative sensory pathways as well. 19 The vertebrate OR gene repertoire has thus evolved from a subset of ancestral genes in the fly and worm. There appear to be three important periods in the evolution of the vertebrate olfactory system, as evidenced by comparative genomics: (1) the adaptation to land in amphibian ancestors; (2) the decline of olfaction in primates; and (3) the delineation of putative pheromone receptors concurrent with rodent speciation. 20 The gene: pseudogene ratio is lowest in human, higher in chimpanzee and highest in rat and mouse. This most likely reflects the necessity of olfaction for survival -more so in the rodent than in the human.
Whereas the chicken, platypus and primate genomes carry ,400 functional OR genes, the opossum and rodent genomes, not surprisingly, contain between 1,000 and 1,210 functional OR genes. 11,13 Curiously, however, it is difficult to explain why the cow genome, with 970 functional OR genes, shows more than the dog genome, with 811 functional OR genes, when dogs are considered to have such a keen sense of olfaction. 13 Thus, the number of OR genes in a species does not appear to be directly related to the environmental 'requirement' or to lifestyle.

Current bioinformatics about the OR gene superfamily
The OR gene superfamily comprises 18 gene families and 300 subfamilies (Table 1). Presently, there are 390 putatively functional ( protein-coding) OR genes and 465 OR pseudogenes located in multiple clusters of varying sizes scattered throughout all autosomes except chromosome (Chr) 20, and on the X but not the Y chromosome. 21 -23 The members of each subfamily have been placed therein because of divergent evolution, as described above. These subfamilies differ from CYP subfamilies, in that individual members within one subfamily are often located on two or more different chromosomes. The OR2T (Table 2) subfamily contains 16 functional genes -more than in any other subfamily. Evolutionary divergence of each of the 18 gene families is illustrated in Figure 1. Figure 1. A phylogenetic analysis of one representative from each family of the human OR gene repertoire. In this tree, one can see the following: (1) a general guideline for how the different families relate to one another (although this is very general, and the branching is not always this well defined); (2) the numbers near each branch denote the OR family number; (3) each pie chart size is scaled to represent the number of the OR genes inside that family (black ¼ functional genes, grey ¼ pseudogenes, yellow ¼ segregating pseudogenes [SPGs]). SPGs are genes that segregate in populations between intact genes and pseudogenes -due to a disruptive SNP. 34 This disruptive mutation can introduce a stop codon, or alter a highly conserved amino acid that is important for proper function of the protein. In Tables 1-5, the SPGs are counted as 'functional genes' or 'pseudogenes', according to the Human Genome Project public version. Additional information can be found at the HORDE database (http://bioportal.weizmann.ac.il/ HORDE/).
Family OR12 genes are located only on chromosome 6. Family OR13 genes are located on chromosomes 9, 1, X and 10.
Family OR14 genes are located only on chromosome 1.

Continued
Note that, in many instances, some subfamilies contain only a single gene or only a single pseudogene (Tables 2 -5). In fact, the OR7E subfamily has only one functional gene, and all the other 85 members are pseudogenes ( Table 3). The OR7E subfamily is the largest subfamily in the human OR gene repertoire, and probably has expanded in the human genome through a series of segmental gene duplication events. 24 The newly described human OR14 gene family (Table 4) was realised after analysis of the platypus and opossum OR gene repertoires. This analysis revealed that six human OR functional genes and one OR pseudogene (which previously had been classified within the OR5 family) are actually derived from a distinct platypus OR gene family. 11,25 The evolutionary divergence of the OR14 gene family is shown in Figure 2.
The 'shotgun' splattering of OR genes throughout the human genome must have happened before speciation of Homo sapiens and the development of its 22 autosomes plus the X and Y chromosomes; this can be inferred from the high conservation of the OR genes' genomic organisation among marsupial and eutherian mammals, 11 and the phylogenetic analysis of the platypus OR gene repertoire -by comparison with that in mammals. 13,25 In contrast to this OR gene arrangement would be the establishment of the CYP gene subfamilies, which arose as syntenic clusters of members within a single chromosomal segment. This finding suggests that gene duplication events within CYP subfamilies occurred after mammalian speciation and development of the autosomes and sex chromosomes.
The two largest OR gene clusters are located on Chr 11, with 38 functional genes (51 per cent of total) on 11q (Cluster 11@5.0) and 44 functional genes (45 per cent) on 11p (Cluster 11@55.6). These genes are predominantly in OR families 51, 52, 55 and 56 (Table 5). Immersed within these two clusters are dozens of other non-OR-related genes. This intrusion of other non-OR-related genes can also be seen in all other OR gene clusters throughout the genome.

Future directions: Additional subsets of sensory reception genes and identification of ligands
A recently appreciated discovery in olfaction is the unique specialisation of sensory neurones, such that each individual sensory neurone is stochastically chosen to express usually only one odourant receptor allele. This mechanism of 'allelic exclusion', by which mutually exclusive expression of odourant receptor genes is regulated, remains unclear at present. 20,26,27 The vomeronasal-1 receptor genes (VN1R) also encode GPCRs and, while they encode odourant receptors, they are evolutionarily distinct 20 from the very large OR gene superfamily. There are five VN1R genes and nine VN1R pseudogenes. The VN1R1, VN1R2 and VN1R4 genes and VN1R6P pseudogene are located at Chr 19q13.42; the VN1R10P, VN1R11P, VN1R12P, VN1R13P and VN1R14P pseudogenes are located on Chr 6p21; VN1R7P and VN1R8P are on Chr 21p11.2; The OR51, OR52, OR55 and OR56 genes are located only on chromosome 11.

UPDATE ON GENE COMPLETIONS AND ANNOTATIONS
Olender, Lancet and Nebert Figure 2. A phylogenetic analysis of platypus, opossum and human OR genes for the new family 14 only. Opossum ¼ black for intact genes, grey for pseudogenes. Platypus ¼ red for intact genes, pink for pseudogenes. Human ¼ blue. OR14 is an expansion of three ancestral OR gene subfamilies (A, B and C); the expansion, in both platypus and opossum, took place after speciation, whereas only one branch shows an orthologous relationship between platypus and human (marked with *). The tree was generated with MEGA4, using the nearest-neighbour-joining algorithm, and distances with the Poisson correction model. Bootstrap units are also indicated. 34 This tree is grounded with the OR51E1 gene. Only genes with no more than two frame disruptions were considered in the analysis.
VN1R3 is alone on Chr 16p11.2; VN1R5 is alone on Chr 1q44; VN1R9P is alone on Chr 22. 28 At the present time, information about the ligands for mammalian OR genes is very limited. The smell of lemons (limonene), the perception of a floral or woody smell (acetophenone) 29 and the ability to smell isovaleric acid 30 have been mapped in the mouse to two specific genomic loci on Chr 4 (Iva1) and Chr 6 (Iva2). In humans, isovaleric acid was found to be highly associated with the OR11H7P segregating pseudogene, which is not syntenic with either Iva1 or Iva2. 31 Another recent study found that human OR7D4 is selectively activated in vitro by androstenone; interestingly, this study found that two non-synonymous SNPs account for a significant proportion of the variance in smell perception of androstenone. 32 Members of the gustatory receptor (Gr) gene family in Drosophila are expressed in chemosensory neurones and are known to mediate the perception of sugars, bitter substrates, carbon dioxide and pheromones. Intriguingly, some of these Gr genes have now been shown to be expressed in abdominal multi-dendritic neurones, hygroreceptive neurones of the arista, peripheral proprioceptive neurones in the legs, neurones in the larval and adult brain, and oenocytes. 33 Along these same lines, we and others have observed several OR genes being significantly up-or downregulated in the liver or kidney of knockout mouse lines -that is, in tissues not normally known to be involved in olfaction. It is therefore tempting to speculate that the receptors encoded by OR genes, as well as by Gr genes, might participate in the roles of detecting endogenous ligands.