Gene nomenclature by default, or BLASTing to Babel
© Henry Stewart Publications 2005
Received: 11 May 2005
Accepted: 11 May 2005
Published: 1 September 2005
The current proliferation of mammalian genomes is creating a nomenclature issue caused by naming genes based on their best BLAST hit to a gene in another annotated genome. The rat genome is relying heavily on the mouse genome for nomenclature, but not all rat genes have direct orthologues in the mouse; often, there are paralogous groups of genes -- due to expansions of that gene subfamily in one or the other genome. Many of these genes have already been assigned names in the rat, so that renaming them based on BLAST scores leads to duplicate sets of names. The supposed orthology created by name sharing across genomes is not always found. These inaccurate names are appearing in frequently used sites, such as the University of California Santa Cruz Genome Browser. The example of rat cytochrome P450 (Cyp) genes is presented here, but other gene families are also likely to be affected.
Keywordsgene nomenclature cytochrome P450 rat genome mouse genome orthologue
The rat genome has been sequenced and assembled , creating a need for rat gene nomenclature. The obvious source of gene nomenclature for the rat would seem to be the mouse genome. Ideally, orthologues should have the same name. This logic has led to an automated naming of rat genes -- leading to problems of two kinds. First, the rat has long been an experimental animal. Genes from both rat and mouse were sequenced and named for nearly 20 years before the genomes were being sequenced. In the example of cytochromes P450, the first mammalian sequences Cyp2b1 and Cyp2b2 were determined in the rat . The mouse sequences began to appear two years later with Cyp1a1 and Cyp1a2 [3, 4]. The established nomenclature for CYP genes has been in place since 1987 [5–7], and these names have been used in publications for several years. Because the names were assigned independently, mostly in chronological order, orthologues do not always carry the same name.
The second nomenclature problem has to do with divergence over time between species' genomes. Here, mouse and rat will be discussed, but the same applies to other species such as human and rhesus monkey. When similar genes appear in gene clusters, the one to one relationship of the genes between mouse and rat is often broken, meaning that the orthology is broken. Compared with the 57 CYP genes of the human, the mouse has greatly expanded its set of Cyp genes to 102 full-length genes;  the rat has been a little more conservative, with 87 Cyp genes . The solo genes in a mammalian CYP subfamily -- those that occur without related neighbours -- are strict orthologues, and so nomenclature by best reciprocal BLAST hit between mouse and rat is a viable strategy. This works for 31 mouse-rat gene pairs and one pseudogene. Eighty-seven rat genes cannot be matched up to 102 mouse genes as orthologue pairs, however, and this nomenclature method can be seen to fail in the gene clusters.
Results and Discussion
Orthology between mouse and rat Cyp genes
Figure 2 shows the Cyp4abx clusters. Notice how the rat Cyp4a1 gene has given rise to three Cyp4a genes in the mouse. By contrast, mouse Cyp4a14 has duplicated, making Cyp4a2 and Cyp4a3 in the rat, based on BLAST similarities. The mouse cluster is further complicated by an approximately 100 kilobase duplication involving the Cyp4a12 and Cyp4a30 genes. This did not happen in that rat and there does not seem to be a Cyp4a30 equivalent in that animal -- unless it might be the rat Cyp4a33-ps pseudogene. There are seven Cyp gene clusters in the rat, some being even more complex than that described for the Cyp2d and Cyp4abx clusters.
The example of mouse versus rat Cyp genes that has been chosen in this paper are by no means the only gene sets that will have this problem. In the 5th December, 2002 issue of Nature, in which the mouse genome was reported , Table 11 (p. 542) shows the top 50 InterPro domain families in mouse compared with that in human, fish, worm and fly. Cytochrome P450 is ranked 46th in the mouse and 52nd in human. The 45 other families that are more abundant than Cyp in the mouse will potentially have similar nomenclature issues. Fortunately, some of these groups (eg the homeobox genes) have a firmly established nomenclature and will not be renamed. It is not so clear what confusion will descend on the ATPase, kinase, zinc finger protein and the many other gene families.
Gene nomenclature committees have been established to impose order on gene families and in whole genomes to prevent duplication of names and multiple uses of the same root symbol. Gene nomenclature committees have been established to provide an authority that can be trusted. Ignoring the existence of naming systems in order to assign hundreds, or thousands, of names quickly to rat genes to match genes in other genomes will come with a price, and the price will be in failed communication and widespread confusion. These problems are not so different from those that must occur when a carefully constructed language is corrupted.
- Rat Genome Sequencing Project Consortium: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428: 493-521.View ArticleGoogle Scholar
- Fujii-Kuriyama Y, Mizukami Y, Kawajiri K, et al: Primary structure of a cytochrome P-450: Coding nucleotide sequence of phenobarbital-inducible cytochrome P-450 cDNA from rat liver. Proc Natl Acad Sci USA. 1982, 79: 2793-2797. 10.1073/pnas.79.9.2793.PubMed CentralView ArticlePubMedGoogle Scholar
- Kimura S, Gonzalez FJ, Nebert DW: The murine Ah locus. Comparison of the complete cytochrome P1-450 and P3-450 cDNA nucleotide and amino acid sequences. J Biol Chem. 1984, 259: 10705-10713.PubMedGoogle Scholar
- Kimura S, Gonzalez FJ, Nebert DW: Mouse cytochrome P3-450: Complete cDNA and amino acid sequence. Nucleic Acids Res. 1984, 12: 2917-2928. 10.1093/nar/12.6.2917.PubMed CentralView ArticlePubMedGoogle Scholar
- Nebert DW, Adesnik M, Coon MJ, et al: The P450 gene superfamily: Recommended nomenclature. DNA. 1987, 6: 1-11. 10.1089/dna.1987.6.1.View ArticlePubMedGoogle Scholar
- Nelson DR, Kamataki T, Waxman DJ, et al: The P450 superfamily: Update on new sequences, gene mapping, accession numbers, early trivial names of enzymes, and nomenclature. DNA Cell Biol. 1993, 12: 1-51. 10.1089/dna.1993.12.1.View ArticlePubMedGoogle Scholar
- Nelson DR, Koymans L, Kamataki T, et al: P450 superfamily: Update on new sequences, gene mapping, accession numbers and nomenclature. Pharmacogenetics. 1996, 6: 1-42. 10.1097/00008571-199602000-00002.View ArticlePubMedGoogle Scholar
- Nelson DR, Zeldin D, Hoffman S, et al: Comparison of cytochrome P450 (CYP) genes from the mouse and human genomes including nomenclature recommendations for genes, pseudogenes, and alternative-splice variants. Pharmacogenetics. 2004, 14: 1-18. 10.1097/00008571-200401000-00001.View ArticlePubMedGoogle Scholar
- The Cytochrome P450 homepage available at http://drnelson.utmem.edu/cytochromeP450.html. (last accessed 30th March, 2005)
- Mouse Genome Sequencing Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View ArticleGoogle Scholar