Methylation-mediated deamination of 5-methylcytosine appears to give rise to mutations causing human inherited disease in CpNpG trinucleotides, as well as in CpG dinucleotides
© Henry Stewart Publications 2010
Received: 28 April 2010
Accepted: 28 April 2010
Published: 1 August 2010
The cytosine-guanine (CpG) dinucleotide has long been known to be a hotspot for pathological mutation in the human genome. This hypermutability is related to its role as the major site of cytosine methylation with the attendant risk of spontaneous deamination of 5-methylcytosine (5mC) to yield thymine. Cytosine methylation, however, also occurs in the context of CpNpG sites in the human genome, an unsurprising finding since the intrinsic symmetry of CpNpG renders it capable of supporting a semi-conservative model of replication of the methylation pattern. Recently, it has become clear that significant DNA methylation occurs in a CpHpG context (where H = A, C or T) in a variety of human somatic tissues. If we assume that CpHpG methylation also occurs in the germline, and that 5mC deamination can occur within a CpHpG context, then we might surmise that methylated CpHpG sites could also constitute mutation hotspots causing human genetic disease. To test this postulate, 54,625 missense and nonsense mutations from 2,113 genes causing inherited disease were retrieved from the Human Gene Mutation Database http://www.hgmd.org. Some 18.2 per cent of these pathological lesions were found to be C → T and G → A transitions located in CpG dinucleotides (compatible with a model of methylation-mediated deamination of 5mC), an approximately ten-fold higher proportion than would have been expected by chance alone. The corresponding proportion for the CpHpG trinucleotide was 9.9 per cent, an approximately two-fold higher proportion than would have been expected by chance. We therefore estimate that ~5 per cent of missense/nonsense mutations causing human inherited disease may be attributable to methylation-mediated deamination of 5mC within a CpHpG context.
Man's yesterday may ne'er be like his morrow; Nought may endure but Mutabililty.
Percy Bysshe Shelley (1816) Mutability
The first hint that the cytosine-guanine (CpG) dinucleotide might constitute a hotspot for pathological mutations in the human genome came nearly 25 years ago with the finding that two different CGA → TGA (Arg → Term) nonsense mutations had recurred quite independently at different locations in the factor VIII (F8) gene causing haemophilia A . The potential generality of this phenomenon was supported by the finding that 12 of the 34 (35 per cent) single base-pair (bp) substitutions then known to cause human inherited disease were C → T and G → A transitions within CpG dinucleotides . Further studies soon confirmed that the CpG dinucleotide was a mutation hotspot in a variety of different human disease genes, including PAH, F9, LDLR, RB1, HPRT1 and DMD . As mutation data accumulated, CGA → TGA transitions were encountered particularly frequently as a cause of human genetic disease; such nonsense mutations are inherently more likely to come to clinical attention than missense mutations [9, 10].
From the outset, it was realised that the hyper-mutability of the CpG dinucleotide was related to its role as the major site of cytosine methylation in the human genome. The reason traditionally put forward to explain this association has been that, while cytosine spontaneously deaminates to uracil (which is efficiently recognised as a non-DNA base and removed by uracil-DNA glycosylase), the spontaneous deamination of 5-methylcytosine (5mC) yields thymine,  thereby creating G•T mismatches whose removal by methyl-CpG binding domain protein 4 (MBD4) and/or thymine DNA glycosylase followed by base excision repair is error prone [12–16]. It remains possible, however, that mCpG transitions are not exclusively caused by the spontaneous deamination of 5-methylcytosine and may also arise through the action of other mechanisms and processes [17–19]. Irrespective of the precise nature of the underlying mechanism, Krawczak et al. (1998)  estimated that the rate of CG → TG (and CG → CA on the other strand) transitions was five times the base mutation rate. Subsequent estimates of 5mC hypermutability--derived from various studies of polymorphism, disease mutations or evolutionary divergence--have ranged between four-fold and 15-fold [20–26].
It has been known for some time that cytosine methylation also occurs in the context of CpNpG sites in mammalian genomes, where N represents any nucleotide [27, 28]. Since the intrinsic symmetry of the CpNpG trinucleotide would support a semi-conservative model of replication of the methylation pattern (as with the CpG dinucleotide), it comes as no surprise that both maintenance and de novo methylation occurs at CpNpG sites in mammalian cells . In their recent paper on the human methylome, Lister et al.  reported abundant DNA methylation in CpHpG trinucleotides (where H = A, C or T). Specifically, some 17.3 per cent of 5mC in embryonic stem cells was found to occur within CpApG, CpCpG and CpTpG, with a further 7.2 per cent of 5mC occurring in CpHpH. Although Lister et al.  suggested that non-CpG methylation is almost entirely lost upon differentiation (a conclusion based solely upon the analysis of foetal lung fibroblasts), others have noted CpNpG methylation within human genes in a variety of different somatic tissues [30, 31]. Although the extent of non-CpG methylation in the germ-line remains unclear, if we were to assume not only that CpHpG methylation occurs in the germline, but also that 5mC deamination can occur within a CpHpG context, then it is very likely that methylated CpHpG sites would constitute mutation hot-spots. Indirect evidence that this might indeed be the case has come from a disproportionately high number of C → T and G → A transitions at CpNpG sites in studies of the human NF1 and BRCA1 genes. In the light of the above, we have revisited the question of CpG dinucleotide hyper-mutability and explored the potential contribution that CpHpG transitions might make to human inherited disease.
Numbers of C → T and G → A mutations found in CpG dinucleotides and CpHpG trinucleotides in a dataset of 54,625 missense and nonsense mutations in 2,113 different human genes (HGMD) and the numbers of possible C → T and G → A mutations in CpG dinucleotides and CpHpG trinucleotides within the coding regions of the mutated genes.
Number of mutations in
not in di/trinucleotide
From the data presented in Table 1, we estimate that ~11.8 per cent of the 9,947 CpG mutations (ie 1,176) occurred within this dinucleotide by chance alone and hence would not necessarily have originated via the methylation-mediated deamination of 5mC. In a similar vein, we estimate that ~46 per cent (2,460) of the CpHpG mutations (5,402) occurred within these trinucleotides by chance alone and hence may not have originated via methylation-mediated deamination of 5mC. The other side of this particular coin, however, is that the remaining 54 per cent of the 5,402 observed CpHpG mutations in HGMD (ie the excess 2,842 over expectation, or ~5 per cent of all the missense/nonsense mutations analysed) may well be attributable to methylation-mediated deamination of 5mC within a CpHpG context. As far as we are aware, this is the first (albeit crude) estimate of the potential impact of CpHpG mutations on human inherited disease.
Numbers of C → T and G → A mutations found in CpG dinucleotides and CpHpG trinucleotides in a dataset of 1,766 regulatory mutations of 191 gene promoters (HGMD) and the numbers of possible C → T and G → A mutations in CpG dinucleotides and CpHpG trinucleotides.
Number of mutations in
not in di/trinucleotide
6.03 × 10-9
Although two specific examples of non-CpG methylation altering the binding of transcription factors to promoter elements within human genes have so far been reported, [36, 37] the functional role of most non-CpG methylation in the human genome is still unclear. Irrespective of the functionality or otherwise of this specific type of post-synthetic DNA modification in the human genome, it would appear that methylation of the CpHpG trinucleotide may leave a significant imprint on the spectrum of missense/nonsense mutations causing human genetic disease.
- Youssoufian H, Kazazian HH, Phillips DG, Aronis S, et al: Recurrent mutations in haemophilia A give evidence for CpG mutation hotspots. Nature. 1986, 324: 380-382. 10.1038/324380a0.View ArticlePubMedGoogle Scholar
- Cooper DN, Youssoufian H: The CpG dinucleotide and human genetic disease. Hum Genet. 1988, 78: 151-155. 10.1007/BF00278187.View ArticlePubMedGoogle Scholar
- Abadie V, Lyonnet S, Maurin N, Berthelon M, et al: CpG dinucleotides are mutation hot spots in phenylketonuria. Genomics. 1989, 5: 936-939. 10.1016/0888-7543(89)90137-7.View ArticlePubMedGoogle Scholar
- Koeberl DD, Bottema CD, Ketterling RP, Bridge PJ, et al: Mutations causing hemophilia B: Direct estimate of the underlying rates of spontaneous germ-line transitions, transversions, and deletions in a human gene. Am J Hum Genet. 1990, 47: 202-217.PubMed CentralPubMedGoogle Scholar
- Rideout WM, Coetzee GA, Olumi AF, Jones PA: 5-Methylcytosine as an endogenous mutagen in the human LDL receptor and p53 genes. Science. 1990, 249: 1288-1290. 10.1126/science.1697983.View ArticlePubMedGoogle Scholar
- Mancini D, Singh S, Ainsworth P, Rodenhiser D: Constitutively methylated CpG dinucleotides as mutation hot spots in the retinoblastoma gene (RB1). Am J Hum Genet. 1997, 61: 80-87. 10.1086/513898.PubMed CentralView ArticlePubMedGoogle Scholar
- O'Neill JP, Finette BA: Transition mutations at CpG dinucleotides are the most frequent in vivo spontaneous single-based substitution mutation in the human HPRT gene. Environ Mol Mutagen. 1998, 32: 188-191. 10.1002/(SICI)1098-2280(1998)32:2<188::AID-EM16>3.0.CO;2-Y.View ArticlePubMedGoogle Scholar
- Buzin CH, Feng J, Yan J, Scaringe W, et al: Mutation rates in the dystrophin gene: A hotspot of mutation at a CpG dinucleotide. Hum Mutat. 2005, 25: 177-188. 10.1002/humu.20132.View ArticlePubMedGoogle Scholar
- Krawczak M, Ball EV, Cooper DN: Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet. 1998, 63: 474-488. 10.1086/301965.PubMed CentralView ArticlePubMedGoogle Scholar
- Mort M, Ivanov D, Cooper DN, Chuzhanova NA: A meta-analysis of nonsense mutations causing human genetic disease. Hum Mutat. 2008, 29: 1037-1047. 10.1002/humu.20763.View ArticlePubMedGoogle Scholar
- Shen JC, Rideout WM, Jones PA: The rate of hydrolytic deamination of 5-methylcytosine in double-stranded DNA. Nucleic Acids Res. 1994, 22: 972-976. 10.1093/nar/22.6.972.PubMed CentralView ArticlePubMedGoogle Scholar
- Hendrich B, Hardeland U, Ng HH, Jiricny J, et al: The thymine glycosylase MBD4 can bind to the product of deamination at methylated CpG sites. Nature. 1999, 401: 301-304. 10.1038/45843.View ArticlePubMedGoogle Scholar
- Waters TR, Swann PF: Thymine-DNA glycosylase G to A transition mutations at CpG sites. Mutat Res. 2000, 462: 137-147. 10.1016/S1383-5742(00)00031-4.View ArticlePubMedGoogle Scholar
- Walsh CP, Xu GL: Cytosine methylation DNA repair. Curr Top Microbiol Immunol. 2006, 301: 283-315. 10.1007/3-540-31390-7_11.PubMedGoogle Scholar
- Cortázar D, Kunz C, Saito Y, Steinacher R, et al: The enigmatic thymine DNA glycosylase. DNA Repair. 2007, 6: 489-504. 10.1016/j.dnarep.2006.10.013.View ArticlePubMedGoogle Scholar
- Boland MJ, Christman JK: Characterization of Dnmt3b:thymine-DNA glycosylase interaction and stimulation of thymine glycosylase-mediated repair by DNA methyltransferase(s) and RNA. J Mol Biol. 2008, 379: 492-504. 10.1016/j.jmb.2008.02.049.PubMed CentralView ArticlePubMedGoogle Scholar
- Shen JC, Rideout WM, Jones PA: High frequency mutagenesis by a DNA methyltransferase. Cell. 1992, 71: 1073-1080. 10.1016/S0092-8674(05)80057-1.View ArticlePubMedGoogle Scholar
- Zhang X, Mathews CK: Effect of DNA cytosine methylation upon deamination-induced mutagenesis in a natural target sequence in duplex DNA. J Biol Chem. 1994, 269: 7066-7069.PubMedGoogle Scholar
- Pfeifer GP: Mutagenesis at methylated CpG sequences. Curr Top Microbiol Immunol. 2006, 301: 259-281. 10.1007/3-540-31390-7_10.PubMedGoogle Scholar
- Nachman MW, Crowell SL: Estimate of the mutation rate per nucleotide in humans. Genetics. 2000, 156: 297-304.PubMed CentralPubMedGoogle Scholar
- Kondrashov AS: Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum Mutat. 2003, 21: 12-27. 10.1002/humu.10147.View ArticlePubMedGoogle Scholar
- Tomso DJ, Bell DA: Sequence context at human single nucleotide polymorphisms: Overrepresentation of CpG dinucleotide at polymorphic sites and suppression of variation in CpG islands. J Mol Biol. 2003, 327: 303-308. 10.1016/S0022-2836(03)00120-7.View ArticlePubMedGoogle Scholar
- Jiang C, Zhao Z: Directionality of point mutation and 5-methylcytosine deamination rates in the chimpanzee genome. BMC Genomics. 2006, 7: 316-10.1186/1471-2164-7-316.PubMed CentralView ArticlePubMedGoogle Scholar
- Elango N, Kim SH, Vigoda E, Yi SV: Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation. PLoS Comput Biol. 2008, 4: e1000015-10.1371/journal.pcbi.1000015.PubMed CentralView ArticlePubMedGoogle Scholar
- Misawa K, Kikuno RF: Evaluation of the effect of CpG hypermutability on human codon substitution. Gene. 2009, 431: 18-22. 10.1016/j.gene.2008.11.006.View ArticlePubMedGoogle Scholar
- Li JB, Gao Y, Aach J, Zhang K, et al: Multiplex padlock targeted sequencing reveals human hypermutable CpG variations. Genome Res. 2009, 19: 1606-1615. 10.1101/gr.092213.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Woodcock DM, Crowther PJ, Diver WP: The majority of methylated deoxycytidines in human DNA are not in the CpG dinu-cleotide. Biochem Biophys Res Commun. 1987, 145: 888-894. 10.1016/0006-291X(87)91048-5.View ArticlePubMedGoogle Scholar
- Clark SJ, Harrison J, Frommer M: CpNpG methylation in mammalian cells. Nat Genet. 1995, 10: 20-27. 10.1038/ng0595-20.View ArticlePubMedGoogle Scholar
- Lister R, Pelizzda M, Dowen RH, Hawkins RD, et al: Human DNA methylomes at base resolution show widespread epige-nomic differences. Nature. 2009, 462: 315-322. 10.1038/nature08514.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee J, Jang SJ, Benoit N, Hoque MO, et al: Presence of 5-methylcytosine in CpNpG trinucleotides in the human genome. Genomics. 2010, 96: 67-72. 10.1016/j.ygeno.2010.03.013.PubMed CentralView ArticlePubMedGoogle Scholar
- Laurent L, Wong E, Li G, Huynh T, et al: Dynamic changes in the human methylome during differentiation. Genome Res. 2010, 20: 320-331. 10.1101/gr.101907.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Rodenhiser DI, Andrews JD, Mancini DN, Jung JH, et al: Homonucleotide tracts, short repeats CpG/CpNpG motifs are frequent sites for heterogeneous mutations in the neurofibromatosis type 1 (NF1) tumour-suppressor gene. Mutat Res. 1997, 373: 185-195. 10.1016/S0027-5107(96)00171-6.View ArticlePubMedGoogle Scholar
- Cheung LW, Lee YF, Ng TW, Ching WK, et al: CpG/CpNpG motifs in the coding region are preferred sites for mutagenesis in the breast cancer susceptibility genes. FEBS Lett. 2007, 581: 4668-4674. 10.1016/j.febslet.2007.08.061.View ArticlePubMedGoogle Scholar
- Stenson PD, Mort M, Ball EV, Howells K, et al: The Human Gene Mutation Database: 2008 update. Genome Med. 2009, 1: 13-10.1186/gm13.PubMed CentralView ArticlePubMedGoogle Scholar
- Illingworth RS, Bird AP: CpG islands -- "A rough guide". FEBS Lett. 2009, 583: 1713-1720. 10.1016/j.febslet.2009.04.012.View ArticlePubMedGoogle Scholar
- Clark SJ, Harrison J, Molloy PL: Sp1 binding is inhibited by (m)Cp(m)CpG methylation. Gene. 1997, 195: 67-71. 10.1016/S0378-1119(97)00164-9.View ArticlePubMedGoogle Scholar
- Inoue S, Oishi M: Effects of methylation of non-CpG sequence in the promoter region on the expression of human synapto-tagmin XI (syt11). Gene. 2005, 348: 123-134.View ArticlePubMedGoogle Scholar