The MECP2 gene codes for methyl CpG binding protein 2 which regulates activities of other genes in the early development of the brain. Mutations in this gene have been associated with Rett syndrome, a form of autism. The purpose of this study was to investigate the role of evolutionarily conserved cis-elements in regulating the post-transcriptional expression of the MECP2 gene and to explore their possible correlations with a mutation that is known to cause mental retardation.
A bioinformatics approach was used to map evolutionarily conserved cis-regulatory elements in the transcribed regions of the human MECP2 gene and its mammalian orthologs. Cis-regulatory motifs including G-quadruplexes, microRNA target sites, and AU-rich elements have gained significant importance because of their role in key biological processes and as therapeutic targets. We discovered in the 5′-UTR (untranslated region) of MECP2 mRNA a highly conserved G-quadruplex which overlapped a known deletion in Rett syndrome patients with decreased levels of MeCP2 protein. We believe that this 5′-UTR G-quadruplex could be involved in regulating MECP2 translation. We mapped additional evolutionarily conserved G-quadruplexes, microRNA target sites, and AU-rich elements in the key sections of both untranslated regions. Our studies suggest the regulation of translation, mRNA turnover, and development-related alternative MECP2 polyadenylation, putatively involving interactions of conserved cis-regulatory elements with their respective trans factors and complex interactions among the trans factors themselves. We discovered highly conserved G-quadruplex motifs that were more prevalent near alternative splice sites as compared to the constitutive sites of the MECP2 gene. We also identified a pair of overlapping G-quadruplexes at an alternative 5′ splice site that could potentially regulate alternative splicing in a negative as well as a positive way in the MECP2 pre-mRNAs.
A Rett syndrome mutation with decreased protein expression was found to be associated with a conserved G-quadruplex. Our studies suggest that MECP2 post-transcriptional gene expression could be regulated by several evolutionarily conserved cis-elements like G-quadruplex motifs, microRNA target sites, and AU-rich elements. This phylogenetic analysis has provided some interesting and valuable insights into the regulation of the MECP2 gene involved in autism.
The methyl CpG binding protein 2 gene codes for the protein MeCP2, which is essential for normal brain development . This protein is responsible for regulated transcription of neuron-specific genes and is vital for connecting nerve cells, where cell–cell communication takes place. Mutations in the MECP2 gene can cause a form of autism called Rett syndrome. Victims of this syndrome are typically females between the ages of 6 and 18 months. Additionally, Rett syndrome patients experience a loss of acquired skills, impaired speech, and abnormal stereotypical movements. In some cases, young patients have experienced frequent seizures and mental retardation . Rett syndrome is in fact one of the most common causes of mental retardation in females.
Several types of mutations have been mapped to the MECP2 gene from affected patients [3, 4]. Many of the mutations affect the coding region and either result in a MeCP2 protein with altered function or a non-functional protein. Mutations that lead to altered gene expression have been mapped to the 5′- and 3′-untranslated regions (UTRs) [3, 5, 6]. Several mutations in the genomic MECP2 sequence lead to altered splicing of the gene .
Cis-regulatory motifs located in the untranslated regions and in the vicinity of splice junctions are known to interact with RNA binding proteins for regulating post-transcriptional gene expression. Studying cis-element regulation of MECP2 gene expression can help provide better insights into the molecular mechanism of MECP2 regulation and deeper understanding of the genetic disorders caused by alteration of its expression.
Guanine-rich sequences can form highly stable structures. Instead of the Watson and Crick DNA duplex, four consecutive tetrads of G-rich sequences in a nucleic acid can form G-quadruplexes . The G-quadruplexes are known to have important roles in biological processes and human disease and as therapeutic targets [8–11]. These structures have been found in telomeres, promoter regions, and other biologically important regions in the DNA influencing DNA replication, transcription, and epigenetic mechanisms [12, 13]. Computationally predicted G-quadruplex structures have been reported in the MECP2 gene . However, the biological role of these motifs in the MECP2 DNA remains to be determined. Recently, it became possible to quantitatively visualize the formation of genomic G-quadruplexes in living mammalian cells . RNA G-quadruplexes are more likely to be formed in vivo and are more stable than the DNA G-quadruplexes . There is ample evidence for cis-regulatory roles of G-quadruplexes in the post-transcriptional gene expression . RNA G-quadruplexes located in the 5′-UTR have been known to be involved in regulated translational initiation [19, 20] as well as translation repression [21–23]. G-quadruplex motifs found in the translated regions have been shown to affect folding and proteolysis of hERα protein . G-rich sequences in the 3′-UTR have been shown to influence polyadenylation , RNA turnover , and subcellular mRNA localization . A 3′-UTR polymorphism that affects G-quadruplex structure has been shown to modulate gene expression of the KiSS1 mRNA . There is evidence for direct G-quadruplex role in regulated alternative splicing of fragile X mental retardation 1 (FMR1) transcripts  and of beta-site amyloid precursor protein (APP) cleaving enzyme 1 (BACE1) involved in Alzheimer disease .
Development of bioinformatics techniques has made it possible to study the prevalence and distribution of G-quadruplex forming sequence motifs at genomic levels [31–34]. Consequently, there has been a tremendous increase in published literature and reviews on this subject [34–36]. Large scale computational studies have identified an association of G-quadruplex forming sequences in both 5′- as well as 3′-UTRs . However, computational predictions have difficulty in distinguishing between a G-quadruplex sequence motif which occurs by chance and the one that forms a structure with a biological role in the cell.
In this study, we have used a bioinformatics approach to map evolutionarily conserved G-quadruplex motifs, microRNA target sites, and AU-rich elements (AREs) in the transcribed regions of the human MECP2 gene and its mammalian orthologs. Identifying evolutionarily conserved motifs helps validate computational predictions, improving accuracy, and providing evidence for their biological relevance. The goal of this project was to study the role of conserved cis-regulatory motifs in regulating the post-transcriptional expression of the MECP2 gene and explore their possible correlations with a mutation that is known to cause mental retardation.
The translation and destabilization of large number of eukaryotic mRNAs are known to be regulated via microRNA-mediated pathways, which have received significant attention . MeCP2 protein expression has been shown to be influenced by microRNA targeting . Similarly, AU-rich elements in the 3′-UTRs of developmentally expressed mRNAs have been associated with regulated stability . Therefore, in addition to the G-quadruplexes, the roles of microRNA targeting and AREs as post-transcriptional regulators and their interrelationships were also investigated in this project.
Results and discussion
A total of four MECP2 mammalian orthologs, Homo sapiens, Canis lupus familiaris, Mus musculus, and Rattus norvegicus were chosen for the current studies (Table 1). Although the MeCP2 protein orthologs were quite similar, the nucleotide sequence similarities among the mRNAs were relatively lower due to variation in the 5′- and 3′- untranslated regions (human, dog, and mouse MECP2 genes are known to have multiple isoforms. Orthologous isoforms with comparable exon/intron structures were chosen for sequence alignments.).
A total of four MECP2 mammalian orthologs were chosen for the current studies
Nucleotide sequence (mRNA) identity to human
Protein sequence identity/similarity to human
Protein length (amino acids)
Homo sapiens [RefSeq:NM_004992.3]
Canis lupus familiaris [RefSeq:XM_848395.1]
Mus musculus [RefSeq:NM_001081979.1]
Rattus norvegicus [RefSeq:NM_022673.1]
RNAs were aligned pairwise by semi-global method which does not penalize end-gaps. Therefore, the percentage similarity calculations do not include differences in the lengths of untranslated regions. Values in parentheses represent NCBI accession numbers of the corresponding mRNAs.
A conserved G-quadruplex in the 5′-UTR of MECP2 orthologs
A G-quadruplex highly conserved in relative location to the translation start site was discovered in the 5′-UTR of human, dog, and mouse MECP2 mRNAs (Figure 1). Existence of a conserved motif within an otherwise highly variable region signifies its functional role. This conserved G-quadruplex motif, which we named ‘CG’ , is located 110 bases upstream of the translation initiation site in the human MECP2 mRNA and is likely to play a role in the regulation of translation. There have been several reports of 5′-UTR G-quadruplexes that are involved in translation regulation. A G-quadruplex structure located in the 5′-UTR of human fibroblast growth factor 2 (FGF2) acts as an internal ribosomal entry site (IRES) for translation initiation . On the other hand, formation of G-quadruplexes can also play inhibitory roles for translation of NRAS oncogene , Ying Yang 1 involved in tumorigenesis , and ADAM10 responsible for anti-amyloidogenic processing of the APP . The CG G-quadruplex conserved in the 5′-UTR of human, dog, and mouse MECP2 mRNA orthologs (Figures 1 and 2) is of particular interest because it maps to a known mutation in the MECP2 gene leading to Rett syndrome . An 11-bp deletion (GCGAGGAGGAG) (Figure 2) in the 5′-UTR results in the lack of MeCP2 protein in about 25% of the tested cells even though the mRNA is detectable and the coding sequence (CDS) of the mRNA is apparently intact .
We believe that the MECP2 5′-UTR G-quadruplex CG is in fact the translation regulatory motif which gets affected due to the 11-bp deletion in some Rett syndrome patients. Nucleotide sequence mutations and polymorphisms that destroy G-quadruplex folding or change the G-quadruplex conformation are known to affect gene expression [22, 28]. Two possible mechanisms may lead to G-quadruplex-mediated regulation of translation in the MECP2 mRNA. Interaction of RNA binding proteins with the G-quadruplexes in the 5′-UTR is known to modulate translation. For example, nucleolin protein binds to G-rich sequences to positively influence protein translation . We have tested several nucleolin targets  with the quadruplex forming G-rich sequences (QGRS) Mapper software  and found them to be capable of forming G-quadruplexes (data not shown). A disruption in the 5′-UTR G-quadruplex of the MECP2 mRNA could consequently lead to lower protein translation. The fragile X mental retardation protein (FMRP) is also known to regulate translation by binding to G-quadruplexes on its target mRNAs . Altered function of FMRP could lead to atypical synapse development in the brain and impaired learning resulting in mental retardation . Several other genes implicated in autism have been shown to form G-quadruplexes [44, 46]. A change in the 5′-UTR G-quadruplex region is likely to affect FMRP binding and hence translation of MECP2 mRNA, possibly leading to genetic defects like Rett syndrome.
Alternatively, the 5′-UTR G-quadruplex may be an important component of IRES [19, 20] which is responsible for translation of the Mecp2 mRNA. The 11-bp deletion in the G-quadruplex motif, and therefore disruption of IRES, may affect the translation of the Mecp2 mRNA.
Conserved G-quadruplexes in the coding region of MECP2 orthologs
We mapped several conserved G-quadruplexes within the CDS region of the MECP2 mRNA orthologs. Three G-quadruplexes (‘X’ , ‘Y’ , and ‘Z’ , Figure 1) were highly conserved within the MECP2 CDS region of all four species. The G-quadruplex ‘Y’ showed a high level of sequence conservation across the four mammalian species (Figure 3). Regardless of the modest variation in sequence conservation, all of the three CDS G-quadruplexes exhibited high conservation at a position relative to the translation start site and at the predicted structure level. G-quadruplexes within the coding regions of mRNAs are known to be involved in regulating the RNA stability , translation , and protein folding .
Conserved cis-regulatory elements in the 3′-UTR of MECP2 orthologs
The MECP2 mRNAs analyzed in this work included two alternatively spliced isoforms each for human, dog, and mouse orthologs and one MECP2 transcript of rat. Both MECP2 isoforms of mouse and human isoform 1, each have long 3′-UTRs (>8.5 kb). Both of the dog MECP2 isoforms, isoform 2 of human MECP2 and the rat mRNA each have short 3′-UTRs (<0.5 kb). The longer MECP2 isoforms contain at least two polyadenylation signals and their corresponding cleavage/polyadenylation sites. Alternative polyadenylation in MECP2 can lead to transcript isoforms with the longer or shorter version of the 3′-UTRs . The longer human isoform has been found to be in higher abundance in the fetal neuronal tissues and involved in the development of the brain while shorter transcripts are prevalent within the adult brain . Long 3′-UTRs are likely to play pivotal roles in post-transcriptional regulation of MECP2 mRNA, especially during the early developmental process when gene expression needs to be tightly regulated. Therefore, this part of our project explored the capability of 3′-UTRs of MECP2 mammalian orthologs and isoforms to form evolutionarily conserved G-quadruplexes, especially in the vicinity of other conserved cis-regulatory elements: AREs, microRNA target sites, and alternative polyadenylation signals.
First, we studied the overall phylogenic conservation of the MECP2 gene particularly in the 3′-UTR regions. Based on sequence alignments among mammalian orthologs of MECP2 mRNAs, we found that most of the MECP2 3′-UTR sequence is highly variable. However, regions surrounding polyadenylation signals/sites showed much better conservation (data not presented). This suggests important biological roles of the conserved regions in the regulation of alternative polyadenylation involved in the developmental regulation of MECP2.
The 3′-UTR of MECP2 is highly variable; however, the majority of the conserved cis-regulatory elements that we analyzed (microRNA target sites, AU-rich elements, and G-quadruplexes) mapped to evolutionarily conserved regions in the 3′-UTR of the long MECP2 isoform, which is involved in early brain development (Figure 4) (all four mammalian orthologs of MECP2 were analyzed. Only data from human and mouse isoforms is presented. Human MECP2 alignments with its dog and rat orthologs were very similar to the alignments between human and mouse orthologs). The short human MECP2 mRNA isoform 2, expressed mostly in the adult brain, lacked conserved microRNA targets, ARE, or G-quadruplexes. Our results suggest that these conserved cis-elements could have important regulatory roles in post-transcriptional MECP2 expression during early development stages of the brain.
There is sufficient evidence to indicate a role for 3′-UTR G-quadruplex in post-transcriptional regulation of gene expression [28, 43, 49–51]. G-quadruplexes in the 3′-UTR are known to regulate translation . Interactions between RNA binding proteins like hnRNP F/H and quadruplex forming G-rich sequences are known to regulate splicing and 3′-end processing [49–51]. In our studies, a highly conserved G-quadruplex was found to be associated with one alternative poly(A) site but not the second site (Figure 4). The conserved G-quadruplex was present 17 bases downstream of the poly(A) site 1 (Figure 5), well within the range of the cleavage/polyadenylation complex formation associated with G-quadruplex-mediated regulation of 3′-end formation . Mutations of G-rich sequences in this region of MECP2 RNA have been shown to reduce polyadenylation efficiency in vivo. We did not find any evidence of G-quadruplex forming sequences within 200 bases downstream of the alternative poly(A) site 2 responsible for the long isoform of the human MECP2 gene (Figure 6 and data not shown). These results suggest a G-quadruplex role in alternative cleavage/polyadenylation associated with brain development-specific MECP2 gene expression. The mechanism of alternative 3′-end processing regulation may involve dynamic formation or resolution of the RNA G-quadruplex near poly(A) Site 1 via specific helicases such as RHAU . The role of G-quadruplexes in polyadenylation can be modulated by interactions with different proteins. For example, while binding of hnRNP H/H′ to quadruplex forming G-rich sequences can enhance polyadenylation [49, 54], hnRNP F (which also has affinity for G-rich tracts) has been shown to interfere with polyadenylation .
Most of the evolutionarily conserved microRNA target sites were located in 3′-UTR of the long isoform; many of them are approximately 100 bp downstream of the poly(A) site 1 which is closer to the MECP2 coding region (Figure 4). The translation and destabilization of a large number of eukaryotic mRNAs, especially those under strict expression regulation, are known to be regulated via microRNA-mediated pathways . Therefore, it was not surprising to discover microRNA target sites in the 3′-UTR of developmentally regulated long MECP2 isoform. MicroRNA targeting the long 3′-UTR MECP2 isoform has been previously shown to modulate MeCP2 protein levels in the developing human brain .
We noticed that most evolutionarily conserved G-quadruplexes were preferentially associated with conserved microRNA target sites in the 3′-UTR (Figure 4), suggesting a potential interplay between microRNAs/microRNP (microribonucleoprotein) and G-quadruplex binding proteins. G-quadruplex binding proteins like FXR1 (fragile X retardation 1, a paralog of FMRP and involved in mental retardation) are known to be part of microRNP complexes . FXR1 is also involved in directing microRNAs to the ARE for regulation of translation . Therefore, a regulatory role for some G-quadruplexes in 3′-UTR of MECP2 may also have to do with mRNA translation.
Evolutionarily conserved ARE and mi-R148/152 target sites were associated with the second alternative poly(A) site which results in the expression of longer isoform (Figures 5 and 6). AU-rich elements in the 3′-UTRs of developmentally expressed mRNAs have been associated with regulated stability via the 3′-5′ exosome pathway following deadenylation . The cis-acting AREs can interact with a variety of proteins to promote  or delay  ARE-mediated mRNA degradation (AMD). Recent studies and reviews have suggested that microRNAs can regulate post-transcriptional gene expression by targeting AMD as well as translation [60, 61]. Association of evolutionarily conserved mi-R148/152 target sites along with ARE in the long isoform suggests a potential cooperation between microRNAs/microRNP and ARE-binding proteins (ARE-BPs) for ARE-mediated post-transcriptional regulation of MECP2 transcripts.
Conserved G-quadruplex motifs near splice sites of the MECP2 pre-mRNA orthologs
We focused our attention to the conserved G-quadruplex motifs located in the vicinity of splice sites, especially those that are alternatively regulated. Human, dog, and mouse MECP2 orthologs are known to have two alternatively spliced isoforms each. The human isoform 1 (also known as MECP2A) of MECP2 has an extra exon. This isoform is predominantly expressed in the neurons during early development while the human isoform 2 is prevalent in adults in a variety of tissues including the brain.
Many G-quadruplexes were mapped in the isoforms of four mammalian pre-mRNA orthologs. A total of 33 G-quadruplexes, which were conserved in all the four mammalian orthologs, were mapped to the vicinity of 18 constitutive and 6 alternative splice sites. A bias in the overall distribution of conserved G-quadruplexes was noticed (Figure 7). Conserved G-quadruplexes were more likely to be associated with alternative splice sites of the mammalian MECP2 orthologs, suggesting a prospective biological role for them in regulated splicing. Almost all the alternatively spliced sites of MECP2 mammalian orthologs were associated with at least one conserved G-quadruplex (Figure 8). Alternative splice site G-quadruplexes were more or less equally distributed among exons and introns.
G-quadruplex forming sequences have the potential to affect alternative tissue-specific splicing through their interactions with hnRNP H family of proteins . For example, the hnRNP F protein, with an affinity for quadruplex forming G-rich sequences, is needed for nervous tissue-specific alternative splicing . A G-quadruplex in FMR1 RNA can act as an alternative exonic enhancer by binding to its own FMRP protein involved in mental retardation . An intronic G-quadruplex in the tumor suppressor TP53 gene is also responsible for alternative splicing . A G-quadruplex in the third exon of beta-site APP cleaving enzyme 1 (BACE1) involved in Alzheimer disease has been shown to regulate splice site selection . Alternative splicing in the human and mouse MECP2 pre-mRNAs involve the second exon which gets skipped. Conserved G-quadruplexes were located near both splice sites of this skippable exon in the human and mouse MECP2 orthologs. While one of the G-quadruplexes (A) was near the 3′ splice site in the intron, there were two conserved overlapping G-quadruplexes (B/B′) near the 5′ splice site in this exon. The locations of these conserved G-quadruplexes seem optimal for direct involvement in the regulated, development-related alternative splicing via interactions with splice regulatory proteins. In one of the dog MECP2 isoforms, the last exon gets interrupted by a short intron resulting in a total of five rather than four exons due to this alternative splicing (Figure 8). A conserved G-quadruplex was also discovered near the alternative 5′ splice site of the alternative intron. Our findings from this experiment suggest a good possibility that G-quadruplexes are involved in regulated alternative splicing in the MECP2 gene.
Multiple sequence alignments revealed that three location-conserved G-quadruplexes (A, B/B′, and D, Figure 8) near the alternative splice sites of all mammalian MECP2 orthologs have highly conserved motifs as well. A highly stable G-quadruplex (C) not found near an alternative splice site is relatively less well conserved at the sequence level (Figure 9). This data demonstrates a difference in the nature of G-quadruplexes found near alternatively spliced sites and other G-quadruplexes conserved in the same gene.
Location-conserved G-quadruplex B′ is also highly conserved at the sequence level in all four mammalian MECP2 orthologs (Figure 10). G-quadruplex B′ partially overlaps with G-quadruplex B (Figure 8). Additionally, the B′ G-quadruplex was found to overlap the second 5′ splice site of MECP2 pre-mRNA (Figure 8). This particular site is known to be alternatively spliced in human and mouse MECP2 orthologs. The highly conserved G-quadruplex B is found 5 bases upstream of the alternative 5′ splice site in the human MECP2 pre-mRNA sequence (Figure 11). This is a convenient location for a G-quadruplex to function as an exonic splicing enhancer (ESE) regulatory motif. Previous studies have demonstrated that G-quadruplex structures found near the splice sites in the exons of genes expressed in the brain can act as ESEs by interacting with FMRP protein . The B′ G-quadruplex, which is also highly conserved across the mammalian species, overlaps the B G-quadruplex motif as well as the alternative 5′ splice site. At a given time, only one of these G-quadruplexes is likely to be formed in the cell. Therefore, quadruplexes B and B′ are likely to be mutually exclusive. While G-quadruplex B can perform as an ESE, B′, when formed, may act as an inhibitor of alternative splicing since formation of this structure is likely to make the 5′ splice site unavailable. This data suggests that the B/B′ G-quadruplex pair can regulate alternative splicing in a negative as well as a positive way in the MECP2 pre-mRNAs.
Regulated alternative pre-mRNA splicing is an essential component of post-transcriptional gene expression and is important for biological processes. MECP2 produces multiple isoforms and its expression is highly regulated among different tissues, especially in the brain during different developmental stages. Our study has identified evolutionarily conserved G-quadruplexes associated with alternative splicing of MECP2 mammalian orthologs.
The goal of this project was to perform evolutionary analysis of four MECP2 mammalian orthologs in order to identify conserved cis-regulatory elements that may regulate post-transcriptional expression of this gene which is known to be associated with mental retardation syndromes. Our bioinformatics based studies focused on G-quadruplexes, microRNA target sites, and AU-Rich elements which we mapped to the transcribed regions of MECP2 orthologs.
We identified a highly conserved G-quadruplex in the 5′-UTR of three mammalian MECP2 orthologs which overlapped with a known 11-bp deletion in Rett syndrome patients with decreased levels of MeCP2 protein but normal transcripts . We believe that this 5′-UTR G-quadruplex could be involved in regulating MECP2 post-transcriptional expression either as an IRES [19, 20], or by interacting with specific proteins such as nucleolin , or FMRP . Altered levels of MeCP2 protein during the early brain development can interfere with neuronal connections, leading to autism.
The majority of the conserved cis-regulatory elements analyzed (G-quadruplexes, microRNA target sites, and AREs) mapped to the evolutionarily conserved regions of the otherwise variable 3′-UTR of the long MECP2 isoform which requires tight regulation during the early brain development. The short isoform which has a more stable adult expression primarily lacks most of the conserved 3′-UTR cis-regulatory elements analyzed. Most evolutionarily conserved G-quadruplexes were preferentially associated with microRNA target sites, suggesting an interplay between microRNAs/microRNA ribonucleoprotein (miRNP) and G-quadruplex binding proteins. A highly conserved G-quadruplex present selectively near alternative polyadenylation site 1 could be responsible for alternative polyadenylation which is the primary mechanism of differential MECP2 expression in the early brain development.
Evolutionarily conserved ARE and mi-R148/152 target sites were associated with the second alternative poly(A) site which results in the expression of longer isoform. Our data suggests that the stability and/or translation of the long MECP2 isoform, which is expected to be under strict post-transcriptional control, is potentially regulated via a cooperation between microRNAs/miRNPs and ARE-BPs.
G-quadruplex locations were found to be highly conserved near alternative splice sites of the MECP2 gene. Location-conserved G-quadruplexes in the vicinity of alternative splice sites are also highly conserved at sequence levels as compared to the G-quadruplexes found elsewhere in the MECP2 gene. We also discovered a bias in the overall distribution of conserved G-quadruplexes which were more likely to be associated with alternative splice sites of the mammalian MECP2 orthologs. Our data suggests a prospective biological role for G-quadruplexes in regulated alternative splicing of the MECP2 pre-mRNAs. We identified a pair of overlapping G-quadruplexes at an alternative 5′ splice site that could regulate alternative splicing in a negative as well as a positive way in the MECP2 pre-mRNAs.
This phylogenic analysis has provided some interesting and valuable insights into the post-transcriptional regulation of MECP2 gene by conserved cis-regulatory elements. The findings can help us further our understanding of mental retardation associated with this gene.
Several freely available public databases and bioinformatics sequence analysis tools were used for this project.
Sources of MECP2 Gene related information
The majority of the gene and sequence-related information was obtained from the database resources of National Center for Biotechnology Information (NCBI) . Nucleotide and amino acid sequences of the human MECP2 gene and its orthologs were obtained from the RefSeq database . The Entrez Gene database  was useful for obtaining alternative MECP2 isoforms and gene-related information. Exon/intron patterns were compared between the mRNA isoforms of the respective MECP2 orthologs to identify alternative and constitutive splice sites. MECP2 orthologs were identified with the help of Homologene database . Several allelic variations and mutations were mapped to the human MECP2 gene with the help of OMIM database . RettBASE  was also found to be a comprehensive collection of a wide variety of MECP2 mutations and phenotypes.
Pairwise sequence alignments were performed with a commercial program based on the Needleman and Wunsch algorithm . Unless otherwise specified, all pairwise alignments used the semi-global method rather than the full global alignment because of the variation between the lengths of untranslated regions across orthologous mRNAs. ClustalW program  was used for multiple sequence alignments.
Mapping G-quadruplex sequence motifs
The QGRS Mapper  software program and the G-rich sequence database (GRSDB)  database were used to map QGRS (predicted G-quadruplexes) in the mRNA and pre-mRNA sequences of human MECP2 orthologs and generate information about the composition and distribution of QGRS in the nucleotide sequence entries. QGRS Mapper and GRSDB identify QGRS based on established algorithms which we have previously described in detail [31, 69]. Briefly, the putative G-quadruplexes are identified using the motif GxNy1GxNy2GxNy3Gx. The motif consists of four guanine (G) tracts of equal size interspersed by three loops. The size of each G-tract corresponds to the number of stacked G-tetrads forming the quadruplex structure.
While quadruplexes with at least three G-tetrads have been accepted as stable structures, two G-tetrad quadruplexes are not uncommon [70, 71]. In fact, stable two G-tetrad RNA G-quadruplexes capable of significantly influencing gene expression in vivo have been reported . Lower stability, in fact, may allow more sensitive control of gene expression . Two G-tetrads are expected to be far more prevalent in the genomes as compared to the three G-tetrads. We have employed two approaches to carefully weed out potential false positive predictions. All predicted G-quadruplexes below a G-score  threshold of 13, representing the bottom 25% of all the G-quadruplexes in the entire human transcriptome predicted in our lab (data not presented), were discarded. Secondly, only the predicted G-quadruplexes which are phylogenetically conserved across a minimum of three mammalian MECP2 orthologs were analyzed, thereby validating our predictions.
It is widely accepted that the biological roles of G-quadruplexes depend primarily on their structure and location within the gene, rather than their sequence. The determinants of G-quadruplex homology are expected to be similarities in their specific locations on the aligned transcripts, number of tetrads, loop lengths, and overall lengths. Therefore, these criteria were adopted to identify evolutionarily conserved G-quadruplexes.
Polyadenylation signal and site mapping
Poly(A) signals and sites information was obtained either from the NCBI nucleotide database records  or polyA_DB database  which reports evolutionarily conserved sites.
AU-rich element mapping
AREs were mapped on the MECP2 mRNA orthologs with the help of the ARED database [73, 74].
Mapping microRNA target sites
MicroRNA target sites were mapped to the 3′-UTRs of MECP2 mRNA orthologs with the help of TargetScan [75, 76] which reports target sites conserved across multiple species.
JB was a high school student when the project began. He is now studying at Carnegie-Mellon University. LD is a Professor of Mathematics and Computer Science at Ramapo College of New Jersey.
ARE-mediated mRNA degradation
Amyloid precursor protein
Deoxyribose nucleic acid
Exonic splicing enhancer
Fragile X mental retardation 1
The fragile X mental retardation protein
G-rich sequence database
Heterogeneous nuclear ribonucleoprotein
Internal ribosomal entry site
Methyl CpG binding protein-2
National Center for Biotechnology Information
Online mendelian inheritance in man
Quadruplex forming G-rich sequences
Ying Yang 1.
John P. Stevens High School
Ramapo College of New Jersey
Carnegie Mellon University
Chadwick LH, Wade PA: MeCP2 in Rett syndrome: transcriptional repressor or chromatin architectural protein?. Curr Opin Genet Dev. 2007, 17: 121-125. 10.1016/j.gde.2007.02.003.View ArticlePubMed
Ben Zeev Ghidoni B: Rett syndrome. Child Adolesc Psychiatr Clin N Am. 2007, 16: 723-743. 10.1016/j.chc.2007.03.004.View ArticlePubMed
Amberger J, Bocchini CA, Scott AF, Hamosh A: McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009, 37: D793-D796. 10.1093/nar/gkn665.PubMed CentralView ArticlePubMed
Hoffbuhr K, Devaney JM, LaFleur B, Sirianni N, Scacheri C, Giron J, Schuette J, Innis J, Marino M, Philippart M, Narayanan V, Umansky R, Kronn D, Hoffman EP, Naidu S: MeCP2 mutations in children with and without the phenotype of Rett syndrome. Neurology. 2001, 56: 1486-1495. 10.1212/WNL.56.11.1486.View ArticlePubMed
Coutinho AM, Oliveira G, Katz C, Feng J, Yan J, Yang C, Marques C, Ataide A, Miguel TS, Borges L, Almeida J, Correia C, Currais A, Bento C, Mota-Vieira L, Temudo T, Santos M, Maciel P, Sommer SS, Vicente AM: MECP2 coding sequence and 3’UTR variation in 172 unrelated autistic patients. Am J Med Genet B Neuropsychiatr Genet. 2007, 144B: 475-483. 10.1002/ajmg.b.30490.View ArticlePubMed
Gellert M, Lipsett MN, Davies DR: Helix formation by guanylic acid. Proc Natl Acad Sci U S A. 1962, 48: 2013-2018. 10.1073/pnas.48.12.2013.PubMed CentralView ArticlePubMed
Patel DJ, Phan AT, Kuryavyi V: Human telomere, oncogenic promoter and 5’-UTR G-quadruplexes: diverse higher order DNA and RNA targets for cancer therapeutics. Nucleic Acids Res. 2007, 35: 7429-7455. 10.1093/nar/gkm711.PubMed CentralView ArticlePubMed
Faudale M, Cogoi S, Xodo LE: Photoactivated cationic alkyl-substituted porphyrin binding to g4-RNA in the 5’-UTR of KRAS oncogene represses translation. Chem Commun (Camb). 2012, 48: 874-876. 10.1039/c1cc15850c.View Article
Baral A, Kumar P, Pathak R, Chowdhury S: Emerging trends in G-quadruplex biology - role in epigenetic and evolutionary events. Mol Biosyst. 2013, 9 (7): 1568-1575. 10.1039/c3mb25492e.View ArticlePubMed
Kumar P, Yadav VK, Baral A, Kumar P, Saha D, Chowdhury S: Zinc-finger transcription factors are associated with guanine quadruplex motifs in human, chimpanzee, mouse and rat promoters genome-wide. Nucleic Acids Res. 2011, 39: 8005-8016. 10.1093/nar/gkr536.PubMed CentralView ArticlePubMed
Saunders CJ, Friez MJ, Patterson M, Nzabi M, Zhao W, Bi C: Allele drop-out in the MECP2 gene due to G-quadruplex and i-motif sequences when using polymerase chain reaction-based diagnosis for Rett syndrome. Genet Test Mol Biomarkers. 2010, 14: 241-247. 10.1089/gtmb.2009.0178.View ArticlePubMed
Biffi G, Tannahill D, McCafferty J, Balasubramanian S: Quantitative visualization of DNA G-quadruplex structures in human cells. Nat Chem. 2013, 5: 182-186. 10.1038/nchem.1548.PubMed CentralView ArticlePubMed
Wieland M, Hartig JS: RNA quadruplex-based modulation of gene expression. Chem Biol. 2007, 14: 757-763. 10.1016/j.chembiol.2007.06.005.View ArticlePubMed
Mergny JL, De Cian A, Ghelab A, Sacca B, Lacroix L: Kinetics of tetramolecular quadruplexes. Nucleic Acids Res. 2005, 33: 81-94. 10.1093/nar/gki148.PubMed CentralView ArticlePubMed
Bugaut A, Balasubramanian S: 5’-UTR RNA G-quadruplexes: translation regulation and targeting. Nucleic Acids Res. 2012, 40: 4727-4741. 10.1093/nar/gks068.PubMed CentralView ArticlePubMed
Bonnal S, Schaeffer C, Créancier L, Clamens S, Moine H, Prats AC, Vagner S: A single internal ribosome entry site containing a G quartet RNA structure drives fibroblast growth factor 2 gene expression at four alternative translation initiation codons. J Biol Chem. 2003, 278: 39330-39336. 10.1074/jbc.M305580200.PubMed CentralView ArticlePubMed
Morris MJ, Negishi Y, Pazsint C, Schonhoft JD, Basu S: An RNA G-quadruplex is essential for cap-independent translation initiation in human VEGF IRES. J Am Chem Soc. 2010, 132: 17831-17839. 10.1021/ja106287x.View ArticlePubMed
Kumari S, Bugaut A, Huppert JL, Balasubramanian S: An RNA G-quadruplex in the 5’ UTR of the NRAS proto-oncogene modulates translation. Nat Chem Biol. 2007, 3: 218-221. 10.1038/nchembio864.PubMed CentralView ArticlePubMed
Lammich S, Kamp F, Wagner J, Nuscher B, Zilow S, Ludwig AK, Willem M, Haass C: Translational repression of the disintegrin and metalloprotease ADAM10 by a stable G-quadruplex secondary structure in its 5’-untranslated region. J Biol Chem. 2011, 286: 45063-45072. 10.1074/jbc.M111.296921.PubMed CentralView ArticlePubMed
Halder K, Wieland M, Hartig JS: Predictable suppression of gene expression by 5’-UTR-based RNA quadruplexes. Nucleic Acids Res. 2009, 37: 6811-6817. 10.1093/nar/gkp696.PubMed CentralView ArticlePubMed
Endoh T, Kawasaki Y, Sugimoto N: Stability of RNA quadruplex in open reading frame determines proteolysis of human estrogen receptor alpha. Nucleic Acids Res. 2013, 41 (12): 6222-6231. 10.1093/nar/gkt286.PubMed CentralView ArticlePubMed
Arhin GK, Boots M, Bagga PS, Milcarek C, Wilusz J: Downstream sequence elements with different affinities for the hnRNP H/H’ protein influence the processing efficiency of mammalian polyadenylation signals. Nucleic Acids Res. 2002, 30: 1842-1850. 10.1093/nar/30.8.1842.PubMed CentralView ArticlePubMed
Subramanian M, Rage F, Tabet R, Flatter E, Mandel JL, Moine H: G-quadruplex RNA structure as a signal for neurite mRNA targeting. EMBO Rep. 2011, 12: 697-704. 10.1038/embor.2011.76.PubMed CentralView ArticlePubMed
Huijbregts L, Roze C, Bonafe G, Houang M, Le Bouc Y, Carel JC, Leger J, Alberti P, Roux N: DNA polymorphisms of the KiSS1 3’ untranslated region interfere with the folding of a G-rich sequence into G-quadruplex. Mol Cell Endocrinol. 2012, 351: 239-248. 10.1016/j.mce.2011.12.014.View ArticlePubMed
Didiot MC, Tian Z, Schaeffer C, Subramanian M, Mandel JL, Moine H: The G-quartet containing FMRP binding site in FMR1 mRNA is a potent exonic splicing enhancer. Nucleic Acids Res. 2008, 36: 4902-4912. 10.1093/nar/gkn472.PubMed CentralView ArticlePubMed
Fisette JF, Montagna DR, Mihailescu MR, Wolfe MS: A G-rich element forms a G-quadruplex and regulates BACE1 mRNA alternative splicing. J Neurochem. 2012, 121: 763-773. 10.1111/j.1471-4159.2012.07680.x.PubMed CentralView ArticlePubMed
Kikin O, D’Antonio L, Bagga PS: QGRS Mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences. Nucleic Acids Res. 2006, 34: W676-W682. 10.1093/nar/gkl253.PubMed CentralView ArticlePubMed
Kikin O, Zappala Z, D’Antonio L, Bagga PS: GRSDB2 and GRS_UTRdb: databases of quadruplex forming G-rich sequences in pre-mRNAs and mRNAs. Nucleic Acids Res. 2008, 36: D141-D148. 10.1093/nar/gkn705.PubMed CentralView ArticlePubMed
Huppert JL, Balasubramanian S: Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 2005, 33: 2908-2916. 10.1093/nar/gki609.PubMed CentralView ArticlePubMed
Huppert JL: Structure, location and interactions of G-quadruplexes. FEBS J. 2010, 277: 3452-3458. 10.1111/j.1742-4658.2010.07758.x.View ArticlePubMed
Huppert JL, Bugaut A, Kumari S, Balasubramanian S: G-quadruplexes: the beginning and end of UTRs. Nucleic Acids Res. 2008, 36: 6260-6268. 10.1093/nar/gkn511.PubMed CentralView ArticlePubMed
Zhang R, Su B: Small but influential: the role of microRNAs on gene regulatory network and 3’UTR evolution. J Genet Genomics. 2009, 36: 1-6. 10.1016/S1673-8527(09)60001-1.View ArticlePubMed
Wada R, Akiyama Y, Hashimoto Y, Fukamachi H, Yuasa Y: miR-212 is downregulated and suppresses methyl-CpG-binding protein MeCP2 in human gastric cancer. Int J Cancer. 2010, 127: 1106-1114.View ArticlePubMed
Khabar KS: The AU-rich transcriptome: more than interferons and cytokines, and its role in disease. J Interferon Cytokine Res. 2005, 25: 1-10. 10.1089/jir.2005.25.1.View ArticlePubMed
Huang W, Smaldino PJ, Zhang Q, Miller LD, Cao P, Stadelman K, Wan M, Giri B, Lei M, Nagamine Y, Vaughn JP, Akman SA, Sui G: Yin Yang 1 contains G-quadruplex structures in its promoter and 5’-UTR and its expression is modulated by G4 resolvase 1. Nucleic Acids Res. 2011, 40 (3): 1033-1049.PubMed CentralView ArticlePubMed
Saxena A, de Lagarde D, Leonard H, Williamson SL, Vasudevan V, Christodoulou J, Thompson E, MacLeod P, Ravine D: Lost in translation: translational interference from a recurrent mutation in exon 1 of MECP2. J Med Genet. 2006, 43: 470-477. 10.1136/jmg.2005.036244.PubMed CentralView ArticlePubMed
Abdelmohsen K, Tominaga K, Lee EK, Srikantan S, Kang MJ, Kim MM, Selimyan R, Martindale JL, Yang X, Carrier F, Zhan M, Becker KG, Gorospe M: Enhanced translation by nucleolin via G-rich elements in coding and non-coding regions of target mRNAs. Nucleic Acids Res. 2011, 39: 8513-8530. 10.1093/nar/gkr488.PubMed CentralView ArticlePubMed
Darnell JC, Jensen KB, Jin P, Brown V, Warren ST, Darnell RB: Fragile X mental retardation protein targets G quartet mRNAs important for neuronal function. Cell. 2001, 107: 489-499. 10.1016/S0092-8674(01)00566-9.View ArticlePubMed
Wang H, Ku L, Osterhout DJ, Li W, Ahmadian A, Liang Z, Feng Y: Developmentally-programmed FMRP expression in oligodendrocytes: a potential role of FMRP in regulating translation in oligodendroglia progenitors. Hum Mol Genet. 2004, 13: 79-89.View ArticlePubMed
Nishimura Y, Martin CL, Vazquez-Lopez A, Spence SJ, Alvarez-Retuerto AI, Sigman M, Steindler C, Pellegrini S, Schanen NC, Warren ST, Geschwind DH: Genome-wide expression profiling of lymphoblastoid cell lines distinguishes different forms of autism and reveals shared pathways. Hum Mol Genet. 2007, 16: 1682-1698. 10.1093/hmg/ddm116.View ArticlePubMed
Simonsson T: G-quadruplex DNA structures–variations on a theme. Biol Chem. 2001, 382: 621-628.View ArticlePubMed
Coy JF, Sedlacek Z, Bachner D, Delius H, Poustka A: A complex pattern of evolutionary conservation and alternative polyadenylation within the long 3’-untranslated region of the methyl-CpG-binding protein 2 gene (MeCP2) suggests a regulatory role in gene expression. Hum Mol Genet. 1999, 8: 1253-1262. 10.1093/hmg/8.7.1253.View ArticlePubMed
Bagga PS, Arhin GK, Wilusz J: DSEF-1 is a member of the hnRNP H family of RNA-binding proteins and stimulates pre-mRNA cleavage and polyadenylation in vitro. Nucleic Acids Res. 1998, 26: 5343-5350. 10.1093/nar/26.23.5343.PubMed CentralView ArticlePubMed
Millevoi S, Decorsière A, Loulergue C, Iacovoni J, Bernat S, Antoniou M, Vagner S: A physical and functional link between splicing factors promotes pre-mRNA 3’ end processing. Nucleic Acids Res. 2009, 37: 4672-4683. 10.1093/nar/gkp470.PubMed CentralView ArticlePubMed
Decorsière A, Cayrel A, Vagner S, Millevoi S: Essential role for the interaction between hnRNP H/F and a G quadruplex in maintaining p53 pre-mRNA 3’-end processing and function during DNA damage. Genes Dev. 2011, 25: 220-225. 10.1101/gad.607011.PubMed CentralView ArticlePubMed
Newnham CM, Hall-Pogar T, Liang S, Wu J, Tian B, Hu J, Lutz CS: Alternative polyadenylation of MeCP2: influence of cis-acting elements and trans-acting factors. RNA Biol. 2010, 7: 361-372. 10.4161/rna.7.3.11564.PubMed CentralView ArticlePubMed
Lattmann S, Giri B, Vaughn JP, Akman SA, Nagamine Y: Role of the amino terminal RHAU-specific motif in the recognition and resolution of guanine quadruplex-RNA by the DEAH-box RNA helicase RHAU. Nucleic Acids Res. 2010, 38: 6219-6233. 10.1093/nar/gkq372.PubMed CentralView ArticlePubMed
Bagga PS, Ford LP, Chen F, Wilusz J: The G-rich auxiliary downstream element has distinct sequence and position requirements and mediates efficient 3’ end pre-mRNA processing through a trans-acting factor. Nucleic Acids Res. 1995, 23: 1625-1631. 10.1093/nar/23.9.1625.PubMed CentralView ArticlePubMed
Veraldi KL, Arhin GK, Martincic K, Chung-Ganster LH, Wilusz J, Milcarek C: hnRNP F influences binding of a 64-kilodalton subunit of cleavage stimulation factor to mRNA precursors in mouse B cells. Mol Cell Biol. 2001, 21: 1228-1238. 10.1128/MCB.21.4.1228-1238.2001.PubMed CentralView ArticlePubMed
Han K, Gennarino VA, Lee Y, Pang K, Hashimoto-Torii K, Choufani S, Raju CS, Oldham MC, Weksberg R, Rakic P, Liu Z, Zoghbi HY: Human-specific regulation of MeCP2 levels in fetal brains by microRNA miR-483-5p. Genes Dev. 2013, 27: 485-490. 10.1101/gad.207456.112.PubMed CentralView ArticlePubMed
Stoecklin G, Colombi M, Raineri I, Leuenberger S, Mallaun M, Schmidlin M, Gross B, Lu M, Kitamura T, Moroni C: Functional cloning of BRF1, a regulator of ARE-dependent mRNA turnover. Embo J. 2002, 21: 4709-4718. 10.1093/emboj/cdf444.PubMed CentralView ArticlePubMed
Peng SS, Chen CY, Xu N, Shyu AB: RNA stabilization by the AU-rich element binding protein, HuR, an ELAV protein. Embo J. 1998, 17: 3461-3470. 10.1093/emboj/17.12.3461.PubMed CentralView ArticlePubMed
Bindra RS, Wang JTL, Bagga PS: Bioinformatics methods for studying microRNA and ARE mediated regulation of post-transcriptional gene expression. Int J Knowl Discov Bioinform. 2010, 1: 97-112.View Article
von Roretz C, Gallouzi IE: Decoding ARE-mediated decay: is microRNA part of the equation?. J Cell Biol. 2008, 181: 189-194. 10.1083/jcb.200712054.PubMed CentralView ArticlePubMed
Chou MY, Rooke N, Turck CW, Black DL: hnRNP H is a component of a splicing enhancer complex that activates a c-src alternative exon in neuronal cells. Mol Cell Biol. 1999, 19: 69-77.PubMed CentralView ArticlePubMed
Marcel V, Tran PL, Sagne C, Martel-Planche G, Vaslin L, Teulade-Fichou MP, Hall J, Mergny JL, Hainaut P, Van Dyck E: G-quadruplex structures in TP53 intron 3: role in alternative splicing and in production of p53 mRNA isoforms. Carcinogenesis. 2011, 32: 271-278. 10.1093/carcin/bgq253.View ArticlePubMed
Acland AAR, Barrett T, Beck J, Benson DA, Bollin C, Bolton E, Bryant SH, Canese K, Church DM, Clark K, DiCuccio M, Dondoshansky I, Federhen S, Feolo M, Geer LY, Gorelenkov V, Hoeppner M, Johnson M, Kelly C, Khotomlianski V, Kimchi A, Kimelman M, Kitts P, Krasnov S, Kuznetsov A, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, et al: Database resources of the national center for biotechnology information. Nucleic Acids Res. 2013, 41: D8-D20.View Article
Pruitt KD, Tatusova T, Brown GR, Maglott DR: NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012, 40: D130-D135. 10.1093/nar/gkr1079.PubMed CentralView ArticlePubMed
Kankia BI, Barany G, Musier-Forsyth K: Unfolding of DNA quadruplexes induced by HIV-1 nucleocapsid protein. Nucleic Acids Res. 2005, 33: 4395-4403. 10.1093/nar/gki741.PubMed CentralView ArticlePubMed
Zarudnaya MI, Kolomiets IM, Potyahaylo AL, Hovorun DM: Downstream elements of mammalian pre-mRNA polyadenylation signals: primary, secondary and higher-order structures. Nucleic Acids Res. 2003, 31: 1375-1386. 10.1093/nar/gkg241.PubMed CentralView ArticlePubMed
Lee JY, Yeh I, Park JY, Tian B: PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes. Nucleic Acids Res. 2007, 35: D165-D168. 10.1093/nar/gkl870.PubMed CentralView ArticlePubMed
Halees AS, El-Badrawi R, Khabar KS: ARED Organism: expansion of ARED reveals AU-rich element cluster variations between human and mouse. Nucleic Acids Res. 2008, 36: D137-D140. 10.1093/nar/gkn610.PubMed CentralView ArticlePubMed
Bakheet T, Williams BR, Khabar KS: ARED 3.0: the large and diverse AU-rich transcriptome. Nucleic Acids Res. 2006, 34: D111-D114. 10.1093/nar/gkj052.PubMed CentralView ArticlePubMed
Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120: 15-20. 10.1016/j.cell.2004.12.035.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.