Germline mutations of the NF1 gene cause the tumour predisposition syndrome, neurofibromatosis type 1 (NF1), which affects 1/3,000–4,000 individuals worldwide. The NF1 gene, spanning 283 kb on chromosome 17q11.2, contains 61 exons that give rise to a 12-kb mRNA transcript encoding neurofibromin [1, 2]. The 327-kDa (2,818-amino acid) protein product, neurofibromin, is expressed in most tissues.
Neurofibromin is a tumour suppressor protein, reflecting its role as a key negative regulator of the cellular Ras signalling pathway (reviewed by Bennett et al. ). More specifically, it is a Ras-specific GTPase-activating protein (GAP), with strong structural and sequence homologies to the GAP superprotein family . It functions by downregulating Ras, thereby resulting in an overall reduction in cellular mitogenic signalling via the Ras pathway. Thus, any NF1 gene mutation that serves to inactivate neurofibromin function may be expected to increase cellular levels of active Ras-GTP significantly, leading to uncontrolled cell growth and, potentially, tumorigenesis . Germline mutations in the SPRED1 gene have previously been detected in individuals with clinical features overlapping those of NF1 patients .
Consistent with Knudson's two-hit hypothesis, NF1 patients harbouring a heterozygous germline NF1 mutation develop neurofibromas upon somatic mutation of the second wild-type NF1 allele. The somatic loss of the second NF1 allele in the progenitor cell (either the Schwann cell or its precursor), combined with haploinsufficiency in a variety of supporting cells [2, 7], is then required for tumour formation. The mutation rate at the NF1 locus is one of the highest reported in any human disorder ; almost 50% of all NF1 patients exhibit a de novo NF1 mutation. The NF1 mutational spectrum is shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment . Nearly 1,300 different inherited mutations of the NF1 gene have been reported as a cause of NF1; these vary in size from deletions spanning more than a megabase to subtle single base-pair substitutions that alter an encoded amino acid or the function of a splice junction . In classical NF1 patients, NF1 gene mutations are detectable in 50% to 95% of individuals depending upon the mutation detection techniques employed and the source of tissue used for analysis [2, 10]. In our own study, the underlying pathogenic NF1 mutation was detected in approximately 92% of NF1 patients despite sequencing the exons, splice junctions, untranslated regions and proximal promoter region of the NF1 gene, and screening for gross gene deletions and rearrangements as well as subtle lesions [11–13]. The undetected mutations in such patients could in principle reside within a remote regulatory element at some distance from the NF1 gene.
In the human genome, not all regulatory elements occur immediately 5′ to the genes that they regulate; indeed, some are located at considerable distances from their cognate genes. A variety of micro-lesions causing human inherited disease have been found to occur >10 kb 5′ upstream of the corresponding disease genes . These include a total of eight mutations within a 1-kb region (termed the long-range or limb-specific enhancer, ZRS) approximately 979 kb 5′ to the transcription initiation site of the sonic hedgehog (SHH) gene . Far upstream polymorphic variants that influence gene expression and impact on disease have also been documented. Thus, for example, the C>T functional single nucleotide polymorphism (SNP) 14.5 kb upstream of the interferon regulatory factor 6 (IRF6) gene, which is associated with cleft palate, alters the binding of transcription factor AP-2α . Similarly, a T>C functional SNP approximately 6 kb upstream of the α-globin-like HBM gene serves to create a binding site for the erythroid-specific transcription factor GATA1 and interferes with the activation of the downstream α-globin genes . Finally, a T>G functional SNP approximately 335 kb upstream of the MYC gene increases the risk of colorectal and prostate cancer by altering the binding strength of transcription factors TCF4 and/or TCF7L2 within a transcriptional enhancer . On the basis of these findings in a number of different genes, we postulated that remotely acting regulatory elements might be present far upstream of the NF1 gene and that these could be mutated in NF1 patients in whom no pathogenic mutations had been detected.
The NF1 gene contains a TATA-less promoter within a classic CpG island that extends from the proximal promoter into exon 1, and a 454-bp 5′ untranslated region . The CpG island is normally unmethylated, and the gene is actively transcribed in all tissues so far examined. Despite the high degree of conservation of the proximal regulatory sequences in rodent orthologs [20, 21], the distal upstream sequence, potentially harbouring additional regulatory elements, diverges quite significantly between different mammalian species.
Histone H3K27ac serves to distinguish active enhancers from inactive ones . Information on various functional elements, including regions enriched in histone H3K27ac, encoded by the human genome, have recently become available through the ENCODE project resources (ENCODE Project Consortium 2011 ; http://genome.ucsc.edu/index.html), a comprehensive collection of experimental data produced using biochemical assays mapped to human genome data. Using a variety of high-throughput techniques for identifying features characteristic to enhancers (e.g. specific histone modifications), up to 1.4 million putative enhancers have been identified in the human genome [23, 24].
It has long been hypothesized that communication between widely spaced genomic elements is facilitated by the spatial organization of chromosomes that brings genes and their cognate regulatory elements into close spatial proximity . The development of ‘chromosome conformation capture’ techniques has led to the discovery of many long-range interactions, both intra- and inter-chromosomal (reviewed in ). Recently, Lieberman-Aiden et al.  employed a novel approach, based on chromosomal conformation capture techniques, and termed Hi-C, to probe the 3D architecture of the entire human genome by coupling proximity-based ligation with massively parallel sequencing to produce a catalogue of approximately 8.4 million inter- and intra-chromosomal interacting fragments. Approximately 6.7 million of these fragments were found to be of long range (>20 kb apart). A genome-wide matrix of DNA-DNA interactions (also known as a contact map), created by dividing the genome into 1-Mb regions and counting the number of interactions between these 1-Mb regions , was subsequently analyzed by Yaffe and Tanay  to eliminate various biases in experimental procedure.
In this study, we postulated that remotely acting regulatory elements might occur within DNA fragments that interact with the DNA fragment containing the NF1 gene, and further that these elements might be located within an H3K27ac-enriched region. We first combined long-range DNA-DNA interaction data and ENCODE resources to predict in silico the location of novel remotely acting regulatory regions upstream of the NF1 gene. Finally, we screened these regions for mutation in those NF1 patients in whom no mutations had been found in either the NF1 or SPRED1 genes. Germline mutations in the SPRED1 gene have previously been detected in individuals with clinical features overlapping those of NF1 patients .