Orientation, distance, regulation and function of neighbouring genes
© Henry Stewart Publications 2009
Received: 4 August 2008
Accepted: 4 August 2008
Published: 1 January 2009
The sequencing of the human genome has allowed us to observe globally and in detail the arrangement of genes along the chromosomes. There are multiple lines of evidence that this arrangement is not random, both in terms of intergenic distances and orientation of neighbouring genes. We have undertaken a systematic evaluation of the spatial distribution and orientation of known genes across the human genome. We used genome-level information, including phylogenetic conservation, single nucleotide polymorphism density and correlation of gene expression to assess the importance of this distribution. In addition to confirming and extending known properties of the genome, such as the significance of gene deserts and the importance of 'head to head' orientation of gene pairs in proximity, we provide significant new observations that include a smaller average size for intervals separating the 3' ends of neighbouring genes, a correlation of gene expression across tissues for genes as far as 100 kilobases apart and signatures of increasing positive selection with decreasing interval size surprisingly relaxing for intervals smaller than ~500 base pairs. Further, we provide extensive graphical representations of the genome-wide data to allow for observations and comparisons beyond what we address.
With the sequencing of the human genome, [1, 2] the identification of most human genes, a goal of geneticists for many years, has become a reality. Additionally, most human protein coding genes have been placed in the genome through sequence alignments, and their localisation and direction of transcription are known. The positioning of genes within the genome expands what we know for each gene to include the neighbouring genes and genomic context.
Compared to previous work, in this study we expanded the search for evidence of functional importance to all non-overlapping gene orientations and distances and investigated the properties of the intergenic intervals, as well as the genes themselves.
Materials and methods
Gene location data
Our primary data source was the University of California, Santa Cruz (UCSC) genome database and browser (UCSC Genome Bioinformatics, http://genome.ucsc.edu), [25, 26] and we used scripts written in Perl (http://www.perl.org) for data parsing and analysis. We used the March 2006 assembly of the human genome, which was annotated at the time of data acquisition using RefSeq version 21 (National Center for Biotechnology Information, Bethesda, MD) We downloaded information for all genes in RefSeq and used exon coordinates to define their start and end locations. We excluded from the analysis all genes located entirely within other genes. For overlapping genes, we searched for shared exons and, if present, we concatenated the genes into one. If no shared exons were identified, we analysed each gene only in relation to its non-overlapping neighbours. Of the remaining 17,531 intergenic intervals, 173 were removed from the analysis because they contained sequence gaps and/or were across centromeres, leaving us with a final set of 17,358 intervals ranging from one base pair (bp) to ~ 4.9 mega bp (Mbp). The distribution of intervals if genes were positioned at random was calculated using a random number generator to assign 17,358 random points on the total intergenic length (1.7 Mbp) and examining the distribution of the distances between them in multiple iterations.
Phylogenetic conservation data
We retrieved coordinates of phylogenetically conserved elements from the UCSC genome browser in a table generated by the PhastCons algorithm, which uses sequences from 28 species and a phylogenetic hidden Markov model to identify varying lengths of sequence likely to be conserved . Using these and the intergenic interval coordinates, we calculated the fraction of conserved bases for each interval, as well as separately for each half of each interval, closer to each of the neighbouring genes. The latter was performed to investigate whether in tail to head (TH) intervals there is a difference in conservation towards the end of one gene or the beginning of the next, the latter being expected to carry more regulatory elements.
Conserved transcription factor binding sites (TFBSs)
We retrieved the coordinates of predicted TFBSs from three-species (human/mouse/rat) comparisons  from the UCSC genome browser. The scores and threshold for identifying TFBSs are computed using the Transfac matrix database (v7.0). A binding site is considered to be conserved across the alignment if its score meets the threshold score for its binding matrix in all three species . We mapped the predicted TFBS locations to the intergenic intervals and also calculated TFBS statistics for each half of each interval.
Expression data from the Genomic Institute of the Novartis Research Foundation - Gene Express Atlas 2 (GNF2)
We obtained the GNF2 data  from the UCSC genome database. This dataset consists of genome-wide gene expression (mRNA) measurements from 79 different human healthy and diseased tissues. We excluded seven tissues because they were neoplastic and could confound our results. Data were available for both genes of the pair for 10,397 pairs. For these, we calculated the correlation coefficient r of the 72 expression level measurements. The calculated r was Fisher transformed to make the correlation coefficients normally distributed and allow the use in statistical tests for comparisons .
Microarray gene expression data
Data on RNA from the superior temporal lobe of 23 brain donors with no gross brain pathology was generated in our laboratory using the Illumina Sentrix HumanRef-8 Expression BeadChips (Illumina, San Diego, CA, USA), containing 24,000 genes recognised by the National Center for Biotechnology Information (NCBI) at the time of production. We extracted total RNA using Trizol (Invitrogen, Carlsbad, CA, USA) with additional purification on RNAeasy columns (Qiagen, Valencia, CA, USA). We assessed the quality of total RNA on an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA), and 0.5 μg of total RNA from each sample was labelled using the Illumina TotalPrep RNA Amplification Kit (Ambion, Austin, TX, USA) in a process of cDNA synthesis and in vitro transcription. We generated and labelled single-stranded RNA (cRNA) by incorporating biotin-16-UTP (Roche Diagnosics GmbH, Mannheim, Germany) and hybridised (16 hours) a total of 0.85 μg of biotin-labelled cRNA to the BeadChips. The hybridised biotinylated cRNA was detected with streptavidin-Cy3 and quantified using the Illumina's BeadStation 500GX Genetic Analysis Systems scanner. The primary Illumina data were returned from the scanner in the form of an '.idat' file, which contains single intensity data values per gene following the computation of a trimmed mean average for each probe type, represented on the array by a variable number of bead probes. We performed preliminary analyses of the scanned data using Illumina BeadStudio software, which returns a detection call, D, based on a comparison between the intensity of a single probe and the intensities of a large number of negative control beads. Genes with calls consistently below D = 0.98 were eliminated from further analysis, leaving data for 11,328 named genes expressed in the temporal lobe for analysis. Normalisation by Z-transformation was performed on each sample/array on a stand-alone basis .
This study was carried out in accordance with the Declaration of Helsinki (2000) of the World Medical Association and was approved by the appropriate institutional review board. Appropriate consent was obtained from human subjects.
Functional relatedness based on gene ontology (GO) and pathway membership data
We used the DAVID Bioinformatics website tools  (DAVID Bioinformatics Resources, http://david.abcc.ncifcrf.gov/home.jsp; National Cancer Institute, Frederick, MD, USA) to generate functional annotation clusters for all human RefSeq genes. We selected two annotation categories for our analysis: GO Biological Processes and the Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways. For each annotation category, all clusters of size two or larger were considered and similarity scores for each pair of neighbouring genes were calculated. To take account of the fact that some clusters are more broadly defined than others, and thus contain more genes, we increased the score by one divided by the cluster size each time the gene pair was found together in a cluster, thus giving higher scores to pairs in smaller, narrowly defined functional clusters. Binary scores were also used for analysis, only considering whether the pair did or did not co-occur in any cluster.
Single nucleotide polymorphisms (SNPs)
We obtained SNP data from the UCSC genome database, querying for all SNPs available in the NCBI Single Nucleotide Polymorphism database (dbSNP) located within each interval. SNP density was measured in SNPs/kilobase (kb).
Tajima's D values  across the genome have been pre-calculated using data from the Perlegen dataset  and integrated into the UCSC browser . The data are only available for the May 2004 version of the human reference sequence, so this version had to be used for the present analysis. Values are calculated in adjacent 10 kb blocks and are likely to be inflated owing to the SNP selection criteria applied by Perlegen; however, they remain useful for comparisons between regions. We assigned values to the intergenic intervals by calculating the average Tajima's D values across each interval. When the 10 kb block covered the entire intergenic interval, the value of D for that interval was used for the region; when it contained more than one 10 kb block, the D values were weighted according to the size included in the interval and averaged.
We exported 18,123 mapped RefSeq genes from the March 2006 release of the UCSC genome database, which defined a total of 17,358 intergenic intervals between non-overlapping genes, excluding intervals containing centromeres or sequence gaps. A total of 568 gene pairs were overlapping without sharing exons, 510 of them transcribed from opposite strands and most often (375/510) overlapping at their 3' ends (Figure 1). It should be noted that, based on the results of the ENCODE project,  there are probably more overlapping pairs than we have identified using the current annotations. For consistency, as the ENCODE project is limited to a small fraction of the genome, we did not incorporate ENCODE data in our analyses. Therefore, our results must be viewed bearing in mind that more intervals should probably be excluded.
Orientation and size distribution
There were 8,877 TH, 4,366 HH and 4,115 TT intergenic intervals. The apparent deficit in TT intervals is because most (375/568) of the excluded overlapping gene pairs were orientated in TT fashion. Taking this into account, the number of neighbouring genes transcribed in the same orientation was no different from the number of those transcribed in opposite orientations (8,935 versus 8,991; chi-squared test, p = 0.7), and, overall, those transcribed in opposite orientations were equally divided between those with adjacent start sites and those with adjacent 3' untranslated regions (4,490 versus 4,501; chi-squared test, p = 0.9).
Biological processes and molecular function enrichment in HH genes closer than 500 base pairs
6.0 × 10-7
Nucleoside, nucleotide and nucleic acid metabolism
1.2 × 10-6
6.9 × 10-6
1.2 × 10-3
Other intracellular protein traffic
3.6 × 10-3
2.9 × 10-2
Chromatin packaging and remodelling
3.5 × 10-2
Protein complex assembly
3.7 × 10-2
Nucleic acid binding
3.51 × 10-9
1.67 × 10-2
1.97 × 10-2
3.06 × 10-2
Biological processes and molecular functions enriched in the 3,734 genes forming TT pairs closer than 10 kb, a size range significantly enriched in TT gene pairs
2.87 × 10-5
Protein metabolism and modification
4.44 × 10-5
Chromatin packaging and remodelling
5.13 × 10-4
Cell structure and motility
6.28 × 10-4
3.19 × 10-3
Nucleoside, nucleotide and nucleic acid metabolism
4.48 × 10-3
Intracellular protein traffic
9.02 × 10-3
2.01 × 10-2
Select regulatory molecule
5.07 × 10-7
1.16 × 10-6
2.46 × 10-6
1.78 × 10-4
Nucleic acid binding
1.18 × 10-3
Other enzyme regulator
2.10 × 10-3
8.47 × 10-3
1.25 × 10-2
1.63 × 10-2
Non-receptor serine/threonine protein kinase
1.91 × 10-2
We observed an increase in conservation for all types of intervals larger than 50 kb as they increased in size, peaking at around 500 kb. Higher conservation gene deserts has previously been reported [37, 38]. Here, we showed that the increase is gradual, starting well below the conventional gene desert threshold size of 500 kb. The correlation between distance and conservation was strong for intervals greater than 50 kb (r = 0.21; p = 1 × 10-55; distance in logarithmic scale), even when we excluded gene deserts (size 50-500 kb, r = 0.17; p = 6 × 10-32). At around 50 kb, conservation was at a minimum and increased again for all types of interval with decreasing size. Below 1 kb, we observed differences between the three interval types, with TT intervals showing the least conservation (for intervals < 1 kb, conservation in HH > TH, p = 0.003; TH > TT, p = 7.4 × 10-5; HH > TT, p = 6.4 × 10-13). When we examined the conservation on the two halves of TH intervals separately, we found the side adjacent to the 5' end of the next gene (which we term 'TH at H') to have the same degree of conservation as HH intervals, while the side adjacent to the 3' end of the previous gene ('TH at T') was about as conserved as TT intervals, which suggests that most of the conservation is likely to be due to elements regulating the downstream gene in this type of arrangement. Finally, we observed an unexpected reduction in conservation in intervals smaller than 250 bp, which was obvious in all three types, providing the most statistical evidence in HH intervals (HH 0-250 bp compared with HH 250-1,000 bp intervals, p = 3.1 × 10-5; for TT intervals, p = 0.003; and for HH intervals, p = 0.07). The possible implications of this novel observation will be addressed in our discussion.
Density of predicted TFBSs
Gene ontology and gene expression
Using the GNF2 data, we found that neighbouring genes with joint membership in any GO cluster (referred to as GO(+) genes) had stronger expression correlation than GO(-) genes (mean r = 0.19 versus 0.084; p < 10-39). This was most pronounced in TH-orientated genes (mean r = 0.2 versus 0.076; p < 10-34) but was significant in all three categories. It was also more pronounced in short intervals (< 50 kb) but remained significant in long intervals. The level of functional relatedness (see Methods section for scoring algorithm) and the correlation of expression were correlated with each other. This was weak in short intervals (< 50 kb) (r = 0.05; p = 0.056), but stronger in long intervals (r = 0.18; p = 5 × 10-5), especially for HH and TH pairs. Finally, negative expression correlations were not stronger or weaker for GO(+) pairs, suggesting no co-regulations in opposite directions. The microarray data of expression across individuals gave similar results. Neighbouring GO(+) genes had stronger expression correlation than GO(-) genes (mean r = 0.13 versus 0.07; p < 0.02), which was entirely due to TH-orientated genes (mean r = 0.17 versus 0.08; p < 0.004). This correlation of expression was, again, more pronounced in short intervals while in long intervals (50 kb) its strength correlated (but not significantly) with the level of functional relatedness. Negative expression correlations were not stronger or weaker for GO(+) pairs. The increased expression correlations in gene pairs with known functional relationships further supports that their coregulation is likely to reflect their functional relatedness.
We have described the properties of the three types of intergenic intervals with regard to their size distribution, phylogenetic conservation, predicted TFBSs and content of polymorphisms across sizes, as well as the expression and function of the flanking genes. Our data strongly reject the hypothesis of random gene arrangement in the genome.
We found that the size distribution of intergenic intervals is markedly different from the random expectation for all three types of intervals. In agreement with previous reports, [13, 14] we saw a bimodal distribution that clearly argues for the functional significance of the HH gene orientation, especially at short distances. This was further supported by the strong correlation of the expression of genes thus arranged, as shown in Figure 6. Among these genes, we replicated the finding of a very significant enrichment for products involved in DNA metabolism and repair [13, 14]. A novel finding was that TT intervals also have a markedly different size distribution than expected (Figure 2), with many more intervals shorter than 10 kb and a smaller average size than the other types. The strong over-representation of protein-modifying genes coding for hydrolases, kinases and transferases (Table 2) further argues for a biological significance that needs further investigation.
Our analysis confirmed that there are far more large intergenic intervals (gene deserts) than expected under a random gene distribution, and added to previous work by showing that the transition from small to large intergenic sizes is gradual, with no obvious threshold for defining a gene desert. The increase in phylogenetic conservation as interval sizes approach the size of gene deserts was also gradual. The increasing values of Tajima's D suggest that the positive conservation in large intervals and gene deserts is more likely due to balancing rather than purifying selection, a suggestion that is further supported by the lack of change in SNP density in contrast to increased conservation. It would be interesting to examine conservation within primates when enough genome sequences become available to investigate further the natural history of conservation in gene deserts. We found that gene deserts are equally represented in the three types of intervals, and there is no significant correlation of expression of the flanking genes across tissues or among individuals. This suggests that if control is exerted by deserts on the expression of immediate neighbours, it is probably not bidirectional. It is possible that the functional elements located in deserts are there because they need to be far from genes. To explore this possibility, we tested whether conservation near the ends of gene deserts (50 and 100 kb from next gene) is different from conservation in the middle of the desert. We did not observe significant differences, suggesting no strong bias against the presence of putative functional elements close to genes. The answer to the function (or not) of gene deserts remains unclear and further advances in this field of research are necessary.
We found significant correlation of expression of neighbouring genes closer than 100 kb, increasing with decreasing distance for all types of intervals and becoming strongest for HH below 1,000 bp (Figure 6). The similarly increased phylogenetic conservation suggests that part of this co-regulation might be due to DNA sequences. Conservation and co-regulation, however, do not reflect predicted TFBS density, which does not increase except for intervals below 5 Kb, not including TT intervals. Other types of shared regulatory elements and/or the propagation of chromatin states are possible explanations. It is tempting to speculate that, whatever the mechanism, the size distribution favouring smaller intervals and gene deserts might be related to the co-regulation of genes.
We found a stronger correlation of expression when gene pairs were functionally related, suggesting that their positioning and functional relatedness are relevant to their co-expression. The correlation of expression is strongest for intervals shorter than 50 Kb and most pronounced in TH gene pairs, which raises the possibility that it might, in part, be the result of tandem duplications generating paralogs. Such paralogs would be expected to have similar functions and to carry similar regulatory elements in their vicinity, providing an explanation for our observations. The well-discussed HH gene arrangement and co-regulation in short distances appears to be a special case that remains of interest. Our results on TFBS density and conservation show that there are either no shared TBFSs and other regulatory elements in HH pairs or that, if there are such elements, more of them are necessary in these types of pairs.
We observed a reduction in phylogenetic conservation in intervals shorter than ~250 bp in all three types. The opposite pattern was observed in SNP density data, which is consistent with reduced selective pressure at this distance. Tajima's D values  further supported this, as they increased in intervals smaller than 500 bp. The congruence of the three types of data is intriguing, yet it is important to consider some important limitations of the SNP and D data. The increase in SNP density in smaller intervals might reflect increased sequencing efforts in areas close to genes by independent researchers. It is hard to tease apart the number of SNPs detected by large-scale sequencing projects from those derived from smaller-scale gene-directed sequencing and reported in the databases. In our study, SNPs in areas flanking transcripts (often not included in gene sequencing projects) could bias toward higher SNP density, a possibility we cannot exclude. An increased SNP density might also reflect higher mutation rates in CG-rich areas close to gene promoters. We examined the fraction of SNPs forming a CpG with either allelic nucleotide in intervals smaller than 1 kb. We found that 32 per cent were in CpG dinucleotides, with 60 per cent of these (19.2 per cent of total) representing transitions from CG to TG. Among 9,000 random SNPs, 27 per cent were in CpG dinucleotides, with 80 per cent of these (21.8 per cent of total) representing transitions from CG to TG. Therefore, although there were more SNPs involving CpG in the short intervals, as expected by the higher CG content, there were fewer transitions from CG to TG, possibly reflecting the lack of CpG methylation in promoter sequences. Another limitation pertains to the available Tajima's D data that have been generated using the Perlegen dataset,  which is biased toward high allele frequency SNPs, so higher values are expected. Although the significance of the values per se cannot be evaluated, the bias is the same across the genome, allowing for comparisons across regions, which is the strategy we used. Perlegen's ascertainment scheme  might introduce an additional bias for Tajima's D close to genes because a fraction of SNPs were selected to be in transcribed sequences, regardless of allele frequency. These SNPs would influence the entire 10 kb block, and when that extended outside a gene, it would cause a reduction in D, especially in smaller intergenic intervals. This bias, however, is opposite to our observation of increased D in small intervals. We cannot fully address the limitations regarding SNP density; we feel, however, that the congruence of conservation, SNP density and Tajima's D data argues for the existence of a very interesting phenomenon of selective relaxation in very small intergenic intervals.
In conclusion, our systematic survey of the distribution and orientation of genes and the properties of intergenic intervals has revealed previously unknown properties of the genome, which might reflect unexplored regulatory mechanisms and evolutionary forces. As expected for this type of research, our observations have generated more questions than were answered. We hope that, as better annotation of the genome becomes available, our results will provide insight for the exploration and interpretation of the new information, and a stimulus for specific questions to be addressed experimentally.
This work was supported, in part, from an NIA award (R01-AG022099) to D.A., a NARSAD young investigator award to D.A. and an award from the Neurosciences Education and Research Foundation to D.A. We thank Drs David Valle and Andrew McCallion for critical suggestions on the manuscript.
- Venter JC, Adams MD, Myers EW, et al: The sequence of the human genome. Science. 2001, 291: 1304-1351. 10.1126/science.1058040.View ArticlePubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.View ArticlePubMedGoogle Scholar
- Ben-Shahar Y, Nannapaneni K, Casavant TL, Scheetz TE, Welsh MJ: Eukaryotic operon-like transcription of functionally related genes in Drosophila. Proc Natl Acad Sci USA. 2007, 104: 222-227. 10.1073/pnas.0609683104.PubMed CentralView ArticlePubMedGoogle Scholar
- Blumenthal T: Gene clusters and polycistronic transcription in eukaryotes. Bioessays. 1998, 20: 480-487. 10.1002/(SICI)1521-1878(199806)20:6<480::AID-BIES6>3.0.CO;2-Q.View ArticlePubMedGoogle Scholar
- Blumenthal T: Operons in eukaryotes. Brief Funct Genomic Proteomic. 2004, 3: 199-211. 10.1093/bfgp/3.3.199.View ArticlePubMedGoogle Scholar
- Braastad CD, Leguia M, Hendrickson EA: Ku86 auto-antigen related protein-1 transcription initiates from a CpG island and is induced by p53 through a nearby p53 response element. Nucleic Acids Res. 2002, 30: 1713-1724. 10.1093/nar/30.8.1713.PubMed CentralView ArticlePubMedGoogle Scholar
- Connelly MA, Zhang H, Kieleczawa J, Anderson CW: The promoters for human DNA-PKcs (PRKDC) and MCM4: Divergently transcribed genes located at chromosome 8 band q11. Genomics. 1998, 47: 71-83. 10.1006/geno.1997.5076.View ArticlePubMedGoogle Scholar
- Galgoczy P, Rosenthal A, Platzer M: Human-mouse comparative sequence analysis of the NEMO gene reveals an alternative promoter within the neighboring G6PD gene. Gene. 2001, 271: 93-98. 10.1016/S0378-1119(01)00492-9.View ArticlePubMedGoogle Scholar
- Platzer M, Rotman G, Bauer D, et al: Ataxia-telangiectasia locus: Sequence analysis of 184 kb of human genomic DNA containing the entire ATM gene. Genome Res. 1997, 7: 592-605.PubMedGoogle Scholar
- Shimada T, Fujii H, Lin H: A 165-base pair sequence between the dihydrofolate reductase gene and the divergently transcribed upstream gene is sufficient for bidirectional transcriptional activity. J Biol Chem. 1989, 264: 20171-20174.PubMedGoogle Scholar
- Xu CF, Chambers JA, Solomon E: Complex regulation of the BRCA1 gene. J Biol Chem. 1997, 272: 20994-20997. 10.1074/jbc.272.34.20994.View ArticlePubMedGoogle Scholar
- Adachi N, Lieber MR: Bidirectional gene organization: A common architectural feature of the human genome. Cell. 2002, 109: 807-809. 10.1016/S0092-8674(02)00758-4.View ArticlePubMedGoogle Scholar
- Trinklein ND, Aldred SF, Hartman SJ, et al: An abundance of bidirectional promoters in the human genome. Genome Res. 2004, 4: 62-66.Google Scholar
- Koyanagi KO, Hagiwara M, Itoh T, Gojobori T, Imanishi T: Comparative genomics of bidirectional gene pairs and its implications for the evolution of a transcriptional regulation system. Gene. 2005, 353: 169-176. 10.1016/j.gene.2005.04.027.View ArticlePubMedGoogle Scholar
- Li YY, Yu H, Guo ZM, et al: Systematic analysis of head-to-head gene organization: Evolutionary conservation and potential biological relevance. PLoS Comput Biol. 2006, 2: e74Google Scholar
- Cho RJ, Campbell MJ, Winzeler EA, et al: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell. 1998, 2: 65-73. 10.1016/S1097-2765(00)80114-8.View ArticlePubMedGoogle Scholar
- Kruglyak S, Tang H: Regulation of adjacent yeast genes. Trends Genet. 2000, 16: 109-111. 10.1016/S0168-9525(99)01941-1.View ArticlePubMedGoogle Scholar
- Williams EJ, Bowles DJ: Coexpression of neighboring genes in the genome of Arabidopsis thaliana. Genome Res. 2004, 14: 1060-1067. 10.1101/gr.2131104.PubMed CentralView ArticlePubMedGoogle Scholar
- Roy PJ, Stuart JM, Lund J, Kim SK: Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature. 2002, 418: 975-979.PubMedGoogle Scholar
- Fukuoka Y, Inaoka H, Kohane IS: Inter-species differences of co-expression of neighboring genes in eukaryotic genomes. BMC Genomics. 2004, 5: 4-10.1186/1471-2164-5-4.PubMed CentralView ArticlePubMedGoogle Scholar
- Robb GB, Carson AR, Tai SC, et al: Post-transcriptional regulation of endothelial nitric-oxide synthase by an overlapping anti-sense mRNA transcript. J Biol Chem. 2004, 279: 37982-37996. 10.1074/jbc.M400271200.View ArticlePubMedGoogle Scholar
- Katayama S, Tomaru Y, Kasukawa T, et al: Antisense transcription in the mammalian transcriptome. Science. 2005, 309: 1564-1566.View ArticlePubMedGoogle Scholar
- Yelin R, Dahary D, Sorek R, et al: Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol. 2003, 21: 379-386. 10.1038/nbt808.View ArticlePubMedGoogle Scholar
- Dahary D, Elroy-Stein O, Sorek R: Naturally occurring antisense: Transcriptional leakage or real overlap?. Genome Res. 2005, 15: 364-368. 10.1101/gr.3308405.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ: BLAT --the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35: D61-D65. 10.1093/nar/gkl842.PubMed CentralView ArticlePubMedGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, et al: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15: 1034-1050. 10.1101/gr.3715005.PubMed CentralView ArticlePubMedGoogle Scholar
- Matys V, Fricke E, Geffers R, et al: TRANSFAC: Transcriptional regulation from patterns to profiles. Nucleic Acids Res. 2003, 31: 374-378. 10.1093/nar/gkg108.PubMed CentralView ArticlePubMedGoogle Scholar
- Su AI, Cooke MP, Ching KA, et al: Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA. 2002, 99: 4465-4470. 10.1073/pnas.012025199.PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher RA: Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika. 1915, 10: 507-521.Google Scholar
- Cheadle C, Vawter MP, Freed WJ, Becker KG: Analysis of microarray data using Z score transformation. J Mol Diagn. 2003, 5: 73-81. 10.1016/S1525-1578(10)60455-2.PubMed CentralView ArticlePubMedGoogle Scholar
- Dennis G, Sherman BT, Hosack DA, et al: DAVID: Database for Annotation Visualization and Integrated Discovery. Genome Biol. 2003, 4: P3-10.1186/gb-2003-4-5-p3.View ArticlePubMedGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.PubMed CentralPubMedGoogle Scholar
- Hinds DA, Stuve LL, Nilsen GB, et al: Whole-genome patterns of common DNA variation in three human populations. Science. 2005, 307: 1072-1079. 10.1126/science.1105436.View ArticlePubMedGoogle Scholar
- Carlson CS, Thomas DJ, Eberle MA, et al: Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res. 2005, 15: 1553-1565. 10.1101/gr.4326505.PubMed CentralView ArticlePubMedGoogle Scholar
- Birney E, Stamatoyannopoulos JA, Dutta A, et al: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447: 799-816. 10.1038/nature05874.View ArticlePubMedGoogle Scholar
- Nobrega MA, Ovcharenko I, Afzal V, Rubin EM: Scanning human gene deserts for long-range enhancers. Science. 2003, 302: 413-10.1126/science.1088328.View ArticlePubMedGoogle Scholar
- Ovcharenko I, Loots GG, Nobrega MA, et al: Evolution and functional classification of vertebrate gene deserts. Genome Res. 2005, 15: 137-145. 10.1101/gr.3015505.PubMed CentralView ArticlePubMedGoogle Scholar