Skip to main content

From DNA to RNA to disease and back: The 'central dogma' of regulatory disease variation


Much of the focus of human disease genetics is directed towards identifying nucleotide variants that contribute to disease phenotypes. This is a complex problem, often involving contributions from multiple loci and their interactions, as well as effects due to environmental factors. Although some diseases with a genetic basis are caused by nucleotide changes that alter an amino acid sequence, in other cases, disease risk is associated with altered gene regulation. This paper focuses on how studies of gene expression variation might complement disease studies and provide crucial links between genotype and phenotype.


Understanding the causes of human disease is one of the most fundamental goals of modern medicine. Individuals differ with respect to disease susceptibility, disease progression and effectiveness of treatment. Identifying the factors contributing to these differences, and elucidating their interactions as they contribute to aspects of disease phenotype, is a precursor to improved prevention, detection and treatment of disease.

Much of the understanding of human disease derives from the study of those diseases that segregate in families in a Mendelian fashion, where the causative variants and the genes in which they reside have been identified through classical family linkage approaches [1] and through studies in large pedigrees and in isolated populations based on founder effects [2]. The vast majority of common diseases exhibit a more complex mode of inheritance, however, aggregating in families but rarely exhibiting Mendelian inheritance. Examples of diseases of this type include diabetes, obesity, schizophrenia and asthma. Understanding of these 'complex' diseases is improving, although still limited, but it is clear that genetic variation plays an important role in susceptibility to disease, for example in autoimmune and infectious diseases [3]. Most complex disease is thought to be caused by the combined effect of genetic variants at a few loci or multiple loci, each with only modest functional effects on susceptibility. Additional roles are played by environmental factors and their interactions. Mapping the genomic regions contributing to disease creates new directions for disease research and is an important step towards improving human health.

Approaches to identifying the genes involved in complex disease can be generally grouped into two categories: candidate gene studies and linkage/association studies. Candidate gene studies use knowledge about the biology of a disease, and about genes in physiologically or biochemically relevant pathways, and attempt to correlate genetic variation at these 'candidate genes' with disease phenotype. Unfortunately, for most diseases, this type of information is not available or complete enough to prove widely useful, and including them in some of the analyses is more likely to increase the noise than it is to reduce the search space for the disease. Genome-wide linkage studies and association analyses serve as alternative approaches to surveying the contribution to disease of genetic variants located anywhere in the genome. The genome-wide aspect means that these studies do not require any a priori hypothesis that a particular region is involved, although predictions about the potential effect of specific variants (eg non-synonymous single nucleotide polymorphisms [SNPs]) can be incorporated in the models. In this respect, these approaches are unbiased. Family-based linkage studies entail identifying genetic variants in families that co-segregate with disease more often than would be expected by chance. In general, linkage studies have achieved limited success in identifying genomic regions involved in complex disease, in part because they are underpowered to detect moderate genetic effects. Furthermore, because identification of a region or regions associated with the disease or trait requires identifying those alleles that segregate with the disease in families, which in turn depends on recombination within the families, it can be difficult to narrow a region exhibiting significant linkage. An alternative methodology is to perform association analysis, which looks for correlation of genetic variants with aspects of phenotype, but does not require a pedigree structure for the individuals. Association analyses are more powerful for the detection of common disease alleles with small to modest effects [4, 5], and increasingly are being used successfully in studies to identify genes contributing to disease [6, 7].

Although many phenotypic differences among individuals are attributable to variants in coding DNA [8], variants in noncoding DNA can have profound effects on phenotypes, including disease phenotypes. For example, regulatory variants affecting transcription initiation, splicing, RNA stability and translational efficiency are known to play roles in conditions including autoimmune disease (CTLA4[9]), malaria (DARC[10]), various cancers (SMYD3[11]) and other examples (shown in Table 1). In studies of such diseases, gene expression may serve as an intermediate phenotype between disease phenotype and genotype [30, 31]. Gene expression, or mRNA levels, can be modulated by variants in coding or non-coding DNA (eg transcription factors or binding sites within promoters). Whole-genome association studies of gene expression (expression quantitative trait loci [eQTL] mapping) may generate hypotheses for disease susceptibility by identifying those regions of the genome with functional effects on gene expression. These might then serve as candidate regions for evaluating for association with disease phenotypes, as is discussed below.

Table 1 Genes with non-coding variants affecting disease

Resources for genome-wide analysis

An efficient approach to the study of human disease benefits from the use of shared resources. For example, in order to perform genome-wide linkage or association analyses, suitable DNA markers are required. The human genome is estimated to harbour more than 10 million SNPs, present at > 1 per cent frequency [32], and these SNPs are located throughout the genome in regions of coding and non-coding DNA. Publicly available databases of SNP alleles, assays and genotypes are accessible online (eg dbSNP [33] and HapMap [34]). High-throughput genotyping platforms and reductions in genotyping costs now make whole-genome genotyping feasible for large numbers of samples. Gene expression can also be quantified in a high-throughput manner using commercially available microarrays, permitting the detection of small differences in expression levels among samples.

The establishment of cell lines creates resources that can be used by multiple research groups from around the world to survey various cellular phenotypes. With respect to the study of gene expression, it is desirable to establish cell lines from different tissues because gene expression is highly dependent on developmental and cellular context and, indeed, some diseases manifest their phenotypes only in specific tissues. In addition, the cell perturbations that accompany the establishment of cell lines suggest the study of gene expression in primary tissues, although, clearly, the choice of sample depends on the purpose, stage and feasibility of a study, the sample size required and its availability. Despite some shortcomings of cell lines as perfect proxies for the complete set of human tissues, data can be collected on a large scale with respect to sample size and reproducibility and can provide candidates for further study in other samples. Currently, there are relatively few data on gene expression across the diversity of healthy human tissues or across multiple individuals from different populations. These data from healthy individuals will provide important information on the range of naturally occurring gene expression variation and will serve as a baseline against which to compare disease-associated molecular phenotypes.

Statistical issues in genome-wide analysis

Although genome-wide association studies are thought to have more power than family-based linkage studies, they present strong challenges in the form of statistical interpretation. For example, a simple genome-wide association study may test hundreds of thousands of SNPs for association to a phenotype (or, more typically, multiple phenotypes), and more complicated models allowing for SNP - SNP interactions vastly increase the already large number of statistical tests. With such a large number of tests, the significance threshold must be adjusted to control for the number of false-positive associations. Although procedures for multiple test correction exist -- for example, Bonferroni correction, false discovery rate [35, 36] and permutations of phenotypes relative to genotypes [37] -- it remains unclear which is the best method to apply in this context. It is also not trivial to infer the biological significance of an association from statistical significance, because allele frequencies, variance of the phenotype, density of markers and linkage disequilibrium (LD) can have a tremendous impact on the statistical significance inferred.

Human genetic variation is structured into haplotypes, such that alleles at nearby loci often show strong statistical association with one another. Because of this association, known as LD, a large region may contain multiple SNPs exhibiting a significant association with a given phenotype. Although this structure of human genetic variation facilitates association mapping, it can complicate subsequent fine-scale mapping to narrow the associated region and locate the causal variant, as discussed below. Another concern in association studies is the potential for false associations caused by population stratification [38], so care must be taken to reduce these effects through appropriate experimental design and data analysis [39, 40].

eQTL mapping

The interrogation of gene expression to facilitate the design and interpretation of disease association studies can be a powerful tool for the identification of biologically functional variants and the interpretation of biological effects [41]. By introducing a quantifiable and easily measurable biological outcome, it is possible to assess the relevance of statistical significance and eliminate some of the issues raised above. The use of gene expression variation in the context of disease mapping can be viewed in two ways (Figure 1). On the one hand, hypotheses can be generated by discovering functional variation using eQTL mapping and subsequently testing those functional variants in large case-control samples. On the other hand, following a disease association study that has identified several signals located within regions of non-coding DNA, eQTL mapping can be used to interpret and dissect the functional effect of the candidate disease variants. Below, the two directions will be explored in more detail.

Figure 1
figure 1

Flow charts demonstrating the use of genome-wide gene expression studies in relation to disease studies. (A) Generating hypotheses for disease studies. Expression quantitative trait loci (eQTL) mapping studies identify variable regions of the genome with functional effects on gene expression. Single nucleotide polymorphisms within these functional regions can be candidates for involvement in disease. (B) Supporting disease association studies. Disease mapping studies often identify non-coding regions of the genome exhibiting significant association with disease. eQTL studies can provide a link to an associated gene by providing annotation of the function of that non-coding region.

Generating functional candidates using eQTL mapping

Regions with functional effects on gene expression can be localised through the use of association mapping. Gene expression, or mRNA level, is a quantitative phenotype that can be assayed in multiple individuals. When the same individuals are surveyed for genetic variation at marker loci, for example SNPs, association analysis tests whether variation at each SNP can explain the observed phenotypic variation. The rationale behind this analysis is that markers themselves are either the causal variant or are highly correlated (in LD) with the causal variant.

Association mapping of gene expression variation has been successful in many species, including human [4245], yeast [4648], mouse [4952], rat [53], fish [54, 55] and maize [52]. Together, these studies provide several striking observations related to the nature of functional variation influencing gene expression. First, variation in gene expression levels among individuals is common -- and much of that phenotypic variation has a genetic basis. Much of the association signal is located cis- to the gene of interest [45, 52, 56], although trans-acting variants have also been observed. Hotspots of gene regulation (ie regions of the genome influencing expression of several genes) have been observed in some [44], but not all, studies.

There are several ways in which the study of the regulation of gene expression can enhance disease studies as well as narrow the choice of candidate regions for disease association studies. Where information exists about the contribution of particular genes to a disease phenotype or susceptibility, understanding the regulatory control of those genes may assist in elucidating the complete set of effects. In addition, understanding the regulation of categories of genes, or genes of a particular pathway, may provide targets for further follow-up in disease studies. It may prove more time-efficient and cost-effective to have a list of many potential functional variants located throughout the genome, however, and test them against a large number of diseases. Whole-genome eQTL studies can provide a list of regions of the genome with functional effects on the expression of known genes (Figure 1a). SNPs located within these regions can then serve as candidates for disease association studies, much in the same way that non-synonymous SNPs are often considered because of their potential functional effect. There are several advantages to this type of targeted approach over a whole-genome scan. First, because the number of SNPs to be genotyped in each individual is reduced, many more individuals can be surveyed in a disease study without vastly increasing costs. The reduction in the number of markers tested can eliminate some of the problems of multiple test correction, more sensible thresholds can be used and smaller effect variants can be detected. Secondly, any significant associations detected between SNP and disease phenotype provide both a mechanism (gene regulation) and the identity of the affected gene. Finally, the fact that potential causal regulatory variants were initially discovered in healthy individuals and subsequently have been associated with disease means that such variants are common and are likely to contribute significantly to the disease risk of the population.

The methodology above carries the risk of focusing only on certain types of genomic variants, while it is known that much of genome function is still missing. A way to circumvent this problem is to enhance disease studies by incorporating the data on functional regulatory regions while using commercially available whole-genome SNP genotyping chips in disease studies, in order to perform the association analysis using Bayesian methods that assign different prior probabilities to SNPs on the array. Under such a scenario, SNPs located in regions with known functional effects on expression of specific genes -- as identified through eQTL studies -- would be assigned a higher prior probability of being associated with a phenotype. In addition, one might assign a higher prior probability to SNPs in known promoters, enhancers or transcription factors. Thus, one could focus on the effects of candidate variants without missing other important signals. Another substantial advance of knowing regulatory variants before performing a genome-wide association study is that one can correlate phenotypes and regulatory networks and utilise such information in the statistical modelling of the disease.

Supporting genome-wide disease association studies: Narrowing on disease-associated non-coding signals

Genome-wide association studies are now increasing in frequency [6, 57], and although it would have been preferable to have identified all functional regulatory variants in advance, investigators will be faced with the challenge of interpreting some of the strongest association signals. Many of the association studies have a multi-phase design, wherein a fraction of the SNPs with the top statistical significance in the first phase are genotyped in a subsequent phase in a new set of individuals. The statistical exercise must eventually give way to biological interpretation, however, and the identification of the causal variant will be necessary. Although most of the confirmed disease-causing variants are located in coding regions, this observation is due to an ascertainment bias in the ability to predict the potential functional consequences of nucleotide variation. As the human genome is composed of only ~3 -5 per cent coding DNA, and studies increasingly attribute function to non-coding DNA, it might be expected that much of the disease-causing variation will be non-coding and that many of the significant peaks in an association analysis will fall in regions devoid of genes.

Disease association studies often identify non-coding regions of the genome exhibiting significant association with disease. The exploration of those non-coding regions will benefit from the survey of gene expression variation and how it relates to genetic variation (eQTL mapping). For any disease-associated non-coding region (eg from a case-control study), it is possible to test whether the disease-associated SNPs and haplotypes are also associated with gene expression variation of nearby genes (as identified from eQTL studies; Figure 2). This enables conclusions to be drawn about the nature of the function of the causal variant. For instance, if the same haplotype that appears to increase the disease risk also appears to be associated with high expression of a nearby gene, it is possible to start making some connections between the biology of the affected gene and the disease itself. Moreover, one can hypothesise (and hopefully test) how levels of expression of a gene might affect disease risk. This simple connection between the two types of study could provide not only the identity of the gene that is linked to the disease, but also the consequence of genome variation that linked the gene with the disease. It may also provide some clues about other candidates (upstream transcriptional regulators, interacting proteins etc).

Figure 2
figure 2

The three panels show the signal of association (as - log p -value of individual single nucleotide polymorphism (SNP) - phenotype associations) of genomic regions with disease, gene expression of gene A and gene expression of gene B. From this plot, it is suggested that genetic variation that increases disease risk is also associated with gene expression variation of gene A (assuming that the associated SNPs and haplotypes are the same). This probably indicates that the disease risk is a regulatory effect and that the amount of transcript (or protein) of gene A is critical for the development of the disease.

Several studies illustrate the utility and validity of using gene expression variation for disease fine mapping. Two of these studies have focused on identifying functional nucleotide variation by focusing primarily on the regions surrounding each of a set of genes (cis-), but also considering other regions located trans- to the genes [42, 45]. These studies showed that a large fraction of genes (10 - 20 per cent) have significant variation that affect their gene expression in cis, and in some cases in trans. The regulatory variants that affect gene expression variation can be mapped with the same resolution as disease variants in genome-wide association studies because, in both cases, the resolution depends on the LD structure of the human populations. These studies, which allow the identification of regulatory haplotypes, need to be verified before functional experiments can be performed. The most appropriate way to perform a first-pass verification is to test whether allelic imbalance in expression is correlated with heterozygosity in the same SNPs as those that showed genotypic association with gene expression.

Even with this information, and the fact that the effect of a causal variant may be known to have an effect on gene regulation, it is still a long way from being able to identify the exact DNA variant that causes the regulatory effect and subsequent increased disease risk. This is a stage where things become complex for many reasons. For example, although the genome-wide distribution of LD is quite variable, average LD in the human genome extends over large regions, which makes it challenging to fine map a causal variant in many regions. In the best case scenario, associated SNPs would be identified in a region of very low LD, thus reducing the number of potential causal variants to test subsequently. More often, an associated region of approximately 10 - 20 kilobases will be identified [45]. Although fine mapping in a population with reduced LD (eg Africans) might assist in identifying shorter associated regions, it is at this stage where extensive amounts of information about genome function are crucial. The diversity of methodologies for large-scale interrogation of the human genome for function is increasing; the resulting information will be very important for prioritising which of those associated DNA segments to focus on first.

Interpreting regulatory variation

The identification of the causal variant can benefit from incorporating information about genome function. Many studies to determine functionality within the human genome sequence are now in progress using high-throughput, genome-wide methodologies. The ENCyclopedia Of DNA Elements (ENCODE) project is the best example [58] of this type of study. The aim of this project is to attribute a functional identity to each nucleotide of the human genome. In its pilot phase, 1 per cent of the human genome (44 genomic regions) has been studied extensively for function, interspecies conservation and population genetic variation. The comprehensive analysis of these 44 regions will provide important clues for the pattern and structure of genome function and will allow predictions for the nature of variations behind complex disease and phenotypic variation. This, and other ongoing studies, will offer a first-pass annotation of functional elements in the human genome and will provide the framework for detailed characterisation of functional variation.

If an established and confirmed association of a region with disease and gene expression variation exists, and there is light annotation of the associated region for coding and non-coding elements, it is possible to apply brute force approaches to identify the specific DNA changes that are causal. A recommended strategy is to perform extensive resequencing of potentially functional segments of the region in high and low expressing individuals. The number of individuals required to be assayed depends on the magnitude of the functional effect and the predicted within-population frequency of the causal variant. This can be assisted by initial power calculations that allow prediction of what is likely to be identified, given the study design. The optimal approach seems to be to sample sequences from individuals at each of the two ends of the phenotypic (expression) distribution, and then proceed inwards.

As soon as a set of genomic segments have been resequenced, one should look for variants that appear to have equal or better correlation with the phenotype than that observed in the initial association study. This can be determined by genotyping all of the potentially functional variants (identified in the resequencing approach in a subset of the individuals) in the original complete sample. Depending on the strength of these correlations, and where the highly correlated variants are located, the appropriate approach should be adopted for direct functional testing of causal haplotypes. Such approaches can include reporter constructs, binding assays, RNA stability assays and chromatin modification assays using all of the alternative haplotypes.


In this paper, some issues have been discussed that arise from the incorporation of gene expression variation data in disease studies. The overall message is that gene expression can greatly assist the discovery of disease variants, as well as the interpretation of the biological effects of causal variants. Further exploration of gene expression variation in more samples and more cell types will greatly enhance both our understanding of phenotypic variation in humans and also the nature of regulatory variation and its impact on complex disease.


  1. 1.

    Jimenez-Sanchez G, Childs B, Valle D: Human disease genes. Nature. 2001, 409: 853-855. 10.1038/35057050.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Gulcher JR, Kong A, Stefansson K: The role of linkage studies for common diseases. Curr Opin Genet Dev. 2001, 11: 264-267. 10.1016/S0959-437X(00)00188-X.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Cooke GS, Hill AV: Genetics of susceptibility to human infectious disease. Nat Rev Genet. 2001, 2: 967-977. 10.1038/35103577.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science. 1996, 273: 1516-1517. 10.1126/science.273.5281.1516.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Hirschhorn JN, Daly MJ: Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005, 6: 95-108.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Klein RJ, Zeiss C, Chew EY, et al: Complement factor H polymorphism in age-related macular degeneration. Science. 2005, 308: 385-389. 10.1126/science.1109557.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  7. 7.

    Ozaki K, Ohnishi Y, Iida A, et al: Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat Genet. 2002, 32: 650-654. 10.1038/ng1047.

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Kasvosve I, Delanghe JR, Gomo ZA, et al: Transferrin polymorphism influences iron status in blacks. Clin Chem. 2000, 46: 1535-1539.

    CAS  PubMed  Google Scholar 

  9. 9.

    Ueda H, Howson JM, Esposito L, et al: Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature. 2003, 423: 506-511. 10.1038/nature01621.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Tournamille C, Le Van Kim C, Gane P, et al: Molecular basis and PCR-DNA typing of the Fya/fyb blood group polymorphism. Hum Genet. 1995, 95: 407-410.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Tsuge M, Hamamoto R, Silva FP, et al: A variable number of tandem repeats polymorphism in an E2F-1 binding element in the 5' flanking region of SMYD3 is a risk factor for human cancers. Nat Genet. 2005, 37: 1104-1107. 10.1038/ng1638.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Zhou XF, Cui J, DeStefano AL, et al: Polymorphisms in the promoter region of catalase gene and essential hypertension. Dis Markers. 2005, 21: 3-7.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  13. 13.

    McDermott DH, Zimmerman PA, Guignard F, et al: CCR5 promoter polymorphism and HIV-1 disease progression. Multicenter AIDS Cohort Study (MACS). Lancet. 1998, 352: 866-870. 10.1016/S0140-6736(98)04158-0.

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Kostrikis LG, Neumann AU, Thomson B, et al: A polymorphism in the regulatory region of the CC-chemokine receptor 5 gene influences perinatal transmission of human immunodeficiency virus type 1 to African-American infants. J Virol. 1999, 73: 10264-10271.

    PubMed Central  CAS  PubMed  Google Scholar 

  15. 15.

    Sakuntabhai A, Turbpaiboon C, Casademont I, et al: A variant in the CD209 promoter is associated with severity of dengue disease. Nat Genet. 2005, 37: 507-513. 10.1038/ng1550.

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Carlson CS, Aldred SF, Lee PK, et al: Polymorphisms within the C-reactive protein (CRP) promoter region are associated with plasma CRP levels. Am J Hum Genet. 2005, 77: 64-77. 10.1086/431366.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  17. 17.

    VanNess SH, Owens MJ, Kilts CD: The variable number of tandem repeats element in DAT1 regulates in vitro dopamine transporter density. BMC Genet. 2005, 6: 55-

    PubMed Central  Article  PubMed  Google Scholar 

  18. 18.

    Kochi Y, Yamada R, Suzuki A, et al: A functional variant in FCRL3, encoding Fc receptor-like 3, is associated with rheumatoid arthritis and several autoimmunities. Nat Genet. 2005, 37: 478-485. 10.1038/ng1540.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  19. 19.

    Kwok JB, Hallupp M, Loy CT, et al: GSK3B polymorphisms alter transcription and splicing in Parkinson's disease. Ann Neurol. 2005, 58: 829-839. 10.1002/ana.20691.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Al-Zahrani A, Sandhu MS, Luben RN, et al: IGF1 and IGFBP3 tagging polymorphisms are associated with circulating levels of IGF1, IGFBP3 and risk of breast cancer. Hum Mol Genet. 2006, 15: 1-10. 10.1093/hmg/ddl043.

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Bennett ST, Lucassen AM, Gough SC, et al: Susceptibility to human type 1 diabetes at IDDM2 is determined by tandem repeat variation at the insulin gene minisatellite locus. Nat Genet. 1995, 9: 284-292. 10.1038/ng0395-284.

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Karim MA, Wang X, Hale TC, Elbein SC: Insulin promoter factor 1 variation is associated with type 2 diabetes in African Americans. BMC Med Genet. 2005, 6: 37-

    PubMed Central  Article  PubMed  Google Scholar 

  23. 23.

    Przybylowska K, Kluczna A, Zadrozny M, et al: Polymorphisms of the promoter regions of matrix metalloproteinases genes MMP-1 and MMP-9 in breast cancer. Breast Cancer Res Treat. 2006, 95: 65-72. 10.1007/s10549-005-9042-6.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Humphries SE, Luong LA, Talmud PJ, et al: The 5A/6A polymorphism in the promoter of the stromelysin-1 (MMP-3) gene predicts progression of angiographically determined coronary artery disease in men in the LOCAT gemfibrozil study Lopid Coronary Angiography Trial. Atherosclerosis. 1998, 139: 49-56. 10.1016/S0021-9150(98)00053-7.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Ye S, Eriksson P, Hamsten A, et al: Progression of coronary atherosclerosis is associated with a common genetic variant of the human stromelysin-1 promoter which results in reduced gene expression. J Biol Chem. 1996, 271: 13055-13060. 10.1074/jbc.271.22.13055.

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Borm ME, van Bodegraven AA, Mulder CJ, et al: A NFKB1 promoter polymorphism is involved in susceptibility to ulcerative colitis. Int J Immunogenet. 2005, 32: 401-405. 10.1111/j.1744-313X.2005.00546.x.

    CAS  Article  PubMed  Google Scholar 

  27. 27.

    Emison ES, McCallion AS, Kashuk CS, et al: A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature. 2005, 434: 857-863. 10.1038/nature03467.

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Grice EA, Rochelle ES, Green ED, et al: Evaluation of the RET regulatory landscape reveals the biological relevance of a HSCRimplicated enhancer. Hum Mol Genet. 2005, 14: 3837-3845. 10.1093/hmg/ddi408.

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Knight JC, Udalova I, Hill AV, et al: A polymorphism that affects OCT-1 binding to the TNF promoter region is associated with severe malaria. Nat Genet. 1999, 22: 145-150. 10.1038/9649.

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Gottesman II, Gould TD: The endophenotype concept in psychiatry: Etymology and strategic intentions. Am J Psychiatry. 2003, 160: 636-645. 10.1176/appi.ajp.160.4.636.

    Article  PubMed  Google Scholar 

  31. 31.

    Watts JA, Morley M, Burdick JT, et al: Gene expression phenotype in heterozygous carriers of ataxia telangiectasia. Am J Hum Genet. 2002, 71: 791-800. 10.1086/342974.

    PubMed Central  Article  PubMed  Google Scholar 

  32. 32.

    Kruglyak L, Nickerson DA: Variation is the spice of life. Nat Genet. 2001, 27: 234-236. 10.1038/85776.

    CAS  Article  PubMed  Google Scholar 

  33. 33.


  34. 34.


  35. 35.

    Benjamini Y, Hochberg Y: Controlling the false discovery rate -- A practical approach to multiple testing. J R Stat Soc Ser B Methodol. 1995, 57: 289-300.

    Google Scholar 

  36. 36.

    Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  37. 37.

    Doerge RW, Churchill GA: Permutation tests for multiple loci affecting a quantitative character. Genetics. 1996, 142: 285-294.

    PubMed Central  CAS  PubMed  Google Scholar 

  38. 38.

    Cardon LR, Palmer LJ: Population stratification and spurious allelic association. Lancet. 2003, 361: 598-604. 10.1016/S0140-6736(03)12520-2.

    Article  PubMed  Google Scholar 

  39. 39.

    Pritchard JK, Stephens M, Rosenberg NA, Donnelly P: Association mapping in structured populations. Am J Hum Genet. 2000, 67: 170-181. 10.1086/302959.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  40. 40.

    Tang H, Quertermous T, Rodriguez B, et al: Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am J Hum Genet. 2005, 76: 268-275. 10.1086/427888.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  41. 41.

    Stranger BE, Dermitzakis ET: The genetics of regulatory variation in the human genome. Hum Genomics. 2005, 2: 126-131.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  42. 42.

    Cheung VG, Spielman RS, Ewens KG, et al: Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005, 437: 1365-1369. 10.1038/nature04244.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  43. 43.

    Monks SA, Leonardson A, Zhu H, et al: Genetic inheritance of gene expression in human cell lines. Am J Hum Genet. 2004, 75: 1094-1105. 10.1086/426461.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  44. 44.

    Morley M, Molony CM, Weber TM, et al: Genetic analysis of genome-wide variation in human gene expression. Nature. 2004, 430: 743-747. 10.1038/nature02797.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  45. 45.

    Stranger BE, Forrest MS, Clark AG, et al: Genome-wide associations of gene expression variation in humans. PLoS Genet. 2005, 1: e78-10.1371/journal.pgen.0010078.

    PubMed Central  Article  PubMed  Google Scholar 

  46. 46.

    Brem RB, Kruglyak L: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc Natl Acad Sci USA. 2005, 102: 1572-1577. 10.1073/pnas.0408709102.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  47. 47.

    Brem RB, Yvert G, Clinton R, Kruglyak L: Genetic dissection of transcriptional regulation in budding yeast. Science. 2002, 296: 752-755. 10.1126/science.1069516.

    CAS  Article  PubMed  Google Scholar 

  48. 48.

    Yvert G, Brem RB, Whittle J, et al: Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet. 2003, 35: 57-64.

    CAS  Article  PubMed  Google Scholar 

  49. 49.

    Cowles CR, Hirschhorn JN, Altshuler D, Lander ES: Detection of regulatory variation in mouse genes. Nat Genet. 2002, 32: 432-437. 10.1038/ng992.

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Doss S, Schadt EE, Drake TA, Lusis AJ: Cis-acting expression quantitative trait loci in mice. Genome Res. 2005, 15: 681-691. 10.1101/gr.3216905.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  51. 51.

    Sandberg R, Yasuda R, Pankratz DG, et al: Regional and strain-specific gene expression mapping in the adult mouse brain. Proc Natl Acad Sci USA. 2000, 97: 11038-11043.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  52. 52.

    Schadt EE, Monks SA, Drake TA, et al: Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003, 422: 297-302. 10.1038/nature01434.

    CAS  Article  PubMed  Google Scholar 

  53. 53.

    Walker JR, Su AI, Self DW, et al: Applications of a rat multiple tissue gene expression data set. Genome Res. 2004, 14: 742-749. 10.1101/gr.2161804.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  54. 54.

    Oleksiak MF, Churchill GA, Crawford DL: Variation in gene expression within and among natural populations. Nat Genet. 2002, 32: 261-266. 10.1038/ng983.

    CAS  Article  PubMed  Google Scholar 

  55. 55.

    Oleksiak MF, Roach JL, Crawford DL: Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nat Genet. 2005, 37: 67-72.

    PubMed Central  CAS  PubMed  Google Scholar 

  56. 56.

    Yan H, Yuan W, Velculescu VE, et al: Allelic variation in human gene expression. Science. 2002, 297: 1143-10.1126/science.1072545.

    CAS  Article  PubMed  Google Scholar 

  57. 57.

    Herbert A, Gerry NP, McQueen MB, et al: A common genetic variant is associated with adult and childhood obesity. Science. 2004, 312: 279-283.

    Article  Google Scholar 

  58. 58.

    The ENCODE Project Consortium: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004, 306: 636-640.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Barbara E. Stranger.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Stranger, B.E., Dermitzakis, E.T. From DNA to RNA to disease and back: The 'central dogma' of regulatory disease variation. Hum Genomics 2, 383 (2006).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • gene expression
  • human disease
  • linkage mapping
  • association mapping