- Primary research
- Open Access
Studies of liver tissue identify functional gene regulatory elements associated to gene expression, type 2 diabetes, and other metabolic diseases
© The Author(s). 2019
- Received: 15 January 2019
- Accepted: 5 April 2019
- Published: 29 April 2019
Genome-wide association studies (GWAS) of diseases and traits have found associations to gene regions but not the functional SNP or the gene mediating the effect. Difference in gene regulatory signals can be detected using chromatin immunoprecipitation and next-gen sequencing (ChIP-seq) of transcription factors or histone modifications by aligning reads to known polymorphisms in individual genomes. The aim was to identify such regulatory elements in the human liver to understand the genetics behind type 2 diabetes and metabolic diseases.
The genome of liver tissue was sequenced using 10X Genomics technology to call polymorphic positions. Using ChIP-seq for two histone modifications, H3K4me3 and H3K27ac, and the transcription factor CTCF, and our established bioinformatics pipeline, we detected sites with significant difference in signal between the alleles.
We detected 2329 allele-specific SNPs (AS-SNPs) including 25 associated to GWAS SNPs linked to liver biology, e.g., 4 AS-SNPs at two type 2 diabetes loci. Two hundred ninety-two AS-SNPs were associated to liver gene expression in GTEx, and 134 AS-SNPs were located on 166 candidate functional motifs and most of them in EGR1-binding sites.
This study provides a valuable collection of candidate liver regulatory elements for further experimental validation.
- Regulatory SNPs
The understanding of the genetics behind the molecular mechanisms involved in many liver and metabolic diseases remains elusive. Genome-wide association studies (GWAS) of diseases and phenotypic traits have been effective in finding association to gene regions but not the functional SNP(s) or the gene(s) mediating the effect . This is likely due to heterogeneity within and between the study groups, for example, due to several functional common variants on a haplotype, common causative variants that differ between populations, or the contribution of rare personal variants . Another major obstacle is the need to study the regulatory mechanism in the proper tissue.
To date, tens of thousands of associations have been reported between variants and diseases , but the question arises whether for example the association of a SNP to breast cancer has a biological relevance if it is located in a regulatory element observed in the skeletal muscle. Moreover, gene regulatory mechanisms are often studied in cell lines derived from cancer cells, which have single nucleotide and copy number variants that drive cancer as well as additional genetic aberrations acquired during prolonged culturing. In addition, culturing cells in the lab also changes the expression of many genes and may activate regulatory elements rarely used in physiological conditions or inactive elements with variants driving associations.
The majority of the GWAS top associated variants are located in non-coding regions [4, 5] and often in high linkage disequilibrium (LD) with several other variants, making it difficult to pinpoint the real functional SNP(s). One way to find putative functional variants is to detect regions with allele-specific (AS) binding of transcription factors (TFs) or their surrogate histone modifications, suggesting a different regulatory downstream role based on the individual genotypes. ChIP-seq data for TFs and histone modifications provide snapshots of direct and indirect protein-DNA interactions allowing the identification of heterozygous SNPs with significant allele-specific signals (AS-SNPs).
Here, we present the results of the identification of AS-SNPs using a minimal set of ChIP-seq datasets produced for two histone modifications and one genome architectural protein in human liver tissue, providing a collection of liver-specific candidate regulatory variants for experimental validations.
In order to minimize the false positives, we filtered out AS-SNPs located in blacklisted regions from the Encyclopedia of DNA Elements (ENCODE) project or in highly duplicated regions such as those in close proximity to centromeres or telomeres. The liver-specific collection of AS-SNPs obtained in this way consisted of 2329 unique heterozygous SNPs (Additional file 6: Table S2).
Associations of AS-SNPs to liver-related diseases and gene expression
AS-SNPs detected associated to liver-specific GWAS traits
Number of AS-SNPs
Number of AS loci*
Blood protein levels
Type 2 diabetes
Liver enzyme levels (gamma-glutamyl transferase)
Fibrinogen levels (smoking status, alcohol consumption, or BMI interaction)
Response to hepatitis C treatment
C-reactive protein levels or total/LDL cholesterol levels (pleiotropy)
Serum metabolite levels
Primary biliary cholangitis
For instance, we identified 4 AS-SNPs at two genetic loci associated to type 2 diabetes (T2D) on chromosome 6 and 17. In both cases, the allele-specific signals identified variants that are likely to better explain the associations observed in GWAS.
These two AS-SNPs belong to a subset of 9 AS-SNPs (Additional file 8: Table S4) we identified in LD with GWAS SNPs which were also reported as liver eQTLs. This set of AS-SNPs represents candidate regulatory variants supported by LD association to liver-related diseases (GWAS SNPs) and with direct knowledge of the target of gene expression regulation (eQTL SNPs), hence designating ideal candidates for experimental validations.
Another two AS-SNPs associated to T2D were located on chromosome 17, where AS-SNPs rs28528789 and rs62075824 are in LD with the GWAS SNP rs12453394 in a similar fashion to the example reported above for chromosome 6 (Additional file 1: Figure S1). The two AS-SNPs are located in the first intron of the UBE2Z gene, which encodes for a ubiquitinating enzyme involved in signaling pathways and apoptosis. The ubiquitin-proteasome system has been suggested to play a role in the process of insulin resistance  and diabetes , making this locus of particular interest considering the close proximity of other ubiquitin-related genes like SNF8 and CALCOCO2. Another candidate target gene at this locus is GIP, which encodes for a potent stimulator of insulin secretion from pancreatic beta cells following food ingestion and nutrient absorption.
Genetic control of gene activity has been analyzed by the GTEx project in different tissues. The significant associations between SNPs and gene expression in the liver were collected from the GTEx project and intersected with the identified AS-SNPs. We found 292 unique AS-SNPs associated to gene expression (Additional file 9: Table S5). The genes with genetically controlled expression and associated to AS-SNPs were highly expressed in hepatocytes (ARCHS4 tissues, GTEx). We observed an enrichment in pathways involved in the regulation of immune response with several AS-SNPs associated to the expression of HLA genes.
Functional annotations of motifs
In order to obtain TFs active in the liver whose motifs were altered by the functional variants, we overlapped the collection of AS-SNP and TF motifs according to the funMotifs framework (see the “Methods” section). As a result, 595 AS-SNPs were annotated to the TF motifs in liver tissue. We obtained the functional score for each motif using the funMotifs framework. Using the functional score and other parameters introduced for the candidate functional motifs (Umer et al. “funMotifs: Tissue-specific transcription factor motifs”, submitted), we identified 134 variants in 166 functional TF motifs (Additional file 10: Table S6). The majority of TF motifs were located in transcription start site (TSS) regions (Additional file 2: Figure S2). The most recurrent motifs altered by AS-SNPs were observed for the TFs: EGR1, CTCF, KLF5, and ZNF263 (Additional file 3: Figure S3).
The early growth responsive gene-1 (EGR1) is a zinc finger TF that plays an important role in metabolic processes  like regulation of cholesterol biosynthesis genes in the liver  or insulin resistance in type 2 diabetes . Furthermore, downregulation of EGR1 has been associated to hepatocellular carcinoma (HCC) development . Motifs for EGR3, belonging to the family of EGR1, were also altered by AS-SNPs but to a lower extent.
Krüppel-like factors (KLFs) are TFs that regulate several metabolic pathways, and deregulation of KLFs has been linked to metabolic abnormalities, such as obesity, diabetes, and heart failure . KLF5 has been associated to the onset of fatty liver disease, promoting hepatic lipid accumulation . The motif analysis is aimed at identifying which TF could mediate the effect of the candidate regulatory AS-SNPs potentially altering a downstream gene expression. We identified 13 AS-SNPs altering functional TF motifs which are also reported as liver eQTLs in the GTEx catalog (Additional file 10: Table S6). One example is rs4886705 that is a reported eQTL for the MAN2C1 gene (Additional file 4: Figure S4).
MAN2C1 encodes for an enzyme involved in the catabolism of cytosolic-free oligosaccharides, which are released from the degraded proteins. Overexpression of MAN2C1 has been linked with high mannose levels in the cytosol that could interfere with glucose metabolism . The AS-SNP rs4886705 (A/G) alters a motif for HINFP, a zinc finger TF that interacts with the histone deacetylase complex and plays a role in transcription repression. It may potentially affect the repression of MAN2C1 leading to metabolic imbalance.
We identified 2329 heterozygous SNPs in a human liver sample that marked putative regulatory elements in the genome based on allele specificity measured through ChIP-seq experiments for histone modifications and TFs. Previous studies have indicated the extreme cell and tissue specificity of these regulators that can be active in one tissue and inactive in others, showing that experimental validations should be performed in the pertinent tissue . At the same time, it is worth to note that gene regulatory mechanisms are often studied in cell lines derived from cancer cells . These cell lines have single nucleotide and copy number variants that drive cancer and additional aberrations acquired during prolonged culturing, and all these variants could bias the interpretation of the molecular mechanisms . Here, we report a collection of variants that flag candidate regulatory elements for liver- and metabolic-related diseases identified in the pertinent tissue context, a healthy human liver sample. Based on the Hardy-Weinberg equilibrium, 33% of common polymorphic sites are heterozygous in one person so in fact we interrogate a reasonable fraction of functional gene regulatory elements that are present in the liver.
The intersection of the liver-specific AS-SNP collection with GWAS and eQTL SNPs was aimed at adding a biological relevance layer. As observed before, the SNPs reported in GWAS and expression studies were directly supported by allele-specific signals in less than 10% of the cases on average. The vast majority of the identified AS-SNPs were in LD with reported GWAS or eQTL SNPs and likely to be the regulatory variants driving the associations. We identified 25 and 292 unique AS-SNPs associated to diseases of the liver and metabolism and gene expression respectively, providing new insights into the molecular regulatory mechanisms. AS-SNPs identified in the human liver flagged 17 genomic loci for several different liver-related traits and diseases.
An example is two loci on chromosome 6 and 17 associated to T2D where AS-SNPs can help explain the association observed in GWAS. The integration with GTEx expression data suggested that the regulatory mechanisms at these loci could link T2D to less familiar pathways such as cationic transporters and drug uptake on chromosome 6 and ubiquitination on chromosome 17.
We also intersected the collection of AS-SNPs with significant variant-gene associations from the GTEx project. We observed a significant number of candidate AS-SNPs (~ 17%) associated to the expression of HLA genes that are expressed not only in immune cells but also in most other tissues and cells. Several experimental and clinical studies have shown how inflammation and tumor progression are working synergistically . An alteration of HLA gene expression can result in losing the ability to present antigens which have been reported to facilitate the metastatic process in cancer cells . Moreover, HLA genes are overexpressed in hepatocytes of the liver with chronic damage or inflammation [20, 21].
The motif analysis was aimed at identifying possible mediators of the regulatory functional activity at the selected AS-SNPs. The rationale is that in altering the sequence of TF binding motifs the AS-SNPs could affect the expression of a target gene. We used functional motif definitions that overlay TFBS with several experimental datasets (e.g., ChIP-seq data, DHSs, CAGE peaks, and TF expression data) going beyond a simple coordinate overlapping with reported TFBS defined from PWMs. We intersected our liver-specific collection of AS-SNPs with functional motifs and found 134 AS-SNPs altering 166 defined functional motifs mostly for TFs expressed in the liver and associated to liver metabolic pathways and development of hepatocellular carcinoma (HCC), such as TFs belonging to the EGR and KLF families. The functional motifs were defined in HepG2 cells, an HCC-derived cell line. This could have led to a definition of more liver cancer-specific motifs and represents a limitation of the method. However, the definition of the functional motifs in the pertinent liver tissue context compensates for the lack of available genomic datasets for human liver tissue. The majority of altered functional motifs were located in the TSS, in agreement with the nature of the allele-specific signal defining the altering AS-SNPs, which in most cases was H3K4me3, a histone modification marking promoters. Finally, we identified 13 AS-SNPs that altered functional motifs and were also associated to gene expression in the liver in the GTEx project. This subset of AS-SNPs represents an excellent starting point for experimental validation of a possible molecular mechanism of gene regulation offering an educated guess on the target (eQTL) and the mediator (funMotifs) of the regulatory process.
In conclusion, we presented a systematic strategy to find functional gene regulatory variants, the TFs that bind differentially between alleles, and possible target genes in human liver tissue. The collection of AS-SNPs presented here offers a set of candidate regulatory variants supported by several layers of evidence to prioritize experimental validations aimed at improving the knowledge of the molecular mechanisms of many metabolic and liver diseases.
Liver WGS and creation of diploid genome reconstruction
Human liver tissue was obtained from Prof. Per Artursson, Uppsala University. The whole genome of the liver tissue was sequenced using the 10X Genomics technology that relies on linked-reads to provide long-range information usually missing from standard approaches, such as phasing and resolution of haplotypes and structural variants. Genomic DNA was extracted from the liver sample and sequenced to a 36x mean depth coverage. The Chromium™ Software Suite was used to analyze (Long Ranger) and visualize (Loupe) the linked-read sequencing data. We use the diploid genome reconstruction module from the ALEA toolbox  that takes a list of phased variants and a reference genome as the inputs. We utilized the variants called by Long Ranger with a “PASS” quality and the Genome Reference Consortium Human Genome Build 37 (GRCh37) as the backbone reference to build two in silico personal genomes for this specific liver sample. ChIP-seq reads aligning to the reference and alternative genomes are referred to as G1 and G2 in Fig. 1 and Additional file 7: Table S3, Additional file 8: Table S4, Additional file 9: Table S5.
Aliquots of the tissue were grinded to a powder with liquid nitrogen, and ~ 40 mg or ~ 200 mg was utilized to prepare chromatin for histone modifications or TF ChIP using the Diagenode iDeal ChIP-seq kits for histones or TFs, respectively. We performed ChIP for two histone modifications: H3K27ac and H3K4me3, and a genome architectural protein: CTCF. Libraries were prepared from the enriched chromatin with NEBNext Ultra II DNA Library Prep Kit for Illumina (E7645S, NEB) following instructions from the manufacturer and sequenced on HiSeq 2500 system with 100-bp pair end sequencing (Macrogen). The read quality was assessed using Phred64/33 scores with a quality cutoff requirement of 20.
The AS-SNP discovery was adapted from our established modular pipeline  (available on http://bioinf.icm.uu.se/repositories.php, AS-SNP pipeline). In summary, (I) it realigns the ChIP-seq reads to two personal genomes derived from the reconstructed diploid genome, (II) it identifies heterozygous SNPs where the aligned read count differs statistically between the alleles, and (III) it filters out SNPs in blacklisted and duplicated genomic regions. All settings and controls are handled via a single configuration across modules.
The potentially functional TF motifs were identified using the funMotifs framework that collected TF motif annotations across the non-coding regions of the human genome in a tissue-specific manner (Umer et al. “funMotifs: Tissue-specific transcription factor motifs”, submitted http://bioinf.icm.uu.se:3838/funmotifs/). The AS-SNPs were overlaid onto the predefined TF motifs for each set of annotations received from a various data types: TF ChIP-seq data, DHSs, CAGE peaks, TF expression data, chromatin state, and information about replication domains. Annotations for the liver were obtained mainly from the HepG2 cell line data. For each of the TF motifs, a functionality score based on the weighted annotations was estimated. The TF motif was indicated as a candidate functional if the DNaseI signal was present on the motif, the TF was expressed, and the motif matching score changed at least 0.3. Furthermore, we required the TF binding event or the significant high functional score for the motif (no less than 2.55).
GWAS SNPs associated to selected liver- and metabolic-related traits from the NHGRI GWAS catalog . A total of 5051 unique GWAS SNPs were retrieved in addition to 56,958 SNPs in high LD (r2 > 0.8) with them. A comprehensive list of the selected traits is reported in Additional file 5: Table S1
Collections of eQTL SNPs from the GTEx project. eGenes and significant variant-gene associations based on permutations in the liver (GTEx Analysis v7 eQTL) were obtained for a total of 290,178 significant associations
1000 Genomes SNP collection (1000 Genomes project, phase3-shapeit2-mvncall-integrated-v5a.20130502)
List of signal artifact blacklisted ENCODE regions , centromeric and telomeric regions
ChromHMM  segmentations for the liver tissue from the Roadmap Epigenomics Projects (E066_25_imputed12marks_mnemonics.bed)
The computations were performed on resources provided by the SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project sens2017549. Prof. Per Artursson, Uppsala University, contributed the liver sample.
This work was supported by Astra Zeneca (C.W. and J.K.), the Swedish Diabetes Foundation (C.W.), EXODIAB (C.W.); the Family Ernfors Fund (C.W.); The Swedish Cancer Foundation (grant number 160518 to C.W.); The Borgströms-Hedströms foundation (M.C.); and The National Science Centre (grant number DEC-2015/16/W/NZ2/00314 to J.K.).
Availability of data and materials
The authors declare that the data supporting the findings of this study are available within the article and its additional files. The raw ChIP-seq data produced are being submitted to the GEO database (ChIP-seq data) and the dbGaP (liver WGS data) and are currently available from the corresponding author on reasonable request.
MC and NB performed most of the data analysis. CW conceived and led the study. JRBW and MC performed the ChIP-seq experiments. JK and GP contributed to the data analysis and interpretation. CK and SS contributed to the results’ interpretation. MC wrote the manuscript assisted by CW with input from all authors. All authors read and approved the final manuscript.
Ethics approval and consent to participate
The study was approved by the Uppsala regional ethics committee (2014/433).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Edwards Stacey L, Beesley J, French Juliet D, Dunning Alison M. Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet. 2013;93(5):779–97.PubMedPubMed CentralView ArticleGoogle Scholar
- Cavalli M, Pan G, Nord H, et al. Allele-specific transcription factor binding to common and rare variants associated with disease and gene expression. Hum Genet. 2016;135:485–97.PubMedPubMed CentralView ArticleGoogle Scholar
- MacArthur J, Bowler E, Cerezo M, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 2017;45(D1):D896–901.PubMedView ArticleGoogle Scholar
- The ENCODE Project C. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74.View ArticleGoogle Scholar
- Maurano MT, Humbert R, Rynes E, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337(6099):1190–5.PubMedPubMed CentralView ArticleGoogle Scholar
- Younesy H, Möller T, Heravi-Moussavi A, et al. ALEA: a toolbox for allele-specific epigenomics analysis. Bioinformatics. 2014;30(8):1172–4.PubMedView ArticleGoogle Scholar
- Yang XD, Xiang DX, Yang YY. Role of E3 ubiquitin ligases in insulin resistance. Diabetes Obes Metab. 2016;18(8):747–54.PubMedView ArticleGoogle Scholar
- Marfella R, D’Amico M, Di Filippo C, et al. The possible role of the ubiquitin proteasome system in the development of atherosclerosis in diabetes. Cardiovasc Diabetol. 2007;6:35.PubMedPubMed CentralView ArticleGoogle Scholar
- Magee N, Zhang Y. Role of early growth response 1 in liver metabolism and liver cancer. Hepatoma Res. 2017;3(11):268.PubMedPubMed CentralView ArticleGoogle Scholar
- Gokey NG, Lopez-Anido C, Gillian-Daniel AL, Svaren J. Early growth response 1 (Egr1) regulates cholesterol biosynthetic gene expression. J Biol Chem. 2011;286(34):29501–10.PubMedPubMed CentralView ArticleGoogle Scholar
- Shen N, Yu X, Pan F-Y, Gao X, Xue B, Li C-J. An early response transcription factor, Egr-1, enhances insulin resistance in type 2 diabetes with chronic hyperinsulinism. J Biol Chem. 2011;286(16):14508–15.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang F, Kuang Y, Salem N, Anderson PW, Lee Z. Cross-species hybridization of woodchuck hepatitis viral infection-induced woodchuck hepatocellular carcinoma using human, rat and mouse oligonucleotide microarrays. J Gastroenterol Hepatol. 2009;24(4):605–17.PubMedView ArticleGoogle Scholar
- Pollak NM, Hoffman M, Goldberg IJ, Drosatos K. Krüppel-like factors: crippling and uncrippling metabolic pathways. JACC. 2018;3(1):132–56.PubMedGoogle Scholar
- Kumadaki S, Karasawa T, Matsuzaka T, et al. Inhibition of ubiquitin ligase F-box and WD repeat domain-containing 7α (Fbw7α) causes hepatosteatosis through Krüppel-like factor 5 (KLF5)/peroxisome proliferator-activated receptor γ2 (PPARγ2) pathway but not SREBP-1c protein in mice. J Biol Chem. 2011;286(47):40835–46.PubMedPubMed CentralView ArticleGoogle Scholar
- Bernon C, Carré Y, Kuokkanen E, et al. Overexpression of Man2C1 leads to protein underglycosylation and upregulation of endoplasmic reticulum-associated degradation pathway. Glycobiology. 2011;21(3):363–75.PubMedView ArticleGoogle Scholar
- Andersson R, Gebhard C, Miguel-Escalada I, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455.PubMedPubMed CentralView ArticleGoogle Scholar
- Hynds RE, Vladimirou E, Janes SM. The secret lives of cancer cell lines. Dis Model Mech. 2018;11(11):dmm037366.PubMedPubMed CentralView ArticleGoogle Scholar
- Brodt P. Role of the microenvironment in liver metastasis: from pre- to prometastatic niches. Clin Cancer Res. 2016;22(24):5971.PubMedView ArticleGoogle Scholar
- McGranahan N, Rosenthal R, Hiley CT, et al. Allele-specific HLA loss and immune escape in lung cancer evolution. Cell. 2017;171(6):1259–71 e11.PubMedPubMed CentralView ArticleGoogle Scholar
- Kassel R, Cruise MW, Iezzoni JC, Taylor NA, Pruett TL, Hahn YS. Chronically inflamed livers up-regulate expression of inhibitory B7 family members. Hepatology. 2009;50(5):1625–37.PubMedPubMed CentralView ArticleGoogle Scholar
- Amiot L, Vu N, Samson M. Biology of the immunomodulatory molecule HLA-G in human liver diseases. J Hepatol. 2015;62(6):1430–7.PubMedView ArticleGoogle Scholar
- Kundaje A. A comprehensive collection of signal artifact blacklist regions in the human genome. 2013. ENCODE [hg19-blacklist-READMEdoc - EBI] Available online at: https://sites.google.com/site/anshulkundaje/projects/blacklists.Google Scholar
- Boyle AP, Hong EL, Hariharan M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790–7.PubMedPubMed CentralView ArticleGoogle Scholar
- Ernst J, Kellis M. ChromHMM: automating chromatin state discovery and characterization. Nat Methods. 2012;9(3):215–6.PubMedPubMed CentralView ArticleGoogle Scholar