Prioritization of therapeutic targets for cancers using integrative multi-omics analysis

Background The integration of transcriptomic, proteomic, druggable genetic and metabolomic association studies facilitated a comprehensive investigation of molecular features and shared pathways for cancers’ development and progression. Methods Comprehensive approaches consisting of transcriptome-wide association studies (TWAS), proteome-wide association studies (PWAS), summary-data-based Mendelian randomization (SMR) and MR were performed to identify genes significantly associated with cancers. The results identified in above analyzes were subsequently involved in phenotype scanning and enrichment analyzes to explore the possible health effects and shared pathways. Additionally, we also conducted MR analysis to investigate metabolic pathways related to cancers. Results Totally 24 genes (18 transcriptomic, 1 proteomic and 5 druggable genetic) showed significant associations with cancers risk. All genes identified in multiple methods were mainly enriched in nuclear factor erythroid 2-related factor 2 (NRF2) pathway. Additionally, biosynthesis of ubiquinol and urate were found to play an important role in gastrointestinal tumors. Conclusions A set of putatively causal genes and pathways relevant to cancers were identified in this study, shedding light on the shared biological processes for tumorigenesis and providing compelling genetic evidence to prioritize anti-cancer drugs development. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-024-00571-2.


Background
As a prevalent chronic disease, cancer arises from the uncontrolled proliferation of abnormal cells, posing a significant threat to human health.Statistically, there were nearly 19.3 million new cancer cases and almost 10.0 million cancer-related deaths in 2020 [1].Due to the complex and multifaceted mechanism underlying tumorigenesis, the treatment of cancers remains a challenge.Meanwhile, numerous ongoing clinical trials are assessing the efficacy of new drugs as cancer therapeutics.Despite efforts, approximately 90% of drugs that progress into clinical trials ultimately fail, primarily due to insufficient efficacy or safety concerns.This contributes to an astonishing average cost of $1.3 billion to complete the development and commercialization of a new drug [2][3][4].
Increasing evidence suggests that drug targets with genetic support usually exhibit a higher success rate in clinical trials and ultimately deliver more effective treatments to patients in need [5][6][7].As a powerful research approach, Genome-wide association studies (GWAS) offer comprehensive genomic data, enabling researchers to investigate molecules and pathways involved in the development of diseases.Some analysis methods based on GWAS data like transcriptome-wide association studies (TWAS), proteome-wide association studies (PWAS), summary-data-based Mendelian randomization (SMR) and colocalization have been widely used to inform potential drug targets, which present unprecedented opportunities to develop novel drugs for many complex diseases [7,8].In fact, some drugs developed based on genetic research, such as PCSK9 [9], CCR5 [10] and ACE2 [11], have already yielded successful outcomes, advancing the treatment of related diseases.
In this study, we sought to ascertain novel therapeutic targets for cancers with the multi-omics GWAS data.An integrative analysis was adopted to investigate candidate genes for cancers at the transcriptomic and proteomic level.Utilizing eQTL and pQTL data, we performed TWAS/PWAS analysis separately to identify casual gene transcripts and proteins for cancers primarily.Then, SMR/MR, Bayesian colocalization and differential expression analysis were leveraged to further confirm above results.What's more, druggable genomic and metabolic data were also included to enrich our research findings through MR analysis.Our study comprehensively prioritized candidate genes for cancers based on multi-omics genetic data, contributing to a better understanding of the potential mechanisms and addressing challenges in the lengthy and costly process of novel drugs development.

Study design and ethics
The overall study design and methods are presented in Fig. 1.The included studies have undergone ethical review and obtained approval from review committees.

Outcome sources
GWAS summary statistics for cancers were extracted from the FinnGen R9 release.FinnGen study is a largescale study that combines genome information with digital healthcare data of 500,000 Finnish individuals, aiming to identify new therapeutic targets and diagnostics for treating numerous diseases through genetic research and improve human health [12].Totally 16 types of cancers were involved as outcomes in our study and details were presented in Additional file 1: Table S1.

TWAS (transcriptomics)
TWAS integrates GWAS and gene expression data to identify specific genes or genetic variants that contribute to the observed trait or disease [13].Functional Summary-based Imputation (FUSION), a widely used tool for TWAS analysis, establishes precomputed predictive models from multiple studies to facilitate testing comprehensive associations throughout the transcriptome (http:// gusev lab.org/ proje cts/ fusion/) [13].In this study, genes expression weights generated from Genotype-Tissue Expression Project version 8 (GTEx v8) serves as a reference framework to illustrate intricate associations between Single nucleotide polymorphisms (SNPs) and genes expression, encompassing whole blood and corresponding organ tissue panels.1000 Genomes European samples (https:// data.broad insti tute.org/ alkes group/ FUSION/ LDREF.tar.bz2) were utilized to estimate the Linkage disequilibrium (LD) between the prediction model and the SNP at each locus of GWAS.Each gene experienced the permutation test for 2000 times by Z-test.Specifically, we set a Bonferroni-corrected threshold at P < (0.05/number of features) to mitigate the increased likelihood of false positive results that arises when conducting multiple statistical tests simultaneously (Additional file 1: Table S2).

SMR (transcriptomics)
As an extension and development of the concept of MR, SMR analysis tests whether the effect of SNPs on cancers is mediated through gene expression, prioritizing genes responsible for tumorigenesis.SMR analysis was conducted with the default settings through the command line interface (https:// yangl ab.westl ake.edu.cn/ softw are/ smr/# Overv iew) [14].The heterogeneity in dependent instruments (HEIDI) test was further employed to determine whether the identified association between genes expression and cancers was attributable to linkage.The two-sided P < 0.01 in HEIDI test demonstrates that the correlation is most likely due to linkage [14].

Bayesian colocalization (transcriptomics)
To test whether the genetic associations with both identified genes and cancers shared the single causal variants, we employed colocalization analyzes for TWAS-significant and SMR-significant results [15].All SNPs within 1 Mb range upstream and downstream of each leading SNP were included in this analysis and the posterior probability of H4 (PP.H4) > 0.8 indicated that identified genes colocalized strongly with cancers.We selected TWAS-significant, SMR-significant and PP.H4 > 0.8 genes as high confidence genes (HCG).

PWAS (proteomics)
The same FUSION workflow was applied for PWAS with the default settings and parameters.In this study, we analyzed 1348 circulating proteins from 7213 European American (EA) in the Atherosclerosis Risk in Communities study (http:// nilan janch atter jeelab.org/ pwas/), combining the corresponding European ancestry sample LD reference [16].

MR (proteomics)
Two-sample MRs were performed to capture the causal associations between the plasma levels of proteins and cancers risk refer to the study by Ferkingstad et al. [17], containing 4907 different blood proteins measured in 35,559 Icelanders.The proteins with quantitative trait loci (pQTLs) − were involved in the MR analysis.Wald ratio was performed when a single pQTL was available for a given protein, and inverse variance weighted (IVW) was applied when multiple genetic instruments were accessible.Same as TWAS Fig. 1 Study design and flow diagrams.The transcriptomic, proteomic, druggable genetic and metabolomic association with cancers were recognized through comprehensive methods.18 gene transcripts (TWAS-significant, SMR-significant and PP.H4 > 0.8), 1 protein-coding genes (PWAS-significant, MR-significant and PP.H4 > 0.8) and 5 druggable genes (SMR-significant and PP.H4 > 0.8) were included in phenotype scanning and enrichment analysis.Additionally, we conducted two-samples MR analyzes to identify 2 metabolic pathways significantly associated with cancers.TWAS transcriptome-wide association studies, PWAS proteome-wide association studies, MR Mendelian randomization, SMR summary-data-based Mendelian randomization, PP.H4 posterior probability that two traits are associated with a single causal variant, eQTL expression quantitative trait loci, GTEx v8 genotype-tissue expression project version 8, EA European American analysis, we also performed colocalization to further screen the above results.PWAS-significant, MR-significant and PP.H4 > 0.8 genes were defined as HCG.

Druggable SMR (genomics)
The druggable genome was defined as a collection of genes that encoded targetable proteins, including compounds in clinical trials, approved medications and small compounds validated in preclinical experiments [18].Focusing on this subset of genes, we aimed to identify further potential repurposing opportunities to inform trials of cancer patients.In this study, the cis-eQTLs extracted from the eQTLGen Consortium were utilized to generate genetic instruments for druggable genome.Within 1 Mb on either side of the encoded gene, common (minor allele frequency [MAF] > 1%) cis-eQTLs that demonstrated a significant association (P < 5.0 × 10 −8 ) with the expression of druggable genes were selected.Moreover, HEIDI tests and colocalization were applicated in this section.We described the genes meeting the criteria of Druggable SMR-significant and PP.H4 > 0.8 as HCG.

Metabolome-wide MR (metabolomics)
In order to elucidate metabolic mechanisms underlying tumorigenesis, we conducted metabolome-wide MR analyzes for 205 metabolic pathways on outcomes (Additional file 1: Table S18).Genetic data of metabolic pathway was obtained from a genome-wide association study called Dutch Microbiome Project, aiming to demonstrate the interaction between host genetics and microbial composition and function [19].To begin with, we employed a rigorous criterion (P < 1 × 10 −5 ) to ensure a comprehensive outcome.All instrumental variables (IVs) subsequently underwent Linkage disequilibrium (LD) clumping (r2 = 0.001; distance = 10,000 kb) to mitigate the potential influence of SNP correlations.SNPs located outside the major histocompatibility complex (MHC) region (chr6, 26-34 Mb) were excluded.The F-statistic of the selected SNPs should exceed a threshold of 10.

Differential expression analysis
We further performed a differential expression analysis to verify the role therapeutic targets plays in specific tumor.The transcriptome RNA-seq and clinical data for above genes were extracted from the Cancer Genome Atlas (TCGA) database (https:// portal.gdc.cancer.gov/) and GTEx (https:// gtexp ortal.org/ home/).

Phenotype scanning
Aiming to investigate possible health effects of HCG, we conducted phenotype scanning in MR analyzes with publicly available electronic health record data corresponding to 1293 health-related endpoints (number of cases > 1000) in FinnGen Release 5.The Bonferroni correction is a common method designed to control the increased risk of a type I error when making multiple statistical tests.According to Bonferroni correction, the results with a P values less than 0.05/number of healthrelated endpoints were considered to be significant.Hence, we reported all gene-trait associations significant under a Bonferroni-corrected threshold of P < 3.87 × 10 −5 (0.05/1293).

Enrichment analysis
Enrichment analysis were conducted to explore the shared mechanism contributing to cancers in the Metascape database (http:// www.metas cape.org/) [20], limiting the species to "Homo sapiens", and setting the cut-off P value as 0.01 and min overlap as three.

Transcriptomic association studies (TWAS, SMR, colocalization)
This study employed TWAS, SMR, and colocalization to impute robust gene expression signatures associated with cancers.TWAS analysis revealed significant associations between the expression of 151 genes and cancers in whole blood and specific organ tissues.Meanwhile, SMR anaysis identified 52 genes whose expression in whole blood and specific organs tissue were associated with cancers.To test whether these genes and cancers shared the single causal variants, the TWAS-significant, SMRsignificant results were subsequently refined through colocalization analyzes.A total of 18 genes' expression colocalized strongly with the cancers (PP.H4 > 0.80), which was recognized as HCG.HCG and detailed results were presented in Table 1 and Additional file 1: Tables S4-S8.

Proteomic association studies (PWAS, MR, colocalization)
Same as above analysis, a set of methods containing PWAS, MR, colocalization were employed to establish reliable protein-coding gene expression signatures associated with cancers.Totally 17 protein-coding genes were identified through PWAS analysis in this study.Moreover, there were 6 protein-coding genes showing significant results in MR analysis.Those that are significant in both two methods were selected to perform a colocalization.Finally, only one protein-coding genes, PDCD6IP, colocalized strongly with the cancers (PP.H4 > 0.80) and was identified as HCG.Additional file 1: Table S9-S11 summarized the detailed information generated from the above analysis.

Differential expression analysis
The results suggested that investigated genes have significant differential expression in specific tumors compared with normal tissues except for "APOBEC3A" and "NEK10".However, the analysis conducted solely based on TCGA has yielded a reverse result, with these two genes marked as strongly significant genes to induce the breast cancer.The box plots of differential expression were illustrated in Additional file 2: Figs.S4-S7.

Phenotype scanning
To investigate the potential impacts of pharmacologically targeting on our genetically identified genes, we carried out MR analyzes with a Bonferroni-corrected threshold of P < 3.87 × 10 −5 in FinnGen Release 5 databse.SNPs located near genes, EEFSEC and TPCN2, were found to be associated with increased risk of asthma, autoimmune and inflammatory diseases.Conversely, SNPs located near gene, GPX, were associated with decreased risk of gastrointestinal-related diseases, such as inflammatory bowel disease and ulcerative colitis (Additional file 1: Table S15).Figure 2 summaries the workflow of phenotype scanning on candidate genes.

Enrichment analysis
Functional enrichment analysis contributes to summarizing the common mechanisms underlying cancers development.Identified genes were mainly enriched in nuclear erythroid factor 2-related factor 2 (NRF2) pathway, regulating the cellular antioxidant response (Additional file 1: Tables S16 and S17, Additional file 2: Fig. S8).

Discussion
In this study, we are committed to leveraging genetic data to picture the molecular characteristics and pathways associated with cancers.Through a variety of analytical methods, we jointly target 24 genes (18 transcriptomic, 1 proteomic and 5 druggable genome-wide genetic) and NRF2 pathway significantly associated with tumorigenesis.Furthermore, two metabolic pathways (PWY.5695..urate.biosynthesis.inosine.5..phosphate.degradation and UBISYN.PWY..superpathway.of.ubiquinol.8.biosynthesis..prokaryotic) were highlighted to be associated with the increased risk of gastrointestinal cancers.
In gene transcripts analysis, some results were confirmed by previous studies.As a glycosylphosphatidylinositol-anchored cell surface protein, PSCA is known to play a key role in intracellular signaling and tumor proliferation [21].Indeed, the application of CAR NK cells targeting PSCA has exhibited extraordinary effectiveness in treating metastatic pancreatic cancer [22].Nonetheless, there remains few studies regarding the association between gastric cancer and PSCA, warranting further investigation.Additionally, Iroquois-class homeodomain protein (IRX4) isoforms was identified to induce distinct functional programming, thereby contributing to suppressing the progression of prostate cancer [23,24].Meanwhile, a bioinformatic study indicated a significant Table 1 High confidence genes associated with cancers (TWAS significant, SMR significant, and PP.H4 > 0.8) High confidence genes associated with cancers reached significant thresholds of reached TWAS analysis, SMR analysis and colocalization analysis.The expression weights of TWAS analysis and cis-eQTLs from SMR analysis were generated from whole blood and corresponding organ tissue from the GTEx v8 release.correlation between the protein phosphatase 1 regulatory inhibitor subunit 14A (PPP1R14A) expression and the prognosis of patients of diverse tumor types across TCGA cohort, adding to the understanding of our results [25].Among druggable genes identified in our study, APOBEC3A has received more attention.APOBEC3associated mutations play an important role in the development of breast cancer and APOBEC3A was recently reported to be the main driver of these mutations [26][27][28].Despite the conflict in the above analysis regarding the expression of APOBEC3A, we still encourage more explorations for APOBEC3A based on previous studies.Meanwhile, targeted therapy-induced APOBEC3A increases genomic instability and drives evolution of drug-tolerant persisters, suggesting that inhibition of APOBEC3A expression or activity may be an effective therapeutic strategies to reverse drug resistance [29].What's more, GPX1, ubiquitously expressing in many tissues, has been reported to have an aberrant expression in multiple cancers and be closely associated with oncogenesis and cancer progression [30].However, there is some controversy regarding its impact on cancer susceptibility [31][32][33][34][35].Its dichotomous roles as both a tumor suppressor and promoter in the specific cancer type should be noticed.
NRF2, a crucial regulator of the cellular antioxidant response, has been increasingly recognized as a driver of cancer progression, metastasis, and therapy resistance [21,22,36].It is reported that NRF2 has played a direct role through upregulation of its target genes and an indirect role through redox modulation in tumorigenesis.[23,24].Similarly, our results provide genetic evidence to further confirm an important role NRF2 pathway played in the tumor-related physiological.These promising discoveries indicated that NRF2 pathway warranted further investigation as a prognostic biomarker and a therapeutic target.
As one of the major hallmarks of malignancy, metabolic reprogramming plays a crucial role in tumor growth, progression and metastasis.To meet the enhanced requirements for biological processes essential for proliferation and survival, cancer cells undergo intrinsic modifications of the metabolic properties and preferences by regulating Table 2 Druggable genes significantly associated with cancers (SMR significant and coloc > 0.8) We perform SMR analysis and colocalization analysis to identify druggable genes significantly associated with cancers.The cis-eQTLs within 1 Mb on either side of the encoded gene extracted from the eQTLGen Consortium were used in SMR analysis.HEIDI tests and Bayesian colocalization were further conducted to assess the impact of pleiotropy.SMR analysis and colocalization analysis, eQTL expression quantitative trait loci, OR odds ratio, PP.H4 posterior probability that two traits are associated with a single causal variant, SNP single nucleotide polymorphism  the flow of metabolic pathways [25,37].Notably, the biosynthesis of ubiquinol was highlighted as specific pathways for the risk of various gastrointestinal tumors.It's reported that ubiquinol drives the oxidative tricarboxylic acid cycle and dihydroorotate dehydrogenase activity in mitochondrial electron transport chain, which is necessary for tumor growth [38][39][40].However, the association between ubiquinol with gastrointestinal tumors has not been reported, requiring further studies.While clinical trials are usually regarded as the gold standard for evaluating treatment efficacy and safety, it is important to recognize that bioinformatics analysis serves a different purpose and complements the findings from clinical trials.It allows for the exploration of large-scale genomic and molecular data, providing a comprehensive understanding of biological processes and disease mechanisms.Using computational tools and algorithms, bioinformatics can uncover complex relationships between genetic variations, gene expression patterns, and disease phenotypes, which can help researchers to identify relevant and potential drug targets.Moreover, by aggregating and analyzing data from various studies, bioinformatics can provide a broader perspective and increase statistical power, which may not be feasible within the confines of a single clinical trial.However, it is important to acknowledge the limitations of bioinformatics analysis.The reliability of the results depends on the quality and accuracy of the input data, as well as the robustness of the analytical methods employed.Hence, the results generated from bioinformatics analysis may require a validation through experimental studies and clinical trials.
Some limitations need to be acknowledged.Firstly, due to the limited availability of multi-omics datasets, the reference data in our study predominantly consist of participants of European ancestry, demonstrating that the findings cannot be directly generalized to other ethnic groups.Secondly, despite the exclusion of potential bias arising from linkage disequilibrium through colocalization analysis and HEIDI test, it is not possible to completely eliminated the impact of horizontal pleiotropy.Finally, it is noteworthy that the results generated from bioinformatics analysis may be considered less reliable compared to those derived from rigorous clinical trials.Therefore, additional clinical trials are needed to further assess the efficacy and safety of these findings.
In conclusion, our study successfully integrated transcriptomic, proteomic, druggable genetic and metabolomics association studies to explore molecular features and shared pathways underlying cancers' incidence and progression, advancing the development of new drugs.Fig. 2 Phenotype scanning of the genes identified in the above analysis.Phenotype scanning was peroformed to investigate possible health effects of the genes identified in the previous analysis, using the publicly available electronic health record data corresponding to 1293 health-related endpoints (number of cases > 1000) in FinnGen Release 5 The genes significant in both TWAS and SMR analyzes were then assessed in colocalization analyzes to further test robustness.TWAS transcriptome-wide association study, SMR Summary-data-based Mendelian randomization, eQTL expression quantitative trait Loci, GTEx v8 genotype-tissue expression project version 8, OR odds ratio, PP.H4 posterior probability that two traits are associated with a single causal variant, SNP single nucleotide polymorphism

Table 3
The results of metabolome-wide MR on cancers NSNP number of single nucleotide polymorphism, OR odds ratio