Comparative study and meta-analysis of meta-analysis studies for the correlation of genomic markers with early cancer detection

A large number of common disorders, including cancer, have complex genetic traits, with multiple genetic and environmental components contributing to susceptibility. A literature search revealed that even among several meta-analyses, there were ambiguous results and conclusions. In the current study, we conducted a thorough meta-analysis gathering the published meta-analysis studies previously reported to correlate any random effect or predictive value of genome variations in certain genes for various types of cancer. The overall analysis was initially aimed to result in associations (1) among genes which when mutated lead to different types of cancer (e.g. common metabolic pathways) and (2) between groups of genes and types of cancer. We have meta-analysed 150 meta-analysis articles which included 4,474 studies, 2,452,510 cases and 3,091,626 controls (5,544,136 individuals in total) including various racial groups and other population groups (native Americans, Latinos, Aborigines, etc.). Our results were not only consistent with previously published literature but also depicted novel correlations of genes with new cancer types. Our analysis revealed a total of 17 gene-disease pairs that are affected and generated gene/disease clusters, many of which proved to be independent of the criteria used, which suggests that these clusters are biologically meaningful.


Introduction
Cancer is the result of a complicated process that involves the accumulation of both genetic and epigenetic alterations in various genes [1]. The somatic genetic alterations in cancer include point mutations, small insertion/deletion events, translocations, copy number changes and loss of heterozygosity [2]. These changes either augment the action and/or expression of an oncoprotein or silence tumour suppressor genes. Single-nucleotide polymorphism (SNP) is the most common form of genetic variation in the human genome. Although common SNPs for disease prediction are not ready for widespread use [3], recent genome-wide association studies (GWASs) using high-throughput techniques have identified regions of the genome that contain SNPs with alleles that are associated with increased risk for cancer such as FGFR2 in breast cancer [4][5][6][7].
The knowledge on gene mutations that predispose tumour initiation or tumour development and progress will give an advantage in cancer patients' treatment. Despite the complexity and variability of cancer genome, numerous studies have examined the correlation of genome variation with cancer development and progression [8]. However, ambiguous results have been generated from the attempt to link genome variants with cancer prediction or detection. A literature search revealed that even among several meta-analyses, there were unclear results and conclusions.
We have, therefore, conducted a thorough metaanalysis of meta-analysis studies previously reported to correlate the random effect or predictive value of genome variations in certain genes for various types of cancer. The aim of the overall analysis was the detection of correlations (1) among genes whose mutation might lead to different types of cancer (e.g. common metabolic pathways) and (2) between groups of genes and types of cancer.

Methods
We performed a thorough field synopsis by studying published meta-analysis studies involving the association of various types of cancer with SNPs located in certain genomic regions. For each published meta-analysis included in our study, we also investigated the number of patients (cases) and controls, date, type of study, study group details (e.g. gender, race, age, etc.), measures included, allele and genotype frequency and also the outcome of each study, i.e. if there was an association or not, the interactions noticed in each of these studies, etc.
We have meta-analysed 150 meta-analysis articles (Additional file 1), which included 4,474 studies, 2,452,510 cases and 3,091,626 controls (5,544,136 individuals in total). The meta-analyses that have been meta-analysed included various racial groups, e.g. Caucasians, Far Eastern populations (Asian, Chinese, Japanese, Korean, etc.), African-American and other population groups (native Americans, Latinos, Aborigines, etc.). Three types of studies were included: (1) pooled analysis, (2) GWAS and (2) other studies, e.g. search in published reports. Collected data consisted of a list of genes, genomic variants and diseases with a known genotype-phenotype association (whether or not a given variation has an impact on susceptibility to a given disease). The principle of our study was to use data mining techniques to find groups (referred to as clusters hereafter) of genes or diseases that behave similarly according to related data. Such groupings will make it possible to find different cancer types susceptible to similar genotypes as well as different genes associated to similar cancer types. Furthermore, our approach would facilitate predicting whether susceptibility to one type of cancer may be indicative of predisposition to another cancer type. Moreover, the association between a group of genes and a given phenotype may suggest that these genes interact or belong to the same biochemical pathway. In order to allow data mining analysis, genotype-phenotype associations had to be classified within a fixed set of categories, i.e. yes/small yes/may/no. Moreover, genes or diseases with fewer than two entries were not considered in our analysis since their clustering would not be meaningful.
Then, data were processed using a state-of-the-art general purpose clustering tool, CLUTO [9]. Data analysis consisted in finding the tightest and most reliable groupings. Since CLUTO offers a wide range of methods, and many different scoring schemes can be used to estimate similarity between genotypes or phenotypes, cluster reliability was assessed by their robustness to clustering criteria (details are provided in Additional file 1). As a consequence, each putative association has been qualified as either 'highly consistent' or 'moderately consistent'. The biological significance of those clusters was, first, evaluated using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) [10,11], a biological database and web resource of known and predicted protein-protein interactions. The STRING database contains information from numerous sources, including experimental data, computational prediction methods and public text collections. It is widely accessible, and it is regularly updated. Second, literature research was performed to complete this initial evaluation.

Results and discussion
In this study, we performed a meta-analysis of published meta-analysis studies to investigate possible correlations among genes and SNPs and various types of cancer, as well as among gene-gene and/or gene-environmental interactions. Furthermore, an advanced literature research was applied in order to evaluate our results obtained from our meta-analysis. Our data were not only consistent with previously published literature but we have also depicted novel correlations of genes with new types of cancer. Our analysis showed a total of ten cancer-related genes that are affected ( Table 1).

Correlation of SNPs' genes with various types of cancer
The association highlighted by our meta-analysis between the CYP2E1 gene and colorectal cancer (CRC), head and neck cancer (HNC) and liver cell carcinoma (LLC) is supported by published data [33][34][35][36][37][38][39]44,121]. An additional literature search to evaluate our initial results revealed novel correlations of the gene combination CYP2E1 and GSTM1 with prostate cancer (PC) susceptibility, lung cancer (LC) and bladder cancer (UBC) as shown in Table 2 [126][127][128]. A similar correlation was found in CRC using a knockdown model [32,40,41]. Studies not only confirm the possibility of association between the CCND1 gene and breast cancer (BC) [25] but also suggest involvement with squamous cell carcinoma (SCC), oesophageal cancer (EC), oral cancer (OC) and malignant glioma (MG), as arisen from the interaction between the CCND1 and CCND3 genes [26, [122][123][124]. This is further corroborated in mouse model studies that show association of CCND1 with BC [25,27-31,153] and PC [125].
Moreover, as far as the ERCC2 is concerned along with the association of ERCC1 gene with BC and LC which is already confirmed [14][15][16][17]21,22], we have also identified from our further literature search on humans the existence of an association with OC [26] and with HNC [129][130][131]. There were no similar mouse studies that could confirm or overrule our findings.
Concerning TGFB1, apart from the BC [64] that was confirmed from the results of our further literature search on humans and on mouse model [75,76], we have noticed also the following associations with gastric dysplasia, LC, pancreatic cancer (PanC) and BC [77,[143][144][145][146]. Also, an association of TGFB1 with CRC was found using a mouse model [147].

Correlations between groups of genes and various types of cancer
We have examined and confirmed the highly consistent gene clustering results over further literature search via STRING. Our search revealed additional types of cancer, except from the types that we have studied in our metaanalysis that seems to be related with pair of genes. STRING database reports binding interaction between GSTP1 and GSTM1 genes, activating interaction between MMP2 and EGF genes, between VEGFA and IL1B genes and between MMP-9 and IL8 genes ( Table 3). The application of our machine learning method has highlighted that those pair of genes have similar association profiles and, therefore, might be involved in the same pathways. The genes that do not appear in the associations do not probably correlate with the presence of a certain type of cancer. First, in our meta-analyses, we observed that the interaction between IL6 and TGFB1 genes was associated to the following types of cancer: BC, CRC, GC, LC and PC as shown in Table 4. Although further literature search on humans could not validate our highly consistent results, we discovered that these interactions are associated to additional types of cancer, such as HNC [187], CRC [158], renal cancer (RC), small cell lung cancer   [188], malignant melanoma (MM) [189][190][191][192] and OVCa [193]. Additionally, regarding our further research on the interaction between IL6 and TGFB1 genes on mouse models, we have confirmed our initial results principally for BC [155][156][157] and PC [159] and have noticed associations with epithelial cancer [194], skin tumour [195], LC [196], OVCa and cervical cancer (CC) [197,198] and HNSCC [199]. Second, we found that the interaction between MMP-2 and EGF was associated with LC, BC and GC (Table 4). Subsequently with a further literature search, we confirmed the association with BC osteolysis [163,164] and also found new associations with EC [200], LC, RC and PC [162]. Furthermore, in some cases, we have observed the association of the aforementioned genes with OSCC [201]. In this study, EGF induced MMP-1 expression that is required for type I collagen degradation. In addition, MMP-1 is also associated with human papillomavirus [202] and BC [165].
Another interesting interaction that was revealed from our analysis was between the VEGFA and IL1B genes that were associated with BC and GC (Table 4). After proceeding with a further literature search, we have not found similar results -except from one report [171] -but we have identified additional associations with HNC, ALL, laryngeal carcinoma and MM [203][204][205][206]. For MMP-9 and IL8 interaction, there was no study confirming our initial results for BC, CRC and GC on neither humans nor mouse models. We have observed though that there was evidence for an association with nasopharyngeal carcinoma [171], LC [177,178] and UBC [207]. Similarly, we could not find any study that could support the interactions between MMP-1 and MMP-3 and GSTP1 with GSTM1, although two studies confirmed that GSTP1 and GSTM1 interactions could be associated with BC [182,183] (Table 4).
Indications from further literature search on human models revealed associations for MMP-1 and MMP-3 with These were identified in our meta-analysis. Their correlation with various cancer types is also shown. NA not available.
We have then attempted to depict the various types of cancers according to the number of SNPs and genes and/ or gene clusters found from our meta-analysis to be meaningfully associated with certain cancer types. Our data indicate that BC is correlated more often than the other types of cancer both with the number of SNPs ( Figure 1A) as well as with the number of genes or gene clusters ( Figure 1B). This observation underlies the heterogeneity of BC, indicating that it is, most likely, not a single disease but a spectrum of related disease states.

Conclusions
In essence, our meta-analysis study generated clusters of genes and diseases, many of which proved to be independent of the criteria used, which suggests that these clusters are most likely biologically meaningful. Preliminary study of some clusters and of our results shows that indeed these genes interact. As regards the associations, with a further literature analysis on human and mouse models, we have also found meaningful gene associations related to other cancer types not previously reported in the literature, an observation that warrants further investigation.

Additional file
Additional file 1: Genes and cancer types included in this metaanalysis.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions ZL carried out the data collection, result analysis and participated in the manuscript preparation. EG participated in the manuscript preparation and data analysis. MF participated in the result and statistical analysis and manuscript revision. EK participated in the data collection and manuscript revision. JCN carried out the result and statistical analysis and participated in the manuscript preparation. HPK participated in the manuscript preparation. GPP participated in the design of the study, data analysis and manuscript preparation. CP conceived of the study, participated in its design and coordination as well as manuscript preparation. All authors read and approved for the final manuscript.  Tables 1, 2, 3 and 4, it seems that the number of genome variations and genes is profoundly bigger in BC, probably indicating that this type of cancer is not a single disease but, most likely, a spectrum of related disease states.