Identification of subgroups along the glycolysis-cholesterol synthesis axis and the development of an associated prognostic risk model

Background Skin cutaneous melanoma (SKCM) is one of the most highly prevalent and complicated malignancies. Glycolysis and cholesterogenesis pathways both play important roles in cancer metabolic adaptations. The main aims of this study are to subtype SKCM based on glycolytic and cholesterogenic genes and to build a clinical outcome predictive algorithm based on the subtypes. Methods A dataset with 471 SKCM specimens was downloaded from The Cancer Genome Atlas (TCGA) database. We extracted and clustered genes from the Molecular Signatures Database v7.2 and acquired co-expressed glycolytic and cholesterogenic genes. We then subtyped the SKCM samples and validated the efficacy of subtypes with respect to simple nucleotide variations (SNVs), copy number variation (CNV), patients’ survival statuses, tumor microenvironment, and proliferation scores. We also constructed a risk score model based on metabolic subclassification and verified the model using validating datasets. Finally, we explored potential drugs for high-risk SKCM patients. Results SKCM patients were divided into four subtype groups: glycolytic, cholesterogenic, mixed, and quiescent subgroups. The glycolytic subtype had the worst prognosis and MGAM SNV extent. Compared with the cholesterogenic subgroup, the glycolytic subgroup had higher rates of DDR2 and TPR CNV and higher proliferation scores and MK167 expression levels, but a lower tumor purity proportion. We constructed a forty-four-gene predictive signature and identified MST-321, SB-743921, Neuronal Differentiation Inducer III, romidepsin, vindesine, and YM-155 as high-sensitive drugs for high-risk SKCM patients. Conclusions Subtyping SKCM patients via glycolytic and cholesterogenic genes was effective, and patients in the glycolytic-gene enriched group were found to have the worst outcome. A robust prognostic algorithm was developed to enhance clinical decisions in relation to drug administration.


Background
Skin cutaneous melanoma (SKCM) is a highly aggressive malignancy, and the mortality rate increases dramatically following metastasis [1]. In its early stage, melanoma can be successfully treated with surgery; however, once it has metastasized it needs to be treated with drugs [2]. Adjuvant systemic therapy is widely used in melanoma patients, especially those with stage III/IV. Immune checkpoint inhibitors and BRAF-targeted therapies have shown efficacy in curbing metastatic melanoma [3], but they often fail in many SKCM patients due to drug insensitivity and resistance, which is attributed to the heterogeneity of melanoma [4].
Building molecular subtypes and a risk model algorithm for SKCM could be an effective solution to determining clinical pathways. In this respect, the Cancer Genome Atlas (TCGA) group established a framework for classifying genomes into four subtypes: mutant BRAF, mutant RAS, mutant NF1, and Triple-WT (wild-type) [5]. Another study identified two immune subtypes that have opposite immune states and developed a prognostic fiveimmune-associated gene signature [6]. In addition, a twogene immune-related signature consisting of CCL8 and DEFB1 was constructed in 2020 [7], and the six genes (SH2D3A, TMEM201, LZTS1, CREG1, NIPA1 and HIST1H4E) model developed by another team are projected to play a vital role in the prognosis of uveal melanoma [8]. However, studies focusing on glycolysischolesterogenesis-related subtypes and associated prognostic models in melanoma are currently lacking.
Metabolic activity is pivotal to the developmental progress of tumors, and it contributes to tumor plasticity. The characteristics of tumor heterogeneity are reflected in cellular and metabolic aspects, including the differential tumor microenvironment (TME) and variable biological pathways [9,10]. Altered metabolic activities influence tumor progress, reflect the associated prognosis, and influence the drug therapeutic effect [11]. Evidence has shown that glycolysis contributes to the disease progress of melanoma and that restricting glycolytic activity acts as a therapy, preserves immune cell function, and improves the immune checkpoint blockage effects [12]. The research of Andreas Koc et al. showed that the expression level of Cyclin D1, which acts as an outcome biomarker in melanoma and indicates its proliferative and invasive extents, is associated with glucose transporter isoform 1 (GLUT1), glycolysis-related genes hexokinase 1 (HK1), lactate dehydrogenase A (LDH-A), and monocarboxylate transporters 1 (MCT1) [13]. Many malignancies, such as lung, gastrointestinal, and pancreatic cancer, harbor KRAS and defunctional TP53 oncogenes, both of which induce glycolytic pathways in malignancies [14][15][16][17]. Unlike healthy tissues, most cancer cells mainly produce energy through glycolysis at a high rate, and at the same time metabolite pyruvate is converted to lactic acid followed by fermentation in cytosol. This process is known as the Warburg effect; it is a hallmark of tumor evolution and it impacts drug efficacy [18].
The Mitochondrial pyruvate complex (MPC) can counteract the influence of glycolytic activity by transporting pyruvate to mitochondrion, and loss-of-function MPC is related to fast cancer cell growth and a poor outcome [19]. In the metabolic process, pyruvate, which is an intermediate molecule of the tricarboxylic acid cycle (TCA cycle), provides precursor citrate for further adipogenesis, including synthetic cholesterol and free fatty acids [20]. Evidence has shown that cholesterol, together with its metabolites and precursors, regulates tumorigenesis and promotes biological process, such as oncogene-driving pathways, ferroptosis, and TME in malignancies [21]. In this respect, cholesterol inhibitors, including statins, are utilized in tumor therapies [22]. However, the role of cholesterogenesis in cancer remains arguable, and the efficacy of statins in regulating cancer has shown mixed effects. In addition, distinct responses exist due to tumor heterogeneity [23,24]. However, the expression levels of mitochondrial pyruvate complex 1 (MPC1) and mitochondrial pyruvate complex 2 (MPC2) regulate malignancy outcomes [25], and this indicates the distinct performance of pyruvate flow in differentiated malignancy types. In this respect, it also shows that balanced glycolytic and cholesterogenic pathways jointly modulate tumor progression.
Based on previous studies of melanoma subtypes [5,6], we aimed to define new subtypes from a metabolic perspective by using different metabolic levels. We used a large-patient cohort from TCGA (https://www.cancer. gov/about-nci/organization/ccg/research/structuralgenomics/tcga) to explore novel subclassifications in melanoma based on glycolysis-cholesterogenesis differential expressed genes, and we then compared the characteristics of the discovered subtypes and further validated the subtyping efficacy and reproducibility. We determined that the subtype with the highest glycolysis and lowest cholesterol synthesis had the worst prognosis. We subsequently explored the nature of the glycolytic subgroup and finally developed a risk prognostic model based on the glycolytic subgroup to provide a quantitative method that involves an enhanced biological understanding and which can be used to develop clinical strategies for SKCM management.

Results
The identification of four metabolic subgroups along the glycolysis-cholesterol synthesis axis The gene expressions of 366 SKCM tumor samples obtained from the TCGA were used to screen the co-expressed glycolytic and cholesterogenic genes. Consensus clustering was conducted on 72 genes in the glycolytic pathway and 25 genes in the cholesterogenic pathway. According to the results, the best grouping scheme was obtained with 6 (k = 6) gene clusters (Fig.  1A), where the method used to determine the optimum k value is provided in the work of Wilkerson et al. [26]. Hierarchical clustering was then conducted based on the consensus matrix generated. As shown in Fig. 1B, genes in C5 (defined as glycolytic co-expressed genes) all belonged to the glycolytic pathway and genes in C6 (defined as cholesterogenic co-expressed genes) all belonged to the cholesterogenic pathway. The median expressions of these glycolytic and cholesterogenic coexpressed genes were then acquired from 366 SKCM tumor samples, and four metabolic subgroups were subsequently identified based on the median expressions (Fig. 1C). The expressed levels of these selected genes were visualized across four subgroups (Fig. 1D), and principal component analysis (PCA) was used to illustrate the difference between the four subgroups (Fig. 1E).
The results showed that patients in the glycolytic subgroup had the worst prognosis ( Fig. 1F and G), which indicates that SKCM with higher glycolysis and lower cholesterol synthesis may have more aggressive characteristics. The clinically relevant information relating to the four subtypes is presented in Table 1. We found that the overall survival (OS) rates and progress-free intervals (PFIs) differed significantly among patients in the different four subtypes (log-rank test, P = 0.024 and < 0.001). There were no statistical differences between the other clinical characteristics of patients, but this result could relate to the large number of categories used with respect to certain characteristics.

Differences in somatic mutations across the metabolic subgroups
As shown Fig. 2A, we found that greater numbers of simple nucleotide variations (SNVs) of MGAM occurred in the glycolytic subgroup. MGAM is a member of the glycoside hydrolase family 31, and it is involved in galactose metabolism and metabolic pathways [27]. No studies to date have revealed a correlation between the SNVs of MGAM and the progress of the cancer, and we thus considered that to evaluate MGAM in cancer, glycolysis might be a good entry point. We therefore selected genes that had significantly different copy number variation (CNV) statuses across the metabolic subgroups. Of these, DDR2 and TPR were filtered out for their high CNV frequencies in both the glycolytic and mixed subgroups. DDR2 promotes the migration and invasion of metastatic melanoma cells [28], and DDR2 inhibition makes tumors vulnerable to anti-PD-1 immunotherapy [29]. TPR can promote cell proliferation in colorectal cancer via binding with GSK3β [30]. In Fig.  2B and E, we found that amplifications or gains of DDR2 or TPR were greater in the glycolytic and mixed subgroups (chi-square test P value < 0.05). Not unexpectedly, the gene levels of DDR2 or TPR were found to be higher in the glycolytic or mixed subgroups, as shown in Fig. 2D and G (Kruskal-Wallis P value < 0.05). There is thus a relationship between these results and the worse prognoses of the glycolytic and mixed subgroups, and it is considered that when conducting research targeting DDR2 or TPR, the activity of glycolysis should be studied.

Positive correlation between tumor purity and metabolic status
According to the results of ESTIMATE, the immune score and stromal score were negatively correlated with the metabolic status, as shown in Fig. 3A and B (the mixed < the cholesterogenic < the glycolytic < the quiescent), and the result of the ESTIMATE score showing the sum of the immune and stromal scores is shown in Fig. 3C. The results indicate that there is a decrease in the number of immune cells and stromal cells infiltrated in the microenvironment of tumor tissue with an increase in the metabolic state. Tumor purity was then calculated using the ESTIMATE scores, and a higher metabolic status was found to be associated with higher tumor purity, as shown in Fig. 3D. This result suggests that a high metabolic status relates to high tumor tissue purity. In addition, tumors with high cholesterol synthesis were found to have a higher purity than tumors with high glycolysis (Fig. 3D).

Glycolytic subgroup has a higher proliferation level
As shown in Fig. 4A, the proliferation score was positively correlated with the median expression of cholesterogenic genes (rho = 0.38, P < 0.05) and glycolytic genes (rho = 0.44, P < 0.05), and a higher proliferation score level was found in the glycolytic subgroup, as shown in Fig. 4B (Kruskal-Wallis P < 0.05). Similarly, the expression level of MKI67 was positively correlated with the median expression of cholesterogenic genes (rho = 0.37, P < 0.05) and glycolytic genes (rho = 0.61, P < 0.05), as shown in Fig. 4C. Furthermore, the glycolytic group was found to have a higher expression level of MKI67, as shown in Fig. 4D (Kruskal-Wallis P < 0.05). These findings confirm that tumors in the glycolytic subgroups identified have greater proliferation abilities.

Gene expressed networks and biological activities related to glycolysis and cholesterol synthesis
In this study, four metabolic subgroups were identified via different metabolic statuses, as represented by the median expression of glycolytic and cholesterogenic genes. WGCNA was then conducted to find more genes related to glycolysis and cholesterol syntheses and to conduct subsequent research. According to Fig. 5A, when the soft threshold was equal to 3, the gene networks satisfied both a high degree of internal connectivity and a high gene similarity. The gene networks with similarity were then merged and six networks represented by different colors were finally identified (Fig.  5B). The relationships between the gene networks and the median expressions of cholesterogenic and glycolytic genes were then explored, and the turquoise and yellow networks were found to be most related to cholesterol synthesis and glycolysis, respectively (Fig. 5C). The gene significance and module membership of genes in the turquoise and yellow networks are shown in Fig. 5D and E, and the values of these variables exhibit strong positive correlations. It is considered that gene significance could reflect the representativeness of a gene in the corresponding phenotype, and that membership could represent the correlation between a gene and its networks. The genes in the turquoise and yellow networks were used in a KEGG enrichment analysis (P < 0.05). As shown in Fig. 6A, glycolysis and cholesterol synthesis are related to cellular processes, human disease, and genetic information processing (level 1 of the KEGG functional category). Furthermore, the p53 signaling pathway, cell cycle, cellular sentence, and homologous recombination are enriched in glycolysis, whereas the adherence junction, platinum drug resistance, and proteoglycans in cancer are enriched in cholesterol synthesis.

Development of a prognostic risk model
Patients in the glycolytic subgroup were found to have the worst prognosis, and we thus used the genes in the yellow network, which are most related to glycolysis, to train the risk model. The C-index was set as the reference in the cross validation to select the optimum least absolute shrinkage and selection operator (Lasso) model for the training group. As shown in Fig. 7A, the 44genes risk model had the highest C-index, and Fig. 7B shows the changes in the coefficients of different genes during the cross validation.
It was then possible to calculate the risk score of each patient to evaluate their risk level using the following equation, and the genes and their coefficients are shown in Table 2. Patients in the training group, internal validation group, and the GSE19234 data set were then ranked in an ascending order of risk score. Due to batch effect across different platforms, the median of the risk scores of each group was selected as the cut-off value to divide patients into high risk and low risk ( Fig. 7C-H).
The global expression levels of the 44 genes are shown in Fig. 7I-K. In the training group and internal validation groups, the high-risk patients had the worst prognosis for OS and PFI ( Fig. 8A-D). In the GSE19234 group, the high-risk patients also had the worse prognosis for OS (Fig. 8E). In all these groups, the model showed a good predictive ability for OS or PFI (AUC > 0.65), as shown in Fig. 8F-I. Unlike our model, Liao's immune-related model was unable to distinguish between high-risk and low-risk patients using an external dataset (GSE19234) (Fig. 9A, log-rank P = 0.876) [7], and their model provided a comparatively poorer performance in terms of calculating the AUC (Fig. 9B, 5 years AUC of Liao's model = 0.44). To validate our model's ability to obtain an independent prognostic factor, univariate and multivariate Cox regression analyses were conducted in the filtered international validation group (n = 92, patients with too many missing values of clinical information were removed) ( Table 3), and we found that pathological T, pathological N, pathological M, and the risk model were able to independently predict a patient's prognosis.

Predicted use of drugs based on prognostic risk model
We used compound data from CTRP and PRISM databases to predict which potential drugs could be The value of k represents the number of clusters during the consensus cluster. When the optimal k value is reached, the area under the CDF curve will not significantly increase with the increase of k value. B Heatmap depicting consensus clustering solution (k=6) for glycolysis and cholesterogenic genes in SCKM samples (n=469). C Scatter plot showing median expression levels of co-expressed glycolytic (x-axis) and cholesterogenic (y-axis) genes in each SKCM sample. Metabolic subgroups were assigned on the basis of the relative expression levels of glycolytic and cholesterogenic genes. D Heatmap depicting expression levels of co-expressed glycolytic and cholesterogenic genes across each subgroup. E PCA showed that patients in the different subtgroups were significantly different from each other. F Kaplan-Meier survival curves for patients in the different subgroups. Log-rank test P values are shown. The clinical outcome endpoint is OS. G Kaplan-Meier survival curves for patients in the different subgroups. Log-rank test P value is shown. The clinical outcome endpoint is PFI. SKCM, skin cutaneous melanoma; CDF, cumulative distribution function; PCA, principal components analysis; OS, overall survival; PFI, progression-free interval. And P < 0.05 is defined as statistically significant administered to patients with high-risk scores. As shown in Fig. 10A, MST-312, neuronal differentiation inducer III, and SB-743921 showed high sensitivity for patients with high-risk scores, and Fig. 10B shows that romidepsin, vindesine, and YM-155 were highly sensitive for patients with high-risk scores.

Discussion
In this research, we explored individual differences in SKCM from a metabolic perspective. Both glycolytic and cholesterogenic pathways orchestrate tumor metabolic adaptations and play roles in cancer progression and proliferation [31]. A previous study discussed the   Kruskal-wallis test P value is shown. P < 0.05 is defined as statistically significant metabolic differential characteristics of pancreatic cancer, highlighted the existence of metabolic heterogeneity in tumors, and determined distinct profiles based on glycolytic and cholesterogenic genes [32]. We therefore hypothesized that SKCM could be subclassified based on glycolytic and cholesterol synthetic genes. A previous study identified TCGA SKCM subtypes based on genomic mutations [5], whereas our study focused on metabolic function to sub-classify SKCM. The results supplement those already published on the SKCM subtypes of TCGA.
We extracted glycolytic and cholesterogenic genes from "REACTOME_GLYCOLYSIS" (n = 72) and "REACTOME_CHOLESTEROL_BIOSYNTHESIS" (n = 25) gene sets, respectively, which were obtained from the Molecular Signatures Database v7.2 [33,34]. To find the most representative genes and reduce noise, consistent clustering was used to reduce the number of genes. Ultimately six gene clusters were obtained (C1-6). C5 and C6 were then designated as the co-expressed cholesterogenic and glycolytic genes, respectively. Cluster C5 contained 9 cholesterogenic genes, including ACAT2, . The relationship of soft threshold and mean connectivity (right). B After the dynamic of cut and merged, a total of 6 gene modules were finally generated. C Heat map for the correlation of gene modules and phenotypes. D Scatter plot depicting the correlation between gene significance and module membership of genes in turquoise network. E Scatter plot depicting the correlation between gene significance and module membership of genes in yellow network. WGCNA, weighted correlation network analysis; TOM, topological overlap matrix. And P < 0.05 is defined as statistically significant CYP51A, DHCR24, DHCR7, HMGCR, HMGCS1, IDI1, MSMO1, and SQLE, and Cluster C6 contained 12 glycolytic genes, including GNPDA1, NUP188, NUP205, NUP214, NUP62, NUP85, POM121, POM121C, PPP2R1A, PPP2R5D, PRKACA, and RAE1. Multiple genes have been digested in melanoma research and found to interfere with melanoma proliferation [35][36][37][38][39][40][41][42][43]. Based on above genes, the samples were divided to four subtypes: cholesterogenic, glycolytic, mixed, and quiescent groups.
Unexpectedly, the best prognosis was found in patients with the quiescent subtype, desert of both two metabolic pathway-related genes. The glycolytic subtype had the worst clinical outcomes and a significantly higher MGAM SNV rate than the others. MGAM (maltase-glucoamylase) belongs to the glycoside hydrolase family 31 and acts as a target in type II diabetes [44]. A higher expression of MGAM has been noted in luminal A breast cancers, and a meta-analysis identified that MCAM mutations are associated with a positive clinical benefit in non-small lung cancer cases [45,46]. It is thus a promising focus for future glycolytic and cancer investigations.
As mentioned in the results, the CNV statuses of DDR2 and TPR were more significant in the glycolytic and mixed subtypes, which also support their association with a poor prognosis. It is of note that both higher glycolytic and cholesterogenic gene expressions indicated higher proliferation scores, although the slope of the glycolytic gene expressions was steeper.
Following subclassification and subsequent analyses, we succeeded in constructing four subtypes based on genes related to glycolysis and cholesterol synthesis pathways and ultimately recognized the glycolytic subtype as the group with the highest risk. We then tried to investigate further by analyzing the subclassifications, and we determined an optimum method for translating the results by constructing a risk predictive model.
The risk score modifier consisted of 44 genes relating to the glycolytic subtype, as it was found to have the worst prognosis. After conducting a literature review, we discovered that the roles of many genes in the classifier have been investigated with respect to melanoma and have been found to influence tumorigenesis. To be specific, CDCA8 was positively related to the risk scores, Fig. 6 The results of KEGG analysis in cholesterogenic genes and glycolytic genes. KEGG, Kyoto Encyclopedia of Genes and Genomes. P < 0.05 is defined as statistically significant   and it was previously validated as being associated with tumor proliferations in melanoma by Chao et al. [47]. Similarly, DEPDC1B SHCBP1, RRM2, PLK1, and UHRF1 have been found to promote melanoma proliferation and could be potential targets for treatment [48][49][50][51][52]; their coefficients were all positive, which implies that they had an additive effect on the risk scores. The model was further validated using both internal and external validation processes. Potential drugs for high-risk patients were then investigated, and MST-312, neuronal differentiation inducer III, SB-743921, romidepsin, vindesine, and YM-155 were finally identified as high-sensitivity drugs for patients with high-risk scores. In this respect, MST-321 (telomerase inhibitor) induces apoptosis in multiple malignancy cells, such as myeloma cells and breast cancer cells [53,54]. Neuronal Differentiation Inducer III is a neuronal differentiation inducing compound that can inhibit brain tumors through inducing brain stem cells differentiation [55], and SB-743921 (Molecular Formula: C31H33ClN2O3·HCl) acts as a potent and active KSP (kinesin spindle protein) inhibitor that inhibits multiple tumor growth and induces apoptosis [56,57]. The efficacy of romidepsin, vindesine, and YM-155 in targeting melanoma has been previously validated [58][59][60]. Therefore, all these drugs could be considered for use in the clinical management of high-score SKCM patients.
With respect to its heterogeneity and plasticity, SKCM is a complicated disease and therapy intolerance and insensitivity may influence therapeutic efficacy. Our study provides a novel subclassification of SKCM and an associated risk score signature based on metabolic genes that could enhance biological understanding and clinical strategies.
However, although the results of this study are positive, certain limitations exist. First, although our research analyzed a large cohort of patients and verified the efficacy of the method using a dataset from another platform, a long-term prospective investigation is essential before applying the risk model in a clinical setting. Second, we used the median expression values for genes C5 and C6 with a cutoff of 0 to define subtypes with  different glycolytic and cholesterogenic levels. However, the use of other clustering methods, such as K-means or Mclust, may be beneficial in the future. In addition, Lasso, smoothing clipped absolute deviation penalty (SCAD), and minimax concave penalty (MCP) regression methods are known to be the most effective for selecting variables. The objective function in Lasso is convex and easy to calculate, the coefficient of the compression independent variable is 0, and it has good robustness; therefore, it is especially practical. Several prognostic research studies have used Lasso when investigating tumor diseases [61][62][63]. Therefore, we used Lasso to develop our prognostic model in this study. However, the other two algorithms developed since Lasso, SCAD and MCP, are known to be occasionally better than Lasso when selecting important features [64]. Therefore, the failure in our study to use SCAD or MCP to screen the model is another limitation, and we will use these methods in future studies. Moreover, we determined several drugs that are highly sensitive in high-risk patients. Following a literature review, we found romidepsin, vindesine, and YM-155 have been shown to have antitumor effects in melanoma development, whereas investigations using MST-321, SB-743921 and Neuronal Differentiation Inducer III in SKCM are lacking. We therefore believe that they should be studied in the future.

Conclusion
In this study, a dataset with 471 SKCM specimens was downloaded from The Cancer Genome Atlas (TCGA) database. We extracted and clustered genes from the Molecular Signatures Database v7.2 and acquired coexpressed glycolytic and cholesterogenic genes. We then subtyped the SKCM samples, and validated the efficacy of subtypes with respect to simple nucleotide variations (SNVs), copy number variation (CNV), patients' survival statuses, tumor microenvironment, and proliferation scores. A survival analysis, SNV, CNV, and tumor environmental analyses were conducted, and the glycolytic subtype was found to have the worst prognosis in all the above aspects. We also explored the subtyping nature and built a 44-gene algorithm for predicting the SKCM prognosis in patients, and we validated the risk score signature in both internal and external independent cohorts. We

Methods and materials
Data acquisition and processing  [65]. According to the CDR, the clinical endpoints used for SKCM were selected as overall survival (OS) and progression-free interval (PFI). In addition, a dataset (GSE19234, n = 44) from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) with survival information was selected to validate the risk score model. In this study, we filtered out patients who had follow-up times of less than 30 days, and patients for whom survival information was missing were deleted.

Identification of metabolic subgroups
First, we extracted glycolytic and cholesterogenic genes from the "REACTOME_GLYCOLYSIS" (n = 72) and "REACTOME_CHOLESTEROL_BIOSYNTHESIS" (n = 25) gene sets, respectively, where were obtained from the Molecular Signatures Database v7.2 [33,34]. A consensus cluster was then conducted on these genes using the Con-sensusClusterPlus R package [26]. The number of subsamples was 100, the proportion of items to samples was 0.8, the proportion of features to samples was 1, and the hierarchical linkage method for subsampling and the consensus matrix was Ward. D2. The consensus cluster results provided the co-expressed glycolytic and cholesterogenic genes. We also identified four metabolic subgroups based on the median expression of these two types of coexpressed genes and defined them as mixed (glycolytic median > 0, cholesterogenic median > 0), cholesterogenic (glycolytic median < 0, cholesterogenic median > 0), glycolytic (glycolytic median > 0, cholesterogenic median < 0), and quiescent (glycolytic median < 0, cholesterogenic median < 0) subgroups. A principal components analysis (PCA) was then conducted on all of the protein-coding genes to determine if patients in these four subgroups differed from each other. Finally, survival analyses were conducted based on the OS and PFI clinical endpoints, and the log-rank test P values were calculated.
Analysis of SNV across metabolic subgroups SNV data were obtained via the workflow of MuSE Variant Aggregation and Masking. We extracted the SNV matrix and divided it into four groups based on the metabolic subgroups. The genes were then ranked according to mutation frequency in descending order, and the top ten genes were selected to investigate their mutation statuses in the four subgroups via the GenVisR R package [66].

CNV analysis across metabolic subgroups
First, we conducted the chi-square test on all genes with CNVs to select genes that had significantly different CNV statuses across the metabolic subgroups. Genes with expression levels that were not correlated with their CNV status were filtered out. Finally, genes with a high proportion of CNVs in the glycolytic or cholesterogenic subgroups were selected to reflect the characteristics of these subgroups.

Analysis of tumor microenvironment across metabolic subgroups
Kosuke et al. developed an algorithm to determine tumor purity based on gene expression information [67] that can be used in the R package, ESTIMATE. The gene expressions of SKCM patients (n = 467) were input to ESTIMATE, and four scores were output. In this respect, the immune score reflected the abundance of immune cells around the tumor, the stromal score reflected the abundance of stromal cells around the tumor, and the ESTIMATE score was the sum of the immune and stromal scores, and it was negatively correlated with tumor purity.

Analysis of the relationship between proliferation and metabolic subgroups
The proliferation score was acquired from the TCGA cohort research study [68]. The proliferative status was represented using the proliferation score and the expression of MKI67, which encodes KI67 and is the most common marker of cell proliferation. The correlation between the proliferative status and the median expression of glycolytic or cholesterogenic genes was explored using the Spearman's test, and the ggstatsplot R package was used to visualize the results of Spearman's rank correlation. The proliferative statuses across metabolic subgroups were visualized using the ggpubr R package, and the Kruskal-Wallis P values were calculated.
Weighted gene co-expression network analysis (WGCNA) All genes were arranged in descending order of variation between samples, and the top 5000 genes were selected for analysis using the R package, WGCNA [69]. According to the calculation conducted using this package, the soft threshold was set as 3. For gene module fusion, the cutoff value was set to 0.25. The results of WGCNA showed that gene modules were mostly correlated with the median expression of glycolytic and cholesterogenic genes, and this was thus selected for use in further research.

Enrichment analysis of glycolytic and cholesterogenic phenotypes
As mentioned in the "Weighted gene co-expression network analysis (WGCNA)" section, we selected gene modules that were mostly correlated with the median expression of glycolytic and cholesterogenic genes. We then conducted an enrichment analysis on these genes via the clusterProfiler R package based on the gene sets within the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [70,71].

Construction of a prognostic risk model via lasso
Genes in the yellow module were used to construct the risk model using Lasso, which is a popular machine learning method [72], and the glmnet R package was employed in this respect [73]. The parameters were set as follow: family = "cox", type.measure = "C", and parallel = TRUE, and the other parameters (not mentioned here) were set as default. Patients from the TCGA were divided into a training group (n = 310) and an internal validation group (n = 124) via the caret R package [74]. The model in the training group was generated by Lasso, and a survival analysis and time-dependent receiver operating characteristic curve (tdROC) were completed to validate the effectiveness of this model in the training group (n = 310), internal validation group (n = 124), and GSE19234 (n = 44). The proposed model was then compared to the existing one in GSE19234 (n = 44) in terms of the survival analysis and AUC using the Survival R package [7]. As a continuous risk score was obtained from the model, Cox test and log-rank test P values were simultaneously obtained. Since the risk scores were derived from training group, univariate and multivariate Cox regression analyses of the risk score and other clinical pathological variables were conducted only in the international validation group (n = 124). And due to some patients had too many missing values of clinical information, the patients in international validation group were filtered and 92 patients were screened out.
Identification of candidate components with higher drug sensitivity in patients with high-risk score The study of Paul Geeleher et al. demonstrated a method using only before-treatment baseline tumor gene expression data that could be used to predict the chemotherapeutic response of patients [75]. They tested a number of the plethora of common machine learning algorithms, including random forest, PAM, principal component regression, Lasso, ElasticNet regression, and ridge regression. Among these, ridge regression was consistently found to be the best performer, and it was noted to have the added advantage of being highly computationally efficient. In their method, the ridge regression tuning parameter was automatically selected, and to facilitate the use of their method, they developed an R package named pRRophetic [76]. With the help of pRRophetic R package, we predicted the sensitivity of SKCM patients from the TCGA to different agents using ridge regression based on data from CTRP2.0 and PRISM databases. Both datasets provided the area under the dose-response curve (AUC) values as a measure of drug sensitivity, and lower AUC values indicated an increased sensitivity to treatment. The components with significantly lower AUCs in high-risk patients were first selected, a Spearman's correlation test between the AUC and the risk score was conducted, and components with significantly negative rho (rho < − 0.3) were retained. Results from the CTRP2.0 and PRISM databases are shown separately.

Statistical analyses
All statistical analyses were conducted using R software 3.6.3. In all analyses, p < 0.05 was considered statistically significant. All of the R packages used in the various analytical procedures are listed in the associated corresponding chapters.