Exploration and validation of related hub gene expression during SARS-CoV-2 infection of human bronchial organoids

In the novel coronavirus pandemic, the high infection rate and high mortality have seriously affected people’s health and social order. To better explore the infection mechanism and treatment, the three-dimensional structure of human bronchus has been employed in a better in-depth study on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We downloaded a separate microarray from the Integrated Gene Expression System (GEO) on a human bronchial organoids sample to identify differentially expressed genes (DEGS) and analyzed it with R software. After processing with R software, Gene Ontology (GO) and Kyoto PBMCs of Genes and Genomes (KEGG) were analyzed, while a protein–protein interaction (PPI) network was constructed to show the interactions and influence relationships between these differential genes. Finally, the selected highly connected genes, which are called hub genes, were verified in CytoHubba plug-in. In this study, a total of 966 differentially expressed genes, including 490 upregulated genes and 476 downregulated genes were used. Analysis of GO and KEGG revealed that these differentially expressed genes were significantly enriched in pathways related to immune response and cytokines. We construct protein-protein interaction network and identify 10 hub genes, including IL6, MMP9, IL1B, CXCL8, ICAM1, FGF2, EGF, CXCL10, CCL2, CCL5, CXCL1, and FN1. Finally, with the help of GSE150728, we verified that CXCl1, CXCL8, CXCL10, CCL5, EGF differently expressed before and after SARS-CoV-2 infection in clinical patients. In this study, we used mRNA expression data from GSE150819 to preliminarily confirm the feasibility of hBO as an in vitro model to further study the pathogenesis and potential treatment of COVID-19. Moreover, based on the mRNA differentiated expression of this model, we found that CXCL8, CXCL10, and EGF are hub genes in the process of SARS-COV-2 infection, and we emphasized their key roles in SARS-CoV-2 infection. And we also suggested that further study of these hub genes may be beneficial to treatment, prognostic prediction of COVID-19.


Introduction
In late December 2019, Wuhan, China, reported an epidemic situation of viral pneumonia caused by unknown microbial pathogens. Its clinical manifestations are very similar to those of viral pneumonia, including fever, cough, shortness of breath, myalgia, and fatigue [1].
On January 7, the pathogen, which was now named as SARS-CoV-2 [2], of this outbreak was identified by the Chinese Center for Disease Control and Prevention (CDC) from a throat swab sample of a patient [3].
SARS-CoV-2 is a new member of the Coronaviridae family [4]. Similar to other coronaviruses, the main routes of transmissions are respiratory droplets, respiratory secretions, and direct contact [5]. Later, it was reported that SARS-CoV-2 could be isolated from fecal swabs, which explains its rapid spread. The rapid spread and high mortality are responsible for the massive global outbreak followed by the outbreak in China. Based on the situation, the World Health Organization (WHO) declared the outbreak to be a public health emergency of international concern on January 31, 2020 [6]. As of October 15, 2020, there are a total of 38,394,169 confirmed SARS-CoV-2 cases worldwide, including 1,089, 047 deaths [7].
Because the novel coronavirus has a huge impact on social order, the global economy and, especially, people's health, a large number of experiments have been done to study the susceptibility and mechanism of SARS-CoV-2. Many infection models were used to study the reaction and immunity of SARS-CoV-2 infected bodies and these models played an important role in these studies [8][9][10][11][12][13]. However, all the commonly used models have a limitation. Either they are suboptimal models to represent human bodies, or they are limited in number [10,11,14].
In order to better understand the interaction between human body and different pathogens, some gene expression research conducted in vivo or in affected patients has established a three-dimensional structural organ model composed of human stem cells [15][16][17]. These models have reproduced various cell components and the morphology as well as an arrangement of each kind of cell. They are believed to help in understanding the path mechanism and developing drugs and vaccines [15,18]. Such an advantaged model was also used to learn about SARS-CoV-2. By now, organoids of different apparatuses have been built and these organoids greatly improved researchers' knowledge toward SARS-CoV-2. A study of intestinal tract organoids indicated that people get infected by SARS-CoV-2 through the intestinal tract, and studies of nervous system-related organoids may explain SARS-CoV-2-infected patients' neurologic symptoms [8,9]. All these studies showed that organoids could be good models to study SARS-CoV-2.
Therefore, we hope to explore the interaction between SARS-COV-2 and human body through the bioinformatics analysis of human bronchial organoids' mRNA expression data from GSE150819 data set and preliminarily explore whether human bronchial organoids can be used as an in vitro model for the study of SARS-COV-2 infection.

Gene expression datasets
We searched the GEO (https://www.ncbi.nlm.nih.gov/ gds/) database to find relevant datasets for our study. Dataset GSE150819 contains genomic information of human bronchial organoids, which we are interested in. Among all samples in GSE150819, 6 samples are available for our study including 3 samples from SARS-CoV-2 infected organoids and 3 from uninfected organoids, and these 6 samples were all sequenced by GPL24676 Illumina nova seq 6000 (homo sapiens) platform. The sample source is bronchial organoid. The detailed information is listed in Table 1.

Identification of DEGs
After the gene expression data were obtained, we analyzed the data using the R package (R Foundation for Statistic Computing). Fold change (FC), P value, and false-positive rate were calculated to define DEGs between SARS-CoV-2 infected organoid samples and uninfected organoid samples. Finally, in the R language program, we selected the DEGs with logFC>1 (upregulated gene) or logFC<−1 (downregulated gene) calculated, while P < 0.05 was considered to be a significant difference and could be used to reduce the false-positive rate.

Functional enrichment analysis of DEGs
The Gene Ontology (GO) database, containing three sections: biological progress (BP), cellular component (CC), molecular function (MF), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database are the most popular tools to learn about gene functions [19]. Therefore, two databases and the reference value P value < 0.05, which was considered as the significant difference, were used to further study the function of DEGs.

PPI network construction and hub gene analysis
The Search Tool for the Retrieval of Interacting Genes/ Proteins (STRING) (https://string-db.org/) is an online database collecting a large amount of protein interaction information. To learn about protein interaction, the STRING online tool was used. Protein interaction is admitted when the combined score was >0.4. We visualized the protein-protein interaction (PPI) network with the usage of Cytoscape software. To study protein interaction further, we used a plug-in to filter to hub genes. And they get the highest degree in the CytoHubba plugin analysis, which means having the highest degree of connectivity within the PPI network [20].

Validation of gene expression levels
We obtained the PPI network action diagram of differential gene expression, and screened out the central genes through the CytoHubba plug-in. These genes, as key genes, are the genes with the highest degree of connectivity or connecting multiple modules in the PPI network diagram. At the same time, we selected GSE150728 to detect peripheral blood mononuclear cells (PBMCs) and verify the expression of pivotal genes. The expression values of the corresponding central genes were found from the expression matrix and organized into a table, which was then produced into a verification map using the GraphPad Prism8 software.

Identification of DEGs
A data set (GSE150819) was obtained from the GEO database and analyzed in the R language. We found 966 DEGs between SARS-CoV-2 infected and uninfected group in total, among which 490 genes were upregulated and 476 genes were downregulated ( Table 2). Volcano plot and heatmap of DEGs were also established. A volcanic map was generated to show up (red) and down (green) genes between infection and normal control (Fig. 1a). Afterwards, we used the heatmap package in R to plot a heatmap based on the expression levels of DEGs in GSE150819 (Fig. 1b).

Functional enrichment analysis of DEGs
We analyzed the enrichment of DEGs, with two set databases GO and KEGG used. The first 10 biological processes of DEGs enrichment are shown in Fig. 2, including leukocyte migration, cell chemotaxis, and response to lipopolysaccharide (Fig. 2a). With regard to cellular components, the results showed that the collagen containing extra-cellular matrix was enriched, and the importance of extracellular matrix collagen components and plasma membrane in the progression of infection in the COVID-19 was gradually recognized (Fig. 2b). In addition, regarding the classification of molecular functions, DEGs focuses on the participation in the following functions: receptor-ligand activity, cytokine activity, and cytokine receptor binding (Fig. 2c). Importantly, KEGG analysis showed that the overlapping DEGs were mainly enriched in the cytokine receptor interaction and the viral protein interaction with cytokine and cytokine receptor (Fig. 2d). We described the difference in gene expression in the above pathway between the SARS-CoV-2 infected group and the uninfected group, respectively (Tables 3 and 4).

Electronic validation of hub gene by GSE150728
GSE150728, performed on peripheral blood mononuclear cells (PBMCs), was selected to validate the expression of hub gene. In GSE150728 database, CXCL1, EGF, CXCL10, CXCL8, and CCL5 were differently expressed before and after SARS-CoV-2 infection. Among these genes, CCL5 was the only gene, whose levels decreased. Anyhow the electronic validation support the finding that CXCL1, EGF, CXCL10, and CXCL8 were significantly expressed in SARS-CoV-2 infected bronchial organoid when compared to normal bronchial organoid (Fig. 4)

Discussion
Pneumonia caused by an infection in novel coronavirus has become a popular public health crisis in the world      [22]. Its early diagnosis and treatment are crucial for later disease progression. In this paper, the related molecular and infectious mechanisms of the human bronchus-like tube after its infection with novel coronavirus were discussed from the perspective of bioinformatics analysis.
In this study, we analyzed six samples from the GSE150819GEO database and found 966 differential genes, of which 490 were upregulated and 476 were downregulated, resulting in the establishment of a volcanic map. In addition, we performed GO and KEGG analyses on the dataset. Among them, biological processes are highly enriched in molecules with leukocyte migration, cell chemotaxis, response to lipopolysaccharide, and bacterial origin, which may be related to an inflammatory reaction [23][24][25]. In the inflammatory response, cytokines secreted by endothelial cells react with leukocyte ligands to express adhesion molecules, chemokines, and chemoattractants. Chemokines bind to glycosaminoglycans on the surface of epithelial cells, triggering leukocyte recruitment and inducing integrin activation. Second, leukocytes also enter the basement membrane and migrate to the mesenchyme, causing significant enrichment and expression in the collagen in the extracellular matrix, the outside of the plasma membrane, and the endoplasmic reticulum lumen of the cell components, which is directly related to the mechanism of pneumonia caused by the novel coronavirus [26]. In the KEGG analysis, the enrichment of cytokine-cytokine receptor interactions and chemokine signaling pathways correspond to the enrichment of cellular components and molecular functions. The high aggregation of the interaction of viral proteins with cytokines and cytokine receptors suggested that viral proteins might be involved in related processes such as inflammation factor storm and inflammation factor regulation. The expression of endoplasmic reticulum collagen in cellular components also suggested that viral proteins might affect the transport process of endoplasmic reticulum. As a pathway to promote cell function, NF−kappa B signaling pathways activation may be related to the synthesis of viral proteins. In summary, the potential host-virus protein interactions may provide us with new ideas and strategies for drug screening and the development of drugrelated research in SARS-CoV-2 [27]. At the same time, based on the PPI network diagram, we used the Cyto-Hubba plug-in to select 12 central genes with high connectivity, all of which occupied the core nodes of the network, suggesting that these central genes might play a pivotal role in the process of infection in the novel coronavirus [20]. There are several highly interacting genes in the PPI network, such as CXCL1, EGF, CXCL10, CXCL8, CCL5, and what calls for special attention is that CCL5 levels decreased in peripheral blood mononuclear cells (PBMCs), whose data is form GSE150728, the dataset we  used for electronic validation of hub gene. However, CCL5 was repeatedly reported to be increased in patients infected by SARS-CoV-2 [28], which supported the result we get from hBO. The unique inflammatory chemokine CCL5 (RANTES) induced at a later phase of inflammation guides white blood cells to migrate into inflammatory lesions during varied pathological processes and maintains local immune cells [29]. It has been demonstrated that CCL5 functions through activating the pathways including transcription factor 3 signaling transducers and activators, nuclear factor-κb, and MAPK pathways across three cell surface receptors named CCR1, CCR3, and CCR5. In addition, CCL5 can directly activate M1 (pro-inflammatory) polarization and prevent M2 (anti-inflammatory) polarization [30]. The lack of homeostatic CCL5 expression will largely influence the activated state of lung TRM (tissue-resident memory) T cell and natural killer cell components [31].
The chemokine CXCL1 plays a significant role in the immune response, mediating its function by combining with the CXCR2 receptor and GAG [32]. It highly regulates the transport, collection, and activation of neutrophilic granulocyte [33]. What is more, it activates the release of proteases as well as reactive oxygen species (ROS), killing the microorganisms at tissue [32]. But it is also associated with damage in numerous tissues, including the lung [34].
Data from previous coronavirus prevalence, including severe acute respiratory syndrome as well as the Middle East respiratory, and current data from the COVID-19 pandemic indicated that there could be pulmonary fibrosis after SARS-CoV-2 infection [35]. EGF may play an important role in pulmonary fibrosis after SARS-CoV-2 infection. Studies of SARS-CoV and SARS showed that abnormal expressions of EGFR are vital in the pathogenesis of fibrosis during coronavirus infection [36,37]. An animal study also reported EGFR potential role in enhancing disease when infected by SARS-CoV [37]. So, it is of great significance to study EGF function in SARS-CoV-2 infected body, which may help prevent pulmonary fibrosis and learn further about the pathogenesis of SARS-CoV-2 induced lung injury.
CXCL10, an interferon-inducible protein, increases greatly in SARS-CoV-2 infected patients [38,39]. Abnormal increase of CXCL10 abnormal is also incurred in SARS infected patients and indicates the worse clinical outcome [40,41]. A study conducted on mice showed that CXCL10 may be involved in fulminant lung injury which can be modified when CXCL10 was neutralized [42]. Back to SARS-CoV-2, serum concentrations of CXCL10, combining with GM-CSF and SARS-CoV-2 viral load are associated with day-28 mortality [43]. All these indicated that CXCL10 can be a potential indicator and a therapeutic target for SARS-CoV-2.
In our study, increased expressions of CXCL10 and CXCL8 are observed, which also seem to characterize COVID-19. CXCL8 (IL-8) is a member of the chemokine family, and its receptors are CXCR1/2 [41]. The combination of CXCL8 and CXCR1 or CXCR2 will lead to lung injury [22,44]. Once blocked, the injury reduces [22,44]. Hence, the study also indicates that CXR1/2 receptor stimulation has a positive effect on pulmonary recovery [41]. In this case, understanding its role in COVID-19 may contribute to clinical drug development. For patients suffering from ARDS, both CXCL8 and anti-CXCL8 levels increase which enhanced the survival of neutrophils [44]. CXCL8 plays a major role in the initial control of respiratory tract infection due to its chemotactic activity for neutrophils and monocytes [45]. And CXCL8 is considered a potential prognostic bio-marker for ARDS clinical course, which is a significant syndrome of SARS-CoV-2 [41].
In addition, this study has its limitations. The small sample size increases the error of the analysis results. If the analysis can be performed based on a large sample size, it may be possible to study the relationship between each central gene and the role of the pathway more fully so that the accuracy is better.
In conclusion, by studying the mRNA expression of human bronchial organoids, we have found that a series of DEGs (CXCL1, CCL5, EGF, CXCL10, CXCL8), which were also reported in SARS-CoV-2 infected patients, in vitro and in vivo models, plays an important role in the COVID-19 and are helpful for diagnosis and treatment. Our findings can better study the molecular mechanism of infection.

Conclusion
In summary, using a series of biological analyses, we have obtained a new understanding and description of human bronchial organoids infected with novel coronavirus, and through the analysis and verification of differential genes, we have identified that CXCL1, CCL5, EGF, CXCL10, and CXCL8 play a key role in the infection process in novel coronavirus and a series of pathways related to cytokine expression, leukocyte migration, and cytokine-cytokine receptor interaction have been significantly changed. These findings will help study the infection mechanism and therapeutic control in novel coronavirus. In order for these biomarkers and targets to become a powerful tool for clinically diagnosing and studying the infection mechanism, more samples should be used to further study the expression differences of these genes. What is more, the pathways and genes mentioned above also play an important role in patients, in vitro and in vivo models infected with SARS-COV-2, and they may be a key link in the pathogenesis of COVID-19. Therefore, we preliminarily concluded that the bronchial organ model can be used as an in vitro model for the study of SARS-COV-2.