Whole-genome sequencing of Chinese centenarians reveals important genetic variants in aging WGS of centenarian for genetic analysis of aging

Genetic research on longevity has provided important insights into the mechanism of aging and aging-related diseases. Pinpointing import genetic variants associated with aging could provide insights for aging research. We performed a whole-genome sequencing in 19 centenarians to establish the genetic basis of human longevity. Using SKAT analysis, we found 41 significantly correlated genes in centenarians as compared to control genomes. Pathway enrichment analysis of these genes showed that immune-related pathways were enriched, suggesting that immune pathways might be critically involved in aging. HLA typing was next performed based on the whole-genome sequencing data obtained. We discovered that several HLA subtypes were significantly overrepresented. Our study indicated a new mechanism of longevity, suggesting potential genetic variants for further study.


Introduction
With the development of human genomics research, a large number of studies of the genetics of longevity have been conducted. Scientists from various countries have proposed many different theories concerning the mechanisms of aging from different perspectives, involving oxidative stress, energy metabolism, signal transduction pathways, immune response, etc. [1,2]. These mechanisms interact with each other and are influenced by heredity to some degree [2,3]. The identification of longevity-related biological markers is critical to an indepth understanding of the mechanisms of carrier protection against common disease and/or of the retardation of the process of aging.
Studies revealed from 300 to 750 genes related to longevity that are critically involved in a variety of life activities, such as growth and development, energy metabolism, oxidative stress, genomic stability maintenance, and neurocognition [4]. These candidate genes include mainly APOE, a gene involved in lipoprotein metabolism [5,6]. Others are those involved in cell cycle regulation, cell growth and signal transduction, the maintenance of genome stability, and the endocrine-related pathway [7][8][9]. In addition, the candidates for longevity encompass genes related to drug metabolism, the ones involved in protein folding, stabilization, and degradation, as well those related to coagulation and regulation of circulation [10], etc. In most cases, these genes or their polymorphic sites were examined in multiple population replication studies, which discovered certain longevity-associated genes or pathways [4][5][6][7][8][9][10].
Besides, longevity is associated with immunity and inflammation [11,12]. HLA gene, also known as the major histocompatibility complex gene, encodes the major histocompatibility complex (MHC), which is a gene family existing in most vertebrate genomes, closely related to the immune system [13]. Earlier study indicated that HLA may be the genetic basis for the specific response patterns of longevity and longevity immunity [14]. Inflammatory cytokines, such as TNF-α, IL-1β, and IL-6, may be key players [15]. IL-10 limits and terminates the inflammatory response by inhibiting the action of T cells, monocytes, and macrophages, and thus, the genetic variants of this gene may also affect the longevity phenotype [16].
However, at present, most of the investigations on longevity factors of centenarians are performed on a small number of candidate miRNAs, single tissues, or single samples, and only few studies have systematically conducted analyses of multiple tissues and copies at the whole-genome level [17][18][19].
Based on the results of previous cohort study, in this study, we aimed to use genome-wide sequencing technology to conduct genome-wide association studies and analysis of centenarians. Our findings would facilitate a more accurate focus on the most important genetic basis and molecular mechanisms associated with longevity. The conclusions of this study can serve as the basis for the public efforts towards the extension of the length of life. Moreover, they will provide a scientific reference for further clinical research on disease treatment and overall health care promotion.

SKAT analysis revealed significantly correlated genes in aging
The sequencing platform Illumina XTen (Illumina, San Diego, CA, USA) was used for sequencing of the entire genome of 19 centenarians at an average depth of 30×. The sequencing quality metrics are provided in Supplementary Table 1. Baseline information of the centenarians is shown in Table 1. The identified variants were annotated, and non-synonymous variants affecting gene function were selected for association analysis.
The experiment design is shown in Fig. 1. Association analysis is applied to WGS data to find important gene and pathways. All centenarian and controls were Eastern Asiatic Mongoloids ascertained to be of Chinese descent (Zhejiang Province, Southeast China). Correlative analysis involved mainly association analysis, SKAT, and Burden tests for rare variants. These methods are commonly employed for GWAS research, especially for case/control samples. They are identified by the difference in frequency of occurrence of variants between case and control samples. Variations associated with the phenotype of the disease, generally directed against common variants, are detected using this method, where the frequency of occurrence and the variation contribute to the phenotype of the disease. Annotations and literature surveys can be used for significantly related variants to further determine the effect of related genes and variants on gene function. Based on the whole-genome sequencing data, this analysis was performed on the variants detected with MAF > 1%. The control sample consisted of selected 1000G East Asian population data, and the total number of the control samples is 208 [20]. PCA analysis was conducted to evaluate the stratification of the case and control group (supplementary Fig 1). For rare variants with a low frequency of variation, analysis using the method Rare variants case/ control association test was performed.

Immune system-related pathway was significantly enriched
The significant genes were subjected to differential pathway enrichment analysis. Then, MutsigDB was used to enrich the KEGG and Reactome pathways. As can be seen in Table 3, the associated genes were significantly enriched in the pathways related to immune and inflammatory responses, such as those of interferons, antibodies, and immunity.

HLA subtypes are correlated with aging
Based on the whole-genome sequencing data obtained, HLA-typing was performed. Through the analysis of HLA type distribution, as presented in Tables 3 and 4 and Fig. 2, we found that the type II HLA genes had an important relationship with longevity. Among them, the HLA DRB1 *13:02, HLA DRB1 *14:01, and HLA DRB1 *16:02 were significantly associated with longevity.

Discussion
Researches on the genetic mechanisms of longevity have been conducted from many perspectives, including that on longevity-related genes, variants, and biological pathways [4,10]. With the advancements in the NGS technology and analysis algorithms, increasingly more longevity-related genetic features could be found and would be useful for the understanding of mechanisms of longevity and related diseases [21,22].
In our study, we used a small centenarian cohort to establish the association between the genetic variants and longevity. By SKAT analysis, rare variants were found that were related to the longevity phenotype. HLA-DRB1, HLA-DRB5, and PDE4 DIP have been reported to be associated with longevity [23][24][25]. Importantly, Fig. 1 Overall design of the study. WGS were applied to 19 centenarians. Association test was used to select candidate variants/genes. Function analysis was then applied HLA-DRB1 variants have been specifically reported to have been significantly enriched in a French centenarian study [26]. Further, through the analysis of HLA type distribution, we found several subtypes of HLA DRB1 which have a closer relationship with longevity. Other significantly related genes that we found in this study, such as PABPC3, BAGE2, PADI4, CHI3L2, MUC17, WARS, and SIRPB1, have never been reported before, and their functions deserve further study. Pathway enrichment was performed and showed an important association of the immune-related pathway and the aging process. Previous examinations revealed that immune and inflammatory responses are closely related to ARD (aging-related diseases) [27,28]. The significant differences in the gene enrichment of the related pathways suggest that a possible longevity mechanism may be associated with protective variants in genes that occur in the related pathways. Here, we have shown that genome-wide data can be further mined as compared with the findings of traditional SNP studies. For example, HLA-type analysis could also be associated with the phenotype, which appears to be a possibility for expanded mining of genome-wide data. Nonetheless, the relatively small sample size limited the power of our findings, and thus further validation by large cohort studies is required.

Conclusions
In conclusion, the findings of our study provide novel insights into aging mechanisms, suggesting the involvement of several genes, pathways, and specific HLA subtypes that are worth further investigations.

Sample preparation
A random number table was used to randomly collect whole-blood samples from the centenarian cohort in Zhejiang Province, China. Ten percent of centenarians, that is, 19 centenarians, were chosen. All of them were free of major age-related diseases, i.e., cardiovascular or

Data preprocessing
Paired reads were aligned to the hg19 reference genome using the BWA (V0.7.15-r1140)-mem command [29]. Then, they were sorted and indexed using SAMtools [30]. An in-house Python script was utilized to evaluate the various statistics collected, including mapping statistics, read quality, and panels capture efficiency.
For each sample, the SAMtools pileup function was employed to generate variant candidates among the corresponding sites. We excluded the SNP sites and lower depth sites (≤ eu) among the candidates and removed the reads with low base (< Q30) and mapping qualities (< 40).

SKAT analysis
The file with the data of the centenarian variants was subjected to SKAT. Functional annotation of the genomic variation of each sample was performed, distinguishing between rare variants and variants affecting the protein function. Next, the influence of the variants in the gene was scored. We used the aforementioned three methods to test candidate genome variants to identify potential rare variants associated with the phenotype. Literature and databases were searched to find how the associated variants affected the biological processes, and speculation on disease mechanisms was carried out.

Pathway enrichment analysis
The KEGG pathway enrichment was conducted using the DAVID Functional Annotation Bioinformatics Microarray Analysis.