Next-generation sequencing analysis of the molecular spectrum of thalassemia in Southern Jiangxi, China

Background Thalassemia is an extremely prevalent monogenic inherited blood disorder in southern China. It is important to comprehensively understand the molecular spectrum of thalassemia in an area with such a high prevalence of thalassemia before taking appropriate actions for the prevention and treatment of this disorder. Herein, we explored the clinical feasibility of using next-generation sequencing (NGS) for large-scale population screening to illustrate the prevalence and spectrum of thalassemia in Southern Jiangxi. Methods Blood samples collected from 136,312 residents of reproductive age in Southern Jiangxi were characterized for thalassemia by NGS. A retrospective analysis was then conducted on blood samples determined to be positive for thalassemia. Results In total, 19,827 (14.545%) subjects were diagnosed as thalassemia carriers, and the thalassemia prevalence rate significantly varied by geographical region (p < 0.001). A total of 40 α-thalassemia genotypes including 21 rare genotypes were identified, with -@-SEA/αα being the most prevalent genotype. 42 β-thalassemia genotypes including 27 rare genotypes were identified, with the most common mutation IVS II-654 C > T accounting for 35.257% of these β-thalassemia genotypes. Furthermore, 74 genotypes were identified among 608 individuals with combined α- and β-thalassemia. Notably, most individuals with rare thalassemia mutations had mildly abnormal hematologic parameters including microcytic hypochromia. Conclusions Our findings demonstrate the great heterogeneity and diverse spectrum of thalassemia in Southern Jiangxi, emphasizing the importance and necessity of persistent prevention and control of thalassemia in this region. Additionally, our findings further suggest that NGS can effectively identify rare mutations and reduce the misdiagnosis rate of thalassemia. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-023-00520-5.


Introduction
Thalassemia is a group of hereditary blood diseases caused by a defect in the globin gene which leads to reduced or even a complete absence of globin peptide chains used for forming hemoglobin, eventually resulting in clinical symptoms, such as chronic hemolysis and anemia [1].According to the types of globin involved, thalassemia can be classified into α-, β-, δ-, and γ-thalassemia [2].Thalassemia is mainly manifested as chronic progressive hemolytic anemia.The degree of anemia varies depending on the type and amount of hemoglobin synthesized.Thalassemia minor may manifest as mild anemia or present in a patient with no clinical symptoms, while thalassemia major often leads to severe anemia [1,3].Hemoglobinopathies are widely prevalent in Mediterranean coastal areas, Africa, the Middle East, Southeast Asia, and southern China [4,5] and pose significant public health problems and burdens on the communities in these areas.Thus, a comprehensive illustration of the prevalence and genotype distribution of this disease is indispensable for the prevention and control of thalassemia.Early diagnosis of thalassemia is conducive to timely prevention and treatment of severe thalassemia [6].Hitherto, thalassemia carrier screening and genetic counseling have been demonstrated to be the most effective solutions to reduce thalassemia major [7,8].
Previous studies have shown that thalassemia in China is mainly distributed in southern regions, with the highest thalassemia carrier rate occurring in Guangxi Province [9].Jiangxi Province consists of 11 cities with a total population of 45,188,600 spread over an area of 166,900 km 2 (http:// www.gztj.gov.cn), and it borders the province of Anhui to the north, Zhejiang to the northeast, Fujian to the east, Guangdong to the south, Hunan to the west, and Hubei to the northwest (Top left thumbnail of Fig. 1).While thalassemia has been reported to be highly prevalent in these neighboring provinces, including 16.450% in Guangdong [10], 10.780% in Hunan [11], and 6.800% in Fujian [12], Jiangxi has also been reported to have a total prevalence of 2.600% [13].Ganzhou city, also known as the Gannan region, the southernmost city in Jiangxi Province, is the main gathering place of the Hakka people, and it had been reported that the carrier rate of thalassemia in this city was as high as 9.490% [14].However, previous studies have had some limitations due to their small sample sizes or the use of nonrepresentative populations [13,14].The majority of the subjects enrolled were children or adults who visited hospitals for Fig. 1 Detection rate of thalassemia and its distribution in Southern Jiangxi, China the diagnosis of various diseases.At present, integration of reverse dot blot (RDB), gap-PCR, and fluorescence PCR melting curve is the most commonly used method in identifying thalassemic mutations [15,16].The major limitation of these methods is that they only identify common variations.Additionally, the thalassemia in the Gannan region population has not yet been investigated using a large-scale and comprehensive epidemiological survey [14].Therefore, the spectrum of thalassemic variations in this region has not been comprehensively explored.Taken together, it is reasonable to assume that many types of mutations in thalassemia may have been overlooked in previous studies.
Next-generation sequencing (NGS) enables rapid and high-throughput detection of genetic variants [17].Recently, technologies such as whole genome sequencing (WGS), exome sequencing, or targeted enrichment panel sequencing have been widely applied in the molecular diagnosis of various genetic disorders [18,19].Herein, we used NGS for the first time to analyze thalassemia distribution in 136,312 subjects of reproductive age enrolled from April 2019 to April 2021 in the Gannan region, which provides a theoretical basis for the screening, prevention, and treatment of thalassemia in other regions.Moreover, our findings demonstrated that NGS could be effectively used to identify rare mutations undetectable using traditional testing methods, potentially further reducing the misdiagnosis rate of thalassemia.

Participants
This study was approved by the Ethics Committee of the First Affiliated Hospital of Gannan Medical University.During the period between April 2019 and April 2021, a total of 75,955 couples, with at least one being a resident of the Southern Jiangxi region (Fig. 1), participated in the "Implementation Plan for the Free Gene Detection of Thalassemia in Ganzhou City (2019-2022)".A total of 136,312 subjects of Ganzhou origin were screened from these couples, including those of Han nationality (99.244%) or She and Hui nationalities (0.756%).The age distribution ranged from 18 to 54 years, with an average age of 26 years, and all individuals provided written informed consent prior to study enrollment.This study was also approved by the Ganzhou Municipal Health Commission and was conducted in accordance with the ethical guidelines for research on human subjects.

Genomic DNA extraction
Genomic DNA was extracted from 200 μL whole blood samples using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany).DNA extracts were then arrayed in 96-well plates, and concentration were quantified using a Qubit 3.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA).We required all samples to have a DNA concentration > 10 ng/mL and an A260/A280 ratio between 1.8 and 2.0 for downstream use.

Thalassemia detection
A combined strategy of Gap-PCR and NGS was applied to detect thalassemia [11].In brief, seven deletions were analyzed by Gap-PCR.Other mutations in globin genes were analyzed by NGS.Firstly, the full-length HBA1, HBA2, and HBB genes were amplified by PCR, resulting in the amplicons that spanned all the exons and introns of these three genes.Sequencing libraries were then constructed according to the MGISEQ-2000 sequencing library preparation protocol.Paired-end (100 bp) -sequencing on the MGISEQ-2000 sequencer was used for generating sequencing data [19].

Hematological analysis
First, 2 mL of peripheral venous blood samples from 30,995 subjects were collected using ethylene diamine tetraacetic acid K2 (EDTA-K 2 ) anticoagulated tubes.Red cell indices were then determined using a SYSMEX XN1000 automatic blood cell analyzer (Kobe, Japan).Subjects with a low red blood cell (RBC) mean corpuscular volume (MCV) < 82 fl and/or mean corpuscular hemoglobin (MCH) < 27 pg were considered positive for thalassemia.Subjects with an MCV ≥ 82 fl and an MCH ≥ 27 pg were considered negative.

Thalassemia genotype definition
Common α-thalassemia and β-thalassemia mutations were found to be prevalent in southern Chinese populations.Moreover, these mutations can often be identified using routine laboratory testing [20].HBA and HBB genotype categories were defined in Additional file 1: Table S1.

Data analysis and statistics
Statistical analysis was conducted using SPSS 23.0 software.A Chi-square (χ 2 ) test was used to evaluate the differences in detection rates (α-thalassemia, β-thalassemia, and combined α-/β-thalassemia) between different regions and genders.P < 0.05 were considered significant.

MCV and MCH values in positive samples
In this study, the relationship between the genotypes of α/β globin mutations and the characteristics of thalassemia MCV or MCH levels was also analyzed (Fig. 3).The level of MCV in most cases was lower than 82 fL, except for the genotypes Hb Phnom Penh/αα, α WS α/αα, -α 3.7 / αα, Hb Phnom Penh/-α 3.7 , 5'UTR + 43 to + 40 (− AAAC)/ β N , IVS II-761 A > G/β N , CAP + 8 (C > T)/β N , IVS-II-848 C > T/β N , − 50 G > A/β N and -α 3.7 /αα + CAP + 8 (C > T)/ β N .However, the MCV values in individuals (1.5668%) with rare mutations were less than 82 fL, which emphasizes the importance of rare mutations in thalassemia carriers.It is worth mentioning that the results on MCH levels were highly consistent with those of MCV values.Moreover, most thalassemia carriers presented with abnormal MCV and MCH index values (MCV < 82 fL and/or MCH < 27 pg), while a few had normal values including subjects with the genotypes IVS-II-848 C > T/ β N , IVS II-761 A > G/β N , and CAP + 8 (C > T)/β N .These results indicated that thalassemia carriers with these genotypes, especially those containing rare mutations, would be missed using routine hematological screening methods.

Discussion
Thalassemia is a common genetic disease causing significant public health problems and social burdens in endemic areas [21,22].In recent years, the incidence of thalassemia has gradually decreased with the improvement and widespread popularization of genetic counseling and prenatal diagnosis (PND) technologies [23,24].However, a high prevalence of thalassemia has still been reported in southern China due to the lack of PND and genetic counseling [9,18].The overall prevalence of α-thalassemia, β-thalassemia, and α + β-thalassemia in this study was 7.880%, 2.210%, and 0.480%, respectively [25].Ganzhou is the southernmost city of Jiangxi Province, central China, and it is adjacent to Guangdong   Province, which had one of the highest incidence rates of thalassemia in China [26].Therefore, investigating the genotype and distribution of thalassemia in Ganzhou city is of great significance for providing a theoretical basis for PND and genetic counseling.In this study, NGS was applied for large-scale population screening to assess the frequency of thalassemia carriers among people in the Gannan region.The results demonstrated the great heterogeneity and widespread spectrum of thalassemia in the Gannan population.The overall frequency of thalassemia was 14.545%, which was significantly higher than that (10.570%) nationwide [25].Furthermore, the incidence of α-thalassemia (10.489%) was significantly higher than that of β-thalassemia (3.610%) in the Gannan region (p < 0.05), which was in accordance with previous studies [14].These results indicate that thalassemia is a serious public health problem in the Gannan region.It is interesting to note that the prevalence rate of thalassemia decreased from the south to the north in this province.The region with the highest prevalence was Dingnan (18.317%), followed by Xunwu (17.723%).One reason for this trend may have been that Dingnan and Xunwu are situated in southeastern Jiangxi Province at the junction of Fujian, Guangdong, and Jiangxi Provinces.The vast majority of the residents in Dingnan are Hakka people, who have been previously reported to have a high prevalence of thalassemia [27,28].More importantly, with the application of NGS to a large population, our data more accurately reflect the prevalence of thalassemia and the distribution of rare thalassemia genotypes in the Gannan region.
The detection rate (10.489%) for α-thalassemia in this study was significantly higher than that previously reported (7.190%) in the Gannan region or (2.600%) in Jiangxi [13,14].We attribute the differences to different genetic screening methods, and sample sizes between these two studies.In addition, we identified 40 distinct α-thalassemia genotypes with 21 different variations, in which, αα/-@-SEA was the most common subtype, with a remarkable proportion of 54.105%, followed by -3.7 / αα (28.011%) and -α 4.2 /αα (8.687%), which was consistent with previous reports [14,29].Apart from these *Indicates genotypes with rare mutations common variation types, other variations with rare or novel mutations were also identified.Hb Phnom Penh, a rare variant caused by the insertion of an ATC (for isoleucine) between codons 117 and 118, was identified as a hotspot for nucleotide insertions within exon 3 of the α1-globin gene.It was first reported in the Cambodian population [30] but has been rarely reported in the mainland of China or Taiwan province [31,32].-@-THAI (NC_000016.9:g.199800_233300del), which has been reported in southern China except for Jiangxi Province was also detected in this study [33][34][35].Furthermore, we also detected other rare genotypes that have not been reported in Jiangxi Province, including CD 30 -GAG [-Glu], Init CD ATG > A-G, and α fusion .These novel findings greatly enrich the database of known thalassemia alleles in the Gannan region.A total of 35 β-thalassemia variations with 42 genotypes that have not been reported in our previous study using RDB gene chips were identified in this cohort, our results suggested that NGS was preferable to RDB gene chip for the screening of rare variants [14].The prevalence of β-thalassemia (3.36%) in this study was much higher than the reported average of 2.21% in China [25].In addition to conventional β-thalassemia mutants, rare deletion variants, including Chinese G γ + ( A γδβ) 0 , SEA-HPFH, and Taiwanese deletion, were also detected.Regarding β-thalassemia genotypes, IVS II-654 C > T/ β N and CD 41/42 (− CTTT)/β N were the two most frequently detected β-thalassemia subtypes, accounting for 35.257% and 28.368%, respectively.The ranking order of the two major mutations also was IVS II-654 C > T and CD 41/42 (− CTTT), which agreed with our previous observations [14].It was interesting to note that these results were identical to those of the Hakka population in Meizhou, Guangdong Province [27,29], and these results implied that the prevalence of β-thalassemia and its genotype distribution were geographically associated.In addition to the higher detection rate, our study also detected some rare β-thalassemia mutations that had not been reported previously, such as − 50 G > A and 5'UTR + 43 to + 40 (− AAAC), which accounted for 11.339% of all β-thalassemia genotypes.).Undoubtedly, the application of conventional thalassemia genetic testing methods will not be able to accurately determine the genotypes of these populations.
With the development of NGS techniques in recent years, NGS has emerged as a powerful and cheaper tool for prenatal screening [18,36].To date, several studies have applied NGS for the study of thalassemia and have made great progress [19,37].In our study, high throughput thalassemia screening was conducted at $10 per sample.A total of 56 thalassemia mutations were identified, including 48 rare mutations.Traditional detection methods, such as RDB and Gap-PCR, can only detect 23 mutations [20], and therefore miss the remaining 33 mutations.In other words, 4.010% (795/19,827) of the population will be missed or misdiagnosed using traditional screening methods.Traditionally, RBC analysis combined with hemoglobin electrophoresis and clinical manifestation description is commonly used for preliminary screening of thalassemia.Then PCR or genome sequencing is used to confirm positive cases before diagnosis [1].Limited by the low sensitivity of hematological analysis and the disadvantages of PCR, a large number of novel or rare thalassemia variations would be missed or misdiagnosed using traditional screening methods.To fill this gap, our findings suggest that NGS can effectively identify new mutations and reduce the rate of misdiagnosis.
Recently, third-generation sequencing (TGS) has been emerging as a fancy method to identify thalassemia mutations in GC-rich and high homology sequences, as well as complex structural mutations.However, it is expensive and time-consuming, which limits its availability in all diagnostic laboratories.In this regard, NGS-based thalassemia screening can benefit a large population with acceptably high accuracy and relatively affordable cost.
In summary, our study was the first to apply NGS to comprehensively analyze thalassemia in a large population of the Gannan region, Jiangxi Province.We demonstrated a high genetic diversity and a high prevalence of thalassemia in this region, which will be of great significance for the prevention and control of thalassemia in Gannan and other high-prevalence areas.More importantly, the identification of rare and novel variations highlighted the necessity and significance of choosing NGS for thalassemia screening in large populations.
• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

•
At BMC, research is always in progress.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ?Choose BMC and benefit from: ? Choose BMC and benefit from:

Table 1
Prevalence rate of α-thalassemia genotypes, phenotypes, and constituent ratios in Gannan populations

Table 2
Prevalence rate of β-thalassemia genotypes, phenotypes and constituent ratios in Gannan populations *Indicates genotypes with rare mutations