Skip to main content

Whole-exome sequencing of BRCA-negative breast cancer patients and case–control analyses identify variants associated with breast cancer susceptibility

A Correction to this article was published on 19 December 2022

This article has been updated



For the majority of individuals with early-onset or familial breast cancer referred for genetic testing, the genetic basis of their familial breast cancer remains unexplained. To identify novel germline variants associated with breast cancer predisposition, whole-exome sequencing (WES) was performed.


WES on 290 BRCA1/BRCA2-negative Singaporeans with early-onset breast cancer and/or a family history of breast cancer was done. Case–control analysis against the East-Asian subpopulation (EAS) from the Genome Aggregation Database (gnomAD) identified variants enriched in cases, which were further selected by occurrence in cancer gene databases. Variants were further evaluated in repeated case–control analyses using a second case cohort from the database of Genotypes and Phenotypes (dbGaP) comprising 466 early-onset breast cancer patients from the United States, and a Singapore SG10K_Health control cohort.


Forty-nine breast cancer-associated germline pathogenic variants in 37 genes were identified in Singapore cases versus gnomAD (EAS). Compared against SG10K_Health controls, 13 of 49 variants remain significantly enriched (False Discovery Rate (FDR)-adjusted p < 0.05). Comparing these 49 variants in dbGaP cases against gnomAD (EAS) and SG10K_Health controls revealed 23 concordant variants that were significantly enriched (FDR-adjusted p < 0.05). Fourteen variants were consistently enriched in breast cancer cases across all comparisons (FDR-adjusted p < 0.05). Seven variants in GPRIN2, NRG1, MYO5A, CLIP1, CUX1, GNAS and MGA were confirmed by Sanger sequencing.


In conclusion, we have identified pathogenic variants in genes associated with breast cancer predisposition. Importantly, many of these variants were significant in a second case cohort from dbGaP, suggesting that the strategy of using case–control analysis to select variants could potentially be utilized for identifying variants associated with cancer susceptibility.


Breast cancer (BC) is the most common malignancy and the leading cause of cancer-associated mortality among women worldwide [1]. It accounts for one in four cancer cases among women and one in six cancer deaths, ranking first in the vast majority of countries for incidence [1]. Approximately, 10–20% of all BC patients have a family history of cancer with multiple family members affected across generations [2]. Germline mutations in specific genes such as BRCA1, BRCA2, CDH1, PALB2, PTEN and TP53 confer an increased risk of developing BC [3].

Recent advances in next-generation sequencing have led to reduced costs for multigene panel testing of cancer predisposition genes for individuals referred for genetic testing, resulting in a higher uptake of testing. However, it is estimated that pathogenic variants in known cancer predisposition genes only account for around 25% of hereditary BC cases [4, 5].

Whole-exome sequencing (WES) is revolutionizing our ability to identify novel genetic variants associated with cancer predisposition. To date, multiple candidate BC predisposition genes have been identified by WES, predominantly from studies on women of European ancestry [6, 7].

Here, we aimed to identify novel candidate BC predisposition genes and variants by performing WES on germline DNA from Asian BC patients referred for cancer genetic risk assessment but who were BRCA1/2-negative. Pathogenic variants identified from WES were filtered and prioritized using in silico bioinformatic tools, followed by case–control analysis and only significant variants in known cancer genes were selected for further analysis. Notably, we have identified pathogenic variants in our cases that had a statistically significant difference in frequency as compared to the Genome Aggregation Database (gnomAD) East-Asian (EAS) controls and Singaporean controls [8].


Demographics and clinical information on the study population

Information on the demographics, age at diagnosis, ethnicity, family history, and clinicopathological characteristics of the 290 BC cases are provided in Table 1. The study population consisted of only females, and a large proportion were Chinese (69.3%). The age of first cancer diagnosis ranged from 19 to 75 years, with a mean and median age of 37.5 and 37 years, respectively. Of 290 patients, 65 (22.4%) presented with a family history (including first-degree, second-degree, and third-degree relatives) of BC, 23 (7.9%) with a family history of other cancers and 218 (75.2%) with no family history of breast or any other cancers (Table 1, Additional file 1: Fig. S1). Of the 290 BC cases, 225 patients (77.6%) had early-onset breast cancer (≤ 40 years).

Table 1 Demographics, clinical characteristics, and family history of patients

Filtering of candidate variants

Whole exome sequencing of 290 BC patients revealed 1,196,466 variants before filtering. Among these, 1,101,796 (92.1%) passed Dynamic Read Analysis for GENomics (DRAGEN) quality-control checks. Further filtering to retain functional variants with gnomAD (EAS) minor allele frequency (MAF) less than 1%, predicted pathogenic variants with scaled Combined Annotation-Dependent Depletion (CADD) score greater than 20, and variants in the known or predicted cancer gene lists in the Network of Cancer Genes (NCG) database, left only 2,496 variants (0.2% of the total; Fig. 1).

Fig. 1
figure 1

Study design for the selection of variants and genes. aList of known or candidate cancer genes in the Network of Cancer Genes [9]. bThe Cancer Gene Census list of the Catalogue of Somatic Mutations in Cancer (COSMIC) [10]. cList of cancer driver genes from Bailey et al. [38]. dList of cancer driver genes inferred with nucleotide context from Dietlein et al. [11]

The genes of our shortlisted variants were further prioritized using cancer genes databases such as Catalogue of Somatic Mutations in Cancer (COSMIC), cancer driver genes based on nucleotide context, and computationally discovered and experimentally validated cancer driver genes [9] (Additional file 4: Table S1). Finally, we shortlist only variants that were present in three or more patients. All variants were checked with IGV (Additional files 1, 2: Figs. S1, S2).

Identification of pathogenic germline variants

In total, we discovered 49 prioritized variants in 37 prioritized genes across 134 patients (Fig. 2; Additional file 4: Table S2). Most of these variants are nonsynonymous single nucleotide variants (SNVs) (42 variants, or 85.7%), with one frameshift insertion (2.0%), three frameshift deletions (6.1%), and three stop-gains (6.1%). Frameshift insertions, deletions, and stop-gains were prioritized regardless of their CADD score.

Fig. 2
figure 2

Oncoplot of variants in prioritized candidate genes, showing the type and frequency of each variant. Rows represent genes and each column represents one case. Rows (bottom) show the age at diagnosis (diag), family history (FH) status for breast cancer (BC) and ovarian cancer (OC) and ethnicity for each case

All 42 nonsynonymous SNVs had CADD scores greater than 20. The remaining 7 variants which were not nonsynonymous SNVs also had CADD scores greater than 20, except for a frameshift deletion variant in HLA-A. Thirty variants were classified as variants of uncertain significance (VUS) (61.2%), two stop-gain mutations in KMT2C were considered pathogenic (4.1%), and the remaining variants were benign (14 variants, or 28.6%) or likely benign (3 variants, or 6.1%) (Table 2).

Table 2 Predicted pathogenicity and classifications from databases for 49 selected variants in 37 genes

Case–control analysis of the Singapore cases

Case–control analysis was performed for 49 selected variants for our Singaporean cases against the gnomAD (EAS) and SG10K_Health control cohorts (Table 3). Apart from the two variants in BRD7 and NBEA that were not reported in gnomAD (EAS), all of our remaining 47 variants were significantly enriched in our cohort as compared to gnomAD (EAS). In the SG10K_Health control cohort, seven of our 49 selected variants were absent, including the aforementioned variants in BRD7 and NBEA; and additional variants in KMT2C, GPRIN2, H3F3A, and MAF. Of the remaining 42 variants which could be found in SG10K_Health, 13 were significantly enriched at α = 0.05 in our cohort versus SG10K_Health (Table 3).

Table 3 Allele frequencies and case–control association analysis of 49 variants in 37 selected candidate genes

Case–control analysis using a breast cancer case cohort from dbGaP

Case–control analysis for the 49 germline variants identified from our Singapore breast cancer cohort was repeated using a case cohort from dbGaP (phs000822.v1.p1) against the same control cohorts (Table 3). Only 34 of our 49 variants were found in phs000822.v1.p1. Of these 34 variants, 26 were significantly enriched in phs000822.v1.p1 when compared against gnomAD (EAS) while eight did not reach statistical significance. Next, comparison of the 34 variants with SG10K_Health found 26 significantly enriched in phs000822.v1.p1, four unreported in SG10K_Health, and another four did not reach significance. These two sets of comparison were generally concordant, as 23 of the 26 significantly enriched phs000822.v1.p1 versus gnomAD (EAS) were also significantly enriched in comparison against SG10K_Health (Table 3). Altogether, 14 variants were significantly enriched in cases, or missing in the control cohorts, across all four sets of case–control comparisons. These variants were found in 89 out of 290 breast cancer patients (30.7%) where 24 of the 89 cases had more than one pathogenic variant (Additional file 4: Table S3).

Variant validation by Sanger sequencing

Four of 14 significant variants were excluded from Sanger sequencing validation as these variants lie in highly repetitive regions (KMT2C, MUC4, and MAF) or highly polymorphic regions (HLA-DRB1). Seven of the remaining 10 variants, including GPRIN2 c.983G, NRG1 c.G172A, MYO5A c.A3960T, CLIP1 c.C80T, CUX1 c.C3317T, GNAS c.A266G and MGA c.C1883A, were confirmed by Sanger sequencing. However, variants in TPTE2, NBEA, and BRD7 could not be validated by Sanger sequencing, suggesting that these variants were likely false positives (Fig. 3).

Fig. 3
figure 3

Sanger sequencing validation of variants identified by whole-exome sequencing. Representative sequencing chromatograms showing the different variants found in our breast cancer patients and of an unaffected control. A Seven variants were confirmed by Sanger sequencing. B Three variants failed to be validated by Sanger sequencing. Arrows indicate the position of the variant


Here, we report the largest WES study on germline DNA from Asian breast cancer patients who had undergone cancer risk assessment and were BRCA1 and BRCA2 mutation-negative. The approach that was taken was to select only pathogenic variants that showed a statistically significant difference against gnomAD East-Asian controls and Singapore controls. This was followed by an additional prioritization step of selecting only variants occurring in well documented cancer genes such as those listed in COSMIC, NCG and cancer driver gene databases [9,10,11].

In total, we have identified 49 rare pathogenic germline variants in 37 genes which were significantly enriched in breast cancer patients. These were all predicted to be pathogenic using in silico tools and all had a minor allele frequency of less than 1% or were unreported in gnomAD (EAS). We further validated these results with an independent United States-based case cohort obtained from dbGaP, of 466 early-onset breast cancer patients. Across four sets of comparisons involving two case and two control cohorts, 14 variants were consistently enriched in breast cancer cases (Table 3).

Of these 14 variants, seven variants in GPRIN2, NRG1, MYO5A, CLIP1, CUX1, GNAS, and MGA were confirmed by Sanger sequencing. To the best of our knowledge, these specific germline variants identified here have not been reported in any cancer-related studies thus far. However, their respective gene functions have been implicated in many cancer types [12,13,14,15,16,17]. The NRG1 nonsynonymous SNV (rs113317778) lies in an immunoglobulin-like domain, while other affected residues in GPRIN2 (rs4445576), CUX1 (rs782176246), GNAS (rs563844600), and MGA (rs61736074) are located within a protein disordered region, where it lacks a stable tertiary structure and adopts different structural conformations [18,19,20]. Interestingly, a computational study has predicted the mutation in GPRIN2 (p.S328C) to generate new microstructural elements in the disordered region and may disrupt protein functions or protein–protein interactions [20]. Other exome sequencing studies have also identified a damaging germline mutation in GPRIN2 (p.A233S) in Iranian patients with familial esophageal squamous cell carcinoma (ESCC) [21] as well as somatic mutations in melanoma samples [22].

Additionally, a frameshift deletion variant in TPTE2 (c.483delT) and two nonsynonymous SNVs in NBEA (c.C2317A) and BRD7 (c.A44C) could not be confirmed by Sanger sequencing. NBEA has segmental duplications on chr15, while BRD7 is mapped to segmentally duplicated regions on chr3 and chr6. Furthermore, the TPTE2 variant is within a short 8-nucleotides homopolymer, and it has two segmental duplications on chrY and chr21 [23]. Due to high sequence similarities, sequenced reads which arise from segmental duplications may be wrongly aligned and result in false-positive variant calls.

Seven nonsynonymous SNVs in RNF43, HLA-B, ERBB3, NTRK1, TET2, and DCC identified here, have previously been implicated in various cancer types Additional file 4: Table S4. For example, the HLA-B c.A161G variant, which was detected in 9 patients (3.1%) here, was also found to be associated with high-grade cervical preinvasive lesions and invasive cervical cancer in a recent genome-wide association study [24]. A different study reported that the ERBB3 c.A3355T variant was significantly associated with poor survival in ER-positive cases [25]. Nonetheless, none of these variants were significantly enriched in our case–control analyses.

Of our 49 variants, 4.1% (2/49) were classified as pathogenic and 61.2% (30/49) as VUS by InterVar, respectively. This high VUS rate is consistent with our previous study and that of others on Asian populations [26, 27]. In a large US study on germline genetic testing, Asian patients had approximately two-fold more VUS compared to non-Hispanic White patients, at a VUS rate above 40% [27]. These substantially higher VUS rates in Asians may reflect the underlying lack of variant data from Asian control populations available for variant reclassification.

Besides the variants identified in this current study, WES has been performed to detect candidate variants in BRCA-negative patients from other populations. In a study on 7 families from France, Italy, Netherlands, Australia and Spain, investigators found 12 variants in genes involved in DNA repair, cell proliferation and survival, or cell cycle regulation [28]. Sequencing of 52 individuals from 17 Greek families with HBOC and further validation in additional cohorts from Canada, TCGA and the UK Biobank, led to the prioritization of missense variants in the SETBP1 and c7orf34 genes [29]. In another European study, 54 BRCA-negative families from Belgium underwent WES and 44% harbored variants in known cancer predisposition genes. In particular, it was observed that nonsense variants in cancer-associated genes involved in DNA repair were enriched in breast cancer patients as compared to controls [30]. From 113 families from Tunisia, eight BRCA-negative unrelated patients were selected for WES. Of 24 genes that were prioritized from WES data, five were selected based on their significant association with survival, as determined from analysis using TCGA data [31]. Notably, the strategies for the prioritization and filtering of genes/variants differ between studies with differing variants identified. It is possible that these variants could be population-specific or low penetrance variants.

Our study has limitations. We had used an independent breast cancer cohort of US patients with early-onset breast cancer [35 years or younger] from dbGaP to validate the frequency of the 49 variants discovered in our cohort that were found to be associated with breast cancer. However, 17 of the 49 variants were not present in this dbGaP case cohort, possibly due to differences in genetic ancestry between the populations. Hence, further studies in additional Asian as well as European populations are necessary to validate the variants described in this current study. Secondly, DNA samples from family members of our cases were not available for segregation analysis. Thirdly, due to limited access to the SG10K_Health cohort, we had used the gnomAD (EAS) population for variant filtering. The gnomAD (EAS) cohort is comprised of individuals of Korean, Japanese and Chinese descent, whereas our study population were South-East Asians, mainly of Chinese, Malay and Indian ethnicity. Nonetheless, the gnomAD (EAS) was the most suitable publicly available control population available, and thus was selected.


In summary, the current study has identified 49 pathogenic variants in 37 genes associated with breast cancer predisposition, many of which have not been previously documented. Our study provides new insights into the genetic susceptibility to BC, and it is imperative that further studies in additional populations of diverse ethnic background be undertaken to determine the frequency of these variants, and to confirm their association with BC risk.

Materials and methods

Study participants

Two hundred and ninety breast cancer patients who fulfilled one or more of the following criteria were selected for WES: 1. having a family history of breast cancer in first- and/or second-degree relatives; 2. having bilateral breast cancer; and, 3. having early-onset breast cancer at the age of 40 years or below (Additional file 1: Fig. S1) [26]. Written informed consent was obtained from all participants and the study was approved by the SingHealth Centralised Institutional Review Board (CIRB Ref: 2018/2147).

Whole-exome sequencing

Genomic DNA was isolated from peripheral blood samples, collected from breast cancer patients as described previously [32, 33]. Samples for sequencing and libraries were prepared according to Agilent SureSelect Human All Exon V6 kit (Agilent Technologies, CA, USA) and the library preparation and enrichment were carried out according to Agilent SureSelect protocols. Enriched samples with paired-end sequencing (2X150 bp) were performed on the Illumina NovaSeq 6000 platform. Variants were aligned and called with Illumina DRAGEN version 3.5.7 on the BaseSpace Sequence Hub cloud platform [34], with median 80 × coverage per base.

Prioritization and filtering of variants

The variants were annotated for their transcript effects, CADD v1.3 scaled score [35], and gnomAD minor allele frequencies using ANNOVAR [36]. CADD v1.3 indel scores were filled in manually using the CADD web server. The American College of Medical Genetics and the Association of Molecular Pathology (ACMG-AMP) classifications were obtained using InterVar [37]. We removed variants which did not pass DRAGEN’s default quality control checks, variants with gnomAD (EAS) MAF greater than 1%, and variants found in only two or fewer patients. Frameshift indels, stop-gains; and nonsynonymous SNVs with scaled CADD v1.3 score greater than 20 were chosen for further analysis. A CADD score of 20 and above represents the top 1% of pathogenic variants as scored by CADD.

Prioritization of candidate genes

From the genes of our prioritized variants, we selected only known or candidate cancer genes as listed by the NCG [9]. These genes were then further curated for those that were strongly implicated in cancer in at least one other cancer gene database: the COSMIC database [10], cancer driver genes based on nucleotide context [11], and computationally discovered and experimentally validated cancer driver genes [38] (Additional file 4: Table S1).

Manual checking with IGV

All prioritized variants were manually checked with Integrative Genomics Viewer (IGV) [39], except those in highly repetitive regions in MUC4 or KMT2C, or highly polymorphic genes HLA-A or HLA-DRB1, as their alignments were too complex (Additional file 2: Fig. S2). Variants suspected to be false positives were excluded (Additional file 3: Fig. S3).

Case–control analysis

Case–control analysis for the variants was performed for two breast cancer cohorts (cases described in this study and the phs000822.v1.p1 dataset from dbGaP) and two control cohorts (gnomAD (EAS) and SG10K_Health). The dataset from dbGaP is a breast cancer dataset of 466 patients with early-onset breast cancer (diagnosed on or before the age of 35) from the United States of America. The gnomAD (EAS) cohort (gnomAD v2.1.1) comprises 9,977 individuals of East Asian descent while the SG10K_Health cohort consists of whole genomes from 9,770 healthy Chinese, Indian, and Malay volunteers from Singapore [8].

Polymerase chain reaction and Sanger sequencing

Variants that were significant by case–control analysis were validated by polymerase chain reaction (PCR) and Sanger sequencing. PCR primer sets were designed using Primer-BLAST [40]. DNA amplification by PCR was performed using HotStartTaq (Qiagen, Venlo, Netherlands) or Q5 High-Fidelity (New England Biolabs, Ipswich, MA, USA) DNA polymerase, as described in the manufacturer’s protocol. Primer sequences and their respective cycling conditions are listed in Additional file 4: Table S5. The PCR products were then analyzed by 2% agarose gel electrophoresis and purified with ExoSAP-IT Express (Thermo Scientific, USA) prior to sequencing. Cycle sequencing reactions were performed using BigDye Terminator v3.1 kit (Applied Biosystems, Foster City, CA) and the sequencing products were analyzed on a Genetic Analyzer. DNA sequences were visualized and aligned using Geneious Prime version 2022.1.

Statistical analysis

For case–control analyses, a two-sided Fisher’s exact test was used and p values were adjusted for multiple testing using the Benjamini–Hochberg method [41].

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Change history



American College of Medical Genetics and Association for Molecular Pathology


Breast cancer


Combined Annotation-Dependent Depletion


Catalogue of Somatic Mutations in Cancer


Database of Genotypes and Phenotypes


Dynamic Read Analysis for GENomics




Esophageal squamous cell carcinoma


False discovery rate


Genome Aggregation Database


Integrative Genomics Viewer


Minor allele frequency


Network of Cancer Genes


Polymerase chain reaction


Single nucleotide variant


Variants of uncertain significance


Whole-exome sequencing


  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.

    Article  Google Scholar 

  2. Rahman N, Stratton MR. The genetics of breast cancer susceptibility. Annu Rev Genet. 1998;32:95–121.

    Article  CAS  Google Scholar 

  3. Daly MB, Pilarski R, Yurgelun MB, Berry MP, Buys SS, Dickson P, et al. NCCN guidelines insights: genetic/familial high-risk assessment: breast, ovarian, and pancreatic, Version 1.2020. J Natl Compr Cancer Netw. 2020;18(4):380–91.

    Article  Google Scholar 

  4. Melchor L, Benítez J. The complex genetic landscape of familial breast cancer. Hum Genet. 2013;132(8):845–63.

    Article  CAS  Google Scholar 

  5. Nielsen FC, van Overeem HT, Sørensen CS. Hereditary breast and ovarian cancer: new genes in confined pathways. Nat Rev Cancer. 2016;16(9):599–612.

    Article  CAS  Google Scholar 

  6. Kiiski JI, Pelttari LM, Khan S, Freysteinsdottir ES, Reynisdottir I, Hart SN, et al. Exome sequencing identifies FANCM as a susceptibility gene for triple-negative breast cancer. Proc Natl Acad Sci. 2014;111(42):15172–7.

    Article  CAS  Google Scholar 

  7. Chandler MR, Bilgili EP, Merner ND. A review of whole-exome sequencing efforts toward hereditary breast cancer susceptibility gene discovery. Hum Mutat. 2016;37(9):835–46.

    Article  Google Scholar 

  8. Precision Health Research, Singapore SG10K. Precision Health Research, Singapore (PRECISE) [Internet]. [cited 2021 Jun 24]. Available from:

  9. Repana D, Nulsen J, Dressler L, Bortolomeazzi M, Venkata SK, Tourna A, et al. The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol. 2019;20(1):1.

    Article  Google Scholar 

  10. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91(2):355–8.

    Article  CAS  Google Scholar 

  11. Dietlein F, Weghorn D, Taylor-Weiner A, Richters A, Reardon B, Liu D, et al. Identification of cancer driver genes based on nucleotide context. Nat Genet. 2020;52(2):208–18.

    Article  CAS  Google Scholar 

  12. Jones MR, Williamson LM, Topham JT, Lee MKC, Goytain A, Ho J, et al. NRG1 gene fusions are recurrent, clinically actionable gene rearrangements in KRAS wild-type pancreatic ductal adenocarcinoma. Clin Cancer Res. 2019;25(15):4674–81.

    Article  CAS  Google Scholar 

  13. Sato N, Fujishima F, Nakamura Y, Aoyama Y, Onodera Y, Ozawa Y, et al. Myosin 5a regulates tumor migration and epithelial-mesenchymal transition in esophageal squamous cell carcinoma: utility as a prognostic factor. Hum Pathol. 2018;80:113–22.

    Article  CAS  Google Scholar 

  14. Izumi H, Matsumoto S, Liu J, Tanaka K, Mori S, Hayashi K, et al. The CLIP1-LTK fusion is an oncogenic driver in non-small-cell lung cancer. Nature. 2021;600(7888):319–23.

    Article  CAS  Google Scholar 

  15. Ramdzan ZM, Vickridge E, Faraco CCF, Nepveu A. CUT domain proteins in DNA repair and cancer. Cancers (Basel). 2021;13(12):552.

    Article  Google Scholar 

  16. Jin X, Zhu L, Cui Z, Tang J, Xie M, Ren G. Elevated expression of GNAS promotes breast cancer cell proliferation and migration via the PI3K/AKT/Snail1/E-cadherin axis. Clin Transl Oncol. 2019;21(9):1207–19.

    Article  CAS  Google Scholar 

  17. Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511(7511):543–50.

    Article  Google Scholar 

  18. UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(D1):D480–9.

    Article  Google Scholar 

  19. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, et al. Classification of intrinsically disordered regions and proteins. Chem Rev. 2014;114(13):6589–631.

    Article  Google Scholar 

  20. Li C, Clark LVT, Zhang R, Porebski BT, McCoey JM, Borg NA, et al. Structural capacitance in protein evolution and human diseases. J Mol Biol. 2018;430(18):3200–17.

    Article  CAS  Google Scholar 

  21. Khalilipour N, Baranova A, Jebelli A, Heravi-Moussavi A, Bruskin S, Abbaszadegan MR. Familial esophageal squamous cell carcinoma with damaging rare/germline mutations in KCNJ12/KCNJ18 and GPRIN2 genes. Cancer Genet. 2018;221:46–52.

    Article  CAS  Google Scholar 

  22. Wei X, Walia V, Lin JC, Teer JK, Prickett TD, Gartner J, et al. Exome sequencing identifies GRIN2A as frequently mutated in melanoma. Nat Genet. 2011;43(5):442–6.

    Article  CAS  Google Scholar 

  23. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006.

    Article  CAS  Google Scholar 

  24. Bowden SJ, Bodinier B, Kalliala I, Zuber V, Vuckovic D, Doulgeraki T, et al. Genetic variation in cervical preinvasive and invasive disease: a genome-wide association study. Lancet Oncol. 2021;22(4):548–57.

    Article  CAS  Google Scholar 

  25. Varadi V, Bevier M, Grzybowska E, Johansson R, Enquist-Olsson K, Henriksson R, et al. Genetic variation in ALCAM and other chromosomal instability genes in breast cancer survival. Breast Cancer Res Treat. 2012;131(1):311–9.

    Article  CAS  Google Scholar 

  26. Wong ESY, Shekar S, Met-Domestici M, Chan C, Sze M, Yap YS, et al. Inherited breast cancer predisposition in Asians: multigene panel testing outcomes from Singapore. NPJ Genomic Med. 2016;1:15003.

    Article  CAS  Google Scholar 

  27. Kurian AW, Ward KC, Abrahamse P, Bondarenko I, Hamilton AS, Deapen D, et al. Time trends in receipt of germline genetic testing and results for women diagnosed with breast cancer or ovarian cancer, 2012–2019. J Clin Oncol. 2021;39(15):1631–40.

    Article  CAS  Google Scholar 

  28. Gracia-Aznarez FJ, Fernandez V, Pita G, Peterlongo P, Dominguez O, de la Hoya M, et al. Whole exome sequencing suggests much of non-BRCA1/BRCA2 familial breast cancer is due to moderate and low penetrance susceptibility alleles. PLoS One. 2013;8(2):e55681.

    Article  CAS  Google Scholar 

  29. Glentis S, Dimopoulos AC, Rouskas K, Ntritsos G, Evangelou E, Narod SA, et al. Exome sequencing in BRCA1- and BRCA2-negative greek families identifies MDM1 and NBEAL1 as candidate risk genes for hereditary breast cancer. Front Genet. 2019;10:1005.

    Article  CAS  Google Scholar 

  30. Shahi RB, De Brakeleer S, Caljon B, Pauwels I, Bonduelle M, Joris S, et al. Identification of candidate cancer predisposing variants by performing whole-exome sequencing on index patients from BRCA1 and BRCA2-negative breast cancer families. BMC Cancer. 2019;19(1):313.

    Article  Google Scholar 

  31. BenAyed-Guerfali D, Kifagi C, BenKridis-Rejeb W, Ammous-Boukhris N, Ayedi W, Khanfir A, et al. The identification by exome sequencing of candidate genes in BRCA-negative tunisian patients at a high risk of hereditary breast/ovarian cancer. Genes (Basel). 2022;13(8):1296.

    Article  CAS  Google Scholar 

  32. Ang P, Lim IHK, Lee T-C, Luo J-T, Ong DCT, Tan PH, et al. BRCA1 and BRCA2 mutations in an Asian clinic-based population detected using a comprehensive strategy. Cancer Epidemiol Biomarkers Prev. 2007;16(11):2276–84.

    Article  CAS  Google Scholar 

  33. Chan M, Ji SM, Yeo ZX, Gan L, Yap E, Yap YS, et al. Development of a next-generation sequencing method for BRCA mutation screening. J Mol Diagnostics. 2012;14(6):602–12.

    Article  CAS  Google Scholar 

  34. Miller NA, Farrow EG, Gibson M, Willig LK, Twist G, Yoo B, et al. A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases. Genome Med. 2015;7(1):100.

    Article  Google Scholar 

  35. Rentzsch P, Schubach M, Shendure J, Kircher M. CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 2021;13(1):31.

    Article  CAS  Google Scholar 

  36. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164–e164.

    Article  Google Scholar 

  37. Li Q, Wang K. InterVar: clinical interpretation of genetic variants by the 2015 ACMG-AMP guidelines. Am J Hum Genet. 2017;100(2):267–80.

    Article  CAS  Google Scholar 

  38. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, et al. Comprehensive characterization of cancer driver genes and mutations. Cell. 2018;173(2):371-385.e18.

    Article  CAS  Google Scholar 

  39. Robinson JT, Thorvaldsdóttir H, Wenger AM, Zehir A, Mesirov JP. Variant review with the integrative genomics viewer. Cancer Res. 2017;77(21):e31–4.

    Article  CAS  Google Scholar 

  40. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinform. 2012;13(1):134.

    Article  CAS  Google Scholar 

  41. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57(1):289–300.

    Google Scholar 

Download references


We thank the Broad Institute for generating high-quality sequence data supported by NHGRI funds (grant # U54 HG003067) with Eric Lander as PI. The datasets used in this manuscript were obtained from dbGaP at through dbGaP accession number phs000822. We also thank Memorial Sloan Kettering Cancer Center and Massachusetts General Hospital for the samples collection. The authors would like to thank Sabna Zihara for technical support in this study. We thank the SG10K_Health Investigators for providing the SG10K_Health data generated as part of the Singapore National Precision Medicine program funded by the Industry Alignment Fund (Pre-Positioning) (IAF-PP: H17/01/a0/007). The views expressed by the author(s) are not necessarily those of the National Precision Medicine investigators or institutional partners. We thank all investigators, staff members and study participants who made the National Precision Medicine Project possible.


This study was primarily supported by a grant from the Industry Alignment Fund – Industry Collaboration Projects funding (IAF-ICP: I1801E0021), and partially by NCCS Cancer Fund, both awarded to Ann Lee.

Author information

Authors and Affiliations



A.S.G.L., P.A., M.-H.T. and S.-C.L. conceived the study. R.J.T., P.-Y.O., J.S., C.W.L., P.A., M.-H.T. and S.-C.L. provided the study material. N.Y.L. and M.K.M. performed the data analysis. M.H., A.A.A., W.K.L. and M.W. performed experiments. N.Y.L., M.H., A.S.G.L. and A.A.A. wrote the manuscript. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Ann S. G. Lee.

Ethics declarations

Ethics approval and consent to participate

Written informed consent was obtained from all participants and the study was approved by the SingHealth Centralised Institutional Review Board (CIRB Ref: 2018/2147).

Consent for publication

Not applicable.

Competing interests

P.A. reports receiving travel support and/or honoraria from AstraZeneca, DKSH, Eisai, Bristol Myers Squibb, Lilly, Novartis, Pfizer, Roche, and MSD, all of which are outside the submitted work. M.-H.T. reports being a director, shareholder, and Chief Executive Officer of Lucence, outside of the submitted work. S.-C.L. reports grant support/research collaborations with Pfizer, Eisai, Taiho, ACT Genomics, Bayer, and MSD; advisory board/speaker invitations from Pfizer, Novartis, Astra Zeneca, ACT Genomics, Eli Lilly, MSD, Roche, Eisai and Daiichi-Sankyo; conference support from Amgen, Pfizer and Roche, all of which are outside the submitted work. All other authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Figure 1 has been updated.

Supplementary Information

Additional file 1

: Fig. S1. Detailed distribution of age at breast cancer diagnosis and family history. Patients with age of diagnosis above 40 years of age but who are without a family history of any cancer, and patients with both age of diagnosis ≤ 40 years of age and also > 40 years of age; had bilateral breast cancer.

Additional file 2

: Fig. S2. Representative IGV screenshots of unambiguous versus ambiguous alignments. A A heterozygous SNV with equal support for both reference and alternate bases; B deletion, as indicated by clear gaps in the read alignment; and C insertion, as represented by a thin vertical line flanked by mapped bases on both sides. Red boxes indicate where the variants are expected to appear. In comparison, the heterozygous nonsynonymous SNVs D MUC4 NM_018406.7:c.G8461A E KMT2C NM_170606.3:c.C2689T F KMT2C NM_170606.3:c.C2710T and G HLA-DRB1 NM_002124.3:c.C301T have fewer reads supporting the alternate base; the frameshift deletions H HLA-A NM_001242758.1:c.268delA and I HLA-DRB1 NM_002124.3:c.118_122del are not associated with any obvious gaps in read alignments; nor is the frameshift insertion J HLA-DRB1 NM_002124.3:c.126_127insTTAAGTTT represented by insertions in its read alignments.

Additional file 3

: Fig. S3. Representative IGV screenshots of alignments supporting two likely–false positive frameshift insertions. Panel A shows the alignment for PABPC1 NM_002568.4:c.1336_1337insACCTCATC and B for CIC NM_015125.4:c.4778_4779insGG. Red boxes indicate where the insertion would have been expected to appear, red arrows point to the soft-clipped alignments which support the existence these frameshift insertions. C Reads supporting the PABPC1 insertion map partially to both PABPC1 and PABPC3 (reverse complement) genes on reference genome loci NC_000008.10:101,719,206-101,719,234 and NC_000013.11:25,097,536-25,097,508, respectively.

Additional file 4

: Supplementary Table 1. Total number of potentially pathogenic variants discovered in each prioritized gene; and support for these genes across different cancer gene databases. Supplementary Table 2. Patient IDs for the patients with rare pathogenic variants in each gene. Supplementary Table 3. Clinical features and pathogenic variants identified in 89 breast cancer patients. Supplementary Table 4. Involvement in cancer for seven of our selected variants, as reported in the literature. Supplementary Table 5. PCR primers and cycling conditions used for Sanger sequencing.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, N.Y., Hum, M., Amali, A.A. et al. Whole-exome sequencing of BRCA-negative breast cancer patients and case–control analyses identify variants associated with breast cancer susceptibility. Hum Genomics 16, 61 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: