Investigating diagnostic sequencing techniques for CADASIL diagnosis

Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) is a cerebral small vessel disease caused by mutations in the NOTCH3 gene. Our laboratory has been undertaking genetic diagnostic testing for CADASIL since 1997. Work originally utilised Sanger sequencing methods targeting specific NOTCH3 exons. More recently, next-generation sequencing (NGS)-based technologies such as a targeted gene panel and whole exome sequencing (WES) have been used for improved genetic diagnostic testing. In this study, data from 680 patient samples was analysed for 764 tests utilising 3 different sequencing technologies. Sanger sequencing was performed for 407 tests, a targeted NGS gene panel which includes NOTCH3 exonic regions accounted for 354 tests, and WES with targeted analysis was performed for 3 tests. In total, 14.7% of patient samples (n = 100/680) were determined to have a mutation. Testing efficacy varied by method, with 10.8% (n = 44/407) of tests using Sanger sequencing able to identify mutations, with 15.8% (n = 56/354) of tests performed using the NGS custom panel successfully identifying mutations and a likely non-NOTCH3 pathogenic variant (n = 1/3) identified through WES. Further analysis was then performed through stratification of the number of mutations detected at our facility based on the number of exons, level of pathogenicity and the classification of mutations as known or novel. A systematic review of NOTCH3 mutation testing data from 1997 to 2017 determined the diagnostic rate of pathogenic findings and found the NGS-customised panel increases our ability to identify disease-causing mutations in NOTCH3.

Background NOTCH3 (Notch homologue 3) encodes a large single-pass transmembrane receptor that transduces signals between cells [1]. It is highly conserved and critical for cell fate determination in embryonic development, the differentiation and maturation of functional arteries, and the biological processes of tissue injury and repair [1][2][3]. The expression of NOTCH3 is ubiquitous in adults; however, due to mutations associated with cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), some studies suggest that NOTCH3 also plays a role in maintaining vascular homeostasis [1].
CADASIL is a cerebral small vessel disease affecting the vascular smooth muscle cells (VSCMs) and characterised by NOTCH3 mutations and/or the presence of granular osmiophilic material (GOM) [4]. The clinical signs and symptoms for CADASIL include recurrent subcortical ischaemic events; cognitive impairment including dementia, migraine, motor disabilities such as gait disturbances, urinary incontinence and pseudobulbar palsy, encephalopathy, mood disturbances such as apathy or severe depression; and less commonly seen neurological manifestations such as seizures [5][6][7].
NOTCH3 encodes one of four NOTCH proteins in mammals and is a core component in Notch signalling, which is considered one of the 'elite' signalling pathways due to its high conservation across species [8]. The NOTCH3 protein is comprised of distinct structural domains; the extracellular domain (ECD), transmembrane domain and intracellular domain (ICD). The ECD is made up of the epidermal growth factor-like repeats (EGFRs) and LIN12/Notch repeats (LNR), whilst the ICD is made up of the recombining binding protein Janus kinase (RBPJK)-associated module (RAM) domain, ankyrin repeats, nuclear localization signals and a Cterminal PEST (proline, glutamate, serine, threonine) sequence [9]. Each domain has an integral role in Notch signalling including interaction with the EGFRs through ligand binding; the RAM domain physically interacts with an effector protein (e.g. RBPJ or CBF1); the ankyrin repeats mediate different protein-protein interactions, and the PEST domain promotes the degradation of the intracellular domain [10].
In NOTCH3 signalling, the ECD of the Notch protein (NECD) binds to a ligand and undergoes a conformational change which exposes a cleavage site for the metalloprotease ADAM17. This change initiates the S-2 cleavage event, through ADAM17, which liberates the ECD from the cell surface [2]. In healthy individuals with no pathogenic NOTCH3 mutation, the ECD-ligand complex is then removed from the extracellular matrix (ECM) through endocytosis from the ligand-presenting cell, whilst in CADASIL patients, this complex aggregates with other ECM proteins and forms the GOM [2]. Activation of the Notch receptor occurs through an S-3 cleavage event caused by a gamma secretase (e.g. presenilin), which liberates the Notch intracellular domain (NICD) from the cell wall [11]. The NICD either translocates to the nucleus by binding with members of the coactivator complex (e.g. RBP/JK) or interacts with members of other signalling pathways [11,12].
The result of NOTCH3 mutations on disease causation is generally due to the location and type of mutation within the gene. CADASIL patients have well-characterised cysteine-altering missense mutations within exons 2-24, which result in the gain or loss of a cysteine residue in 1 of the 34 EGFRs [4,[13][14][15]. In comparison, truncating NOTCH3 mutations within exon 33 (often deletions of stop-loss mutations) which disrupts the NOTCH3 PEST domain are also known to cause lateral meningocele syndrome (LMS) MIM#130720 [16,17]. The disruption of the PEST domain presumably results in an increased half-life of the NICD and, as a result, prolonged NOTCH signalling [17]. Interestingly, this does not seem to be the case in CADASIL as NOTCH3 signalling does not seem to be impaired, despite causative mutations being primarily found in the ECD of the protein [18,19]. There are also several pathological hallmarks of CADASIL which include profound demyelination and axonal damage, as well as arteriopathy caused by degeneration of vascular smooth muscle cells (VSMCs) in the brain and peripheral organs [20][21][22]. Damage to VSMCs is also thought to cause progressive thickening of the arteriole walls, fibrosis and luminal narrowing in the medium and small arteries eventually resulting in lacunar infarcts [23,24].
Originally, CADASIL was diagnosed by the presence of granular osmiophilic material (GOM), which contains the ectodomain of the NOTCH3 protein, identifiable in the walls of small arteries via examination of tissue biopsy using electron or light microscopy [4,25]. However, sequencing of NOTCH3 is now used as a diagnostic tool with studies finding congruence between NOTCH3 mutations and GOM in the diagnosis of CADASIL [26,27]. Where patients have no known identifiable NOTCH3 mutation, they can also be categorised as being CADASILlike and if a genetic cause is found could be re-classified as a similar condition (e.g. HTRA1 mutations in cerebral autosomal recessive arteriopathy with subcortical infarcts and leukoencephalopathy (CARASIL) or GLA mutations in Fabry disease) [28,29]. The Genomics Research Centre (GRC) currently undertakes diagnostic testing for familial hemiplegic migraine, epilepsy, CADASIL, episodic ataxia type 2 and spinocerebellar ataxia type 6, utilising Sanger sequencing, as well as a next-generation sequencing (NGS) 5-gene custom panel (CACNA1A, ATP1A2, SCN1A, NOTCH3 and KCNK18). The GRC also undertakes clinical whole exome sequencing (WES) to diagnose conditions with similar phenotypes to those that can be diagnosed using the NGS 5-gene panel [30]. The aim of this study was to analyse the number and types of mutations identified in CADASIL in referred patients across the three different sequencing techniques.
The NGS 5-gene custom panel identified mutations in 15.8% (n = 56/354) of patients screened for CADASIL across NOTCH3 (n = 53/56), CACNA1A (n = 2/56) and ATP1A2 (n = 1/56). This included 52 samples which had previously been tested by Sanger sequencing and where no causative mutations had been identified. The increased diagnostic rate in the samples was also identified to be statistically significant (p value = 0.027) by a direction χ 2 analysis based on the hypothesis that the NGS 5gene panel diagnostic rate will be greater than the Sanger sequencing diagnostic rate. Variants in exons 2-24 of NOTCH3 accounted for 92.45% (n = 49/53) of NOTCH3 mutations that have been reported in patients ( Table 3). The remaining 3 NOTCH3 variants were identified in exon 25 (p.Leu1518Met) and exon 33 (p.Glu2268Lys) and a deletion in intron 1 (part of the 5′ UTR sequenced from the panel). As the missense mutation in exon 33 does not result in a truncated protein that would disrupt the PEST region and the patient was not identified to have an LMS phenotype, it was considered unlikely that this variant is causative of LMS. In addition, 3 heterozygous missense mutations in other genes within the panel were identified (CACNA1Ap.Asp1723Asn and p.Ala987Ser; ATP1A2-p.Glu219Gln) suggesting that these patients have familial hemiplegic migraine (FHM) which has symptomatic features which overlap with CADASIL. Our analysis identified known HGMD disease-causing mutations in (n = 38/56) of tests (Table 3). NOTCH3 Cys-sparing mutations accounted for 11.1% (n = 5) of mutations identified, all within exons 2-24 (Tables 1 and 3). In addition, there were 3 commonly identified amino acid changing mutations which accounted for n = 35/100 total variants (Table 1), including Arg141Cys, Arg153Cys and Arg182Cys, which were identified in 16, 9 and 10 cases, respectively (Tables 2  and 3). All samples with the same mutation were followed-up to check for related family members; however, there was no definitive evidence to suggest a relationship based on the clinical information received upon genetic testing request. However, due to the high number of samples with the same mutation, it is likely given the rare nature of CADASIL that there may be some familial relationship.
This work also yielded five previously unreported NOTCH3 variants (Table 4) identified through either the NGS 5-gene panel or by Sanger sequencing. n = 3/5 Table 1 The number of potential causal mutations identified by the two different sequencing techniques and stratified according to gender (M, male; F, female). *There is an overlap of samples completing multiple sequencing when there has been no mutation identified via the previous sequencing technique which shows an improved diagnostic rate using the GRC NGS 5-gene panel compared to targeted exon Sanger sequencing In the study data set, there were three samples which were previously tested using the NGS 5-gene panel that had WES only completed with targeted analysis on NOTCH3 as well as COL4A1 and other specified genes.
All samples had previously been tested using the NGS gene panel, and no potential causative mutation had been identified. Of these, one sample was identified to have a variant of unknown significance in COL4A1 (p.Gly1198Arg) which was predicted to be pathogenic by in silico tools such as SIFT, PolyPhen and MutationTaster. There were no other clinically significant variants identified in other genes requested for analysis known to cause related CSVDs including HTRA1, HTRA4, COL4A1, COL4A2, ARX, TREX, GLA and NOTCH3 in CAD-661, and NOTCH3, APP, COL4A1, COL4A2, TREX1, ARX, HTRA1, HTRA2, GLA or ITM2B in CAD-637. NOTCH3 was analysed for all three samples by WES and found to confirm 100% concordance with the NGS gene panel results for variants identified.

Discussion
Sequencing of NOTCH3 is a critical component in the diagnosis of CADASIL. Initial diagnostic testing for NOTCH3 mutations was influenced by research conducted by Joutel et al. [31] and subsequent supporting literature which identified mutations clustering within exons 3 and 4 of the gene [15,32]. It is partially due to this that there remains a bias in mutations detected via Sanger sequencing in exon 4 due to the initial primary sequencing of NOTCH3 being limited to exons 3 and 4. The GRC NGS 5-gene custom panel data also supports the clustering of mutations in exon 4; however, there is a greater spread of mutations across all NOTCH3 exons, with most of the identified mutations found within exons 2-24 [33].
The development and design of the NGS 5-gene panel in 2012 was completed as it allowed for a cost-and timeeffective approach to identify mutations in any of the 33 NOTCH3 exons as opposed to individual exons sequenced at an increased cost if no mutation is initially identified [30,34]. The ability of the custom panel to sequence all exons and flanking untranslated regions has led to an increased diagnostic rate, from 10.6 to 15.8% (p value = 0.027) ( Table 1) and can include identifying previously unreported variants (Table 4). Whilst the majority of mutations identified through the gene panel were Cys-changing and located between exons 2 and 24, a number of variants were identified which do not disrupt the cysteine residues in EGFR. Cys-sparing mutations are contradictory to the hypothesis that Cys-changing mutations in NOTCH3 are responsible for the disease mechanism in CADASIL; however, multiple case studies have identified Cys-sparing mutations in NOTCH3 (p.R61W, p.R75P, p.R213K, p.A1020P   and p.T1098S) as a cause of CADASIL [35][36][37][38][39][40]. Other studies have also identified mutations located outside the EGFRs implicated as the cause for CADASIL and white matter disease, suggesting that there are other mechanisms which contribute or cause the CADASIL phenotype [41,42]. The increase in mutations that do not affect Cysteine residues or the EGFRs are reflected in updated proposed guidelines for CADASIL diagnosis which suggest that non-Cysteine-altering mutations should also be investigated carefully [43,44]. Variants identified in other genes in the panel (Table 3) were due to clinical requesting for further analysis on patients with no identifiable NOTCH3 mutation. This was seen with mutations identified in CACNA1A, ATP1A2 and COL4A1. Mutations in CACNA1A are known to cause familial hemiplegic migraine type 1 (FHM1) and episodic ataxia type 2 (EA2). The clinical signs of FHM1 overlap significantly with CADASIL, with migraine reported in~20-35% of CADASIL patients and some motor effects may resemble stroke effects [45,46]. Due to a lack of prior clinical information, we cannot exclude other aetiologies for the ischaemic events, e.g. if they are due to environmental or lifestyle stresses, vasoconstrictive drugs used as a prior treatment or if another gene mutation not tested is the cause [45,47,48]. Another heterozygous gene mutation was identified in ATP1A2 in CAD-400 that is known to cause familial hemiplegic migraine type 2 (FHM2) (MIM#602481). A meta-analysis completed by Cole and Kittner [49] found an association of greater risk for ischaemic stroke in migraine sufferers.
Studies by Harriott et al. [50] failed to reproduce results when investigating ATP1A2 polymorphisms and stroke risk; however, they did concede that the data from the study is hypothesis-generating and further studies may be useful.
WES identified a heterozygous mutation in COL4A1, which is known to cause a cerebral small vessel disease (SVD) with symptoms including transient ischaemic attacks, adult-onset haemorrhagic stroke, periventricular brain abnormalities, white matter hyperintensities and leukoencephalopathy (including cerebellar hypoplasia, cerebral atrophy and vascular changes) [51][52][53]. Choi [54] highlighted phenotypic similarities between COL4A1 SVD and NOTCH3 mutations in CADASIL, showing that both conditions cause lacunar infarcts, cognitive deficits, intracerebral haemorrhage and migraine. The main pathological finding difference involves a defect in the basement membrane as opposed to the GOM found in the arteriole walls, which is difficult to determine unless a tissue biopsy is performed [4,54].
Despite the limited number of samples assessed in this study, we already have evidence that the use of WES can expand our capabilities to identify genetic causes of cerebral small vessel disease when CADASIL mutation testing is negative. We are also confident that this work is able to identify variants consistently across the different sequencing technologies as stringent validation of this work has been completed for accreditation for diagnostic testing through the National Association for Testing Authorities (NATA), Australia, and through previous work  The percentage indicates how confident the tool is for determining a deleterious, neutral (N) or unknown (?) variant effect. Percentages listed with "(N)" indicated the percentage of confidence in calling a benign or non-damaging variants based off the in silico tool used. "?" indicates that the in silico tool could not determine whether the variant would be pathogenic or damaging completed by Maksemous et al. [30]. However, one of the limitations in using WES for CADASIL-related conditions is the reliance on the clinician to request the genes for analysis and the potential non-specific symptoms of patients. It is important to identify the correct causative genetic mutation in CADASIL and related conditions as physicians need to be able to manage the symptoms of these disorders. One example related to a major CADA-SIL symptom is that migraine treatments should include non-steroidal anti-inflammatories (NSAIDs) or analgesics, whilst vasoconstrictors should be avoided due to an increased risk of inducing an ischaemic event [6]. This highlights the need to have open communication between the referring clinicians and the diagnostic testing facilities to ensure gene lists are ready for use in cases where further testing may be required as it can have direct treatment/ management ramifications for people affected. Furthermore, detailed phenotypic information is essential to augment the clinical and genetic testing information for improved diagnosis and reporting.

Conclusions
The role of NOTCH3 testing in CADASIL diagnosis is important, and with advances in sequencing technology (from Sanger sequencing to NGS gene panels, WES and whole genome sequencing), we can continue to improve diagnostic success rates. However, the number of mutations we are able to identify in patients which are thought to be symptomatic is still quite low. This may be related to limitations associated with the gene panel caused by the small gap in coverage in exon 24 of NOTCH3; however, this limitation is unlikely to have a large impact as the coverage gap size and location are not known hotspots for NOTCH3 mutations in CADA-SIL. Other genetic mutations known to be associated with similar clinically presenting diseases (FHM1 in CACNA1A, FHM2 in ATP1A2, and mutations within COL4A1 cause COL4A1-associated leukoencephalopathy) have been identified through follow-up testing requested by clinicians. This supports the premise that the cause of the symptoms of CADASIL may be attributed to other related neurological disorders with overlapping symptoms. The development and implementation of the GRC NGS 5-gene custom panel have shown complete concordance with Sanger sequencing but extends our capacity to detect mutations and resulted in an increased diagnostic rate of 10.8 to~15.8%. Hence, NGS has increased our capabilities to identify NOTCH3 mutations causative of CADASIL, although the increased variety and relatively low diagnostic yield highlight that there may be other genes or mechanisms which contribute to or cause CADASIL. Future WES and whole genome sequencing may play an important role in identifying other genes implicated in this disorder.

Materials and methods
Patients were originally referred to the Genomics Research Centre NATA (National Association of Testing Authorities, Australia)-accredited diagnostic laboratory by physicians in Australia and New Zealand. Ethical approval for these studies is through QUT HREC (Approval Number 1400000748). Patient results were selected from internal de-identified records from January 1, 1997, to December 31, 2017, and were based on referrals for CADASIL or CADASIL-like symptoms and specific NOTCH3 testing. The results were excluded if the samples were identified to have also been sent for testing for familial hemiplegic migraine or if they were family members of previously investigated probands, investigated or used for confirmatory testing based on previous genetic testing for CADASIL. The results were stratified through the identification, exon location and mutation type within NOTCH3.
Requested CADASIL/NOTCH3 patients (n = 407) underwent initial Sanger sequencing on exons 3 and 4, unless another exon or an extended NOTCH3 analysis (sequencing of exons 2, 11, 18 and 19) was subsequently requested. All exons were initially selected for analysis and were based on mutational hotspots identified in NOTCH3 by Joutel et al. and Peters et al. [15,27,32]. The primer sets were designed to encompass some of the entire exon examined, as well as surrounding intronic material, spanning in size from 193 bp for exon 2, 296 bp for exon 3, 488 bp for exon 4, 367 bp for exon 11, 258 bp for exon 18 and 350 bp for exon 19. The methods used for NOTCH3 Sanger sequencing has been previously described by Roy et al. [55]. Genomic DNA was extracted from peripheral blood lymphocytes using the QIAGEN QIAcube™ (Venlo, Netherlands). Samples were originally sequenced using Sanger et al. [56] dideoxy methods using the ThermoFisher BigDye™ Terminator v3.1 Cycle Sequencing Kit (Thermo Fisher Scientific, Scoresby, Victoria, Australia) and were analysed following separation on an Applied Biosystems™ 310, 3130 or 3500 Series Genetic Analyzer (Thermo Fisher Scientific, Scoresby, Victoria, Australia) [55].
The  [30]. Pearson's chi-square test was also completed based on the hypothesis that there is a greater percentage of mutations identified by the NGS panel compared to Sanger sequencing.
Whole exome sequencing (WES) was performed using Ion AmpliSeq™ Exome Library Kit Plus (Carlsbad, CA, USA) according to the manufacturers' instructions (MAN0010084). Template preparation, enrichment and chip loading were performed using the Ion PI™ Hi-Q™ Chef Kit (Catalogue number A27198) on the Applied Biosystems Ion Chef (Carlsbad, CA, USA). Sequencing was undertaken on the Ion Proton™ platform (Carlsbad, CA, USA). Only requested genes by physicians were analysed in the WES data, and these included amyloid beta precursor protein (APP), aristaless-related homeobox (ARX), collagen type IV alpha 1 chain (COL4A1), collagen type IV alpha 2 chain (COL4A2), high-temperature requirement A serine peptidase 1 (HTRA1), hightemperature requirement A serine peptidase 2 (HTRA2), high-temperature requirement A serine peptidase 4 (HTRA4), three prime repair exonuclease 1 (TREX1), galactosidase alpha (GLA), NOTCH3 and integral membrane protein 2B (ITM2B) although not all of these genes were investigated in each patient sample.
Variant annotation for the NGS techniques was based on the use of population databases and in silico prediction tools for determining pathogenicity. Population databases used for analysis include 1000 Genomes (1000G), exome aggregation consortium database (ExAC) http://exac.broadinstitute.org and genome aggregation database (GnomAD) http://gnomad.broadinstitute.org/. In silico prediction tools used included SIFT (score < 0.05), PolyPhen (score > 0.8), MutationTaster and PredictSNP2 (which also includes CADD, DANN, FATHMM, FunSeq2 and GWAVA [41,[57][58][59]. Other databases for investigating variant effects included dbSNP https://www.ncbi.nlm.nih.gov/snp/, HGMD http://www.hgmd.cf.ac.uk/ac/index.php and OMIM https:// www.omim.org/. Authors' contributions Conception and design of the manuscript were completed by PJD, HGS, LMH and LRG. Analysis of the sequence data to identify mutations was originally completed by NM and RAS. Investigation of the mutations to identify and stratify mutations was completed by PJD. Writing and editing of the manuscript were completed by PJD. Substantial editing was done by NM, RAS, HGS, LMH and LRG. Finally, all authors read and approved the submission of the manuscript to BMC Human Genomics.

Funding
Work completed for this manuscript is funded by the Australian National Health and Medical Research Council (NHMRC) Dora Lush Biomedical Sciences Postgraduate Scholarship which pays a stipend to PJD to complete their research.

Availability of data and materials
All data relevant for this study is included within this manuscript; any further information may be made available on request.
Ethics approval and consent to participate Ethical approval for these studies is through Queensland University of Technology (QUT) Human Research Ethics Committee HREC (Approval Number 1400000748).

Consent for publication
Not applicable.