Utility and limitations of exome sequencing as a genetic diagnostic tool for conditions associated with pediatric sudden cardiac arrest/sudden cardiac death
- Mindy H. Li1, 3Email author,
- Jenica L. Abrudan1, 3,
- Matthew C. Dulik1, 3,
- Ariella Sasson4,
- Joshua Brunton1, 3,
- Vijayakumar Jayaraman1, 3,
- Noreen Dugan1, 2,
- Danielle Haley1, 2,
- Ramakrishnan Rajagopalan5, 6,
- Sawona Biswas5,
- Mahdi Sarmady4,
- Elizabeth T. DeChene5, 6,
- Matthew A. Deardorff1, 3,
- Alisha Wilkens1, 3,
- Sarah E. Noon1, 3,
- Maria I. Scarano1, 3,
- Avni B. Santani5, 6,
- Peter S. White1, 4, 7, 8,
- Jeffrey Pennington4,
- Laura K. Conlin5, 6,
- Nancy B. Spinner5, 6,
- Ian D. Krantz1, 3 and
- Victoria L. Vetter1, 2
© Li et al. 2015
Received: 12 May 2015
Accepted: 9 July 2015
Published: 19 July 2015
Conditions associated with sudden cardiac arrest/death (SCA/D) in youth often have a genetic etiology. While SCA/D is uncommon, a pro-active family screening approach may identify these inherited structural and electrical abnormalities prior to symptomatic events and allow appropriate surveillance and treatment. This study investigated the diagnostic utility of exome sequencing (ES) by evaluating the capture and coverage of genes related to SCA/D.
Samples from 102 individuals (13 with known molecular etiologies for SCA/D, 30 individuals without known molecular etiologies for SCA/D and 59 with other conditions) were analyzed following exome capture and sequencing at an average read depth of 100X. Reads were mapped to human genome GRCh37 using Novoalign, and post-processing and analysis was done using Picard and GATK. A total of 103 genes (2,190 exons) related to SCA/D were used as a primary filter. An additional 100 random variants within the targeted genes associated with SCA/D were also selected and evaluated for depth of sequencing and coverage. Although the primary objective was to evaluate the adequacy of depth of sequencing and coverage of targeted SCA/D genes and not for primary diagnosis, all patients who had SCA/D (known or unknown molecular etiologies) were evaluated with the project’s variant analysis pipeline to determine if the molecular etiologies could be successfully identified.
The majority of exons (97.6 %) were captured and fully covered on average at minimum of 20x sequencing depth. The proportion of unique genomic positions reported within poorly covered exons remained small (4 %). Exonic regions with less coverage reflect the need to enrich these areas to improve coverage. Despite limitations in coverage, we identified 100 % of cases with a prior known molecular etiology for SCA/D, and analysis of an additional 30 individuals with SCA/D but no known molecular etiology revealed a diagnostic answer in 5/30 (17 %). We also demonstrated 95 % of 100 randomly selected reported variants within our targeted genes would have been picked up on ES based on our coverage analysis.
ES is a helpful clinical diagnostic tool for SCA/D given its potential to successfully identify a molecular diagnosis, but clinicians should be aware of limitations of available platforms from technical and diagnostic perspectives.
The rapid development of genomic sequencing and techniques such as massively parallel next-generation sequencing has decreased cost, improved efficiency, and increased the clinical and research use of genetic testing [1, 2]. Exome sequencing (ES), or sequencing the protein-coding portions of a human genome, has become an increasingly utilized approach for investigating Mendelian disorders . Studies report varying diagnostic ES success rates, ranging from 22.8 %  to 50 % . As costs continue to decline, it is likely the use of whole genome-sequencing will increase . The application of exome and genome-level sequencing raises many challenges both from a technical execution and diagnostic standpoint [7, 8], and the best use of this testing in clinical practice remains unclear . The role of genetic testing as a tool to investigate cardiovascular disease has had increased focus in recent years .
Sudden cardiac arrest/death (SCA/D) is uncommon in the young and occurs in an estimated 2000 individuals under 25 years of age annually in the US . Causes include inherited structural, functional, and electrical cardiac abnormalities [11–13]. There may be no significant previous medical history prior to the occurrence of SCA/D, and standard postmortem analysis may be unrevealing [12, 14, 15] in as many as 10-30 % of cases . A pro-active family screening approach is important to provide life-saving treatment and to help identify other affected members due to the high association of genetic causes [12, 13]. More than 100 genes have been associated with SCA/D [11, 17]. Guidelines for genetic testing for channelopathies and cardiomyopathies, were published in 2011 .
Increased accessibility to ES warrants its examination as a possible front line diagnostic tool for inherited conditions associated with SCA/D. Compared with targeted gene sequencing and comprehensive panels specific for disease, which are currently available , there are important differences to consider with ES. The workflow and challenges of completing ES are described in detail by Bamshad et al. . Importantly, only 1-2 % of the human genome contains protein–coding sequences [3, 6, 19]; thus, these regions must undergo an exon-targeting “capture” process before being sequenced. Traditional sequencing methods for individual genes or panels do not require this step.
A second consideration is determining the “coverage” of these captured regions. Coverage (also known as “depth” of sequencing) refers to how many times a nucleotide meeting criteria for being a high-quality base is represented in a random collection of raw sequences . This helps differentiate sequencing errors from a true sequence variant; the higher the coverage, the more likely the captured base is accurate and not a false read due to technical errors. For example, a captured base “T” with 20x coverage means that base is represented at least 20 times at that position in multiple raw sequences. Of note, “coverage”, in addition to meaning depth of sequencing, may also refer to the general proportion of bases covered in genomic sequence at a specific depth . For example, an exon with 90 % coverage at 20x means 90 % of the bases in that exon are represented at least 20 times on multiple raw sequences. In our manuscript, depth of sequencing will refer to the number of times a nucleotide is represented, and coverage will refer to the general proportion of genomic sequence covered unless otherwise specified.
This research study was approved by The Children’s Hospital of Philadelphia (CHOP) Institutional Review Board (IRB). Samples from 102 pediatric individuals were enrolled under an IRB approved protocol of informed consent at The Children’s Hospital of Philadelphia and de-identified. Detailed demographic information of individuals was not readily available due to the de-identification process. Of these 102 patients, thirteen had known molecular diagnosis for SCA/D, 59 had known molecular etiologies for other conditions including hearing loss, intellectual disability and mitochondrial disease (all known diagnoses were identified by Clinical Laboratory Improvement Amendments certified laboratories), and 30 individuals did not have known molecular etiologies for SCA/D.
Exome capture, sequencing, and bioinformatics
Peripheral blood from patients was collected in sterile EDTA tubes (BD vacutainer) at Phlebotomy, Children’s Hospital of Philadelphia. Blood tubes were stored immediately at 4 °C and Genomic DNA (gDNA) was manually extracted using standard procedures with the Gentra Puregene Blood Kit Plus (Qiagen, 158489). gDNA quality was assessed on an agarose gel, Nanodrop spectrophotometer and quantified via the Qubit system. 3-6ug of gDNA from each sample were prepared and sent to the Beijing Genomics Institute (BGI) facility at CHOP. Exome capture was done with Agilent SureSelect V4, and whole-exome sequencing was completed on Illumina Hi-Seq 2000 sequencers at an average coverage depth of 100X. Sequencing reads were obtained in FASTQ format and were examined via the Pediatric Genetic Sequencing Project (PediSeq) exome sequence coverage analysis pipeline.
The sequence reads were mapped to human genome assembly, GRCh37.p10, using Novoalign (V3.00.02) (www.novocraft.com), which has been shown to optimize alignment . Coverage statistics per exons in the SCA/D genes bed file were generated using GATK Depth of Coverage tool version 2.2. Quality control steps to filter out poorly mapped reads included minBaseQuality 20/minMapping Quality 20 settings during variant calling and removal of variants with a minimum depth of coverage of less than 20 reads. Multi-sample variant calling was not done as patients are clinically evaluated and analyzed independently. Refer to Additional file 1 for further details regarding the sequencing protocol and data processing.
The primary aim of this study was to evaluate the adequacy of depth of sequencing, coverage of targeted SCA/D genes, and platform efficacy, not to identify individual molecular diagnoses. Therefore, the main coverage analysis was completed on the first 72 samples, regardless of their underlying clinical findings and molecular etiologies. Of these 72 samples, additional analysis (described at the end of the methods section) was completed on the 13 cases with known molecular causes for SCA/D. In addition, diagnostic analysis was performed on 30 patients without known causes for their SCA/D. 15 of these 30 patients (50 %) had variable prior genetic workup that was non-diagnostic.
Curated List of 103 Genes Associated with SCA/D
We then evaluated the proportion of different variant types present within the 2,190 exons. This was critical as changes such as large insertions or deletions do not have specific genomic coordinates, and locating these changes can be problematic due to current limitations of ES technology. As large insertions and deletions were unlikely to be picked up with ES without separate and additional computational analyses, they were not included in the final analysis.
The next round of analysis focused on three main aspects: 1) Examining how well the exons in the targeted genes were captured on the Agilent SureSelect V4 platform, 2) Of the exons that were captured, how adequate was the depth of sequencing of these exons, using 20x (when a nucleotide on average is represented at least 20 times in a group of random raw sequences) as our standard for defining adequate depth of sequencing, and 3) Of the captured exons, what proportion of the exons met criteria for adequate sequencing coverage (percentage of bases within the exons that are sequenced at an appropriate read depth, which in this case was 20x). Coverage scores of all 2,190 exons were obtained for each sample individually, and then data for each exon was averaged across all samples.
Although the primary objective of this study was to evaluate the adequacy of depth of sequencing and coverage of targeted SCA/D genes and not for primary diagnosis, all patients enrolled in the study who had known molecular etiologies for SCA/D were evaluated with the project’s variant analysis pipeline to determine if the molecular etiologies could be successfully identified. Project members completing this analysis were blinded to the known molecular diagnosis of the patients to avoid bias during the evaluation process. As this group of patients was relatively small (n = 13), additional variants within the targeted genes associated with SCA/D were selected and evaluated for depth of sequencing and coverage to determine how well a random number of potentially disease related mutations would be picked up on ES. One hundred variants reported in the Human Gene Mutation Database (HGMD) that were within the 103 genes associated with SCA/D were randomly selected for analysis to ensure a varied distribution. Statistics regarding the capture and depth of sequencing on those specific variants were generated using GATK Depth of Coverage tool version 2.2.
Beyond the 72 individuals used in the primary coverage analysis, the additional 30 patients with a history of SCA/D but no known molecular causes were analyzed using the same variant analysis pipeline to determine a diagnostic yield. Results were deemed “positive” if there were variants in genes related to SCA/D categorized as likely pathogenic or pathogenic, “uncertain” if there were only variants in genes related to SCA/D categorized as variants of uncertain significance (VUS), and “negative” if there were no VUS, suspected pathogenic, or pathogenic variants identified in genes related to SCA/D.
We were also interested in the proportion of unique genomic positions falling within exons that were captured but had poor depth of sequencing and thus potentially poor coverage. We considered an exon to have inadequate coverage (“not covered” or “no coverage”) when less than 40 % of the bases within the exon met criteria for having sequence depth of at least 20x. Within the 72 samples, the number of unique genomic positions falling within captured exons that fell in this category ranged from 44 to 587, with a median of 374. Averaged across all 72 samples, there were approximately 344/8538 (4 %) unique genomic positions falling within captured exons that had less than 40 % of bases sequenced at 20x depth (“no coverage”).
Randomly selected variants potentially associated with disease within targeted SCA/D genes with less than 20x sequencing depth
Ehlers-Danlos syndrome IV
c.3230G > T
c.274G > A
Arrhythmogenic right ventricular dysplasia
c.1237C > T
Complex II deficiency & Dilated cardiomyopathy, 1GG
c.1664G > A
c.2068C > T
Pathogenic or likely pathogenic variants identified on ES in samples without prior known molecular diagnosis
Phenotype of gene
BrSa, DCMb, Familial atrial fib, Long QT
c.4867C > T
c.226G > A
Long QT, Short QT
c.1750G > A
Familial atrial fib, Long QT, Short QT
c.513C > A
Our results revealed a number of findings demonstrating the strengths and limitations of using ES as a diagnostic tool. First, there was a fair percentage (16.6 %) of variants within the targeted genes reported in HGMD (large deletions or insertions without genomic positions) that would not be expected to be seen with ES due to the limitations of current technology. Though the ability to identify such changes will likely improve with better technology, clinicians should be aware of the types of platforms being used to capture exonic sequence as well as the limitations of sequence and variant calling technologies to successfully sequence and identify certain types of mutations. In our analysis, amongst the variants reported that have a genomic position, the majority (89.8 %) were unique genomic positions. We were primarily interested in the number of unique genomic positions since presumably a position that has good depth of sequencing would be covered regardless of the base pair change at that location.
Second, the results demonstrate that the majority of exons (97.6 %) within the targeted SCA/SCD genes were captured with our specific capture kit. The remaining small portion of exons (2.4 %) were not captured primarily because our capture kit did not target these exons, so these areas would not be expected to be picked up even on subsequent sample runs. There are regions of DNA that can be difficult to capture due to the inherent sequence/structure (e.g., repetitive and GC rich regions) resulting in a technical inability to target and capture every exon in the human genome. Thus, the proportion of unique genomic positions reported in HGMD falling within these non-captured regions would be potentially missed. However, within the non-captured exons, there were only 40 genomic positions (0.5 % of 8578 total genomic positions) with published HGMD variants that would not be expected to be picked up (Fig. 2b). This reflects the process was able to capture the majority of exons with minimal reported HGMD positions missed due to capture issues alone. It is important to be mindful that the goal of individual capture kits is to obtain consistent coverage on the desired targets, but capture and coverage of the non-targeted regions will vary depending on the run due to the limitations of technology.
Pathogenic variants identified on ES in samples with known molecular diagnosis
Phenotype of gene
Long QT, Short QT
c.1882G > A
Familial atrial fib, Long QT, Short QT
c.1552 C > T
Long QT, Short QT
c.1838C > T
BrSa, DCMb, Familial atrial fib, Long QT
c.4978A > G
Familial atrial fib, Long QT, Short QT
c.704 T > A
c.173G > A
c.226G > A
Familial atrial fib, Long QT, Short QT
c.1140G > T
DCM, HCM, LVNCd
c.2572C > T
c.173G > A
c.59216 T > A; c.94578delT
c.2347 A > C
DCM, HCM, LVNC; DCM, HCM
ACTC: c.806 T > C; TTN: c.11323 G > A
We took an additional step to evaluate 100 random variants that have been potentially associated with disease in our targeted SCA/D genes across all individual samples (Additional file 2). The capture of these variants was ideal with all 100 variants successfully captured by the capture kit (Fig. 5). In terms of coverage, the majority (95/100) of the variants were sequenced adequately at 20x and thus would have been picked up on ES. Of the remaining variants with sequencing depths less than 20x, only 2 had sequencing depths less than 5x and would be likely to be missed on ES (Fig. 5; Table 2). Although 20x is often the ideal standard for sequencing depth, many analysis pipelines include variants with lower cutoffs; thus, depending on what analysis protocols are used, up to 5 % of these particular variants may not have been picked up on ES due to sequencing depth. In sum, although at this time ES does not fully cover every base pair at 20x within our targeted genes, the likelihood of a missed variant due to coverage issues remains small.
There are a number of limitations to this study that should be considered. First, coverage data from this analysis was limited to the focused gene list that was curated by our team. It is recognized that all genes associated with SDA/D are not included in this list and that new genes are frequently discovered as updated information becomes available. Since the completion of this study, we have added additional genes associated with SCA/D to our list. A pathogenic variant may exist in a patient, but it will not be picked up on ES if the gene has not yet been associated with that particular phenotype and/or human disease. Genes are uniquely different in terms of genomic location, size, number of exons, repeat regions and GC rich regions, and other characteristics that can potentially affect the ability to capture exons and have appropriate coverage. As new genes related to SCA/SCD are discovered, it is important to consider how these might be best captured and what is necessary to improve coverage. Sims et al. reviewed in depth issues of sequencing coverage . Definitions for what constitutes specific coverage “levels” also vary among institutions and should be taken into account when performing ES analysis. Additionally, available exome capture kits differ between vendors, though functionality has been found to be generally equal , and reproducibility varies with each use, even when using the same kits. Factors such as level of enrichment, genomic library detail, and consistency of captured targets play a role .
Finally, as the goal of ES is to identify those variants that may be potentially pathogenic and disease causing, it is equally important to have efficient strategies and appropriate variant analysis pipelines. Without a solid analysis pipeline, consistent capture and high coverage alone is not adequate to detect meaningful variants. A pathogenic variant may be present, but if it is not targeted in analysis it may not be found depending on the filtering parameters. This consideration will continue to be of importance even as the use of genome level sequencing potentially increases beyond ES in the future. Genome sequencing would allow changes beyond the coding regions to be identified, and it would not face the technical challenges seen in ES such as exon capture and coverage. However, the number of potential variants to analyze would increase tremendously and would be require more sophisticated analysis pipelines to filter and identify disease-causing changes.
Given the high genetic heterogeneity of conditions leading to SCA/SCD, genomic sequencing has the potential to provide invaluable clinical information to high-risk families and clinicians and to help identify at-risk individuals in whom management can help to prevent future SCA/SCD. Our results revealed both the abilities and limitations of using ES as a tool to evaluate genes related to conditions associated with SCA/SCD. Although ES is not fully comprehensive for our targeted genes at this time compared to traditional single or multi-gene panels, the majority of exons were still captured with commercially available kits and were also fully covered on average at 20x sequencing depth. Also, the proportion of HGMD unique genomic positions reported within poorly covered exons remained small. Exonic regions with less coverage reflect the need to enrich these regions to improve coverage. Despite limitations in coverage, our results show ES has a strong potential to pick up molecular changes as we were able to identify 100% of cases with known molecular etiologies for SCA/D in our small cohort. Additionally, in a cohort of 30 patients without a known molecular etiology for their SCA/D we were able to identify a likely etiology in 17 %. We were also able to demonstrate at least 95 % of a number of randomly selected HGMD reported variants would have been picked up on ES as well based on coverage analysis. Overall, ES is a helpful genetic diagnostic tool for SCA/SCD in the clinical setting given its potential to successfully reveal a molecular diagnosis, but clinicians should be aware of limitations of currently available platforms from both a technical and diagnostic perspective.
We would like to acknowledge the patients and their families who participated in this study.
Funding for this work was supported by the NIH/NHGRI UO1-HG006546 as part of the Clinical Exploratory Sequencing Consortium (CSER) (IDK, NBS, VLV, JP, PSW) and National Institute of General Medical Sciences (NIGMS) 5T32GM008638 (MHL). The funding body did not participate in the study design, collection, analysis and interpretation of data, in the writing of the manuscript, or in the decision to submit the manuscript for publication.
- Jamal SM, Yu J, Chong JX, Dent KM, Conta JH, Tabor HK, Bamshad MJ. Practices and Policies of Clinical Exome Sequencing Providers: Analysis and Implications. Am J Med Genet A. 2013;161A:n/a-n/a.Google Scholar
- Interpreting Secondary Cardiac Disease Variants in an Exome Cohort. Circ Cardiovasc Genet. 2013;6:337–46.Google Scholar
- Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011;12:745–55.View ArticlePubMedGoogle Scholar
- Atwal PS, Brennan M, Cox R, Niaki M, Platt J, Homeyer M, et al. Clinical whole-exome sequencing: are we there yet? Genet Med. 2014;16:717–9.View ArticlePubMedGoogle Scholar
- Clinical application of exome sequencing in undiagnosed genetic conditions. J Med Genet. 2012;49:353–61.Google Scholar
- Wang Z, Liu X, Yang B, Gelernter J. The Role and Challenges of Exome Sequencing in Studies of Human Diseases. Front Genet. 2013;4:160.PubMedPubMed CentralGoogle Scholar
- Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, Ward PA, et al. Clinical Whole-Exome Sequencing for the Diagnosis of Mendelian Disorders. N Engl J Med. 2013;369:1502–11.View ArticlePubMedPubMed CentralGoogle Scholar
- Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Arndt A, MacRae CA. Genetic testing in cardiovascular diseases. Curr Opin Cardiol. 2014;29:235–40.View ArticlePubMedPubMed CentralGoogle Scholar
- Pediatric Sudden Cardiac Arrest. Pediatrics. 2012;129:e1094-e1102.Google Scholar
- Wilde AAM, Behr ER. Genetic testing for inherited cardiac disease. Nat Rev Cardiol. 2013;10:571–83.View ArticlePubMedGoogle Scholar
- Contribution Of Inherited Heart Disease To Sudden Cardiac Death In Childhood. Pediatrics. 2007;120:E967-E973.Google Scholar
- Yield of Molecular and Clinical Testing for Arrhythmia Syndromes: Report of 15 Years' Experience. Circulation. 2013;128:1513–21.Google Scholar
- Kauferstein S, Kiehne N, Jenewein T, Biel S, Kopp M, König R, et al. Genetic analysis of sudden unexplained death: A multidisciplinary approach. Forensic Sci Int. 2013;229:122–7.View ArticlePubMedGoogle Scholar
- Kumar S, Peters S, Thompson T, Morgan N, Maccicoca I, Trainer A, et al. Familial cardiological and targeted genetic evaluation: Low yield in sudden unexplained death and high yield in unexplained cardiac arrest syndromes. Heart Rhythm. 2013;10:1653–60.View ArticlePubMedGoogle Scholar
- Tester DJ, Ackerman MJ. Postmortem Long QT Syndrome Genetic Testing for Sudden Unexplained Death in the Young. J Am Coll Cardiol. 2007;49:240–6.View ArticlePubMedGoogle Scholar
- OMIM. Online Mendelian Inheritance in Man, OMIM®. Johns Hopkins University, Baltimore, MD. World Wide Web URL: http://www.omim.org.
- HRS/EHRA Expert Consensus Statement on the State of Genetic Testing for the Channelopathies and Cardiomyopathies: This document was developed as a partnership between the Heart Rhythm Society (HRS) and the European Heart Rhythm Association (EHRA). Europace. 2011;13:1077–109.Google Scholar
- Krawitz P, Mundlos S. Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Clin Genet. 2011;80:127–32.View ArticlePubMedGoogle Scholar
- Raymond C, Raymond C, Aravind L. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921.View ArticlePubMedGoogle Scholar
- Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014;15:121–32.View ArticlePubMedGoogle Scholar
- Li H. Aligning Sequence Reads, Clone Sequences And Assembly Contigs With BWA-MEM. 03 2013. 1303.Google Scholar
- Stenson PD, Mort M, Ball EV, Shaw K, Phillips AD, Cooper DN. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet. 2013;133:1–9.View ArticlePubMed CentralGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.