Skip to main content

Copy number alterations detected by whole-exome and whole-genome sequencing of esophageal adenocarcinoma



Esophageal adenocarcinoma (EA) is among the leading causes of cancer mortality, especially in developed countries. A high level of somatic copy number alterations (CNAs) accumulates over the decades in the progression from Barrett’s esophagus, the precursor lesion, to EA. Accurate identification of somatic CNAs is essential to understand cancer development. Many studies have been conducted for the detection of CNA in EA using microarrays. Next-generation sequencing (NGS) technologies are believed to have advantages in sensitivity and accuracy to detect CNA, yet no NGS-based CNA detection in EA has been reported.


In this study, we analyzed whole-exome (WES) and whole-genome sequencing (WGS) data for detecting CNA from a published large-scale genomic study of EA. Two specific comparisons were conducted. First, the recurrent CNAs based on WGS and WES data from 145 EA samples were compared to those found in five previous microarray-based studies. We found that the majority of the previously identified regions were also detected in this study. Interestingly, some novel amplifications and deletions were discovered using the NGS data. In particular, SKI and PRKCZ detected in a deletion region are involved in transforming growth factor-β pathway, suggesting the potential utility of novel biomarkers for EA. Second, we compared CNAs detected in WGS and WES data from the same 15 EA samples. No large-scale CNA was identified statistically more frequently by WES or WGS, while more focal-scale CNAs were detected by WGS than by WES.


Our results suggest that NGS can replace microarrays to detect CNA in EA. WGS is superior to WES in that it can offer finer resolution for the detection, though if the interest is on recurrent CNAs, WES can be preferable to WGS for its cost-effectiveness.


Cancer arises from gradual accumulation of somatic genomic instability and alterations, which eventually lead to carcinogenesis and cancer progression [1, 2]. Copy number alterations (CNAs), one form of somatic genome alterations, refer to somatic changes in chromosome structure that result in gains or losses of copies of DNA segments. Detection of CNA is important to understand cancer development and identify key driver events [3, 4]. Microarray technologies have been widely used in CNA detection [57], including array comparative genomic hybridization (array CGH) and single nucleotide polymorphisms (SNP) microarrays. In array CGH, reference and test DNAs are fluorescence-labeled and hybridized to arrays, which are composed of bacterial artificial chromosome (BAC) clones, cDNA clones, or oligonucleotides. The signal ratio is used as an estimate of the copy number ratio. SNP microarrays are also based on hybridization, but a single sample is processed on each microarray and intensity ratios are formed by comparing the intensity of the sample under investigation to a collection of reference samples, or all other samples that are studied. Compared to array CGH, SNP arrays can have better resolution and produce B allele frequency so that loss of heterozygosity (LOH) can be detected [7]. Resolution of these arrays is typically greater than 1 kb, depending on the density, distribution, and response characteristics of their probes. More recently, next-generation sequencing (NGS) technologies offer single-nucleotide resolution and absolute counts of read numbers and therefore can provide more sensitive and accurate CNA results. Moreover, direct sequencing enables substantial increases in discoveries of smaller structural variation events [8, 9]. It is believed that, with its ever-decreasing cost, NGS will ultimately replace microarrays in copy number analyses [10].

In this paper, we conduct CNA analyses using published NGS data from [11], which contains 145 esophageal adenocarcinoma (EA) samples, as no CNA analyses were reported in the paper. The incidence of EA has strikingly increased over the past 30–40 years, and it is the seventh leading cause of cancer death among men in the USA [12]. Many studies of CNA detection in EA have been carried out using microarrays. Paulson et al. detected 19 most frequent CNAs in 15 EA patients using BAC array data [13]. Beroukhim et al. created the Tumorscape Copy Number Portal, where they collected more than 3000 copy number profiles from 26 cancer types using Affymetrix 250K StyI (Affymetrix, Santa Clara, CA) arrays [3]. They identified 33 recurrent CNAs (RCNAs), which appear in 44 EA samples more frequently than expected by chance. Dulak et al. detected 46 regions of significant recurrent events of gain and loss in 186 EA samples using 250K StyI arrays and SNP Array 6.0 arrays (Affymetrix) [14]. Zack et al. created the TCGA Copy Number Portal and identified RCNAs across multiple cancer types; they detected 88 RCNAS across 184 EA samples using Affymetrix SNP6 arrays [4, 15]. Frankel et al. detected 52 RCNAs in 54 EA samples using Illumina CytoSNP-12 arrays [16]. However, there has not been any published CNA detection study using NGS technologies. In this study, we plan to fill the gap by analyzing the NGS data from [11] and compare the result to the findings of the aforementioned papers.

Indeed, microarray-based CNA analyses are still a common approach to detect CNAs, possibly due to the following reasons: microarray technologies have been developed for a longer time and corresponding CNA detection methods were well established and accurate detection of CNA in NGS can be a challenging task due to the complexities of sequencing data processing [17]. To the best of our knowledge, only a few CNA studies have been conducted to compare the performance of microarrays and NGS side-by-side. Koboldt et al. detected CNAs on coding regions of five ovarian tumors using both a SNP array and two NGS platforms—whole-genome (WGS) and whole-exome sequencing (WES) [18]. They found the majority of CNA events were consistently detected by the three platforms. More CNAs were detected by the WGS platform than those by the array. In another study, the authors detected germline copy number variations (CNVs) in 16 breast cancer cell lines using both array CGH and WES [19]. Four WES-based CNV detection methods were compared, and the regions detected by the array were used to form the gold standard. They detected a greater number of focal-scale CNVs using the array. These studies were conducted on the individual sample level. In this study, we are interested to detect and compare regions frequently appearing among multiple samples between NGS data and previous findings derived from microarrays-based studies. The detected recurrent regions may contain real driver events that contribute to the cancer development.

Furthermore, there were 15 samples (patients) subjected to both WGS and WES in [11], providing a great opportunity to compare CNA detection by WES and WGS. Not much work has been conducted to address this question. Koboldt et al. found that a significant portion (79.53 %) of focal-scale CNAs detected by WES were also supported by WGS, and they recommended the use of WES-based approach, by which it is likely to detect more platform-specific focal copy number changes missed by WGS and microarray [18]. WES is an increasingly popular platform for studying tumor genomics because of itscost-effectiveness and the immediate interpretation of mutations in coding regions. It has been shown that WES data can be used to study CNA [19]. However, the uniformity of WES coverage is worse than that of WGS mostly due to exome capturing, and exons are not evenly placed within the genome so that it is difficult to detect CNAs over a long intergenic region using WES. On the other hand, if the interest is long CNA segments spanning over genes, it is not clear whether CNAs inferred by WES will lose a substantial amount of information when compared to WGS. It is quite possible that this comparison may depend on cancer site and the length of CNAs, since longer segment should be reliably detected by exome sequencing.

A number of bioinformatics and statistical methods have been developed for CNA detection using NGS data [17, 2022]. These methods can be classified in several ways. Most methods were developed to detect CNAs on the individual sample level, and they usually detect CNAs based on read count ratios between a tumor sample and its matched normal sample. These methods can be further categorized according to the study design. Some commonly used ones are as follows. (a) CNVnator [23], RDXplorer [24], and ReadDepth [25] detect CNAs on a single tumor sample. (b) CNAseg [26], Segseq [27], ExomeCNV [28], HMMcopy [29], and VarScan2 [18] identify CNAs on matched tumor-normal samples. Control-FREEC [30, 31] can be categorized both into classes (a) and (b), as it can either work with tumor-normal pairs or with tumor-only samples. Depending on the NGS platforms, CNVnator, Segseq, RDXplorer, ReadDepth, and HMMcopy work for WGS data; ExomeCNV and VarScan work for WES data; and Control-FREEC can work for both types of the sequencing data. In addition to the above methods detecting CNA in individual samples, other methods have been developed to detect RCNAs from multiple samples. These methods take segments from all the individual samples as input and identify the (merged) segments which appear more frequently across the population than expected by chance. Only a few RCNA methods have been developed for NGS data, including JointSLM [32] and cn.MOPS [33]. They conduct copy number analyses based on read counts of segments of multiple tumor samples and usually are applied for CNV detection. On the other hand, many RCNA detection methods that were originally developed for microarray platforms [34] can also be adapted to work on NGS data. These methods include STAC [35], CMDS [36], and GISTIC2.0 [37].

In this study, Control-FREEC is selected to detect CNAs on the individual sample level using WGS and WES data from [11], and the results are compared between the two sequencing platforms. Control-FREEC is a flexible and powerful tool in that it performs multiple types of bias corrections considering GC-content, mappability, and matched normal sample, and it is among the most sensitive tools on both WGS and WES platforms [22]. GISTIC2.0, likely the most popular RCNA detection method, is chosen to detect RCNAs using both WGS and WES data. The identified RCNAs are then compared with those reported previously using microarrays. We compare our results with those from five previous studies, and four of which (all except [13]) used GISTIC2.0. By choosing GISTIC2.0, we hope to alleviate the concern that potential differences generated in the NGS data are due to different software and analytical methods being applied.


RCNA analysis

The estimated copy ratios of segments among 145 WES and 15 WGS data are shown in Fig. 1. We used GISTIC2.0 on the copy ratio profiles to perform a permutation-based significance analysis and identify significantly amplified/deleted regions. The recurrent amplification/deletion regions for WES data are shown in Fig. 2. The results of WGS data are shown in Fig. 3 accordingly. The threshold for the residual q value was set as 0.1, resulting in 41/16 amplifications and 67/19 deletions in WES/WGS data, respectively. We further combined the results from WES and WGS, and resulted in 47 amplification and 74 deletion events.

Fig. 1
figure 1

Segmented copy number ratio profiles in WES and WGS. The x-axis represents the samples. The y-axis represents the chromosomes. a WES data. b WGS data

Fig. 2
figure 2

Genomic positions of RCNAs detected in 145 WES data. The x-axis represents the normalized amplification signals (top) and significance by q value (bottom). The green line indicates the significance cutoff at q = 0.25. a Amplification regions. b Deletion regions

Fig. 3
figure 3

Genomic positions of RCNAs detected in 15 WGS data. The x-axis represents the normalized amplification signals (top) and significance by q value (bottom). The green line indicates the significance cutoff at q = 0.25. a Amplification regions. b Deletion regions

These newly identified genomic regions were verified with all the five microarray-based studies (Tables 1 and 2). It was found that the majority of the regions (68 % of deletions and 74 % of amplifications) detected in our study were also identified in those previous studies. Known cancer genes within these regions were identified according to the Cancer Gene Census [38], and the results are shown in the supplementary document (Additional file 1: Tables S1 and S2). Among all these detected regions, 13 amplification events were not reported in any of the previous studies; four of them (1p36.33, 12p13.31, 18p11.21, 8q24.3) had a residual q value less than 0.01. Twenty-nine deletion events were not identified previously, and ten of them (Xp22.33, 3p26.3, 6q22.31, 14q32.2, 1p21.1, 3p12.3, 6q12, Yq12, 6p12.3, 14p11.2) had a residual q value less than 0.01. We also examined the regions identified from the five previous studies to see whether they were also identified using the NGS data. We extracted the amplification regions (from Additional file 1: Table S2-C) and deletion regions (from Additional file 1: Table S4-B) in [14], for example. We checked if these regions were detected using the sequencing data and listed the q value for each region in Table 3. The genomic location for each region was converted from hg18 to hg19 using the University of California, Santa Cruz (UCSC) liftOver tool. The majority of those regions overlapped with our results, except for four amplifications and four deletions. The comparisons with other four studies are listed in the supplementary document (Additional file 1: Tables S3–S6), from which it can be seen that 58 % of regions in [16], 95 % of regions in [13], 64 % of regions in [3], and 57 % of regions in [4] were detected in our study. From these comparisons, we observed that the majority of regions in previous microarray studies were detected using NGS data.

Table 1 Amplification RCNAs detected by 145 WES data and 15 WGS data
Table 2 Deletion RCNAs detected by 145 WES data and 15 WGS data
Table 3 Comparison of results of Dulak et al. [14] to our results

To generate a consensus list of regions, we investigated all the genomic regions in terms of cytobands across all the results from the six studies including ours and listed the regions appearing in at least three of them. The results are shown in Tables 4 and 5. Only two amplifications and six deletions were not found in our study, and our result is the one that is most consistent with the consensus regions, which suggests that NGS may be a more powerful approach for detecting RCNAs.

Table 4 Consensus amplification RCNAs in 6 studies
Table 5 Consensus deletion RCNAs in six studies

Comparison of CNAs on WGS and WES

We detected CNAs in 15 normal-tumor sample pairs based on both WGS data and WES data using Control-FREEC and compared the results from the two platforms. The comparisons were made on different lengths of segments, including large-scale and focal-scale, where large-scale CNAs refer to those spanning more than 25 % of a chromosome arm and focal-scale CNAs refer to those shorter than 25 % of a chromosome arm. The size span of large-scale CNAs is [18.32 161.22] Mb, with a standard deviation of 37.39 Mb. The size span of small-scale CNAs is [0.001 50.65] Mb, with a standard deviation of 2.50 Mb. More than 83 % of focal-scale CNAs are shorter than 1 Mb. For each detected CNA, we used Kolmogorov-Smirnov (KS) test to assess the possibility that it was generated just by chance; furthermore, we searched the WGS and WES data of each sample to see if it contained an event that overlapped the detected CNA with at least 10 % of bases, i.e., we counted how many times it appeared in WGS data and WES data. We then applied Fisher’s exact test to compare the detection frequency of each CNA by the two platforms.

The results of large-scale CNAs are shown in Table 6. Totally, 19 regions were detected from the 15 EA samples. We then counted how many times these CNAs were detected by WGS and WES and found none of them was more frequently detected by one platform than the other. In addition, we used KS test and found the false-positive detection rate of each identified CNA was 0.

Table 6 Large-scale CNAs detected in WGS and WES

The results of focal-scale CNAs are shown in Table 7. WGS identified 21,197 focal-scale CNAs from the 15 samples; among them, 3675 were statistically more frequently detected by WGS than by WES. WES identified 4371 focal-scale CNAs, and 144 of them were identified more frequently by the platform. We checked the false-positive detection rates of the detected CNAs using the KS test and found 19,694/3655 CNAs on WGS/WES with p values < 0.05; these CNAs are less likely to be spurious discoveries, and we only worked on these CNAs afterwards. Among them, about 18 % of CNAs detected by WGS were statistically more frequently identified by WGS than by WES, while only about 3 % of CNAs detected by WES were more frequently identified on the platform. We further investigated if the false-positive detection rates of small CNAs (<200 k) detected on the two platforms were different using one-tailed t test, which resulted in a p value of 2.2E−16 (with means 0.004 vs. 0.009), and it indicates that the false-positive detection rate of those small CNAs is significantly smaller using WGS. One possible explanation is that WGS does not contain the exome-capturing process as in WES, and the local variation/bias of sequence read coverage is smaller [39]. Compared to WGS, WES does not cover intron regions, and it only covers 2.76 % of the whole genome. So finally, we investigated the effect of non-coverage to CNA detection and dealt with small CNAs that only reside in intron regions. As the result, no CNAs detected by WES spanned only on introns, and more than 7000 of such CNAs were identified by WGS, but only 22 % of these intron CNAs were statistically more frequently detected by WGS.

Table 7 Focal-scale CNAs detected in WGS and WES


In this study, we detected RCNAs using NGS data from 145 EA samples and compared them with those from the five microarray studies. We found that the majority of the regions detected by microarrays overlapped the regions identified by NGS and vise versa. Furthermore, based on all these six studies, we identified 22/51 consensus amplification/deletion regions, and our result was found to be the one that is most concordant with the consensus events. From the above observations, we suggest that NGS can replace microarrays to detect RCNAs in EA.

However, discrepancy generally exists when comparing each specific region from all the studies. Even for the largest detected events, they are not consistent across the platforms and across the different microarray studies. The largest recurrent deletions detected by microarrays are not consistent. Two of them [3, 14] identified the largest recurrent deletions on chr7:123.66-142.52 (Mb), which corresponds to chr7:105.14-128.47 (Mb) detected both by WGS and WES in our study. The largest deletion detected by WGS and WES is on Chr16:29.48-90.35 (Mb), while only part of the region—chr16:78.13–79.65 (Mb) (in [4, 14, 16]) and chr16:31.93–33.39 (Mb) (in [16])—were detected in the microarray studies. Part of these discrepancies may just be caused by different technologies used in these platforms, such as different hybridization and scanning methods applied in these microarray studies, target-enrichment strategies applied in WES, and bias due to the effect of GC-content and uneven mappability across genome in NGS. Although our study indicates a significant overlap between RCNAs detected using microarray data and NGS data, it is still a challenge to rigorously compare these RCNA calling methods. To further compare these approaches, a well-controlled study design such as a spike-in experiment should be applied in the future.

GISTIC analysis is often used to identify driver genes that contribute to cancer development. In this study, we found several potential driver genes in the detected regions that were reported in previous studies, and the results are listed in Table 8. We detected oncogenes such as EGFR, ERBB2, GATA6, KRAS, MYC, and tumor suppressor genes such as APC, ARID1A, ATM, CDKN2A, CDKN2B, CDK6, MCL1, MET, MYB, PDE4D, PRCKI, and PTPRD. Those were also identified in the various previous microarray studies. In another study [11], the authors identified 26 significantly mutated genes based on the 145 WES data used in our study. Among them, ten genes such as TP53, CDKN2A, EYS, ARID1A, TLR4, ARID2, SYNE1, C6orf118, ACTL7B, and SCN10A were also identified in our study, and three of the rest (SMAD4, TLL1, and SMARCA4) are located within 1 Mb of the detected regions of this study. It is worth to point out that some of the potential driver genes such as ERBB2 and TP53 were reported as implicated in the progression of esophageal Barrett to EA [13]. However, CNA regions are usually large and contain many genes. It is difficult to distinguish driver genes from passengers by just studying copy numbers [40]. Although more common driver genes were detected in this study than those found in [16], the discrepancy still implies the need of an integrated approach to identify driver genes of EA, which can consider CNA, mutation, gene expression, and methylation altogether.

Table 8 Potential driver genes reported in previous studies and corresponding RCNAs detected in this study

In addition to the common regions, we found some novel ones, including four amplification regions and ten deletion regions with statistically high frequency of appearance in the population. These regions may provide more clues to understand the cancer genomics of EA. In particular, SKI and PRKCZ in 1p36.33 have been reported to contribute to the loss function of TGFBR2 and SMAD4 in cancer [41]. TGFBR2 and SMAD4 are involved in the transforming growth factor (TGF)-β pathway and were identified as driver genes in gastric cancer [42] and colorectal cancer [43]. The novel deletion event identified on Yq12 in our study, along with previously found deletion events on X chromosome (e.g., Xp21.1 and Xp21.2) may help to understand the greater incidence of EA in males over the past three decades. For example, the DMD gene in Xp21.1 was identified as a driver gene in gastric cancer [42], and our result suggests that it may also contribute to EA development.

The recurrently detected regions are likely to harbor “common mutations” that are of great interest in cancer studies. However, each tumor sample can contain private driver mutations for that individual patient’s tumor. To verify it, we compared the CNAs detected at individual sample level (Tables 6 and 7) with the recurrent events (Tables 1 and 2). We found only about 25.2 % of individual deletions overlapped identified deletion RCNAs. More extremely, only 10.2 % of amplifications detected at individual sample level overlapped those amplification RCNAs. Even for large-scale events, we found 88.0 % of individual deletions overlapped the recurrent deletion events, and only 35 % of individual amplifications overlapped the recurrent amplification events. The above observation implies that a considerable amount of driver mutations in a specific tumor sample is not located in the recurrent regions and personalized studies are required to identify these rare events.

In our study, the medians of spans of recurrent amplification/deletion events are 1.0/6.6 Mb for WES (and possibly WGS) and 0.2/2.1 Mb for those identified only from WGS (Tables 1 and 2). Also, we detected more individual small CNAs by WGS (Table 7). Compared to WES, WGS appears more powerful to detect small events, especially for those that mostly reside in non-coding regions. The limitation of this comparison is that only 15 WGS/WES samples were available. For future studies, a larger sample size should provide more precision to calibrate the performance of WES relative to WGS.


In this study, we detected RCNAs in EA using the NGS data from [11] and compared the results with those from the previous microarray studies. The majority of the events detected in our study also were detected in those previous studies. Furthermore, novel regions and genes were found using NGS technologies. We also compared carefully WGS and WES in detecting CNA on an individual level. We found large-scale segments can be more consistently detected by both platforms, whereas WGS does detect more focal events. Importantly, the recurrent events on the population level appear to be successfully identified by WES. Given that the cost of WES is much less than that of WGS, and the mutations in WES is much more interpretable, our study suggests that WES may be the viable platform to detect recurrent copy number events in EA research.


Esophageal adenocarcinoma cancer data

The NGS data, including both WGS and WES data, were generated in [11] and stored in the database of Genotypes and Phenotypes (dbGaP) (study accession: phs000598.v1.p1). The dataset is comprised of 145 matched tumor-normal samples. Among them, 15 samples both have WGS and WES data, and the rest 130 samples have only WES data. The EA samples include those from the gastric-esophageal junction, not treated with chemotherapy or radiation before surgery. The tumor samples were examined by a board-certified pathologist and ensured that their carcinoma content >70 %. The samples were sequenced on multiple Illumina HiSeq flow cells to have the average target exome coverage of ~80× in WES data, and sequenced on the Illumina Genome Analyzer Iix or the Illumina HiSeq sequencer with an average of ~30× coverage depth in WGS data. The details of the sample collection, DNA extraction, and sequencing procedures can be found in [11].

The raw sequence data were extracted from dbGaP using the NCBI SRA Toolkit; the sequences were aligned to the NCBI build 37 (hg19) reference using BWA [44] and processed following GATK best practices. The base score re-calibrated bam files were used for CNA detection.

CNA detection methods

Control-FREEC was applied in this study on both WGS and WES data. It divided the genome into small contiguous regions using sliding windows. The read count profiles in each region for normal and tumor samples were computed and normalized accounting for GC-content and mappability. The read count ratios of tumors to matched normal samples were calculated and used as the proxy of the copy number ratios. A LASSO-based algorithm was used to segment the data. LASSO is a widely used generalized linear regression method that involves penalizing the absolute size of its regression coefficients [45]. Using LASSO, a piecewise constant smoothed step profile was used to model the copy number ratios, and the positions with nonzero coefficients were considered as change points. For WES data, the window size was set to 500, and the step size was set to 250, which were recommended by the authors. For WGS data, those parameters were set as 2000 and 1000, respectively. Control-FREEC estimates the normal cell contamination in tumor samples by comparing the observed and predicted copy numbers. It uses the Kolmogorov-Smirnov test to assess the false-positive rate of each detected CNA. Control-FREEC can predict absolute copy numbers if the ploidy information is provided. We used ABSOLUTE [46] to estimate the ploidy of the 15 EA samples using WES data, and the results are listed in the supplement. In this study we classified the identified CNAs based on their status (amplification or deletion) instead of their absolute copy numbers. Control-FREEC ignored genomic regions with mappability less than 0.85 by default, and hence, we did not consider the effect of unmappable regions in this study.

GISTIC2.0 was used to identify regions with a statistically high frequency of copy number aberrations over background aberrations. It evaluated both the frequency and the significance to identify regions of interest. The G score measured both the frequency of aberrations, and the magnitude of the copy number changes (log ratio intensity) in each sample. Each location was scored separately for gains and losses. Then locations in each sample were permuted to simulate random aberrations. This random distribution was compared to the observed statistic to identify scores that are statistically significant. A false discovery rate (FDR) multiple testing correction was applied to calculate a q-bound significance score. Within each statistically significant region, a peak region was identified so that the region with a maximal G score and a minimal q value is most likely to contain affected genes. In addition to the q value, it also computed the residual q value, which measured the q value of a peak region after removing events that overlap with other more significant peak regions in the same chromosome. The 145 WES data were segmented using circular binary segmentation (CBS) algorithm [47] and combined to form the segmentation file, while the 15 WGS data were segmented using Control-FREEC as described above. The parameter settings were as follows: amplification threshold = 0.1, deletion threshold = 0.1, broad length cutoff = 0.98, remove X-chromosome = 0, and confidence level = 0.95.

Whenever possible, default parameters and recommended settings were used in the implementation of these tools.


  1. Nowell P. The clonal evolution of tumor cell populations. Science. 1976;194(4260):23–8.

    Article  CAS  PubMed  Google Scholar 

  2. Greaves M, Maley CC. Clonal evolution in cancer. Nature. 2012;481(7381):306–13.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  3. Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Donovan J, et al. The landscape of somatic copy-number alteration across human cancers. Nature. 2010;463(7283):899–905.

  4. Zack TI, Schumacher SE, Carter SL, Cherniack AD, Saksena G, Tabak B, et al. Pan-cancer patterns of somatic copy number alteration. Nat Genet. 2013;45(10):1134–40.

  5. Greshock J, Feng B, Nogueira C, Ivanova E, Perna I, Nathanson K, et al. A comparison of DNA copy number profiling platforms. Cancer Res. 2007;67(21):10173–180.

  6. Carter NP. Methods and strategies for analyzing copy number variation using DNA microarrays. Nat Genet. 2007;39:S16–21.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011;12(5):363–76.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, et al. Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007;318(5849):420–26.

  9. Hurd PJ, Nelson CJ. Advantages of next-generation sequencing versus the microarray in epigenetic research. Brief Funct Genomic Proteomic. 2009;8(3):174–83.

    Article  CAS  PubMed  Google Scholar 

  10. Chin L, Hahn WC, Getz G, Meyerson M. Making sense of cancer genomic data. Genes Dev. 2011;25(6):534–55.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Dulak AM, Stojanov P, Peng S, Lawrence MS, Fox C, Stewart C, et al. Exome and whole genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nat Genet. 2013;45(5):478–86.

  12. Stewart BW, Wild CP. World cancer report 2014. Lyon, France: IARC Nonserial Publication; 2014.

  13. Paulson TG, Maley CC, Li X, Li H, Sanchez CA, Chao DL, et al. Chromosomal instability and copy number alterations in Barrett’s esophagus and esophageal adenocarcinoma. Clin Cancer Res. 2009;15(10):3305–14.

  14. Dulak AM, Schumacher S, van Lieshout J, Imamura Y, Fox C, Shim B, et al. Gastrointestinal adenocarcinomas of the esophagus, stomach and colon exhibit distinct patterns of genome instability and oncogenesis. Cancer Res. 2012;72(17):4383–93.

  15. TCGA Genome Data Analysis Center. SNP6 copy number analysis (GISTIC2). TCGA Genome Data Analysis Center. 2014. Accessed 15 Aug 2015.

  16. Frankel A, Armour N, Nancarrow D, Krause L, Hayward N, Lampe G, et al. Genome-wide analysis of esophageal adenocarcinoma yields specific copy number aberrations that correlate with prognosis. Genes Chromosom Cancer. 2014;53(4):324–38.

  17. Liu B, Morrison CD, Johnson CS, Trump DL, Qin M, Conroy JC, et al. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges. Oncotarget. 2013;4(11):1868–81.

  18. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22(3):568–76.

  19. Guo Y, Sheng Q, Samuels DC, Lehmann B, Bauer JA, Pietenpol J, et al. Comparative study of exome copy number variation estimation tools using array comparative genomic hybridization as control. Biomed Res Int. 2013;2013:915636.

  20. Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, et al. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014;15(2):256–78.

  21. Zhao M, Wang Q, Wang Q, Jia P, Zhao Z. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics. 2013;14(11):S1.

  22. Alkodsi A, Louhimo R, Hautaniemi S. Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data. Brief Bioinform. 2015;16(2):242–54.

    Article  PubMed  Google Scholar 

  23. Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21(6):974–84.

  24. Yoon S, Xuan Z, Makarov V, Ye K, Sebat J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009;19(9):1586–92.

  25. Miller CA, Hampton O, Coarfa C, Milosavljevic A. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS ONE. 2011;6(1):16327.

  26. Ivakhno S, Royce T, Cox AJ, Evers DJ, Cheetham RK, Tavaré S. CNAseg-a novel framework for identification of copy number changes in cancer from second-generation sequencing data. Bioinformatics. 2010;26(24):3051–58.

  27. Chiang DY, Getz G, Jaffe DB, O'Kelly MJT, Zhao X, Carter SL, et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Meth. 2009;6(1):99–103.

  28. Sathirapongsasuti JF, Lee H, Horst BAJ, Brunner G, Cochran AJ, Binder S, et al. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011;27(19):2648–54.

  29. Ha G, Roth A, Lai D, Bashashati A, Ding J, Goya R, et al. Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. Genome Research. 2012;22(10):1995-2007.

  30. Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, et al. Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics. 2011;27(2):268–69.

  31. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012;28(3):423–25.

  32. Magi A, Benelli M, Yoon S, Roviello F, Torricelli F. Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm. Nucleic Acids Res. 2011;39(10):65.

  33. Klambauer G, Schwarzbauer K, Mayr A, Clevert DA, Mitterecker A, Bodenhofer U, et al. cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res. 2012;40(9):69.

  34. Yuan X, Zhang J, Zhang S, Yu G, Wang Y. Comparative analysis of methods for identifying recurrent copy number alterations in cancer. PLoS ONE. 2012;7(12):52516.

  35. Diskin SJ, Eck T, Greshock J, Mosse YP, Naylor T, Stoeckert CJ, et al. STAC: a method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. Genome Res. 2006;16(9):1149–58.

  36. Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, et al. CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics. 2010;26(4):464–69.

  37. Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. Gistic2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12(4):41.

  38. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A census of human cancer genes. Nat Rev Cancer. 2004;4(3):177–83.

  39. Meynert AM, Ansari M, FitzPatrick DR, Taylor MS. Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics. 2014;15(1):247.

  40. Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC, et al. An integrated approach to uncover drivers of cancer. Cell. 2010;143(6):1005–17.

  41. Milacic M, Haw R, Rothfels K, Wu G, Croft D, Hermjakob H, et al. Annotating cancer variants and anti-cancer therapeutics in reactome. Cancers. 2012;4(4):1180–211.

  42. Wang K, Yuen ST, Xu J, Lee SP, Yan HH, Shi ST, et al. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat Genet. 2014;46(6):573–82.

  43. Fleming NI, Jorissen RN, Mouradov D, Christie M, Sakthianandeswaren A, Palmieri M, et al. SMAD2, SMAD3 and SMAD4 mutations in colorectal cancer. Cancer Res. 2012;73(2):725–35.

  44. Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25(14):1754–60.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  45. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1994;58:267–88.

    Google Scholar 

  46. Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotech. 2012;30(5):413–21.

  47. Seshan VE, Olshen A. DNAcopy: a package for analyzing DNA copy data. R package version 1.40.0. 2010. Accessed 15 Aug 2015.

Download references


This work was directly supported by the National Institutes of Health (R01HL114901 to J.Y.D. and X.W., P01CA53996 to J.Y.D.)

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xiaoyu Wang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JD, XL, XW, and YC conceived of the ideas in this manuscript. XW wrote the code and ran the analysis. CK, JD, SS, XBS, XL, XS, XW, and YC wrote the manuscript. All authors read and approved the final manuscript.

Additional file

Additional file 1:

Supplementary tables. Table S1. Cancer genes in detected amplification regions. Table S2. Cancer genes in detected deletion regions. Table S3. Comparison of results of Frankel to our results. Table S4. Comparison of results of Paulson to our results. Table S5. Comparison of results of Beroukhim to our results. Table S6. Comparison of results of Zack to our results. Table S7. Ploidy information of the 15 EA samples. (XLSX 57.4 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Li, X., Cheng, Y. et al. Copy number alterations detected by whole-exome and whole-genome sequencing of esophageal adenocarcinoma. Hum Genomics 9, 22 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Bacterial Artificial Chromosome
  • Single Nucleotide Polymorphism
  • Esophageal Adenocarcinoma
  • Driver Gene
  • Copy Number Ratio