Viral expression associated with gastrointestinal adenocarcinomas in TCGA high-throughput sequencing data
© Salyakina and Tsinoremas; licensee BioMed Central Ltd. 2013
Received: 21 August 2013
Accepted: 7 November 2013
Published: 27 November 2013
Up to 20% of cancers worldwide are thought to be associated with microbial pathogens, including bacteria and viruses. The widely used methods of viral infection detection are usually limited to a few a priori suspected viruses in one cancer type. To our knowledge, there have not been many broad screening approaches to address this problem more comprehensively.
In this study, we performed a comprehensive screening for viruses in nine common cancers using a multistep computational approach. Tumor transcriptome and genome sequencing data were available from The Cancer Genome Atlas (TCGA). Nine hundred fifty eight primary tumors in nine common cancers with poor prognosis were screened against a non-redundant database of virus sequences. DNA sequences from normal matched tissue specimens were used as controls to test whether each virus is associated with tumors.
We identified human papilloma virus type 18 (HPV-18) and four human herpes viruses (HHV) types 4, 5, 6B, and 8, also known as EBV, CMV, roseola virus, and KSHV, in colon, rectal, and stomach adenocarcinomas. In total, 59% of screened gastrointestinal adenocarcinomas (GIA) were positive for at least one virus: 26% for EBV, 21% for CMV, 7% for HHV-6B, and 20% for HPV-18. Over 20% of tumors were co-infected with multiple viruses. Two viruses (EBV and CMV) were statistically significantly associated with colorectal cancers when compared to the matched healthy tissues from the same individuals (p = 0.02 and 0.03, respectively). HPV-18 was not detected in DNA, and thus, no association testing was possible. Nevertheless, HPV-18 expression patterns suggest viral integration in the host genome, consistent with the potentially oncogenic nature of HPV-18 in colorectal adenocarcinomas. The estimated counts of viral copies were below one per cell for all identified viruses and approached the detection limit.
Our comprehensive screening for viruses in multiple cancer types using next-generation sequencing data clearly demonstrates the presence of viral sequences in GIA. EBV, CMV, and HPV-18 are potentially causal for GIA, although their oncogenic role is yet to be established.
KeywordsCancer Papilloma virus Herpes virus
Viruses may be more commonly associated with malignant diseases than previously considered . Reported associations do not always mean that a virus is a direct cause of the cancer; they can be the result of contamination, viral infection without causal involvement (‘passenger’), and an indirect or direct causal relationship. Regardless of the causal relationship, viruses may have significant clinical implications in human cancers through contribution to dramatic changes in the microenvironment and immunosurveillance.
The main strategies to detect and type various viruses in cancers usually address individual protein biomarkers, serological tests, or DNA/RNA detection of one or a few viruses at a time. The major disadvantage of these strategies is failure to detect viruses not previously known to be associated with a particular cancer type. In this report, we introduce a new and substantially different way of addressing this problem by utilizing next-generation sequencing (NGS) data to detect both human and non-human nucleic acids in tumor specimens. This approach does not require any prior knowledge of viruses involved and can identify all known viral genomes. NGS provides the opportunity to detect viral transcripts with high sensitivity in the host tissue at frequencies less than 1 RNA molecule in 1 million . Whole genome or transcriptome tumor sequencing data provides a unique resource for the development of new and powerful methodologies to detect and characterize viruses in cancers.
‘Computational subtraction’ is the general concept for detecting infectious agents in the host NGS material . During this procedure, human and artifact sequences are removed from the NGS data and the remaining sequences are aligned to bacterial or viral references from existing databases. A few groups have implemented computational subtraction procedures for this purpose [4, 5]. To date, NGS data was successfully used for virus identification in human papilloma virus (HPV)-associated squamous cell carcinomas [6–9] and hepatitis B virus (HBV)-mediated hepatocellular carcinomas , while other cancer types and viruses largely remain out of the picture.
In this study, we present a comprehensive screening for viruses in NGS data of nine common cancers in 1,007 patients, using The Cancer Genome Atlas (TCGA) data. TCGA is a joint project of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). The goal of the TCGA project is to collect and systematically explore the entire spectrum of genomic changes involved in more than 20 types of human cancers. Comprehensive genomic characterization has been published for three out of nine cancers studied so far (lung squamous cell carcinoma, colon, and rectum adenocarcinoma) [10, 11]. Colon and rectum adenocarcinoma were shown to belong to one cancer type based on their molecular profiles. In addition to the screening for viruses, we perform association analysis of identified viruses with tumor vs. paired non-malignant tissue from the same patients in order to determine whether the presence of a virus is significantly associated with tumors and not the normal cell types.
Results and discussion
Screening for viruses
Sample size for available DNA and RNA sequencing data in primary tumors and paired control samples
AML, acute myeloid leukemia
COAD, colon adenocarcinoma
KIRK, kidney renal clear carcinoma
KIRP, kidney renal papillary cell carcinoma
LUAD, lung adenocarcinoma
LUSK, lung squamous cell carcinoma
READ, rectum adenocarcinoma
STAD, stomach adenocarcinoma
UCEC, uterine corpus endometrioid carcinoma
Total number of next-generation sequencing reads/fragments available for gastrointestinal cancers organized by cancer and tissue type
Human genome coveragea
Human genome coveragea
Human genome coveragea
Viruses in gastrointestinal adenocarcinomas
All viruses identified in GIA are ubiquitous in human population. The first encounter with EBV, CMV, and HHV-6B infection usually happens in early childhood and results in latent lifelong infection with a prevalence of over 90% in adults [15–21]. Similarly, KSHV infects 7.2% and 49% of the population depending on the geographical region [22–24]. At the same time, HPV is the most common sexually transmitted infection with a lifelong prevalence of 71%, although unlike herpes virus infections, about 90% of HPV infection is cleared within 2 years without any consequences .
Nominal p values for association testing of virus with clinical and demographic phenotypes
Age at initial diagnosis
5.0E − 05*
History of colon polyps
3.0E − 03
CMV is also capable of transforming mammalian cells through various pathways  and has been linked to colorectal cancer, although available evidence is scarce . KSHV causes Kaposi’s sarcoma  and, to our knowledge, has not yet been associated with gastric adenocarcinomas. This may be a special case of STAD that needs further investigation.
Co-infection with multiple viruses
A substantial proportion of GIA was co-infected with multiple viruses: 46 (23%) COAD, 11 (19.3%) STAD, and 9 (12.7%) READ. Figure 2 shows how many tumors were co-infected. These multiple viruses may either co-exist in the same cancer cells or populate different cell types that compose or infiltrate the tumor. Szostek et al. suggests that co-infection with HHVs, especially CMV and EBV, may increase probability of the HPV-16 integration into the host genome during cervical cancer tumorigenesis . Similar mechanisms may be involved with EBV, CMV, and HPV-18 in colorectal adenocarcinomas and need to be tested in future studies. Alternatively, some of the identified viruses can also preferentially infect cancer cells, taking advantage of the impaired immune environment of the tumors.
The estimated viral load for EBV, CMV, and HHV-6B in GIA was less than one viral copy per cell (vc/c) in all cases with a maximum of 0.72 vc/c (HHV-6B) (Additional file 1: Table S1). The latter is equivalent to one viral genome per 1.39 human cells. This data supports the hypothesis that only a small proportion of tumor cells had a virus. The viral DNA abundance for EBV and CMV correlated with the proportion of total viral RNA reads (Additional file 3). Because no genomic data was available for KSHV-positive tumor and no HPV-18 was detectable in genomic DNA from tumors or normal cells, no viral load for these viruses could be calculated. The HPV-18 genome (7,857 nt for RefSeq ID: NC_001357) is 20 to 30 times smaller than the HHV (162,114–235,646 nt). The HPV-18 DNA quantity must have been below the detection limit at the available sequencing depth. The lowest detection threshold for NGS studies is limited to one sequence read aligning to the target genome. As a result, the probability for detecting a viral sequence in the host NGS data will be proportional to the target sequence length, the viral load, and the sequencing depth. Given the HPV-18 size and average genome sequencing depth of GIA, the hypothetical average detection limit would be above 0.0047 vc/c (standard deviation (SD) = 0.0023), equivalent to 1 virus in 261 human cells (SD = 133). This estimation does not take into account possible data loss due to disproportional filtering of highly polymorphic viral sequences through an unaccepted number of mismatches and homopolymeric and repetitive regions. Thus, the viral load is most likely underestimated here. Currently, there is no clear consensus on the minimum viral load indicative for the virus causality in neoplasm. On one hand, viral genome abundance and active expression of viral oncogenes are broadly believed to indicate much greater viral involvement in disease than the silent presence of viral genome. On the other hand, according to the ‘hit-and-run’ mechanism, transient acquisition of viral genome may be sufficient to induce malignant conversion . In the hit-and-run scenario, viruses may get partially or completely lost after they cause permanent damage to the host cell and are no longer necessary for the maintenance of the malignant state. The most reliable estimates of viral load from the literature are related to HPV-associated tumors. The quantitative PCR experiments report several HPV-18 copies per cell [8, 47]. However, Yoshida et al. suggested that very early HPV-18 DNA integration may result in lower copy numbers in cervical adenosquamous carcinoma (1.50–0.89 vc/c), leading to a more aggressive transformation with greater chromosomal instabilities, higher growth rates, and rapid progression .
Virus association with tumors
Counts of colorectal samples, positive (+) or negative (−) for identified viruses, within tumor’s DNA/RNA pairs
Tumor DNA (N = 117)
CI for two one-sided hypotheses (CL = 0.975)
Counts of colorectal samples, positive (+) or negative (−) for identified viruses, in matched tumor/normal specimen pairs
Matched control DNA N = 111 2 (blood:solid)
Adjusted pvalues for association
Tumors vs. all controls
Tumors vs. blood
Tumors vs. solid tissue
Virus integration into the human genome
For the remaining viruses detected in available genomic data, the number of identified reads was not sufficient for integration site detection (see Figure 3 and Additional file 2). Neither viral nor human genomes were covered without substantial gaps. As shown in Table 2, the median coverage of the human genome in colorectal samples was below 2x, and a great majority of the viruses with available whole genome seq data had a small fraction of the genomes covered.
Our results clearly demonstrate the presence of viral sequences in GIA. EBV and CMV were statistically significantly associated with CRAD. In addition, the expression pattern of HPV-18 was consistent with genomic integration typical during oncogenesis . This supports the hypothesis that EBV, CMV, and HPV-18 are potentially oncogenic in GIA, although we realize that further studies are needed before a conclusion can be made about the pathophysiological role of the identified viruses in GIA. No viruses were identified in the remaining six cancer types.
Our results demonstrate the feasibility of NGS for the identification of viruses at very low levels in the human tissue. Unlike PCR-based approach, NGS data offer a unique opportunity to capture any viral nucleic acids present in the sample above the detection limit. Identification of viral infection is a first step in determining the role of viruses in cancer. Availability of comprehensive viral databases makes it possible to scan for a large number of candidates without the need for de novo assembly with the restriction that novel viruses will not be detected.
Finally, in this study, we established an empirical detection limit in our computational pipeline. This information can be used to calculate the required sequencing depth, as well as the amount of material needed by the given size of viral genome, and the expected viral load in similar studies.
Whole transcriptome sequencing data for nine cancer types, comprising 1,007 patients, were obtained through TCGA (accessed on October 17, 2011). Table 1 summarizes specimen counts and abbreviations for the nine included cancer types. Additional transcriptome and whole genome sequencing data for three GIA were downloaded on November 1, 2012. Sequencing has been done using Illumina (Solexa, GAII, Illumina, Inc., San Diego, CA, USA) or SOLiD™ technology (Life Technologies, Carlsbad, CA, USA). A detailed description of the TCGA data can be found on the following TCGA websites: http://cancergenome.nih.gov/, https://tcga-data.nci.nih.gov/tcga/, as well as in two recently published studies on genomic characterization of three out of nine cancer types discussed here [10, 11]. Patient enrollment and utilization of data were conducted in accordance with TCGA human subjects protection and data access policies (http://cancergenome.nih.gov/PublishedContent/Files/pdfs/6.3.1_TCGA_Human_Subjects_and_Data_Access_policies_FINAL_011211.pdf).
Cancers selected for the TCGA study were chosen based on specific criteria that included (1) poor prognosis and high public health impact, and (2) availability of human tumor and matched normal tissue that meet TCGA standards for patient consent, quality, and quantity. The proportion of 60% tumor nuclei in the specimens was found to be sufficient by TCGA project organizers to generate high-quality data, in which the tumor’s signal can be distinguished from other cells’ signals when using NGS. Only primary, untreated tumors were collected. Samples were frozen quickly after surgery in order to prevent degradation of the RNA and DNA.
Whole genome sequencing data was available for 37.5% of the GIA (Table 1). Blood or germ line specimens derived from the same individual as the tumor specimens were used in the TCGA study to serve as paired normal controls when available (Table 1). DNA sequencing data from the blood or adjacent healthy tissue was available for 35% of the GIA specimens.
Transcriptome data (BAM files) generated by TCGA for a total of 1,007 cancer specimens were analyzed in an automated fashion on a computational cluster hosted by the High-Performance Computing core at the Center for Computational Science, University of Miami (http://ccs.miami.edu/). An IBM BladeCenter cluster was available for compute-intensive data analysis. The cluster, named Pegasus running under Linux operation system, was used consisting of 280 computing nodes each with 8 Xeon 2.6 GHz cores and 16 GB of memory, and 700 computing nodes each with 4 Opteron 2.2 GHz core and 4 GB of memory. These nodes are interconnected by Gigabit Ethernet and feature a 21 TB NFS file system providing an aggregate of 5,040 cores and 7.3 TB of memory. All computational tasks were submitted in parallel to the LSF job scheduler and resource management system. The computational pipeline is outlined in Figure 1. In total, 1,156 jobs were submitted to the cluster for steps I–V for DNA-seq and RNA-seq data; 25,245 CPU hours were used for data analysis. Bamtools-1.0.2  and samtools-0.1.18  software were employed for converting data format. Sequencing reads with phred-like quality scores q > 30 were utilized. TopHat (v.2.0.0)  was consistently used for all transcriptome mapping steps. Multiple threads were used during alignment with option –p 8. When subtracting bacterial and viral sequences, we allowed TopHat to tolerate up to four mismatches per read, instead of the default of two in the alignment step, to allow for potentially higher mismatch rates due to mutations  or imperfect match to the reference sequence. In addition, TopHat was instructed to use a .gtf file and not to look for novel transcript junctions by utilizing the ‘—no-novel-junc’ flag. We combined reference fasta files into ‘supergenomes’ for vector sequences, bacterial genomes, and viral genomes for steps II, III, and IV, respectively (Figure 1). Each individual reference sequence in the ‘supergenome’ was treated as a chromosome. Reference files were indexed before alignment steps. The bacterial reference file had to be split into two parts to reduce the memory use needed for indexing and mapping steps. Even the single short read aligning to the viral reference was considered as successful detection, if following BLAST analysis  versus the NCBI nucleotide (nt) collection confirmed sequence similarity with the target over 98% and at least one transcriptome out of a cohort had more than 10 reads mapped to a viral genome reference. Sequences aligning to multiple organisms, known artificial (vector) sequences, or low-complexity sequences were considered false positive and removed.
Only three gastrointestinal cancer types, stomach (STAD), rectum (READ), and colon adenocarcinomas (COAD), which tested virus positive on the transcriptome level, proceeded to the whole genome analysis step (Table 1). Burrows-Wheeler Aligner (BWA, v.0.5.9)  with default options was consistently used for genomic data alignment. Multiple threads were used when running BWA with option –t 4. Genomic read subtraction was performed in exactly the same fashion as described above for transcriptomes (Figure 1). In order to determine computational pipeline sensitivity to single nucleotide mismatches, sequence reads on EBV and HPV-18 with different mutation/mismatch rates and lengths were simulated and run through the computational subtraction and alignment steps (see Additional files 1 and 4).
Consensus sequence for EBV
EBV reference genomes include two very similar strains HHV-4.1 (NC_007605) and HHV-4.2 (NC_009334). In order to capture as many reads as possible, we created a consensus sequence using both NC_007605 and NC_009334 genome references. We replaced the two original references for two strains with the consensus reference for computational pipeline.
Since available data comprised cases-only DNA/RNA and malignant/normal tissue pairs, a simple McNemar test implemented in R library ‘exact2x2’ (v1.1-1.0) was used for both equivalence and association testing . Equivalence testing for virus identification in DNA and RNA was performed using two one-sided exact McNemar tests with a confidence level of 0.975 at α = 0.05 (Bonferroni-corrected for four viruses). The null hypothesis of equivalence was rejected, when at least in one of the one-sided tests, the confidence interval (CI) did not include ‘1’. Association of virus with tumor vs. normal tissue was done with two-sided exact McNemar test. Since COAD and READ are genetically identical , we combined two cohorts for association analysis to achieve a higher sample size. Association testing of virus presence in tumor/normal tissue was done using blood and solid tissue controls in separate tests, as well as blood and solid combined. Bonferroni correction was done for nine tests (three types of control groups multiplied by three viruses identified in the whole genome data). Fisher exact test was used for association testing with clinicopathological and demographics variables. Age at initial diagnosis in virus-positive and virus-negative groups was compared using ANOVA. Bonferroni correction was applied.
Estimation of viral load
where i is the species, R is the number of reads, L = 51 and is the average read length in nucleotides, and G is the corresponding genome length in nucleotides. For the diploid human genome, C = CH/2.
Acute myeloid leukemia
Human herpes virus
Human papilloma virus type 18
Kidney renal clear carcinoma
Kidney renal papillary cell carcinoma
Kaposi’s sarcoma-associated human virus
Lung squamous cell carcinoma
The Cancer Genome Atlas
Uterine corpus endometrioid carcinoma
Viral copies per cell.
We would like to thank TCGA project organizers as well as all study participants. This work was supported by the Center for Computational Science (CCS), University of Miami and partially by a grant (1R03CA171052-01A1) from National Cancer Institute (NCI). We also thank the High-Performance Computing team at the Center for Computational Science, University of Miami, especially Joel P. Zysman, John Baringer, Pedro Davila, and Zongjun Hu for technical support, as well as Dr. Enrique A. Mesri (Microbiology and Immunology Department) and Dr. Jennifer Clarke (Epidemiology Department) for their intellectual contribution to the manuscript. Finally, we would also like to thank Camilo Valdes (CCS) for assistance with the simulations.
- Zur Hausen H: The search for infectious causes of human cancers: where and why. Virology. 2009, 392 (1): 1-10. 10.1016/j.virol.2009.06.001.View ArticlePubMed
- Moore RA, Warren RL, Freeman JD, Gustavsen JA, Chenard C, Friedman JM, Suttle CA, Zhao YJ, Holt RA: The sensitivity of massively parallel sequencing for detecting candidate infectious agents associated with human tissue. PLoS One. 2011, 6 (5): e19838-10.1371/journal.pone.0019838.PubMed CentralView ArticlePubMed
- Weber G, Shendure J, Tanenbaum DM, Church GM, Meyerson M: Identification of foreign gene sequences by transcript filtering against the human genome. Nature genetics. 2002, 30 (2): 141-142.PubMed
- Kostic AD, Ojesina AI, Pedamallu CS, Jung J, Verhaak RG, Getz G, Meyerson M: PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nature biotechnology. 2011, 29 (5): 393-396. 10.1038/nbt.1868.PubMed CentralView ArticlePubMed
- Chen Y, Yao H, Thompson EJ, Tannir NM, Weinstein JN, Su X: VirusSeq: software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue. Bioinformatics. 2013, 29 (2): 266-267. 10.1093/bioinformatics/bts665.PubMed CentralView ArticlePubMed
- Agrawal N, Frederick MJ, Pickering CR, Bettegowda C, Chang K, Li RJ, Fakhry C, Xie TX, Zhang J, Wang J, et al: Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science. 2011, 333 (6046): 1154-1157. 10.1126/science.1206923.PubMed CentralView ArticlePubMed
- Stransky N, Egloff AM, Tward AD, Kostic AD, Cibulskis K, Sivachenko A, Kryukov GV, Lawrence MS, Sougnez C, McKenna A, et al: The mutational landscape of head and neck squamous cell carcinoma. Science. 2011, 333 (6046): 1157-1160. 10.1126/science.1208130.PubMed CentralView ArticlePubMed
- Arron ST, Ruby JG, Dybbro E, Ganem D, DeRisi JL: Transcriptome sequencing demonstrates that human papillomavirus is not active in cutaneous squamous cell carcinoma. J Invest Dermatol. 2011, 131 (8): 1745-1753. 10.1038/jid.2011.91.PubMed CentralView ArticlePubMed
- Barzon L, Militello V, Lavezzo E, Franchin E, Peta E, Squarzon L, Trevisan M, Pagni S, Dal Bello F, Toppo S, Palù G: Human papillomavirus genotyping by 454 next generation sequencing technology. J Clin Virol. 2011, 52 (2): 93-97. 10.1016/j.jcv.2011.07.006.View ArticlePubMed
- TCGARN: Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012, 489 (7417): 519-525. 10.1038/nature11404.View Article
- TCGARN: Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012, 487 (7407): 330-337. 10.1038/nature11252.View Article
- De Paoli P, Carbone A: Carcinogenic viruses and solid cancers without sufficient evidence of causal association. International Journal of Cancer Journal International du Cancer. 2013, 133 (7): 1517-1529. 10.1002/ijc.27995.View ArticlePubMed
- Grulich AE, van Leeuwen MT, Falster MO, Vajdic CM: Incidence of cancers in people with HIV/AIDS compared with immunosuppressed transplant recipients: a meta-analysis. Lancet. 2007, 370 (9581): 59-67. 10.1016/S0140-6736(07)61050-2.View ArticlePubMed
- Khoury JD, Tannir NM, Williams MD, Chen Y, Yao H, Zhang J, Thompson EJ, Meric-Bernstam F, Medeiros LJ, Weinstein JN, Su X: Landscape of DNA virus associations across human malignant cancers: analysis of 3,775 cases using RNA-Seq. Journal of Virology. 2013, 87 (16): 8916-8926. 10.1128/JVI.00340-13.PubMed CentralView ArticlePubMed
- Takeuchi K, Tanaka-Taya K, Kazuyama Y, Ito YM, Hashimoto S, Fukayama M, Mori S: Prevalence of Epstein-Barr virus in Japan: trends and future prediction. Pathol Int. 2006, 56 (3): 112-116. 10.1111/j.1440-1827.2006.01936.x.View ArticlePubMed
- Linton MS, Kroeker K, Fedorak D, Dieleman L, Fedorak RN: Prevalence of Epstein-Barr Virus in a population of patients with inflammatory bowel disease: a prospective cohort study. Aliment Pharmacol Ther. 2013, 38: 1248-1254. 10.1111/apt.12503.View ArticlePubMed
- Luzuriaga K, Sullivan JL: Infectious mononucleosis. The New England Journal of Medicine. 2010, 362 (21): 1993-2000. 10.1056/NEJMcp1001116.View ArticlePubMed
- Lubeck PR, Doerr HW, Rabenau HF: Epidemiology of human cytomegalovirus (HCMV) in an urban region of Germany: what has changed?. Med Microbiol Immunol. 2010, 199 (1): 53-60. 10.1007/s00430-009-0136-3.View ArticlePubMed
- Lopo S, Vinagre E, Palminha P, Paixao MT, Nogueira P, Freitas MG: Seroprevalence to cytomegalovirus in the Portuguese population, 2002–2003. Euro Surveill. 2011, 16 (25): 5-
- Staras SA, Dollard SC, Radford KW, Flanders WD, Pass RF, Cannon MJ: Seroprevalence of cytomegalovirus infection in the United States, 1988–1994. Clin Infect Dis. 2006, 43 (9): 1143-1151. 10.1086/508173.View ArticlePubMed
- Oren I, Sobel JD: Human herpesvirus type 6: review. Clin Infect Dis. 1992, 14 (3): 741-746. 10.1093/clinids/14.3.741.View ArticlePubMed
- Butler LM, Were WA, Balinandi S, Downing R, Dollard S, Neilands TB, Gupta S, Rutherford GW, Mermin J: Human herpesvirus 8 infection in children and adults in a population-based study in rural Uganda. The Journal of Infectious Diseases. 2011, 203 (5): 625-634. 10.1093/infdis/jiq092.PubMed CentralView ArticlePubMed
- Qu L, Jenkins F, Triulzi DJ: Human herpesvirus 8 genomes and seroprevalence in United States blood donors. Transfusion. 2010, 50 (5): 1050-1056. 10.1111/j.1537-2995.2009.02559.x.View ArticlePubMed
- Baillargeon J, Leach CT, Deng JH, Gao SJ, Jenson HB: High prevalence of human herpesvirus 8 (HHV-8) infection in south Texas children. Journal of Medical Virology. 2002, 67 (4): 542-548. 10.1002/jmv.10136.View ArticlePubMed
- CDC: Incidence, Prevalence, and Cost of Sexually Transmitted Infections in the United States. 2013, Atlanta: Centers for Disease Control and Prevention
- Uozaki H, Fukayama M: Epstein-Barr virus and gastric carcinoma–viral carcinogenesis through epigenetic mechanisms. Int J Clin Exp Pathol. 2008, 1 (3): 198-216.PubMed CentralPubMed
- Camargo MC, Murphy G, Koriyama C, Pfeiffer RM, Kim WH, Herrera-Goepfert R, Corvalan AH, Carrascal E, Abdirad A, Anwar M, Hao Z, Kattoor J, Yoshiwara-Wakabayashi E, Eizuru Y, Rabkin CS, Akiba S: Determinants of Epstein-Barr virus-positive gastric cancer: an international pooled analysis. British Journal of Cancer. 2011, 105 (1): 38-43. 10.1038/bjc.2011.215.PubMed CentralView ArticlePubMed
- Fukayama M, Ushiku T: Epstein-Barr virus-associated gastric carcinoma. Pathol Res Pract. 2011, 207 (9): 529-537. 10.1016/j.prp.2011.07.004.View ArticlePubMed
- Marquitz AR, Mathur A, Shair KH, Raab-Traub N: Infection of Epstein-Barr virus in a gastric carcinoma cell line induces anchorage independence and global changes in gene expression. Proc Natl Acad Sci U S A. 2012, 109 (24): 9593-9598. 10.1073/pnas.1202910109.PubMed CentralView ArticlePubMed
- Ryan JL, Jones RJ, Kenney SC, Rivenbark AG, Tang W, Knight ER, Coleman WB, Gulley ML: Epstein-Barr virus-specific methylation of human genes in gastric cancer cells. Infect Agent Cancer. 2010, 5: 27-10.1186/1750-9378-5-27.PubMed CentralView ArticlePubMed
- Yuen ST, Chung LP, Leung SY, Luk IS, Chan SY, Ho J: In situ detection of Epstein-Barr virus in gastric and colorectal adenocarcinomas. The American Journal of Surgical Pathology. 1994, 18 (11): 1158-1163. 10.1097/00000478-199411000-00010.View ArticlePubMed
- Grinstein S, Preciado MV, Gattuso P, Chabay PA, Warren WH, De Matteo E, Gould VE: Demonstration of Epstein-Barr virus in carcinomas of various sites. Cancer Research. 2002, 62 (17): 4876-4878.PubMed
- Karpinski P, Myszka A, Ramsey D, Kielan W, Sasiadek MM: Detection of viral DNA sequences in sporadic colorectal cancers in relation to CpG island methylation and methylator phenotype. Tumour Biol. 2011, 32 (4): 653-659. 10.1007/s13277-011-0165-6.PubMed CentralView ArticlePubMed
- Park JM, Choi MG, Kim SW, Chung IS, Yang CW, Kim YS, Jung CK, Lee KY, Kang JH: Increased incidence of colorectal malignancies in renal transplant recipients: a case control study. Am J Transplant. 2010, 10 (9): 2043-2050. 10.1111/j.1600-6143.2010.03231.x.View ArticlePubMed
- Song LB, Zhang X, Zhang CQ, Zhang Y, Pan ZZ, Liao WT, Li MZ, Zeng MS: Infection of Epstein-Barr virus in colorectal cancer in Chinese. Ai Zheng. 2006, 25 (11): 1356-1360.PubMed
- Giuliani L, Ronci C, Bonifacio D, Di Bonito L, Favalli C, Perno CF, Syrjanen K, Ciotti M: Detection of oncogenic DNA viruses in colorectal cancer. Anticancer Res. 2008, 28 (2B): 1405-1410.PubMed
- Chen TH, Huang CC, Yeh KT, Chang SH, Chang SW, Sung WW, Cheng YW, Lee H: Human papilloma virus 16 E6 oncoprotein associated with p53 inactivation in colorectal cancer. World J Gastroenterol. 2012, 18 (30): 4051-4058. 10.3748/wjg.v18.i30.4051.PubMed CentralView ArticlePubMed
- Salepci T, Yazici H, Dane F, Topuz E, Dalay N, Onat H, Aykan F, Seker M, Aydiner A: Detection of human papillomavirus DNA by polymerase chain reaction and southern blot hybridization in colorectal cancer patients. J Buon. 2009, 14 (3): 495-499.PubMed
- Lee YM, Leu SY, Chiang H, Fung CP, Liu WT: Human papillomavirus type 18 in colorectal cancer. Journal of Microbiology, Immunology, and Infection = Wei mian yu gan ran za zhi. 2001, 34 (2): 87-91.PubMed
- Bodaghi S, Yamanegi K, Xiao SY, Da Costa M, Palefsky JM, Zheng ZM: Colorectal papillomavirus infection in patients with colorectal cancer. Clinical Cancer Research. 2005, 11 (8): 2862-2867. 10.1158/1078-0432.CCR-04-1680.PubMed CentralView ArticlePubMed
- Damin DC, Caetano MB, Rosito MA, Schwartsmann G, Damin AS, Frazzon AP, Ruppenthal RD, Alexandre CO: Evidence for an association of human papillomavirus infection and colorectal cancer. Eur J Surg Oncol. 2007, 33 (5): 569-574. 10.1016/j.ejso.2007.01.014.View ArticlePubMed
- Collins D, Hogan AM, Winter DC: Microbial and viral pathogens in colorectal cancer. The Lancet Oncology. 2011, 12 (5): 504-512. 10.1016/S1470-2045(10)70186-8.View ArticlePubMed
- Chen HP, Jiang JK, Chen CY, Chou TY, Chen YC, Chang YT, Lin SF, Chan CH, Yang CY, Lin CH, Lin JK, Cho WL, Chan YJ: Human cytomegalovirus preferentially infects the neoplastic epithelium of colorectal cancer: a quantitative and histological analysis. J Clin Virol. 2012, 54 (3): 240-244. 10.1016/j.jcv.2012.04.007.View ArticlePubMed
- Zur Hausen H: Oncogenic DNA viruses. Oncogene. 2001, 20 (54): 7820-7823. 10.1038/sj.onc.1204958.View ArticlePubMed
- Szostek S, Zawilinska B, Kopec J, Kosz-Vnenchak M: Herpesviruses as possible cofactors in HPV-16-related oncogenesis. Acta Biochim Pol. 2009, 56 (2): 337-342.PubMed
- Niller HH, Wolf H, Minarovits J: Viral hit and run-oncogenesis: genetic and epigenetic scenarios. Cancer Letters. 2011, 305 (2): 200-217. 10.1016/j.canlet.2010.08.007.View ArticlePubMed
- Yoshida T, Sano T, Oyama T, Kanuma T, Fukuda T: Prevalence, viral load, and physical status of HPV 16 and 18 in cervical adenosquamous carcinoma. Virchows Arch. 2009, 455 (3): 253-259. 10.1007/s00428-009-0823-x.View ArticlePubMed
- Lee JH, Kim SH, Han SH, An JS, Lee ES, Kim YS: Clinicopathological and molecular characteristics of Epstein-Barr virus-associated gastric carcinoma: a meta-analysis. J Gastroenterol Hepatol. 2009, 24 (3): 354-365. 10.1111/j.1440-1746.2009.05775.x.View ArticlePubMed
- Mori Y: Recent topics related to human herpesvirus 6 cell tropism. Cell Microbiol. 2009, 11 (7): 1001-1006. 10.1111/j.1462-5822.2009.01312.x.View ArticlePubMed
- Zur Hausen H: Papillomaviruses and cancer: from basic studies to clinical application. Nature reviews Cancer. 2002, 2 (5): 342-350. 10.1038/nrc798.View ArticlePubMed
- Barnett DW, Garrison EK, Quinlan AR, Stromberg MP, Marth GT: BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics. 2011, 27 (12): 1691-1692. 10.1093/bioinformatics/btr174.PubMed CentralView ArticlePubMed
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralView ArticlePubMed
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012, 7 (3): 562-578. 10.1038/nprot.2012.016.PubMed CentralView ArticlePubMed
- Sanjuan R, Nebot MR, Chirico N, Mansky LM, Belshaw R: Viral mutation rates. Journal of virology. 2010, 84 (19): 9733-9748. 10.1128/JVI.00694-10.PubMed CentralView ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMed
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralView ArticlePubMed
- Fay MP: Confidence intervals that match Fisher’s exact or Blaker’s exact tests. Biostatistics. 2010, 11 (2): 373-374. 10.1093/biostatistics/kxp050.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.