Functional nsSNPs from carcinogenesis-related genes expressed in breast tissue: Potential breast cancer risk alleles and their distribution across human populations

Although highly penetrant alleles of BRCA1 and BRCA2 have been shown to predispose to breast cancer, the majority of breast cancer cases are assumed to result from the presence of low-moderate penetrant alleles and environmental carcinogens. Non-synonymous single nucleotide polymorphisms (nsSNPs) are hypothesised to contribute to disease susceptibility and approximately 30 per cent of them are predicted to have a biological significance. In this study, we have applied a bioinformatics-based strategy to identify breast cancer-related nsSNPs from 981 carcinogenesis-related genes expressed in breast tissue. Our results revealed a total of 367 validated nsSNPs, 109 (29.7 per cent) of which are predicted to affect the protein function (functional nsSNPs), suggesting that these nsSNPs are likely to influence the development and homeostasis of breast tissue and hence contribute to breast cancer susceptibility. Sixty-seven of the functional nsSNPs presented as commonly occurring nsSNPs (minor allele frequencies ≥ 5 per cent), representing excellent candidates for breast cancer susceptibility. Additionally, a non-uniform distribution of the common functional nsSNPs among different human populations was observed: 15 nsSNPs were reported to be present in all populations analysed, whereas another set of 15 nsSNPs was specific to particular population(s). We propose that the nsSNPs analysed in this study constitute a unique resource of potential genetic factors for breast cancer susceptibility. Furthermore, the variations in functional nsSNP allele frequencies across major population backgrounds may point to the potential variability of the molecular basis of breast cancer predisposition and treatment response among different human populations.


Introduction
Mutations of BRCA1 1 and BRCA2 2 confer high breast cancer risk to the carriers. Such highly penetrant mutations are only responsible for as mall fraction ( , 5-10 per cent) of all breast cancer cases, 3,4 however, suggesting the presence of other,y et to be identified, mutations in other breast cancer predisposition genes. [5][6][7] Mutations in anumber of genes, such as p53, 8 ATM 6 and Chek2, 9 have also been shown to contribute to breast cancer risk in av erys mall fraction of breast cancer cases. So far, no other high-penetrant breast cancer susceptibility gene has been identified; however, genetic variations including single nucleotide polymorphisms (SNPs) have been hypothesised to act as low-moderate penetrant alleles and contribute to breast cancer,aswell as other complex diseases. 7,[10][11][12] Va riations in protein sequence and function are mainly due to the non-synonymous formofSNPs (nsSNPs).The fraction of nsSNPs in the genome is relatively low( , 10 per cent of all coding SNPs) 13  more likely to alter the structure, function and interaction of the proteins, and thus constitute as et of candidate genetic factorsa ssociated with disease predisposition. 14,15 Approxi-mately3 0p er cent of the nsSNPs are predicted to have biological consequences. [16][17][18] Several nsSNPs from the proteinsa cting in av ariety of cellular pathways-such as apoptosis, 19 oxidatives tress 20 and signal transduction 21 -have already been reported to be associated with an increased/ decreased risk of breast cancer.
Several studies have described cancer-relevant nsSNPs; 22-25 however, to our knowledge they have not been studied in the context of expression of genes in ap articular tissue. Clearly,i no rder for genes to be linked to ad isease of at issue, their protein products should somehowi nfluence that particular tissue,e ither as exogenous proteins( sucha s hormones) or endogenous proteins( such as the proteins expressed in that tissue). 26,27 In this study,w eh avea pplied a bioinformatics-based strategy and identifiedp otentially functional nsSNPs from endogenous carcinogenesis-related proteinse xpressed in breast tissue.

nsSNPs
The nsSNPs from the group of carcinogenesis-related genes expressed in breast tissue were retrieved from dbSNP build 120 (http://www.ncbi.nlm.nih.gov/SNP/). 31 Only the nsSNPs detected in $ 2c hromosomes in as ample panel of $ 40 chromosomes were included in this study (validated nsSNPs).SeventeennsSNPs were found in both less and more than 5p er cent of the chromosomes analysed in different sample sets; for simplicity,w eh avec lassified such nsSNPs within the nsSNP setw ith $ 5p er cent minor allele frequencies throughout this paper.

PolyPhen analysis
The PolyPhen predictions 18 were retrieved from ap re-computed dbSNP -PolyPhen resource.A ll PolyPhen predictions were based on either alignment of at least fives imilar proteins (for am ore reliable prediction) or structural parameters.

Results
The results obtained in this study ares ummarised in Ta ble 1 and constitute only thev alidatedn sSNPsw ith ar eliable prediction made by the PolyPhen prediction tool (see Methods). At otal of 367 nsSNPs from 189 carcinogenesis-related genes expressed in breast tissue arep resented. At otal of 109 nsSNPs (28.4 per cent) from 75 genes were predicted potentially to affect the protein function (functional nsSNPs). Additionally,6 1.5p er cent ( n ¼ 67) of the potentially functional nsSNPs represented commonly occurring nsSNPs in the population( $ 5p er cent minor allelef requency; Ta ble 2). In this paper,w em ainly discuss the commonly occurring functional nsSNPs; however, the listo fr arely occurring functional nsSNPs can also be found under the supplementaryt able (www.ozceliklab.com/Breast_ rare_nsSNPs/).
Af raction of protein products of genes bearing commonly occurring functional nsSNPs were found to be involved in one or more carcinogenesis-related biological pathways compiled by the CGAP-GAI 30 (Table 2). Such nsSNPs were mostly found in the proteinsf romD NA repair (three genes, four nsSNPs); metastasis (four genes, four nsSNPs); angiogenesis (seven genes,e ight nsSNPs); pharmacology (seven genes, ten nsSNPs); and immunology (38 genes, 51 nsSNPs).
We have also analysed the distribution of the commonly occurring functional nsSNPs across humanp opulations.F or simplicity,w eh avec ategorised the frequency information obtained from different dbSNP entries into three major groups: African (African and African-American),C aucasian (Caucasian and European) and Asian (Chinese and East Asian) populations. Minor allele frequencies for nsSNPs were available for at least three different humanp opulations for 30 out of 67 commonly occurring functional nsSNPs( Ta ble 3).
Ta ble 3. Functional and common non-synonymous form of single nucleotide polymorphisms (nsSNPs) with frequency information available from different human populations.  31 c The position of the amino acid substitution and the amino acids specified by the major and minor SNP alleles are indicated. The frequency informationisasi nd bSNP build 123 and is based on $ 40 chromosomes. Please note that the samples annotated as Africanand African-American; Caucasian and European; Chinese and East Asian are combined together here and are referred to as African, Caucasian and Asian, respectively.W henever more than one entryw as available for ag roup,o nly the information from the entries with the highest number of chromosomes is included here.
Functional nsSNPs from carcinogenesis-related genes expressed in breasttissue Review PRIMARY RESEARCH

Discussion
Ap ortion of SNPs is considered to contribute to complex disease development. 7,10 -12 SNPs in or aroundt he candidate genes might be directly linked to ad isease; however, not all SNPs are supposed to affect gene expression and function,s o selectiono ft hose with potentiale ffects is keenly debated. 32 Several studies have developed tools and/or systematically analysed nsSNPs to identifyt hose that affect gene function based on evolutionaryc onservation or structural parameters. [16][17][18]33 PolyPhen 18 is one such web-based tool utilised to select the nsSNPs that arel ikelyt oa ffect protein function. In short, the PolyPhen predictions areb ased on protein alignments, structural parameterso rs equence annotations. The sensitivity of PolyPhen has been reported to be approximately 82 per cent. 18 In this study,w eh ypothesised that the systematic analysisof candidate genes that are expressed in the affected tissueislikely to improvea nd enrich the identification of disease-susceptibility alleles. Accordingly,u sing ab ioinformatics-based strategy,w ei dentified the functional nsSNPs from al arge number of genes related to the carcinogenesis-related pathways( DNA repair,cell cycle,signal transduction, etc), which areexpressed in breast tissue.W ep ropose that these potentially functional nsSNPs can result in abnormalities at the protein level, which are likely to affect the development, metabolism and homeostasis of the breast tissue,a nd thus can contribute to breast cancer susceptibility.
The genes with functional nsSNPs identifiedi nt his study were from avarietyofcarcinogenesis-related cellular pathways. According to this information,p ossible biological roles for these nsSNPs mayb es uggested. Fore xample,n sSNPs from angiogenesis-and metastasis-related proteins mayhaver oles in tumourg rowtha nd the developmento fm etastatic tumours. 34,35 Additionally,D NA repair nsSNPs mayl ead to the accumulation of somatic mutations and thus can participate in cancer initiation and promotion. [34][35][36] Furthermore,t ogether with the DNA repair nsSNPs, the nsSNPs from the pharmacology genes mayalso be good candidates for the studies targeting the efficacy,d ifferential response and adverse effect of chemo-/radiotherapyi nb reast cancer. [37][38][39] The majority of the nsSNPs were from the genes related to immunological responses (74.6 per cent), which can both suppress and promote tumorigenesis. 34 It is likely that the larger number of the functional nsSNPs in immune system-related genes is ar eflection of the large number of immunology genes in the breast tissue-expressed gene set (60 per cent).
Ac onsiderable number of genes with functional nsSNPs have been previously linked to breast cancer aetiology: ADM, 40 ADRB2 , 41 57 Therefore, we propose that the nsSNPs in Ta ble 2a re excellent candidates as genetic factors involved in breast cancer initiation, promotion or progression. Additionally,s omeo ft hese nsSNPs mayb ec ritical for breast cancer treatment outcome.
When the distribution of the commonly occurring functional nsSNPs wasanalysed,differences in the major allelesand the allele frequencies across human populations were observed. For example,1 5c ommonly occurring nsSNPs were found in all populations, whereas another set of 15 nsSNPs was specific to particular population(s). These differences might be reflections of either the age of the allele, foundere ffects or the dissimilar selective pressures acting on different populations. 58,59 Mostimportantly,the data also indicate that a common nsSNP withap otentialb iological consequence in our set wasequally likely to be either prevalent across different human populations or limited to some populations. Clearly, the latter prompted us to conclude that the population-specific functional nsSNPs mayc ontribute to the genetic predisposition in individuals with as pecific background. In this regard, this conclusion is consistentw ith previous studies in which genetic variations with significantly different allelic frequencies among populations were found to be associated with specific disease or differential drug responses. [60][61][62][63][64][65] This information mayb ep articularly helpful to researchersi nd etermining which nsSNPs mayb er elevant to utilise in specific population-based studies. In addition, althoughf urther analyses are required, it is tempting to speculate that these nsSNPs may be ap arto ft he potentialv ariability of the molecularb asis of breast cancer predisposition and drug response among different human populations.
Data integrationf roms everal databases forms the basis of our strategy to determine functional SNPs of breast tissue-expressed genes.T he quality and the quantity of the genomic data within individual databases influence thec omprehensiveness of the combined data. Thef unctional SNP list presented in this study is aresult of data integration from three databases -n amely,T issueInfo, 29 Ensembl, 28 and dbSNP. 31 The non-matching data fields (eg transcript identifiers) between TissueInfo,E nsembl and dbSNPh aveb een the main source of missing data. For example,a lthough BRCA1 wask nown to have ap otentially functional SNP (predicted previously), this information has not been captured because of non-matching transcript identifieri nformation for BRCA1 in the databases.T hus, incompatibility of data in different databases has been ar ate-limiting factor for the bioinformatics-based strategies presented here.T he improvement of the quality and the quantity of genomic data in the databases will proveb eneficial for researching complex questions. Also, the genes presented in this paper are based on thee xpressed sequence tag information, which mayl ead to an underrepresentationo fr arely expressed genes. 29,66 Data integration using other tissue expression databases is likely to enrich the quality of the data produced. Nevertheless, althoughi ti s possible that theS NPs presented here mayn ot represent the most comprehensivel ist, the SNPs identifiedu singt he proposed strategy represent av aluable resource for studying the genetic predisposition to breast cancer.

Conclusion
In conclusion, we have designed an ovel strategy to identify potentially functional variants of cancer-related genes expressed in breast tissue.O ur results demonstrated the presence of 109 nsSNPs withapotentialb iological consequence, 67 of which were frequentinhuman populations. We propose that, together with other genetic and environmental factors, these nsSNPs mayb ei nvolved in breast cancer initiation and progression; thus, thesen sSNPs represent the premium candidates as genetic variations of breast cancer predisposition. We also suggest that aconsiderable fraction of the nsSNPs may, in fact,b ep opulation-specific genetic variations.