Open Access

Human SNPs resulting in premature stop codons and protein truncation

Human Genomics20062:274

DOI: 10.1186/1479-7364-2-5-274

Received: 10 November 2005

Accepted: 10 November 2005

Published: 1 March 2006

Abstract

Single nucleotide polymorphisms (SNPs) constitute the most common type of genetic variation in humans. SNPs introducing premature termination codons (PTCs), herein called X-SNPs, can alter the stability and function of transcripts and proteins and thus are considered to be biologically important. Initial studies suggested a strong selection against such variations/mutations. In this study, we undertook a genome-wide systematic screening to identify human X-SNPs using the dbSNP database. Our results demonstrated the presence of 28 X-SNPs from 28 genes with known minor allele frequencies. Eight X-SNPs (28.6 per cent) were predicted to cause transcript degradation by nonsense-mediated mRNA decay. Seventeen X-SNPs (60.7 per cent) resulted in moderate to severe truncation at the C-terminus of the proteins (deletion of > 50 per cent of the amino acids). The majority of the X-SNPs (78.6 per cent) represent commonly occurring SNPs, by contrast with the rarely occurring disease-causing PTC mutations. Interestingly, X-SNPs displayed a non-uniform distribution across human populations: eight X-SNPs were reported to be prevalent across three different human populations, whereas six X-SNPs were found exclusively in one or two population(s). In conclusion, we have systematically investigated human SNPs introducing PTCs with respect to their possible biological consequences, distributions across different human populations and evolutionary aspects. We believe that the SNPs reported here are likely to affect gene/protein function, although their biological and evolutionary roles need to be further investigated.

Keywords

SNP premature termination codons nonsense-mediated mRNA decay population distribution evolutionary selection

Introduction

The Human Genome Project revealed the presence of a large number of genetic variations among individuals. Single nucleotide polymorphisms (SNPs) are the most common genetic variation; they occur, on average, once in every 400 - 1,000 base pairs along DNA [14]. The term 'polymorphism' traditionally refers to commonly occurring genetic variations (minor allele frequency approximately ≥ 1 per cent) in the population [5]. The density of SNPs varies among different genomic regions, and is thought to be dependent on both the mutation rate and the selective constraints on the region [6]. Currently, there is a strong interest in SNPs because they are hypothesised to contribute to differential disease risk and drug/treatment response among individuals [7, 8].

SNPs located in the coding regions of genes may have important biological consequences. For example, non-synonymous SNPs (nsSNPs) change the amino acid sequence and thus may affect protein function. Although many approaches and systematic analyses have been undertaken to identify nsSNPs with possible biological significance [912], to our knowledge no large-scale systematic analysis has been carried out to identify and characterise SNPs that introduce premature termination codons (PTCs; herein called X-SNPs). Both frameshift and nonsense mutations can lead to the introduction of PTCs along the open reading frames. As a result of PTCs, the stability of transcripts or proteins may be directly affected [13, 14]. Alternatively, the truncated proteins may act in a dominant-negative fashion [15]. Thus, the PTCs can lead to either loss-of-function or gain-of-function by altering the stability and function of the transcripts/proteins.

The Mendelian human diseases are associated with highpenetrant disease-causing genetic alterations that are found in very low frequencies (approximately < 1 per cent) in the population, most likely due to strong selection against them [16]. In inherited human genetic disorders, approximately one-third of mutations introduce PTCs[17] that are considered to be deleterious. Similarly, the number of SNPs introducing PTCs in the human genome is estimated to be fairly low, and a previous study suggested the presence of strong evolutionary selection against X-SNPs [18]. Therefore, disease-related or not, the PTCs are considered dramatically to affect proteins leading to potential biological abnormalities. In this study, our aim was to evaluate the polymorphisms introducing PTCs in the human genome with respect to their potential biological consequences, distributions across different human populations and minor allele frequencies. As more X-SNPs are discovered and deposited in public SNP databases, it will be possible to analyse a larger number of X-SNPs and obtain more comprehensive data. Nevertheless, our results do provide an interesting and unique catalogue of polymorphisms that deserves further biological and epidemiological disease-association studies.

Methods

SNPs

SNPs annotated 'premature termination codon SNPs' were retrieved from the dbSNP database build 120 (http://www.ncbi.nlm.nih.gov/SNP/) [19]. We have annotated such SNPs as X-SNPs throughout this paper. There was a total of 977 X-SNPs in the dbSNP database; however, only 119 of them were presented with minor allele frequency information. Among these SNPs, only the ones that were found in at least two chromosomes with a sample size of ≥ 20 chromosomes were further analysed (herein annotated as validated X-SNPs). The X-SNPs that are located on the transcripts annotated as 'predictions', 'pseudogenes', 'similar to' or 'open reading frames' were excluded from this study. In total, 28 X-SNPs were in agreement with all of the above requirements.

BLAST analyses

To map the SNP sequences on transcripts, SNP-flanking sequences of X-SNPs were blasted against the transcripts in GenBank (http://www.ncbi.nih.gov/Genbank/)[20] using the BLAST against gene transcripts tool (http://lpgws.nci.nih.gov:80/perl/blast2/) [21], as explained by Savas et al [22] One mismatch in the SNP-flanking sequence/transcript alignment was allowed. The SNP-flanking sequences were also blasted against the human genome using the NCBI BLAST tool (http://www.ncbi.nlm.nih.gov/BLAST/)[23] to ensure that the SNP sequences are not derived from multiple genomic regions [24], as explained in a further paper by Savas et al. [25]

Alternatively spliced transcript variants (ASTVs)

Information relating to ASTVs was retrieved from the Ref Seq resource of NCBI (http://www.ncbi.nlm.nih.gov/RefSeq/) [26].

Candidate transcripts for nonsense-mediated mRNA decay

Blasting the transcript sequence against the human genome identified the genomic structures of transcripts. The subsequent manual analysis of the exon - intron boundaries identified X-SNPs that can lead to nonsense-mediated mRNA decay (NMD): the transcripts with an SNP introducing a PTC located ≥ 50 nucleotides upstream of an exon - intron junction are considered candidates to undergo NMD [13, 2729]. Blasting the transcript sequence against the human genome identified the genomic structures of transcripts. The subsequent manual analysis of the exon - intron boundaries identified X-SNPs that can lead to nonsense-mediated mRNA decay (NMD): the transcripts with an SNP introducing a PTC located ≥ 50 nucleotides upstream of an exon - intron junction are considered candidates to undergo NMD [13, 2729].

Results and discussion

Possible biological consequences of X-SNPs

Our systematic search of the dbSNP database[19] (build 120) yielded 28 validated X-SNPs from 28 genes (Table 1). Twenty-three genes bearing X-SNPs were found to code for a single transcript; however, the remaining five X-SNPs were found in genes undergoing alternative splicing: DSCR8- K79X, HPS4-R246X, IL17RB-Q484X, OAS2-W720X and TAP2-Q687X. With the exception of HPS4-R246X, all X-SNPs were mapped onto an ASTV coding for the longest protein isoform. For 22/28 X-SNPs, genotype information was available in the dbSNP database. As a result, for 12 X-SNPs, at least one homozygous sample was reported, suggesting that these X-SNPs do not affect the fitness per se (see below; Table 1). In the remaining cases, genotyping of larger sample sets may help in elucidating whether the homozygous state is deleterious (ie the homozygotes are not viable) or whether the low allele frequency makes it hard to detect the homozygotes in small populations.
Table 1

Validated X-SNPs in the human genome.

Gene

aGene function

bAccession #

Location

cSNP ID

dFrequency

eHomozygosity

X-SNP

fProtein length (truncation)

gNMD

hCpG

AGT

Cell signalling; hypertension

NM_000029.1

1q42-q43

rs5039

CEPH-MULTI-NATIONAL 184 chr.

G = 1.000 A = 0.000

HYP1-MULTI-NATIONAL 80 chr.

G = 0.950 A = 0.050

n/a

Q53X

485

(89%)

+

+

APOC4

Lipid metabolism

NM_001646.1

19q13.2

rs5164

CEPH-MULTI-NATIONAL 184 chr.

G = 1.000 A = 0.000

HYP1-MULTI-NATIONAL 80 chr.

G = 0.950 A = 0.050

-

W47X

127

(63%)

+

-

CDH15

Cell adhesion; morphogenetic processes

NM_004933.2

16q24.3

rs2270416

JBIC-allele-EAST ASIA 1500 chr. G = 0.826

T = 0.174

n/a

Y788X

814

(3.2%)

-

+

CLCA3

Transport

NM_004921.1

1p31-p22

rs2292830

JBIC-allele -EAST ASIA 1462 chr. G = 0.569

C = 0.431

n/a

Y84X

262

(67.3%)

-

-

CYP2C19

Transport; drug metabolism and synthesis of lipids

NM_000769.1

10q24.

1-q24.3

rs4986893

PAC1-EAST ASIA 46 chr. G = 0.913

A = 0.087

CAUC1-MULTI-NATIONAL 60 chr.

G = 1.000 A = 0.000

AFR1-MULTI-NATIONAL 48 chr. G = 0.979

A = 0.021

HISP1-CENTRAL/SOUTH AMERICA 44

chr. G = 1.000 A = 0.000

P1-MULTI-NATIONAL 198 chr. G = 0.975

A = 0.025

n/a

W212X

446

(52.5%)

-

-

DSCR8

Unknown

NM_032589.2

21q22.2

rs2836172

NCBI|NIHPDR-NORTH AMERICA 20 chr. A = 0.900 T = 0.100

AFD_EUR_PANEL-NORTH

AMERICA 48 chr. A = 1.000

AFD_AFR_PANEL-NORTH AMERICA 46

chr. A = 0.783 T = 0.217

AFD_CHN_PANEL-NORTH

AMERICA 48 chr. A = 0.854 T = 0.146

+

K79X

91

(13.2%)

-

-

EPHX1

Aromatic compound catabolism; xenobiotic metabolism

NM_000120.2

1q42.1

rs4986931

PAC1-EAST ASIA 46 chr. A = 0.978 G = 0.022 P1-MULTI-NATIONAL 202 chr. A = 0.990 G = 0.010 CAUC1-MULTI-NATIONAL 62 chr.

A = 1.000 G = 0.000

AFR1-MULTI-NATIONAL 48 chr. A = 1.000

G = 0.000

HISP1-CENTRAL/SOUTH AMERICA 46 chr. A = 0.978 G = 0.022

-

W97X

455 (78.7%)

+

-

FUT2

Carbohydrate metabolism; protein glycosylation

NM_000511.1

19q|3.3

rs1800030

PAC1-EAST ASIA 48 chr. G = 0.979 A = 0.021 P1-MULTI-NATIONAL 202 chr. G = 0.995

A = 0.005

CAUC1-MULTI-NATIONAL 60 chr. G = 1.000 A = 0.000

AFR1-MULTI-NATIONAL 48 chr.

G = 1.000 A = 0.000

HISP1-CENTRAL/SOUTH AMERICA 46 chr. G = 1.000 A = 0.000

-

W297X

346

(14.2%)

-

-

HPS4

Organelle biogenesis; protein stabilisation/targeting

NM_152843.1

22cen--q|2.3

rs3747129

JBIC-allele-EAST ASIA 1492 chr. G = 0.798 A = 0.202 AFD_EUR_PANEL-NORTH AMERICA 48 chr. G = 0.812 A = 0.188 AFD_AFR_PANEL-NORTH AMERICA 46 chr. G = 0.978 A = 0.022

AFD_CHN_PANEL-NORTH

AMERICA 48 chr. G = 0.750 A = 0.250

HapMap-CEU-EUROPE 120 chr. G = 0.825 A = 0.175

+

R246X

528

(53.4%)

-

+

IL17RB

Immuno-regulatory activity; regulation of cell growth

NM_018725.2

3p2|.1

rs1043261

JBIC-allele-EAST ASIA 1476 chr. C = 0.902 T = 0.098 HapMap-CEU-EUROPE 120 chr. C = 0.908 T = 0.092 AFD_EUR_PANEL-NORTH AMERICA 48 chr. C = 0.938 T = 0.062

AFD_AFR_PANEL-NORTH AMERICA 46

chr. C = 0.978 T = 0.022

AFD_CHN_PANEL-NORTH AMERICA 48

chr. C = 0.792 T = 0.208

+

Q484X

502

(3.6%)

-

+

KRTAP1-1

Cytoskeleton; intermediate filaments

NM_030967.2

17q12-q21

rs3213755

JBIC-allele-EAST ASIA 708 chr. C = 0.617 T = 0.383 HapMap-CEU-EUROPE 120 chr. G = 0.800 A = 0.200

+

Q51X

177

(71.2%)

-

-

LCE5A

Unknown

NM_178438.1

1q21.3

rs2282298

JBIC-allele-EAST ASIA 1504 chr. G = 0.979 A = 0.021 AFD_EUR_PANEL-NORTH

AMERICA 48 chr. C = 1.000

AFD_AFR_PANEL-NORTH

AMERICA 46 chr. C = 1.000

AFD_CHN_PANEL-NORTH

AMERICA 48 chr. C = 0.896 T = 0.104

-

R79X

118

(33.1%)

-

+

LIG4

DNA repair; cell cycle

NM_206937.1

13q33-q34

rs2232636

PAC1-EAST ASIA 46 chr. G = 1.000 A = 0.000 P1-MULTI-NATIONAL 202 chr. G = 0.995 A = 0.005

CAUC1-MULTI-NATIONAL 62 chr.

G = 1.000 A = 0.000

AFR1-MULTI-NATIONAL 48 chr.

G = 0.979 A = 0.021

HISP1-CENTRAL/SOUTH AMERICA 46 chr. G = 1.000 A = 0.000

-

W46X

911

(95%)

-

-

LPL

Lipoprotein metabolism

NM_000237.1

8p22

rs328

WIAF-CSNP-MITOGPOP5-MULTI-NATI-ONAL 112 chr. C = 0.982 G = 0.018 JBIC-allele-EAST ASIA 1458 chr. C = 0.860

G = 0.140

CEPH-MULTI-NATIONAL 184 chr.

C = 0.640G = 0.360

AFD_EUR_PANEL-NORTH AMERICA 44

chr. C = 0.727 G = 0.273

AFD_AFR_PANEL-NORTH AMERICA 42

chr. C = 0.952 G = 0.048

AFD_CHN_PANEL-NORTH AMERICA 46

chr. C = 0.935 G = 0.065

+

S474X

475

(0.2%)

-

-

MAGEE2

Unknown

NM_138703.2

Xq13.3

rs1343879

TSC_42_C-NORTH AMERICA 84 chr. C = 0.950 A = 0.050 C_42_A-EAST ASIA 84 chr. A = 0.650

C = 0.350

TSC_42_AA-NORTHAMERICA 84 chr.

C = 0.950 A = 0.050

HapMap-CEU-EUROPE 120 chr. C = 0.983 A = 0.017

-

E120X

523

(77.1%)

-

-

MS4A12

Signal transduction

NM_017716.1

11q12

rs2298553

JBIC-allele-EAST ASIA 726 chr. C = 0.585 T = 0.415 AFD_EUR_PANEL-NORTH

AMERICA 48 chr. C = 0.583 T = 0.417

AFD_AFR_PANEL-NORTH

AMERICA 42 chr. C = 0.548 T = 0.452

AFD_CHN_PANEL-NORTH

AMERICA 48 chr. C = 0.542 T = 0.458

+

Q71X

267

(73.4%)

+

-

OAS2

Immune response

NM_016817.1

12q24.2

rs15895

POOLED_CEPH-MULTI-NATIONAL 188 chr. A = 0.668 G = 0.332 CEPH-MULTI-NATIONAL 184 chr. C = 0.670 T = 0.330

SC_12_A-EAST ASIA 20 chr. G = 1.000

SC_12_AA-NORTH AMERICA 24 chr.

G = 0.830 A = 0.170

SC_12_C-NORTH AMERICA 24 chr.

G = 0.710 A = 0.290

SC_95_C-NORTH AMERICA 184 chr.

C = 0.590 T = 0.410

AFD_EUR_PANEL-NORTH AMERICA 48

chr. G = 0.562 A = 0.438

AFD_AFR_PANEL-NORTH

AMERICA 46 chr. G = 0.913 A = 0.087

AFD_CHN_PANEL-NORTH

AMERICA 48 chr. G = 1.000

+

W720X

727

(1%)

-

-

OVCH2

Proteolysis

NM_198185.1

11p15.4

rs4509745

HapMap-CEU-EUROPE chr.120 T = 0.658

C = 0.342

HapMap-HCB-EAST ASIA 88 chr. T = 0.705

C = 0.295

HapMap-JPT-EAST ASIA 88 chr. T = 0.614

C = 0.386

HapMap-YRI-WEST AFRICA 120 chr.

C = 0.783 T = 0.217

AFD_EUR_PANEL-NORTH AMERICA 44

chr. T = 0.568 C = 0.432

AFD_AFR_PANEL-NORTH AMERICA 46

chr. C = 0.609 T = 0.391

AFD_CHN_PANEL-NORTH AMERICA 48

chr. T = 0.583 C = 0.417

+

W556X

564

(1.4%)

-

-

POLE2

DNA repair

NM_002692.2

14q21- q22

rs3218790

NIHPDR-NORTH AMERICA 170 chr.

A = 0.988 T = 0.012

HapMap-CEU-EUROPE 120 chr. A = 1.000

HapMap-HCB-EAST ASIA 90 chr. A = 1.000

HapMap-JPT-EAST ASIA 88 chr. A = 1.000

HapMap-YRI-WEST AFRICA 120 chr.

A = 1.000

-

K443X

527

(15.9%)

+

-

SER-PINB11

Serine-type endopeptidase inhibitor activity

NM_080475.1

18

rs4940595

AfAm 12 chr. C = 0.667 A = 0.333

Caucasian 24 chr. A = 0.667 C = 0.333

Asian 12 chr. C = 0.667 A = 0.333

CEPH 12 chr. C = 0.667 A = 0.333

PDpanel 48 chr. A = 0.521 C = 0.479

AFD_EUR_PANEL-NORTH AMERICA 48

chr. T = 0.625 G = 0.375

AFD_AFR_PANEL-NORTH AMERICA 44

chr. G = 0.545 T = 0.455

AFD_CHN_PANEL-NORTH AMERICA 48

chr. G = 0.771 T = 0.229

+

E90X

392

(77%)

+

-

SMUG1

DNA repair

NM_014311.1

12q13.11-q13.3

rs2233919

NIHPDR-NORTH AMERICA 574 chr. C = 0.986 T = 0.014 PDR90 166 chr. C = 0.988 T = 0.012

-

Q3X

270

(98.9%)

+

-

SPTBN5

Actin cytoskeleton organisation and biogenesis

NM_016642.1

15q21

rs2271286

JBIC-allele-EAST ASIA 1482 chr. G = 0.951 A = 0.049

-

Q72X

3674

(98%)

-

-

TAP2

Immune response; protein transport and assembly

NM_000544.2

6p21.3

rs241448

CEPH-MULTI-NATIONAL 184 T = 0.700 C = 0.300 WIAF-CSNP-MITOGPOP5-MULTI-NATI-ONAL 48 chr. T = 0.812 C = 0.188

+

Q687X

703

(2.3%)

-

-

TAAR9

Signal transduction

NM_175057.1

6q23.2

rs2842899

HapMap-CEU-EUROPE 120 chr. T = 0.708 A = 0.292 HapMap-YRI-WEST AFRICA 120 chr. T = 0.883 A = 0.117

AFD_EUR_PANEL-NORTH AMERICA 48

chr. A = 0.812 T = 0.188

AFD_AFR_PANEL-NORTH AMERICA 46

chr. A = 0.783 T = 0.217

AFD_CHN_PANEL-NORTH

AMERICA 48 chr. A = 0.854 T = 0.146

+

Q61X

348

(82.5%)

-

-

TLR5

Immune response

NM_003268.3

1q41-q42

rs5744168

D-0-NORTH AMERICA 48 chr.C = 0.938 T = 0.062 E-0-NORTH AMERICA 40 chr. C = 0.925

T = 0.075

E-1-EUROPE 6 chr. C = 1.000

-

R392X

858

(54.3%)

-

+

TRPM1

Cation transport

NM_002420.3

15q13-q14

rs3784589

JBIC-allele-EAST ASIA 1502 chr. C = 0.965 A = 0.035 HapMap-CEU-EUROPE 120 chr. C = 0.942 A = 0.058

HapMap-HCB-EAST ASIA 90 chr. C = 1.000 HapMap-JPT-EAST ASIA 88 chr. C = 0.955 A = 0.045

HapMap-YRI-WEST AFRICA 118 chr. C = 0.958 A = 0.042

AFD_EUR_PANEL-NORTH AMERICA 48

chr C = 0.917 A = 0.083

AFD_AFR_PANEL-NORTH AMERICA 46

chr. C = 0.913 A = 0.087

AFD_CHN_PANEL-NORTH AMERICA 48

chr. C = 1.000

+

E1305X

1533

(14.9%)

-

-

UNC93A

Unknown

NM_018974.2

6q27

rs2235197

JBIC-allele-EAST ASIA 1484 chr. G = 0.852 A = 0.148

n/a

W151X

456

(66.9%)

-

-

ZNF34

Gene expression

NM_030580.2

8q24.3

rs2294120

JBIC-allele-EAST ASIA 1494 chr. C = 0.729 T = 0.271

n/a

Q56X

549

(89.8%)

+

+

Abbreviation: SNP = single nucleotide polymorphism.

a Gene functions are retrieved from the Entrez Gene database of NCBI [30].

b The accession numbers onto which the SNP-flanking sequences have been located.

c SNP ID corresponds to the dbSNP database SNP identifiers.

d The frequency information is as posted in dbSNP build 124.

e This information indicates whether or not a homozygous sample in a sample set was reported for the corresponding X-SNP and was collected from the dbSNP database 'summary of genotypes' section: 'n/a': no information was available, ' + ': homozygous genotype was reported, ' 2 ': no homozygous was reported.

f Length of the wild-type protein products. In parentheses are the percentages of the protein truncation at the C-terminus caused by the X-SNP.

g SNPs that may lead to nonsense-mediated mRNA decay are annotated by ' + '.

h SNPs occurring at CpG dinucleotides and thus can be hot spot mutations are annotated by ' + '.

We then carried out a theoretical evaluation of the possible biological consequences of the identified X-SNPs at the mRNA and protein levels. For example, NMD is a surveillance system that specifically eliminates transcripts that contain PTCs as a result of mutations in DNA or errors in RNA processing [15]. NMD usually requires a downstream intron and at least 50 - 55 nucleotides before the downstream exon - intron junction in order for a PTC to be recognised [27, 28]. Based on the 50 - 55 nucleotide rule, we analysed the locations of the X-SNPs with respect to the exon - intron boundaries and predicted that eight (28.6 per cent) X-SNPs (AGT-Q53X, APOC4-W47X, EPHX1-W97X, MS4A12-Q71X, POLE2- K443X, SERPINB11-E90X, SMUG1-Q3X and ZNF34- Q56X) may potentially cause mRNA degradation via NMD. Thus, at least these eight X-SNPs are likely to result in loss of gene function. Exceptions to this rule have also been reported [17], however, which suggests that the proportion of PTC-containing mRNAs undergoing mRNA degradation may, in fact, be larger. The reported allele frequencies of these X-SNPs ranged from rare (AGT-Q53X 0 -5 per cent; APOC4-W47X 0 - 5 per cent; EPHX1-W97X 0 - 2.2 per cent; POLE2-K443X 0 - 1.2 per cent; SMUG1-Q3X 1.2 - 1.4 per cent) to common (MS4A12-Q71X 41.5 - 45.8 per cent; SERPINB11-E90X 22.9 - 52.1 per cent; ZNF34-Q56X 27 per cent) (Table 1). Individuals with a homozygous state for two of these X-SNPs, namely MS4A12-Q71X and SERPINB11-E90X, were reported in dbSNP submissions, suggesting that, in such individuals, the levels of these truncated protein products are likely to be reduced by the NMD mechanism.

If no NMD occurs and the protein products are translated, then the PTCs lead to protein truncation at the C-terminus - the consequences of which vary depending on the degree of the truncation. For example, 17 (60.7 per cent) X-SNPs led to moderate to severe truncation at the C-terminus of the proteins (deletion of > 50 per cent of the amino acid sequence), which is likely radically to alter protein structure and function (Table 1, Figure 1). As an extreme example, SMUG1-Q3X, when translated, would yield only a two-amino acid peptide, which would presumably be non-functional (loss of function). Also, PTCs can destabilise the protein products by altering the protein-folding state or kinetics[14, 31] and may cause proteolysis. In addition, they may act as dominant-negative mutations[15] or cause exon skipping and alter the open reading frame [32]. Alternatively, mRNA molecules bearing a PTC closer to the 5' end can be still translated if an in-frame translationable AUG start codon is present downstream of the PTC [3335]. Such N-terminal truncated proteins can be fully or partially functional. For example, in the case of SMUG1-Q3X, there is an inframe AUG located at the 18th codon of the SMUG1 gene, which can be experimentally evaluated to determine if an N-terminal truncated SMUG1 protein is produced and functional. To summarise, the stability, structure and function of the protein products or transcripts may be affected by the X-SNPs described in this study, and experimental approaches are needed to evaluate their true biological effects.
https://static-content.springer.com/image/art%3A10.1186%2F1479-7364-2-5-274/MediaObjects/40246_2005_Article_147_Fig1_HTML.jpg
Figure 1

Percentage truncation induced by the X-SNPs in the corresponding proteins. The X-SNPs that can cause mRNA degradation via NMD are indicated by * in the left margin of the histogram.

Possible evolutionary explanations of common X-SNPs

The small number of validated X-SNPs identified suggests infrequent occurrence of the PTC-introducing variations in the human genome and thus agrees with the presence of selection against them [18]. By contrast with rare PTC-introducing mutations observed in human diseases, however, X-SNPs analysed in this study represented commonly occurring variations in humans: 22 X-SNPs (78.6 per cent) were found with minor allele frequencies of ≥ 5 per cent in at least one sample panel analysed (common X-SNPs) compared with only six rare X-SNPs (with < 5 per cent minor allele frequencies).

How can we explain the abundance of such common (and perhaps deleterious) X-SNPs in the human population? Possible scenarios are summarised in Figure 2. For example, one explanation could be that the truncated protein product may still be functional in the presence of the X-SNP. For instance, LPL-S474X was located only one amino acid prior to the natural termination codon; thus, it may not really alter the protein properties and thus may not be deleterious to cell function at all. Alternatively, the protein may not be essential for the fitness of human beings; in this case, the evolutionary pressure is relieved, which can lead to toleration of an increase in allele frequency of premature stop codons in human populations.
https://static-content.springer.com/image/art%3A10.1186%2F1479-7364-2-5-274/MediaObjects/40246_2005_Article_147_Fig2_HTML.jpg
Figure 2

How can we explain the allele frequencies of the X-SNPs? This figure presents a summary of possible biological consequences of X-SNPs. For simplicity, both deleterious and slightly deleterious variations are annotated as deleterious.

Another possibility is that X-SNPs may be capable of affecting protein function/the organism per se, but other factors might modify their effects. Here, we will assume that these PTCs represent both the strongly deleterious mutations that are a result of selection and quickly removed from the populations, as well as the slightly deleterious mutations that are subject to both selection and drift [36]. For example, these X-SNPs may be hot-spot mutations, where the new mutations introduce (slightly) deleterious alleles and thus increase the allele frequency, despite the selection. In order to assess whether some of these X-SNPs might in fact represent the hot-spot mutations, we analysed the immediate flanking sequences of each X-SNP. As a result, we found that 25 per cent (7/28) of X-SNPs (all common) had occurred at CpG dinucleotides (Table 1). These data suggest that these X-SNPs might have arisen from spontaneous deamination of methylcytosine leading to a thymine, and thus may represent hot-spot mutations [37].

Additionally, diploidy was suggested to relieve the tension of purifying selection and increase the tolerance for PTCs [38], which predicts a recessive effect or loss of function. All but one gene (MAGEE2) in Table 1 were located in autosomal chromosomes, which may also help to explain the frequency of the naturally occurring PTC polymorphisms in humans. Moreover, it is also likely that, even though (slightly) deleterious in a homozygous state, some X-SNPs can confer selective advantage to heterozygotes [39]. Alternatively, epistatic interactions of additional mutations, either on the same or different genes, may compensate for the (slightly) deleterious effects of the X-SNP [16, 40]. Furthermore, X-SNPs may be beneficial at present conditions, which may favour the positive selection of the X-SNPs and increase their allele frequencies. Moreover, if a PTC is located at the 5' end of a gene and there is a nearby in-frame initiation codon after that PTC, then the protein translation can re-initiate and a peptide with aminotruncation may be produced [3335]. Depending on the nature and extent of the truncation, the truncated peptide can fully or partially function and thus can, completely or to some extent, rescue the phenotype. There is a need for further studies to elucidate the molecular basis of the discrepancy and the determination of the biological differences between human disease-related mutations and naturally occurring stop codon-creating polymorphisms.

Frequency spectrum of X-SNPs in different human populations

Comparison of the population(s) and the minor allele frequencies of X-SNP entries in the dbSNP database[19] presented great variability across different human populations, at least in some cases (Table 1). For example, HSP4-R246X, IL17RB-Q484X, LPL-S474X, MS4A12-Q71X, OVCH2-W556X, SERPINB11-E90X, TAAR9-Q61X and TRPM1-E1305X were detected in samples from African, Asian and Caucasian backgrounds. This might mean that either these X-SNPs have been inherited from a common ancestor or they represent hot-spot mutations (HSP4-R246X and IL17RB Q484X occurred at CpG dinucleotides and thus might in fact be hot-spot mutations; see Table 1). By contrast, CYPC19-W212X (African and Asian), EPHX1-W97X (Asian and Hispanic) and OAS2-W720X (African and European) were detected in some populations but not in others. In addition, there were three X-SNPs that were found exclusively in one population: FUT2-R297X and LCE5A-R79X in Asian and LIG4-W46X in African samples. Either different selection in different populations or the occurrence of founder effect/genetic drift may explain the population spectrum of these SNPs [16, 41].

Conclusion

In conclusion, we have evaluated SNPs that introduce PTCs in the human genome that can potentially affect the stability of transcripts and their protein products. Although there is considerable information regarding the PTC-creating mutations in human genetic diseases, to date, there has been no systematic study reporting on the PTC-causing polymorphisms in the human genome and their evolutionary and biological roles in humans. Our results indicated that the allelic frequencies of the disease-causing PTC-creating mutations and polymorphisms display a marked difference. These X-SNPs were found in a variety of proteins with different cellular functions (signal transduction, DNA repair, transcription, immune response, drug metabolism etc; Table 1). A search of literature reports and the Human Gene Mutation Database[42] showed that a fraction of these genes have already been implicated in human diseases: AGT in essential hypertension; [43]HPS4 in Hermansky - Pudlak syndrome type 4;[44]LPL in disorders of lipoprotein metabolism;[45] and TLR5 in pneumonia caused by Legionella pneumophila [46]. In the latter case, the TLR5-R392X SNP was functionally characterised and found to be defective in flagellin signalling and associated with the pneumonia susceptibility [46]. In the case of the TAP2-Q687X SNP, TAP2-Q687 was reported to be a part of a haplotype associated with a reduced risk of insulin-dependent diabetes mellitus in a small sample set [47]. Our data suggest a potential deleterious effect for X-SNPs identified in this study; however, their true biological consequences and potential roles in human disease and health have yet to be experimentally verified and identified.

Electronic database information

Declarations

Acknowledgements

The authors thank Baris Tuncertan and Mehjabeen Shariff for automatic retrieval of the SNPs from the dbSNP database and Stewart Cho for editing the manuscript. This work was supported by a grant (BCTR0100627) from the Susan Komen Breast Cancer Foundation, USA. Sevtap Savas is supported, in part, by a "CIHR Strategic Training Program Grant - The Samuel Lunenfeld Research Institute Training Program: Applying Genomics to Human Health" fellowship.

Authors’ Affiliations

(1)
Fred A. Litwin Centre for Cancer Genetics, Samuel Lunenfeld Research Institute, Mount Sinai Hospital
(2)
Department of Pathology and Laboratory Medicine, Mount Sinai Hospital
(3)
Department of Laboratory Medicine and Pathobiology, University of Toronto
(4)
Cancer Drug Development Laboratory, Translational Genomics Research Institure

References

  1. Gray IC, Campbell DA, Spurr NK: Single nucleotide polymorphisms as tools in human genetics. Hum Mol Genet. 2000, 9: 2403-2408. 10.1093/hmg/9.16.2403.View ArticlePubMed
  2. Miller RD, Kwok PY: The birth and death of human single-nucleotide polymorphisms: New experimental evidence and implications for human history and medicine. Hum Mol Genet. 2001, 10: 2195-2198. 10.1093/hmg/10.20.2195.View ArticlePubMed
  3. Taylor JG, Choi EH, Foster CB, Chanock SJ: Using genetic variation to study human disease. Trends Mol Med. 2001, 7: 507-512. 10.1016/S1471-4914(01)02183-9.View ArticlePubMed
  4. Shastry BK: SNP alleles in human disease and evolution. J Hum Genet. 2002, 47: 561-566. 10.1007/s100380200086.View ArticlePubMed
  5. Brookes AJ: The essence of SNPs. Gene. 1999, 234: 177-186. 10.1016/S0378-1119(99)00219-X.View ArticlePubMed
  6. Lercher MJ, Hurst LD: Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 2002, 18: 337-340. 10.1016/S0168-9525(02)02669-0.View ArticlePubMed
  7. Chakravarti A: Population genetics -- Making sense out of sequence. Nat Genet. 1999, 21 (1): 56-60.View ArticlePubMed
  8. Thomas FJ, McLeod HL, Watters JW: Pharmacogenomics: The influence of genomic variation on drug response. Curr Top Med Chem. 2004, 4: 1399-1409.PubMed
  9. Sunyaev S, Ramensky V, Koch I, et al: Prediction of deleterious human alleles. Hum Mol Genet. 2001, 10: 591-597. 10.1093/hmg/10.6.591.View ArticlePubMed
  10. Wang Z, Moult J: SNPs, protein structure, and disease. Hum Mutat. 2001, 17: 263-270. 10.1002/humu.22.View ArticlePubMed
  11. Ng PC, Henikoff S: Accounting for human polymorphisms predicted to affect protein function. Genome Res. 2002, 12: 436-446. 10.1101/gr.212802.PubMed CentralView ArticlePubMed
  12. Ramensky V, Bork P, Sunyaev S: Human non-synonymous SNPs: Server and survey. Nucleic Acids Res. 2002, 30: 3894-3900. 10.1093/nar/gkf493.PubMed CentralView ArticlePubMed
  13. Byers PH: Killing the messenger: New insights into nonsensemediated mRNA decay. J Clin Invest. 2002, 109: 3-6.PubMed CentralView ArticlePubMed
  14. Gregersen N, Bross P, Jorgensen MM, et al: Defective folding and rapid degradation of mutant proteins is a common disease mechanism in genetic disorders. J Inherit Metab Dis. 2000, 23: 441-447. 10.1023/A:1005663728291.View ArticlePubMed
  15. Schell T, Kulozik AE, Hentze MW: Integration of splicing, transport and translation to achieve mRNA quality control by the nonsense-mediated decay pathway. Genome Biol. 2002, 3: REVIEWS1006-PubMed CentralView ArticlePubMed
  16. Fay JC, Wu CI: Sequence divergence, functional constraint, and selection in protein evolution. Annu Rev Genomics Hum Genet. 2003, 4: 213-235. 10.1146/annurev.genom.4.020303.162528.View ArticlePubMed
  17. Frischmeyer PA, Dietz HC: Nonsense-mediated mRNA decay in health and disease. Hum Mol Gen. 1999, 8: 1893-1900. 10.1093/hmg/8.10.1893.View ArticlePubMed
  18. Sawyer SL, Berglind LC, Brookes AJ: Negligible validation rate for public domain stop-codon SNPs. Human Mut. 2003, 22: 252-254. 10.1002/humu.10256.View Article
  19. Sherry ST, Ward MH, Kholodov M, et al: dbSNP: The NCBI database of genetic variation. Nucleic Acids Res. 2001, 29: 308-311. 10.1093/nar/29.1.308.PubMed CentralView ArticlePubMed
  20. Benson DA, Karsch-Mizrachi I, Lipman DJ, et al: GenBank: Update. Nucleic Acids Res. 2004, 32: D23-D26. 10.1093/nar/gkh045.PubMed CentralView ArticlePubMed
  21. Clifford RJ, Edmonson MN, Nguyen C, Buetow KH: Large-scale analysis of non-synonymous coding region single nucleotide polymorphisms. Bioinformatics. 2004, 20: 1006-1014. 10.1093/bioinformatics/bth029.View ArticlePubMed
  22. Savas S, Ahmad F, Kim DY, et al: Candidate nsSNPs that can affect the functions and interactions of the cell cycle genes. Proteins. 2005, 58: 697-705.View ArticlePubMed
  23. Wheeler DL, Church DM, Edgar R, et al: Database resources of the National Center for Biotechnology Information: Update. Nucleic Acids Res. 2004, 32: D35-D40. 10.1093/nar/gkh073.PubMed CentralView ArticlePubMed
  24. Estivill X, Cheung J, Pujana MA, et al: Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genome. Hum Mol Genet. 2002, 11: 1987-1995. 10.1093/hmg/11.17.1987.View ArticlePubMed
  25. Savas S, Kim DY, Ahmad MF, et al: Identifying functional genetic variants in DNA repair pathway using protein conservation analysis. Cancer Epidemiol Biomarkers Prev. 2004, 13: 801-807.PubMed
  26. Pruitt KD, Maglott DR: Ref Seq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2001, 29: 137-140. 10.1093/nar/29.1.137.PubMed CentralView ArticlePubMed
  27. Thermann R, Neu-Yilik G, Deters A, et al: Binary specification of nonsense codons by splicing and cytoplasmic translation. EMBO J. 1998, 17: 3484-3494. 10.1093/emboj/17.12.3484.PubMed CentralView ArticlePubMed
  28. Zhang J, Sun X, Qian Y, et al: At least one intron is required for the nonsense-mediated decay of triosephosphate isomerase mRNA: A possible link between nuclear splicing and cytoplasmic translation. Mol Cell Biol. 1998, 18: 5272-5283.PubMed CentralView ArticlePubMed
  29. Baker KE, Parker R: Nonsense-mediated mRNA decay: Terminating erroneous gene expression. Curr Opin Cell Biol. 2004, 16: 293-299. 10.1016/j.ceb.2004.03.003.View ArticlePubMed
  30. Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: Gene-centered information at NCBI. Nucleic Acids Res. 2005, 33: D54-D58. 10.1093/nar/gni052.PubMed CentralView ArticlePubMed
  31. Williams RS, Chasman DI, Hau DD, et al: Detection of protein folding defects caused by BRCA1-BRCT truncation and missense mutations. J Biol Chem. 2003, 278: 53007-53016. 10.1074/jbc.M310182200.View ArticlePubMed
  32. Liu HX, Cartegni L, Zhang MQ, Krainer AR: A mechanism for exon skipping caused by nonsense or missense mutations in BRCA1 and other genes. Nat Genet. 2001, 27: 55-58.PubMed
  33. Ozisik G, Mantovani G, Achermann JC, et al: An alternate translation initiation site circumvents an amino-terminal DAX1 nonsense mutation leading to a mild form of X-linked adrenal hypoplasia congenita. J Clin Endocrinol Metab. 2003, 88: 417-423. 10.1210/jc.2002-021034.View ArticlePubMed
  34. Heppner Goss K, Trzepacz C, Tuohy TM, Groden J: Attenuated APC alleles produce functional protein from internal translation initiation. Proc Natl Acad Sci USA. 2002, 99: 8161-8166. 10.1073/pnas.112072199.PubMed CentralView ArticlePubMed
  35. Howard MT, Malik N, Anderson CB, et al: Attenuation of an amino-terminal premature stop codon mutation in the ATRX gene by an alternative mode of translational initiation. J Med Genet. 2004, 41: 951-956. 10.1136/jmg.2004.020248.PubMed CentralView ArticlePubMed
  36. Ohta T: Near-neutrality in evolution of genes and gene regulation. Proc Natl Acad Sci USA. 2002, 99: 16134-16137. 10.1073/pnas.252626899.PubMed CentralView ArticlePubMed
  37. Tomso DJ, Bell DA: Sequence context at human single nucleotide polymorphisms: Overrepresentation of CpG dinucleotide at polymorphic sites and suppression of variation in CpG islands. J Mol Biol. 2003, 327: 303-308. 10.1016/S0022-2836(03)00120-7.View ArticlePubMed
  38. Xing Y, Lee CJ: Negative selection pressure against premature protein truncation is reduced by alternative splicing and diploidy. Trends Genet. 2004, 20: 472-475. 10.1016/j.tig.2004.07.009.View ArticlePubMed
  39. Dean M, Carrington M, O'Brien SJ: Balanced polymorphism selected by genetic versus infectious human disease. Annu Rev Genomics Hum Genet. 2002, 3: 263-292. 10.1146/annurev.genom.3.022502.103149.View ArticlePubMed
  40. Cordell HJ: Epistasis: What it means, what it doesn't mean, and statistical methods to detect it in humans. Hum Mol Genet. 2002, 11: 2463-2468. 10.1093/hmg/11.20.2463.View ArticlePubMed
  41. Cavalli-Sforza LL, Feldman MW: The application of molecular genetic approaches to the study of human evolution. Nat Genet. 2003, 33: 266-275. 10.1038/ng1113.View ArticlePubMed
  42. Stenson PD, Ball EV, Mort M, et al: Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat. 2003, 21: 577-581. 10.1002/humu.10212.View ArticlePubMed
  43. Jeunemaitre X, Soubrier F, Kotelevtsev YV, et al: Molecular basis of human hypertension: Role of angiotensinogen. Cell. 1992, 71: 7-20.View Article
  44. Anderson PD, Huizing M, Claassen DA, et al: Hermansky - Pudlak syndrome type 4 (HPS-4): Clinical and molecular characteristics. Hum Genet. 2003, 113: 10-17.PubMed
  45. Otarod JK, Goldberg IJ: Lipoprotein lipase and its role in regulation of plasma lipoproteins and cardiac risk. Curr Atheroscler Rep. 2004, 6: 335-342. 10.1007/s11883-004-0043-4.View ArticlePubMed
  46. Hawn TR, Verbon A, Lettinga KD, et al: A common dominant TLR5 stop codon polymorphism abolishes flagellin signaling and is associated with susceptibility to legionnaires disease. J Exp Med. 2003, 198: 1563-1572. 10.1084/jem.20031220.PubMed CentralView ArticlePubMed
  47. Clonna M, Bresnahan M, Bahram S, et al: Allelic variants of the human putative peptide transporter involved in antigen processing. Proc Natl Acad Sci USA. 1992, 89: 3932-3936. 10.1073/pnas.89.9.3932.View Article

Copyright

© Henry Stewart Publications 2006