Novel variants of major drug-metabolising enzyme genes in diverse African populations and their predicted functional effects

Pharmacogenetics enables personalised therapy based on genetic profiling and is increasingly applied in drug discovery. Medicines are developed and used together with pharmacodiagnostic tools to achieve desired drug efficacy and safety margins. Genetic polymorphism of drug-metabolising enzymes such as cytochrome P450s (CYPs) and N-acetyltransferases (NATs) has been widely studied in Caucasian and Asian populations, yet studies on African variants have been less extensive. The aim of the present study was to search for novel variants of CYP2C9, CYP2C19, CYP2D6 and NAT2 genes in Africans, with a particular focus on their prevalence in different populations, their relevance to enzyme functionality and their potential for personalised therapy. Blood samples from various ethnic groups were obtained from the AiBST Biobank of African Populations. The nine exons and exon-intron junctions of the CYP genes and exon 2 of NAT2 were analysed by direct DNA sequencing. Computational tools were used for the identification, haplotype analysis and prediction of functional effects of novel single nucleotide polymorphisms (SNPs). Novel SNPs were discovered in all four genes, grouped to existing haplotypes or assigned new allele names, if possible. The functional effects of non-synonymous SNPs were predicted and known African-specific variants were confirmed, but no significant differences were found in the frequencies of SNPs between African ethnicities. The low prevalence of our novel variants and most known functional alleles is consistent with the generally high level of diversity in gene loci of African populations. This indicates that profiles of rare variants reflecting interindividual variability might become the most relevant pharmacodiagnostic tools explaining Africans' diversity in drug response.


Introduction
Pharmacogenetics describes patients' variation in response to therapy due to genetic factors.
Pharmacogenetics-based therapy is of special interest for drugs with narrow therapeutic indices, where impairment in metabolic activity might cause difficulties in dose adjustment, resulting in increased susceptibility to adverse drug reactions (ADRs). The cytochrome P450 enzymes (CYPs) metabolise more than 80 per cent of clinically used drugs and most of them exhibit functionally significant genetic polymorphisms. The genes encoding CYP2C9, CYP2C19 and CYP2D6, as well as N-acetyltransferase 2 (NAT2), have been most extensively studied across various populations. 1,2 The presence of novel variants remains to be ascertained in African populations, however, particularly rare (,1 per cent frequency) single nucleotide polymorphisms (SNPs), which may contribute to a better understanding of interindividual variation in the metabolism of drugs.
The human CYP2C subfamily contains four highly homologous genes -2C8, 2C9, 2C18 and 2C19 -which are located in a cluster on chromosome 10. 3 CYP2C9 is the main CYP2C enzyme, constituting 20 per cent of total human liver microsomal P450 content. 4 CYP2C9 and CYP2C19 genes each contain nine exons and encode proteins of 490 amino acids in length. Although these genes are highly homologous (92 per cent), the enzymes differ in terms of substrate specificities. 5 Major variations in the occurrence of polymorphisms in both CYP2C9 and CYP2C19 genes have been reported in various populations. CYP2C9 variants CYP2C9*2 and CYP2C9*3 are the most common and occur at frequencies of 0.11 and 0.08, respectively, in Caucasians. 6 Population-based pharmacokinetics-pharmacodynamics modelling of their effects has been explored for revising labels of CYP2C9 substrate drugs. 7 Testing for CYP2C9 genotypes can be used to predict the starting dose of the anticoagulant drug warfarin to avoid excessive bleeding episodes. 8 Other drugs affected by CYP2C9 polymorphism are the antidiabetic agents glipizide and tolbutamide, the antiepileptic agent phenytoin, the antihypertensive drug losartan and non-steroidal anti-inflammatory drugs (NSAIDs) such as ibuprofen and diclofenac. 9 CYP2C19 metabolises omeprazole, diazepam and proguanil to a major extent. The common allelic variants, such as CYP2C19*2 and CYP2C19*3, cause reduced enzyme activity and contribute to the poor metabolism of substrate drugs. 10 A polymorphism in the promoter region has, however, been associated with increased enzyme activity. 11 Individuals carrying this variant may therefore require a higher dosage in order to achieve the therapeutic effect.
CYP2D6 metabolises a wide range of drugs, such as antiarrhythmic agents, tricyclic antidepressants, neuroleptics and anti-cancer agents. 12 CYP2D6 is the most polymorphic CYP, with alleles causing a spectrum of phenotypic responses. The presence of multiple copies of the gene results in individuals described as ultra-rapid metabolisers. For example, individuals carrying duplicated or multi-duplicated active CYP2D6 genes are very common among Ethiopians, compared with Caucasian, Oriental and other Black populations. 13 By contrast, whole gene deletions causing poor metaboliser phenotypes, have been observed across all populations. The Africanspecific alleles CYP2D6*17 and CYP2D6*29 cause reduced enzyme activity; individuals homozygous for these alleles are classified as intermediate metabolisers.
Overall, Africans metabolise CYP2D6 substrates at a slower rate than Caucasians owing to the higher prevalence of these reduced-function alleles. 14 So far, NAT2 has been found to comprise 19 major known haplotypes. Important drugs metabolised by this enzyme include the anti-tuberculosis drug isoniazid and the antibiotic co-trimoxazole. Some polymorphisms of NAT2 have been shown to affect the acetylation of these drugs and this may result in toxic side effects. 15,16 The most commonly known alleles are NAT2*5, NAT2*6, NAT2*7 and the African-specific NAT2*14. In addition, other SNPs have been discovered and are awaiting characterisation of their phenotypic effects.
In clinical pharmacogenetics, we aim to optimise therapeutic outcome by prescribing drugs to patients at doses that are predicted to be efficacious and safe. Knowledge of the types of genetic variants of major drug-metabolising enzymes and their frequency in the population is therefore important for the design and deployment of pharmacodiagnostic tools to guide drug prescription. Only a few studies on genotypephenotype relationships of drug effects have been carried out in African populations. 16 -20 Therefore, limited knowledge of polymorphisms and their impact in Africans may underestimate the importance of clinical applications of pharmacogenetics. Here, we report novel variants of the CYP2C9, CYP2C19, CYP2D6 and NAT2 genes found in African populations and their predicted functional effects.

DNA samples
The study was carried out according to the Declaration of Helsinki (2000) of the World Medical Association and was approved by the Ethical Review Boards of Kenya, Nigeria, Tanzania and Zimbabwe. Informed consent was obtained from volunteers of the following ethnic groups: Hausa (20), Ibo (20), Luo (30), Maasai (13), San (40), Shona (23), Venda (9), Yoruba (20) and Tanzanian Mixed Bantu (12). Ethnicity was assigned based on the submission that parents and grandparents of the volunteers were of the same self-identified ethnic group. The exact numbers of samples analysed per gene are shown in Tables S1-S4 in the Appendix. DNA was extracted from whole blood samples stored in the AiBST Biobank of African Populations 21 using the QIAamp DNA Blood Mini Kit (Qiagen, KJ Venlo, The Netherlands).

PCR and sequencing
Primers were designed using SNPBox and Primer3 software. 22,23 Their specificity for each gene studied was confirmed by a BLAST analysis search and comparison of genomic sequences in the National Center for Biotechnology Information (NCBI) databases (http://blast.ncbi.nlm.nih.gov/Blast.cgi).
Identical primers were used for the polymerase chain reaction (PCR) and sequencing, except where otherwise stated (Table S5). First-step exon amplification mixtures (20 ml) contained 1x TiTaq buffer, 0.25 mM deoxyribonucleotide triphosphates, 0.5 mM of each primer and 0.25 units of TiTaq and DNA template (5 ng/ml). For PCR, an initial denaturation at 94 o C for two minutes was followed by 35 cycles at 94 o C for 30 seconds, 59 o C for 30 seconds and 72 o C for 60 seconds. Sequencing reactions were started at 96 o C for one minute followed by 25 cycles at 96 o C for ten seconds, 50 o C for five seconds and 60 o C for four minutes, and resolved on an ABI 3730 DNA Analyzer (Applied Biosystems, Brussels, Belgium).

Data analysis
Identification of SNPs was carried out using the novoSNP v2.1.9 software package. 24 Reference sequences were NC_000010.9 for CYP2C9, NC_000010.9 for CYP2C19, M33388 for CYP2D6 and NC_000008.9 for NAT2. All identified SNPs were compared with the NCBI Single Nucleotide Polymorphism database (dbSNP). 25 As SNPs can cause the introduction of pre-microRNA (miRNA) sites, this was included as part of the annotation in the novoSNP analysis procedure. Frequencies of SNPs were calculated using Genepop. 26 HaploView v3.31 27 was used to determine haplotypes from sequence genotype data. Linkage disequilibrium plots were generated to assess the extent to which SNPs were likely to be linked and hence likely to occur on the same haplotype with logarithm of odds (LOD) score .3. The HaploView Tagger tool was used to estimate which SNPs were likely to be tagged by a single SNP in a predicted haplotype (threshold r 2 .0.8).
Prediction of functional effects of non-synonymous SNPs Functional effects of non-synonymous SNPs were predicted using the Polyphen prediction programme 28 based on position-specific independent counts (PSIC) scores of multiple sequence alignments, as well as structural information, if available. The programme predicts the functional effects of SNPs based on occurrence in active/binding sites or in transmembrane regions, interference with disulphide or other bonds, compatibility with homologous sequences at that position, as well as mapping to known three-dimensional protein structures or validated homology models. Protein sequence accession numbers were obtained from Swiss-Prot 29 as CYP2C9: P11712; CYP2C19: P33261; CYP2D6: P10635; NAT2: P11245.

Allele nomenclature
Allele nomenclature is assigned according to the Human Cytochrome P450 (CYP) Allele Nomenclature Committee 30 and the Arylamine Nacetyltransferase Gene Nomenclature Committee. 31 Alleles, as borne by specific SNPs, are assigned numbers -for example, CYP2C9*9 to define the 10535A . G mutation, the presence of which results in the amino acid change H251R.

Results
In our African populations, novel and known SNPs were found in all drug-metabolising enzyme genes studied (Tables S1-S4). Novel SNPs for CYP2C9, CYP2C19 and CYP2D6 were grouped and assigned to haplotypes or groups of other known mutations if possible (Table 1). We mainly looked at the non-synonymous SNPs, since these are used to determine the eventual assignment of new functional alleles. CYP2C9 42519T . C (I327T) and 50341G . T (V490F) were assigned the new allele names CYP2C9*31 and CYP2C9*32, respectively. For CYP2C19, the 17869G . C/80161A . G (R186P/I331V) combination was assigned the new allele name CYP2C19*22. It was not possible to assign new haplotypes/alleles for CYP2C9 50294A . G (N474S) and CYP2C19 12690G . A (V113I) because their linkage with other alleles such as CYP2C9*9 (10535A . G; H251R) and CYP2C19*2 (19154G . A; P227P), respectively, could not be excluded. Known mutations also had some synonymous SNPs, as well as non-coding SNPs, grouped to them (eg CYP2C9*9, CYP2C19*12 and CYP2C19*13). It appears that the novel CYP2D61608 G . A (V119M) SNP is found on the known CYP2D6*29 allele, which is defined by 1659G . A (V136M) and 3183G . A (V338M). This haplotype group was therefore assigned the new name CYP2D6*70. New alleles were not assigned for CYP2D6 1621G . T (R123L) and 4057G . A (G445E), since further work is required fully to establish the haplotypes. The novel NAT2 SNPs did not appear to be linked to any other SNPs. HaploView-determined tag SNPs for NAT2 were used to determine the major haplotype frequencies (Figure 1).
In CYP2C9 (Table S1), three out of six nonsynonymous SNPs -42519T . C (I327T), 50294A . G (N474S) and 50341G . T (V490F)were novel. Of these, I327T and V490F changes are predicted to have a functional effect (Table 1); however, further inference of these amino acid changes with crystal structure information 32 and Gotoh's sequence alignments 33 indicates that they may not influence substrate recognition and binding. The most common non-synonymous CYP2C9 allele in this study was CYP2C9*9 (10535A . G; H251R), which is predicted to be damaging to enzyme function, although phenotypic studies in African individuals have shown no effect on the metabolism of the antiepileptic drug phenytoin. By contrast, the other known non-synonymous SNPs, such as CYP2C9*5 (42619C . G; D360E) and CYP2C9*6 (10601delA; K273fs) (Table S1), did cause reduced enzyme activity. 17 The two novel non-synonymous SNPs discovered in CYP2C19 (Table S1), 12690G . A (V113I) in exon 3 and 17869G . C (R186P) in exon 4, seem to cause very different effects on enzyme function, according to the physicochemical character of their amino acid changes (Table 1). Whereas the effect of V113I may be negligible, the change from the basic arginine to proline at position 186 seems to be functionally damaging, as predicted (PSIC score ¼ 3.159).
Three novel non-synonymous SNPs were found in CYP2D6: 1608G . A (V119M), 1621G . T (R123L) and 4057G . A (G445E) ( Table S3). Whereas the V119M and the R123L changes were predicted to have no effect on enzyme function (Table 1), they are located in the substrate recognition site SRS1. The G445E substitution may be functionally important (PSIC score ¼ 3.063) owing to its close proximity to the 443 site, which is critical for the heme ligand binding in this enzyme according to the crystal structure. 34 Consistent with other African data, 35,36 the most common CYP2D6 haplotypes contributing to the variability of drug response were CYP2D6*2 (2850C . T; R296C and 4180G . C; S486T), CYP2D6*17 (1023C . T; T107I) and CYP2D6*29 (1659G . A; V136M and 3183G . A; V338M) (Table S3).
Four novel amino acid-changing SNPs were detected in NAT2 (Table S4). The 641C . T (T214I) was predicted to have an effect on enzyme function (Table 1) because the amino acid at this position was predicted to be involved in coenzyme A ligand binding as part of the acetylation process. The 589C . T (R197X) results in a stop codon being introduced and hence no protein is expressed. The most common alleles of NAT2 in this study were NAT2*5 (341T . C; I114T) and NAT2*6 (590G . A; R197Q) (Table S4), which contribute largely to the slow acetylator phenotype in African populations. Figure 1 shows NAT2 haplotypes and their frequencies in the total population studied. The most common sub-haplotypes were NAT2*6A and NAT2*5B, which affect enzyme function, followed by the wild type NAT2*4 and NAT2*12A, which do not impair acetylation.
In addition to non-synonymous SNPs, numerous novel synonymous SNPs, SNPs in introns and at splice site junctions, were identified. SNPs at splice site junctions were investigated, but none of the novel SNPs were located within the most critical -1 to -2 positions of the acceptor sites or the -2 to þ4 positions of the donor sites.

Discussion
Major genetic variability in drug-metabolising enzymes has been reported in Caucasian and Asian populations. 37 The aim of this study was to search for novel variants of the highly polymorphic cytochrome P450 (CYP2C9, CYP2C19, CYP2D6) and N-acetyltransferase 2 (NAT2) genes in Africans, using representative samples from our newly established Biobank. 21 This analysis was focused on the occurrence of alleles in African populations, their potential effects on enzyme function and the applicability of such data to personalised therapy.

African populations
Certain SNPs or haplotypes that have been reported as prevalent and functionally important in other populations are rare or have not yet been detected in African populations. For example, CYP2C9*2 (R144C) and CYP2C9*3 (I359L), while extensively studied in Asian and Caucasian populations and identified as rare alleles in African Americans (frequency 1 per cent), were not found in Africans, neither in this study nor in a Beninese population. 38 By contrast, CYP2C9*5 (D360E) and CYP2C9*6 (K273fs) have been identified in African populations, although at low frequency (frequency ¼ 0.01; Table S1). CYP2C9*5 causes impaired enzyme activity, 6 and CYP2C9*6, first found in African Americans, is associated with phenytoin toxicity. 39 The importance of CYP2C9*8 (R150H), CYP2C9*9 (H251R) and CYP2C9*11 (R335W), which were detected in limited studies in Africans 17 (partly including the present study), and the distribution of poor metabolisers in African populations remain unclear.
The US Food and Drug Administration (FDA) has recommended genotyping for CYP2C9*2 and CYP2C9*3 with the aim of better use of warfarin. 40 Since these variants are practically absent in populations of African origin, their use in current pharmacodiagnostic kits that identify individuals carrying CYP2C9*2 and CYP2C9*3 may not be applicable in these populations. Test kits that detect CYP2C9*5, CYP2C9*6, CYP2C9*8, CYP2C9*9 and CYP2C9*11, as well as our novel SNPs, should be more predictive of the clinical response to CYP2C9 substrate drugs in Africans. Before such tools can be developed and deployed for clinical use, however, further studies are required to establish the frequencies of these alleles in larger African populations, in addition to genotype -phenotype studies to establish their functional relevance.
CYP2C19*2 (splicing defect) and CYP2C19*3 (W212X) have been recommended as biomarkers for the administration of certain CYP2C19 substrates. 41 The CYP2C19 poor-metaboliser phenotype is detected in two to four per cent of Caucasians and in about 20 per cent of Asians, and these two variants account for 99 per cent of these poor metaboliser phenotypes. 42,43 Whereas CYP2C19*2 was the most frequent known defective variant in our study (frequency ¼ 0.15; Table S2), we and various genotype -phenotype correlation studies have found CYP2C19*3 to be rare in most African populations (frequency ¼ 0.01; Table S2 44 ). We also identified one individual in the Maasai ethnic group who was heterozygous for this allele, and a few heterozygous individuals have previously been reported in a Tanzanian population. 20 Earlier data show that CYP2C19*2 accounts for over 70 per cent of slow metabolisers of S-mephenytoin. 45 The missing 30 per cent might be made up by CYP2C19*3 and other variants such as CYP2C19*12, CYP2C19*13 and CYP2C19*15, which would make these SNPs important contenders to include in genotyping panels for diagnostic purposes in Africans.
Our analysis of diverse African populations confirmed that CYP2D6*17 (T107I) and CYP2D6*29 Based on the highly polymorphic CYP2D6, we used principal component analysis to investigate inter-ethnic variability. The fact that no significant differences were detected across ethnicities (data not shown) could be due to our small sample sizes; however, our data are consistent with a recent study illustrating that CYP2D6 shows a high frequency of altered activity variants but no clear population structure. 47 It may also imply that the phenotype status of those populations is not significantly different either.
It has been speculated that the variation in acetylator (NAT2) status across major world populations reflects differences in dietary habits or the environment. There is a high prevalence of slow and intermediate acetylators in African populations, however, due to the common NAT2*5 (I114T), NAT2*6 (R197Q) and NAT2*14 (R64Q) alleles, which contribute largely to the slow acetylator phenotype. This is consistent with our data ( Figure 1) and with a recent study of sub-Saharan populations which also indicates that the NAT2*5B and NAT2*6A haplotypes are more common than the wild-type haplotype NAT2*4. 1,48 Enzyme function Some mutations in coding regions cause amino acid changes that result in alterations of enzyme activity, substrate selectivity and, sometimes, protein stability. Ensuing functional differences cause different metaboliser phenotypes. So far, over 30 such variants have been reported for CYP2C9, approximately 20 for CYP2C19 and over 60 for CYP2D6. 30 We have predicted functional effects of novel nonsynonymous SNPs discovered in this study (Table 1). These predictions were based on amino acid chemistry, conservation in the alignment of known sequences from the same protein families, and solved structures or homology modelling. Crystal structures of CYP2C9 and CYP2D6 have been reported 32,34 and structures of the other enzymes have been approximated by homology modelling. 49,50 It is assumed that such approximation is sufficiently accurate to predict functional effects in substrate recognition, binding and catalysis of reactions. 51 Amino acid changes with a PSIC score of less than 1 are assumed not to be involved in any functional sites and are predicted not to affect enzyme function (eg N474S in CYP2C9, V113I in CYP2C19, V119M in CYP2D6, and I158L and I270T in NAT2). Some changes with PSIC scores slightly above 1 may still have modest effects on enzyme function -for example, R123L in CYP2D6 (PSIC score ¼ 1.236). As shown in a previous study, in which the CYP2D6 sequence was aligned with Gotoh's sequence, 33 however, this residue is involved in the substrate recognition site SRS1. 52 The T214I change in NAT2 (PSIC score ¼ 1.257) seems to interfere with enzyme function because this residue is important for the interaction with the co-enzyme A ligand, according to homology model prediction.
The effect of the R186P change in CYP2C19 leads to a change in electrostatic charge and possibly geometry; hence, it is predicted to affect the protein dramatically, giving a high PSIC score. The high score observed for G445E in CYP2D6 might be due to its interaction with position 443, which is important for heme-ligand binding, 34 and therefore has a high probability of affecting enzyme function.
Whereas some defective splice site variants are well understood -for example, CYP2D6*4 (1846G . A), which occurs at the zero acceptor position of exon 4 -functional indications are less clear if mutations lie further away from splice site junctions. Rogan et al. 53 have used information theory analysis to show how other intronic and synonymous mutations may contribute to splice site effects in CYP genes. 53 For example, the defective allele CYP2C19*2 (19154G . A) results in a synonymous mutation (P227P), yet it has been associated with reduced enzyme activity. Further investigations showed that this mutation introduces a cryptic splice site 40 nucleotides downstream, resulting in a truncated protein. We used information theory to analyse novel synonymous SNPs and intronic SNPs within the splice sites (-25 to þ2 for exon acceptor sites and -3 to þ6 for exon donor sites) of CYP2C9, CYP2C19 and CYP2D6 but did not find any significant effects on splice site recognition (data not shown).
Pre-miRNA sequences are involved in the regulation of protein expression. Mutations in these sequences, as well as insertions of new pre-miRNA sequences, could affect enzyme expression, yet CYP1B1 is the only CYP that has been found to be miRNA regulated so far. 54 In the present study, we did not find any pre-miRNA sequences introduced in the 3 0 untranslated region (UTR) regions, yet in CYP2C19, 18818T . C in intron 4 and 19332G . A in intron 5 introduce miRNA binding sites for has-mir-139 and has-mir-448, respectively (Table S2). Since miRNA binding sites mostly act within the 3 0 UTR, however, these mutations would not be expected to have any effects.
In summary, our data, in conjunction with other studies of sub-Saharan Africans and African Americans, 17,19,55,56 indicate low heterogeneity in the frequency of functional mutations. In the genes studied, most functionally important SNPs have been found. What remains is to determine their prevalence across populations and to evaluate the functional effects of the novel SNPs. Expressing variant proteins and analysing their substrate turnover to show impaired enzymatic activity was beyond the scope of this study. We envisage that such analyses will strengthen our findings, however, and might become essential for the pharmacokinetic assessment of individual variants in order to meet regulatory requirements for diagnostic use.

Personalised therapy
Our data indicate the importance of CYP2C9, CYP2C19, CYP2D6 and NAT2 for genotype assessment, including the identified novel SNPs, so that optimisation of drug use in African populations can be considered under appropriate clinical scenarios. This could enable correct dose adjustment for individuals who are likely to experience ADRs owing to poor metabolism or an inadequate therapeutic effect owing to ultra-rapid metabolism. It is noteworthy, however, that other factors, which are not related to the newly identified SNPs but affect the clinical pharmacology of prescribed medications, may play a role in clinical ADRs or therapeutic failure.
The incorporation of CYP2C9 genotyping as part of pre-prescription diagnosis for individuals being treated with drugs metabolised by this enzyme 57 indicates the immediate utility of pharmacogenetics. Likewise, pre-prescription genotyping has been recommended for CYP2D6-metabolised drugs with a narrow therapeutic window, such as some antipsychotic agents. 58 NAT2 genotype information can be used to predict the phenotypic status of individuals to enable dose adjustment of anti-tuberculosis drugs such as isoniazid.

Conclusions
We have started to identify and catalogue novel variants (SNPs) of genes that are important in drug metabolism. We have confirmed African-specific variants but found modest variation between different African ethnicities, indicating similar metabolic profiles for most drugs, yet stressing inter-individual variability. The low frequency of our new CYP2C9, CYP2C19, CYP2D6 and NAT2 alleles seems to have reduced their impact at the population level. The generally high level of diversity in gene loci of African populations, however, indicates that rare variants (incidence of less than 1 per cent) and inter-individual variability might bear extra weight in explaining Africans' phenotypic diversity. As genome-wide association studies turn up new variants at high pace, the character of molecular diagnostics shifts from single genes to profiles, encompassing low frequency variants as their main constituents.
We have predicted the functional effects of nonsynonymous SNPs and suggest genotype-phenotype studies to investigate the effects of these SNPs in individuals. Eventually, we recommend the genotyping of African populations to establish the prevalence of functionally important haplotypes towards the development of relevant pharmacodiagnostic tools for these populations.

PRIMARY RESEARCH
Matimba et al.