e-PKGene: A knowledge-based research tool for analysing the impact of genetics on drug exposure

e-PKGene (http://www.pharmacogeneticsinfo.org) is a manually curated knowledge product developed in the Department of Pharmaceutics at the University of Washington, USA. The tool integrates information from the literature, public repositories, reference textbooks, product prescribing labels and clinical review sections of new drug approval packages. The database's easy-to-use web portal offers tools for visualisation, reporting and filtering of information. The database helps scientists to mine pharmacokinetic and pharmacodynamic information for drug-metabolising enzymes and transporters, and provides access to available quantitative information on drug exposure contained in the literature. It allows in-depth analysis of the impact of genetic variants of enzymes and transporters on pharmacokinetic responses to drugs and metabolites. This review gives a brief description of the database organisation, its search functionalities and examples of use.


Introduction
Differences in drug response among patients are common, often leading to challenges in optimising a dosage regimen for an individual patient. Genetic factors have long been known to cause interindividual differences in the pharmacokinetics, efficacy and adverse events of a number of drugs, 1 and drug metabolising enzymes have been shown to be the greatest source of pharmacogenetic (PGx) variability identified to date. It is estimated that over half of the 170 genes with products affecting drug disposition are polymorphic, 2 and clinically important polymorphisms have been identified for most major enzymes involved in both phase I and phase II drug metabolism. 3 More recently, the polymorphic variability of several transporter proteins, such as the hepatic uptake transporter, organic anion transporter polypeptide 1B1 (OATP1B1), has been shown to have an impact on the exposure to, and safety of, widely prescribed drugs. 4 Thus, incorporating the knowledge gained from PGx research to make decisions in drug development and clinical care has the potential to increase the safety and efficacy of drug treatment, and is central to the strategies of personalised medicine. 5 In spite of current efforts to incorporate the use of PGx information in drug development, clinical practice and in making cost-effective healthcare decisions, however, information uptake remains low. Translational research is required to move PGx discoveries effectively to evidence-based application in these areas. Translational research has been described as having four iterative phases with feedback loops, to allow integration of new knowledge. 6 Phase 1 (T1) and Phase 2 (T2) translational research informs the development of clinical interventions and evidencebased guidelines; Phase 3 (T3) research assesses the implementation of guidelines in health practice; and Phase 4 (T4) research evaluates the health outcomes of changes in practice following the implementation of guidelines. 6 All phases have become data intensive, with studies in PGx discovery increasing rapidly in both number and throughput. For example, in 2009 there were at least 12 PGx genome-wide association studies conducted. 7 In a review of 100,000 PubMed-listed publications on pharmacogenomics, less than 2 per cent were identified as original research manuscripts, 8 illustrating the difficulty that exists in locating reliable information for translational research pursuits. e-PKGene (www.pharmacogeneticsinfo.org) is a manually curated knowledge-based product which facilitates easy access to, and search for, quantitative information contained in the PGx literature base. It provides in-depth analysis of the impact of genetic variants of metabolising enzymes and transporters on pharmacokinetics. The tool's focal point is the identification of the genetic variants that are best correlated with drug exposure. e-PKGene is designed directly to support drug development or Phase 1 translational research (T1). This tool also has the capacity to support other phases of translational research, however, including Phase 2 (T2) evidence-based evaluations, which can better predict patient responses by providing clinical recommendations. Details about e-PKGene design and content, examples of use and types of support provided to various types of users are described in the following sections.

Structure
The application has a typical multi-tier architecture in a Microsoft w .NET environment. The web part of the database, which is accessed by the user over the internet, is hosted on a Microsoft Windows 2003 server running IIS and version 2.0 of the ASP.NET framework. All data are stored on a Microsoft SQL Server 2005 database. The use of the web facilitates worldwide access, as well as upgrades and updates. e-PKGene is being developed by scientists from the Department of Pharmaceutics, University of Washington, USA. e-PKGene allows for a high level of development flexibility through incorporating structured data and standardised representations of PGx knowledge with use of controlled terminologies. The tool uses hierarchical categorisation to characterise data sources and evidence.
Content e-PKGene integrates information from the literature, public repositories, reference textbooks, product prescribing labels and clinical review sections of new drug approval packages. The current pilot version focuses on drug metabolising enzymes and transporters that are routinely assessed in the context of drug development, and which are of interest to clinical practice and healthcare economics. The core content of e-PKGene is represented by published pharmacokinetics studies, performed in human subjects (healthy volunteers or patients). Each research article (citation) may contain one or more pharmacokinetics studies in which one or more genes have been investigated. A study is defined as a set of assessments ( pharmacokinetics, pharmacodynamics and safety) following the administration of a target compound to a well-defined population. The population is usually divided into a 'reference' group and an 'impaired' group. The reference group (the one to which the other groups are compared) consists either of individuals who are 'extensive metabolisers' or carriers of two copies of the wild-type allele of the gene of interest.
Each citation is assigned a unique identifier (Accession Number). Detailed records are manually curated from each citation, highlighting the study design, the population characteristics (ethnicity, health status, gender etc.), and the genotyping or phenotyping methods. For each population subgroup, pharmacokinetic parameters (area under the curve [AUC], clearance or plasma concentration)as determined by the study -are presented to the user. Pharmacodynamics, clinical outcome and side effects are also reported when provided in the citation, especially when their relationships with pharmacokinetics were investigated. The datasets made available to end-users are summarised in Table 1.
To allow an analysis of the impact of genetic variations on drug exposure, a comparison of the pharmacokinetic parameters between impaired groups and the reference group is systematically provided for each citation. All changes in PK parameters (D) are calculated as follow: Where PK variant ¼ PK parameter of the variant group, PK ref ¼ PK parameter of the reference group (control).
Gene and allelic variants e-PKgene defines each variant allele by one single nucleotide polymorphism (SNP) which is unique to the allele. This SNP is identified as the 'diagnostic SNP', but it is important to note that the diagnostic SNP may not be responsible for the observed functional changes. The reference SNP number (rs number, as assigned by the Single Nucleotide Polymorphism Database [dbSNP] 9 ) allows the identification of the * allele and the SNP as well. The * allele nomenclature has been adopted according to the Human Cytochrome P450 (CYP) Allele Nomenclature Committee. 10 Each SNP or variant allele is described by its nucleotide change, its location within the gene, its impact on the amino acid change in the protein and the mechanisms by which it affects its activity in vivo ( Figure 1). When an allele is defined by a haplotype formed by a constellation of SNPs, each SNP is described as well. When available, the functional activity in vivo of the enzyme or transporter is assigned. For certain genes, such as CYP2D6, an 'activity score' is also assigned.

Examples of queries and output
The e-PKGene website allows initial searching based on compound (or metabolite), gene or population. Selection of a search category retrieves a list of all available options. For example, a search by gene generates an alphabetical list of all compounds and metabolites with available data. In the following example (Figure 2), a tamoxifen-based search is conducted. The tamoxifen query generates a results screen (Figure 3) which indicates the total  Figure 4), which lists the variants studied and the moieties for which impact has been assigned. Impact is a binary YES/NO classification of genotype/compound pairs which is based on change of exposure between the reference group (wild type) and a variant group. This classification uses a predefined cut-off given by the statistical analysis performed by the authors (statistical significance). The classification is assigned by the database editorial team to provide users with a contextual framework which rapidly  The citation listing contains complete reference information, as well as comments entered by the database editorial team when applicable. The full abstract from PubMed can be retrieved by clicking on the PubMed icon (circled in red). The initial 'effects' window shows summarised information for all studies contained within each citation (single citations often involve multiple studies if different doses or populations, or multiple enzyme systems are evaluated separately). The database 'impact assignment' is shown for each population examined.
More detailed information is retrieved in the 'impact of pharmacokinetics' screen for an individual citation (Figure 6), where specific pharmacokinetic parameters (AUC, clearance or plasma concentration) for the selected compound and any active metabolites are displayed for all the studies. Additional information extracted from the citation is displayed in the 'full study set' section.
The 'full study set' tab ( Figure 7) displays all information that has been captured for an individual study -this information may include study design, phenotyping and/or genotyping methods, alleles tested for (from which the user can infer the alleles that were not tested for) and additional population classifications. This information enables the user to determine the validity of the information by evaluating the study design and methods. While not all case reports are suitable for inclusion in e-PKGene database, 'case report' will be listed as the study design in this section for those which can be entered into e-PKGene.
Searches can be performed by gene (Figure 8a) or population (Figure 8b), and will retrieve citations relevant to the search criteria. Allowing searches by compound, gene or population enables the user quickly to focus on the desired parameters. Pharmaceutical researchers may find the information useful in designing studies on human subjects by highlighting populations (ethnic and genotypic) that will require more in-depth examination. Similarly, clinicians may find this platform valuable for identifying drugs that may require dosing adjustment in subjects with a known ethnic background, genotype or phenotype.
A summary section allows the user to access both gene and drug summaries. The gene summaries give a brief description of the gene, as well as the classification of its alleles and genetic description. The drug summaries show the metabolism of the drug and its metabolic scheme, as well as a table view with the maximum percentage change in AUC and/or clearance for a drug (and its metabolites when available) encountered in particular genotype variants (Figure 9).

Domains of use
In spite of the wealth of information relating variants of metabolising enzymes and transporters to pharmacokinetic parameters, it is rare for a   genotype to be a critical determinant in selecting a drug dose. 11 While there are several instances of PGx and personalised medical applications, few have undergone the rigorous evaluations required for regulatory approval. 12 Even in cases where there is adequate knowledge on genotype -drug response -phenotype correlations to appear in drug labels, 13 PGx language is often for informational purposes only. This is because of what is described as the 'evidence dilemma', since there is often a  lack of sufficient evidence to weigh the benefits and risks of population screening or routine use of PGx applications and decision making in clinical practice. 14 The web-based application e-PKGene emphasises the quantitative analysis of PGx data related to pharmacokinetics, pharmacodynamics and the safety of drugs in various populations. It provides access to information that is important to consider when evaluating genetic factors affecting drug exposure, populations affected and potential clinical relevance. The tool has the capacity to support all phases of translational research, but is of direct use to Phase 1 (T1) research, which seeks to move PGx discoveries into candidate health applications. Additionally, the tool will provide a portal for better understanding the scientific steps involved in applying PGx information and integrating genomics into drug development programmes: for example, validating drug targets, choosing and validating lead compounds, identifying optimal early patient populations, determining surrogate endpoints (biomarkers) for the design of clinical trials and predicting likely variability in clinical trials. As the field of pharmacogenetics continues to expand and mature, the tool will shape into a large repository that will become invaluable in guiding both research and clinical practice decisions by providing primary literature findings necessary for establishing genotype -phenotype relationships.