Survey of public domain software for docking simulations and virtual screening

Progress in functional genomics and structural studies on biological macromolecules are generating a growing number of potential targets for therapeutics, adding to the importance of computational approaches for small molecule docking and virtual screening of candidate compounds. In this review, recent improvements in several public domain packages that are widely used in the context of drug development, including DOCK, AutoDock, AutoDock Vina and Screening for Ligands by Induced-fit Docking Efficiently (SLIDE) are surveyed. The authors also survey methods for the analysis and visualisation of docking simulations, as an important step in the overall assessment of the results. In order to illustrate the performance and limitations of current docking programs, the authors used the National Center for Toxicological Research (NCTR) oestrogen receptor benchmark set of 232 oestrogenic compounds with experimentally measured strength of binding to oestrogen receptor alpha. The methods tested here yielded a correlation coefficient of up to 0.6 between the predicted and observed binding affinities for active compounds in this benchmark.


Introduction
Docking simulations and virtual screening are being routinely used in drug design, enabling rapid identification of hits and lead compounds. 1 -3 The goal of docking simulations is to determine the binding mode (bound conformation) and the strength of binding (binding affinity) between a ligand, which is typically assumed to be a small molecule, and a macromolecular receptor, such as a protein. 1,3,4 Given a resolved or modelled structure of a target receptor, virtual screening involves performing docking simulations for a large number of candidate compounds in order to identify putative leads. 2,5,6 These candidates can subsequently be characterised and validated by empirical binding and activity assays, and by assessing their toxicity, pharmacokinetics and other properties for further drug development. 7 -9 Many methods for molecular docking and virtual screening have been developed to date, including AutoDock, 10,11 DOCK, 12 -14 Flex, 15 Glide, 16 GOLD, 17 RosettaDock, 18 SLIDE 19,20 and Surflex. 21 These methods introduce various approximations to simplify the problem -for example, assuming a rigid body docking model in which the receptor structure is fixed. Rigid body docking allows one to speed-up computations by comparison with flexible docking (in which the receptor structure is allowed to move) by precomputing the forces experienced by the ligand on a grid. In general, docking simulations involve two main components: sampling algorithms to find plausible conformations of the complex, and scoring functions to estimate relative binding affinities and rank ligand poses. 1 The sampling of alternative conformation is coupled with the search for the optimal solutionthat is, poses with the highest binding affinity (or score), which typically involves solving a global optimisation problem. Consequently, various optimisation techniques -such as Monte Carlo, simulated annealing or genetic algorithms -are used in the context of docking simulations. 1 Atomic force fields that describe both intra-and intermolecular interactions in the system, 4 simplified solvation potentials and empirical scoring functions are typically used to estimate the strength of interactions between the ligand and receptor and to score alternative poses of the ligand. 6 Docking methods are being constantly improved in each of these aspects, with newer implementations also seeking to take advantage of current computing architectures. 3,6,22,23 Following the progress in the field, reviews and comparisons of docking methods are being published regularly. These surveys and evaluations focus primarily on re-docking experiments, benchmarking performance on sets of compounds with known binding affinities, and assessment of enrichment (into true binders) in the context of virtual screening. 2,3,5,9,24 -26 A general conclusion from these studies is that no single method outperforms other methods consistently, suggesting that different targets may require different combinations of sampling approaches and scoring functions for optimal performance. 2,6,23,24,26,27 This review focuses on several widely used public domain packages for docking and virtual screening. With recent improvements, these packages are becoming better integrated, more accurate and faster, and are poised to increase their impact as platforms of choice, especially within the academic community.

Docking programs surveyed
The public domain docking programs surveyed here are listed in Table 1. These include three well-benchmarked and widely used public packages -namely, AutoDock, 11 DOCK 12 and SLIDE 20 -as well as the recently developed AutoDock Vina (also referred to as AD Vina). 28 Below, some of the characteristics and distinct features of each program are summarised briefly. For detailed descriptions and documentation, the reader is referred to the original resources listed in Table 1.
While each of the programs surveyed here offers different solutions to both the sampling and scoring problem, they provide a similar overall functionality and range of options reflecting improvements in docking methods. In particular, 'flexible' models of both ligands and receptors are available in each of these programs, allowing one to sample not only different conformations of the ligand, but also of the receptor -for example, by means of introducing a subset of residues (active space) for which alternative conformations can also be sampled to find optimal complex structure. In addition, these public domain docking programs have evolved to take advantage of the prevalence of distributed computing and the emergence of new computing platforms, as discussed in the next section.
AutoDock offers well-benchmarked force fields (including the widely used Amber molecular force field in the latest version), efficient (grid-based) implementation of rigid body docking, a flexible (receptor) docking protocol with an active space consisting of residues which are allowed to undergo conformational changes, and efficient implementations of the Lamarckian genetic algorithm (GA) to search for the ligand pose with the highest binding affinity, while sampling and 'mixing' suboptimal solutions to solve the underlying global optimisation problem. 11 In spite of its name, AD Vina uses a very different search strategy, which couples Newton-type local minimisation steps with global optimisation, involving concurrent (on multicore central processing units [CPUs]) runs of a number of local minimisations which can be combined to improve sampling. 28 DOCK uses a well-tested shape-matching approach to sample alternative poses of the ligand, given the receptor structure. These alternative conformations of the complex subsequently can be assessed and scored using the Amber molecular force field coupled with several alternative implicit solvent models. 12 Some of the new features added in DOCK 6 include conjugate gradient minimisation and molecular dynamics simulation capabilities, ligand conformational entropy corrections, ligand and receptor desolvation and receptor flexibility. SLIDE 20 was designed to account efficiently for receptor flexibility. The mean-field theory-based model used in SLIDE implies that multiple conformations of the receptor contribute to the interactions with the ligand, as the search proceeds through alternative conformations of the complex. This strategy allows one to account implicitly for receptor flexibility and induced-fit complementarity between the ligand and receptor. Another advantage of SLIDE is that its force field accounts explicitly for hydrogen bonds of different types.

Fast implementations of docking for computational clusters
The speed of docking is clearly very important for virtual screening, and many recent improvements have aimed at taking full advantage of current distributed computing platforms. In this regard, DOCK (as of version 5) offers an efficient MPI-based parallel implementation which can be used on computational clusters. An MPI version of AutoDock has also been developed, 29 although it does not appear to be widely used. The advent of a novel parallel computing architecture -Compute Unified Device Architecture (CUDA) -which utilises graphics processing units (GPUs), spurred the development of molecular modelling software for this new platform as well, including a recently developed CUDA-enabled version of AutoDock. 30 Further improvements and developments in this regard are likely to improve the accuracy of virtual screening by enabling more extensive sampling and testing of the results.
It should be noted, however, that even serial codes can be used efficiently for virtual screening by exploiting the 'embarrassing parallelism' of docking; this involves simply screening subsets of compounds from in silico libraries on individual cluster nodes. In this context, multicore servers with hyperthreading have recently become a mainstay of scientific computing. AD Vina aims to use multithreading optimally to increase the speed of docking simulations. It has been reported to achieve a near 60-fold increase in speed (with a further near-linear decrease in CPU time when using multiple cores), compared with AutoDock, with similar (or better) accuracy. 28 In our own tests, similar increases were observed using the Xeon(R) X5570 2.93 GHz quad-core server, with the depth of sampling that allows one to achieve correlations between the predicted and experimental binding affinities reported by Fang et al. 3 for a benchmark set of 232 oestrogens. That benchmark is used to illustrate and compare the performance of the methods surveyed here.

Tools for the visualisation and analysis of protein -ligand docking
Limitations of the current methods, such as those concerning scoring functions for docking, make the analysis of predicted complexes (eg in terms of consistency of ligand poses across multiple runs) an integral part of docking simulations and virtual screening. Assessment of the results usually starts with a visual inspection of the predicted binding modes for hits, and subsequently can be extended using structural analysis, geometric clustering of predicted poses, re-scoring and cheminformatic approaches.
The output from ligand docking programs usually includes a coordinate file (in PDB format or other standard formats) and supplementary data, such as estimated binding affinity, clustering of alternative poses, etc. To view coordinate files, commonly used macromolecular visualisation programs, such as Jmol, 31 PyMol, 32 Visual Molecular Dynamics (VMD) 33 and others can be employed, whereas additional information may require ad hoc programs to be used for data analysis. Table 2 summarises several (free for academic use, with the exception of more advanced functionality in DockingServer) tools that enable the visualisation and assessment of predicted ligand poses in the protein-ligand complex, and provide other specifically designed features for the analysis of docking results.
Most tools discussed here are coupled with a specific docking program to facilitate setting up simulations and subsequent analysis of the results for that program. In this review, however, the survey of visualisation and analysis methods starts with LIGPLOT, 34 which is a simple tool that can be used in conjunction with any docking program (LIGPLOT does not provide any specific interfaces and the results are assumed to be in the PDB format). LIGPLOT automatically generates a detailed two-dimensional diagram of proteinligand interactions, with distances between interacting atoms and classification of interactions according to the type of interaction centres involved. Therefore, LIGPLOT can be easily coupled with docking pipelines to enable visualisation of docking poses.
The AutoDockTools (ADT) package was developed specifically for AutoDock. 11 ADT provides the possibility of viewing clustered alternative poses of a ligand accompanied by estimated docked energy, cluster size and the root mean square difference (RMSD) of the poses within a given cluster, as well as RMSD to the input (reference) structure. Alternative conformations of the ligand can be viewed using the animated mode. ADT can also represent the ligand using the isocontour rendering that facilitates the visual evaluation of the model as to whether pairwise interactions between atoms of the receptor and ligand are reasonable. vsLab 35 is another tool that provides an improved interface to AutoDock, including setting the docking simulations and analysing them. Results are represented using both tabular, easy-to-navigate views and graphical rendering by VMD. 33 DockingServer 36 automatically generates images of the docked ligands using VMD and provides the possibility for an interactive view using the Jmol applet. In addition, the server generates summary tables that can be used to assess the main forces driving a specific protein -ligand interaction. ViewDock combines the Chimera 37 rendering of docking models (including animation) with the interface for interactive analysis of multiple ligands docked to the receptor. The list of compounds and alternative poses can be sorted by the estimated binding energy, number of hydrogen bonds formed and other characteristics.
The POLYVIEW-MM server 38 combines multiple annotations for both the receptor structure and the protein -ligand interaction. Alternative poses of the ligand can be visualised and analysed using the Jmol applet or high-quality PyMol rendering, as both static slides and animations. The server also enables analysis of clustering patterns computed by AutoDock for potential re-scoring of docking poses, provides a quick interface interactively to map residues in contact with the ligand for each pose and can be used to visualise distances for atom -atom pairs in multiple poses. Additionally, the structure of the receptor (and ligand binding sites in particular) can automatically be annotated by mapping the location of structural domains, conserved residues, known (derived from structurally resolved complexes in PDB) and predicted (using SPPIDER 39 ) protein interaction sites and other structural and functional features.

Oestrogen receptor (ER) binding benchmark
Nuclear receptors (NRs) are transcription factors that are activated by sex hormones and play major

SOFTWARE REVIEW
Biesiada et al.
roles in embryogenesis, development and tumorigenesis. 40 Consequently, NRs constitute an important class of targets for drugs. Many such drugs have been specifically designed to compete with natural hormones for binding to ligand binding domains of NRs, including tamoxifen and raloxifen used in breast cancer. 8,40,41 NRs are also activated by numerous environmental factors, adding to the importance of studying their interactions with small molecules. In particular, ERs can be activated by many naturally occurring substances (eg genistein found in soybean) and industrially made compounds (eg bisphenol A [BPA]) which exhibit oestrogenic effects. 7,42,43 In this context, this paper will now assess whether current docking programs are sufficiently accurate to provide a basis for the prediction of activity and toxicity risks for potential oestrogens, using the National Center for Toxicological Research (NCTR) ER set of 232 structurally diverse natural, synthetic and environmental compounds with a wide range of biological activities that are potentially mediated by ERs. 7 This set provides an excellent benchmark for docking and structure -activity relationship (SAR) models, by virtue of comparing predicted ER binding affinities with experimental binding data obtained using a well validated in vitro rat uterine cytosol ER alpha competitive binding assay. 44 The NCTR ER benchmark also provides an opportunity to illustrate some of the features of the docking programs surveyed here. Based on experimentally observed ER relative binding affinities (ER-RBA), each chemical was classified as active (131 chemicals) or inactive (101 chemicals). The ER-RBA was calculated by dividing the concentration of chemical that causes 50 per cent inhibition of E2 binding (IC50) of the natural ER ligand, E2, by the IC50 of the competitor, and multiplying the ratio obtained by 100. Thus, ER-RBA . 100 means a binding affinity greater than that of E2. A compound was designated as inactive if 50 per cent inhibition was not reached within concentrations of competitor ranging from 1 nM to 1 mM (versus 1 nM E2), or if no activity was observed. 7 This classification provides an opportunity further to assess the capacity of docking programs successfully to discriminate true hits and inactive compounds, with implications for virtual screening.
The docking simulations for programs surveyed here were performed using E2-, raloxifen-and tamoxifen-bound crystal structures of the ligand binding domain of ER alpha (PDB codes 1ERE, 1ERR and 3ERT), using re-docking of native ligands to test if user-defined parameters and the simulation systems (including the grid box) had been properly set up for each of the programs. Default values were used for most parameters, with the exception of those effectively defining the depth of the search, which were gradually increased to provide better sampling until no further improvement, in terms of overall correlations, was obtained. For example, the results reported here for AutoDock were obtained using 50 Lamarckian GA runs, 10 million energy evaluation steps and a population size of 150. With this somewhat extensive sampling, the average running time per compound was 2,987 CPU seconds on a Xeon(R) X5570 2.93 GHz quad-core server (when using 1 CPU).
An example of the top scoring pose for tamoxifen, obtained using DOCK and the tamoxifenbound structure of the receptor, is shown in Figure 1. As can be seen from the figure, the predicted pose is very close to the native structure (RMSD of about 1.9 Å for all atoms of the ligand). Qualitatively similar results were obtained for other programs. In addition, the predicted binding affinities for E2, the ER antagonist tamoxifen and the weak oestrogen BPA were found to follow a clear trend, consistent with experimental data. 40 The strongest binding was found for tamoxifen, followed by relatively strong binding for E2 and much weaker binding for BPA. For example, rigid body docking with AutoDock yielded inhibition constants of 7.51 nM, 357 nM and 6.35 mM for tamoxifen, E2 and BPA, using tamoxifen-, E2-and raloxifen-bound structures, respectively. The results of docking for NCTR ER compounds obtained using an 'intermediate' ER alpha structure with a binding pocket similar (but not identical) to that of E2 (PDB code 1ERR 45 ) are summarised in Figures 2 and 3. For active compounds, Pearson correlation coefficients (CCs) between predicted and experimentally measured binding affinities of about 0.6 were obtained using SLIDE and rigid body AutoDock (see Figure 2), with somewhat lower CCs for flexible AutoDock, Vina and DOCK. It should be noted, however, that each method failed to generate results for a number of compounds. Therefore, these CCs cannot be compared directly, as they were computed on a slightly different subset of compounds in each case. A direct comparison of (rigid body) AutoDock and SLIDE is shown in Figure 2, and indicates high concordance between these two well-performing methods (CC of 73).
As can be seen from Figure 3, the distributions of scores for active and inactive compounds are shifted and, in fact, their means are significantly different, as assessed by a t-test ( p ¼ 0.00001). Qualitatively similar results are obtained for other programs, although SLIDE yields the best overall separation. Applying a threshold of 8 for a scaled SLIDE inhibition constant results in selecting about 80 per cent true positives (active compounds), as opposed to about 56 per cent expected for a randomly selected subset. At the same time, about 20 per cent of inactive compounds are predicted to be active ( predicted scores larger than 8). Thus, a significant false-positive rate is observed (even taking into account that some of the compounds classified as inactive may, in fact, be weak binders), reflecting the limitations of current docking  methods. It should also be noted that the falsepositive rate is likely to increase when docking arbitrary rather than potentially oestrogenic ligands, or when using receptor conformations that are different from a targeted bound state, as these factors contribute to rates of false positives (and true negatives) observed in virtual screening. 2,6,24 -27

Summary and conclusions
In this survey, several public domain docking programs, including AutoDock, AutoDock Vina, DOCK and SLIDE, were briefly described and their performance compared using the NCTR ER set, yielding encouraging (albeit moderate) correlations between predicted and experimentally observed binding affinities for active compounds (with CCs of up to 0.6). At the same time, however, all of the methods tested result in significant false-positive rates when used for discrimination between active and inactive compounds, underscoring the limitations of current docking methods. The importance of induced fit in the binding of oestrogens to ERs, 45 inherent limitations of scoring functions used by docking programs along with other factors are likely to contribute to the difficulties in virtual screening for candidate inhibitors of ERs and pose limitations for the prediction of oestrogenicity.
The field of toxicogenomics examines cellular responses to xenoestrogens, such as BPA, in order to provide a basis for the understanding of their mode(s) of action and for improved risk assessment. 8,42,46,47 Assessing the health risk of environmental chemicals remains challenging, however, as multiple complex pathways may be involved in integrating the responses to chemical exposures. 46 Therefore, SAR models are widely used to circumvent these limitations, and to predict the activity of chemical compounds based on their physicochemical and structural characteristics. 7,42,43,48 SAR methods typically rely on statistical and machine learning techniques to capture complex correlations between relevant descriptors and outcomes to be predicted. 48 In this context, molecular docking could provide estimates of the strength of physical interactions with relevant receptors, to be used as complementary features for the development of more accurate SAR models, even if the accuracy of binding affinity prediction is far from perfect.
In conclusion, while limitations remain (especially with regard to scoring functions), methodological advances and the development of more integrated, faster and easier-to-use current generation software packages for docking simulations, including the public domain programs surveyed here, are likely to expand further the realm and scope of applications of docking methods. These applications include virtual screening and drug development, functional annotations for apo-proteins, as well as the elucidation of the modes of action, and assessment of the toxicity, of environmental factors.