Skip to main content

Genome-wide scans for loci under selection in humans


Natural selection, which can be defined as the differential contribution of genetic variants to future generations, is the driving force of Darwinian evolution. Identifying regions of the human genome that have been targets of natural selection is an important step in clarifying human evolutionary history and understanding how genetic variation results in phenotypic diversity, it may also facilitate the search for complex disease genes. Technological advances in high-throughput DNA sequencing and single nucleotide polymorphism genotyping have enabled several genome-wide scans of natural selection to be undertaken. Here, some of the observations that are beginning to emerge from these studies will be reviewed, including evidence for geographically restricted selective pressures (ie local adaptation) and a relationship between genes subject to natural selection and human disease. In addition, the paper will highlight several important problems that need to be addressed in future genome-wide studies of natural selection.


Phenotypic diversity is a ubiquitous characteristic of natural populations. Individuals vary in almost every conceivable way, including physical appearance, behaviour, disease susceptibility, ability to detoxify drugs and perception of environmental stimuli [1]. Although environmental forces undoubtedly contribute to phenotypic variation, so too does genetic variation. Therefore, explaining the evolutionary forces that create, maintain and shape patterns of human genetic variation is of fundamental importance in understanding phenotypic variation [2].

An important goal in studies of human genetic variation is to identify loci that have been targets of natural selection due to their variable effects on the fitness of individuals throughout a population's history. Signatures of natural selection delimit regions of the genome that are, or have been, functionally important. Therefore, identifying such regions will facilitate the identification of genetic variation that contributes to phenotypic variation and help to functionally annotate the genome. Unfortunately, inferring the action of natural selection remains a challenge. This is likely to change in the near future, as high-throughput methods for cataloguing genetic variation on a genome-wide scale and new statistical tools for detecting selection have been, and continue to be, developed.

Much important work has been done on genome scans for natural selection in model organisms such as Drosophila;[35] this review, however, will focus on studies performed in human populations. Firstly, there will be a summary of the effects of natural selection and population history on patterns of genetic variation and some of the common statistical methods used to test for deviations from neutrality will be presented. Next, a critical evaluation of several empirical genome-wide scans for selection will be presented. Finally, the paper will highlight several important problems, both practical and conceptual, that need to be addressed in future studies.

Human genetic variation: The neutral expectation

The evolutionary sojourn of a newly arisen mutation depends upon how it affects the fitness of the individual who possesses it. The neutral theory of molecular evolution posits that the vast majority of polymorphisms in a population are selectively neutral and have no appreciable effects on fitness [6, 7]. Under neutrality, changes in allele frequency are governed by the stochastic effects of genetic drift in populations of finite size. Thus, the effective population size, Ne, and neutral mutation rate, μo, determine levels of polymorphism within species and the rate of divergence between species [8]. In addition, the effect of mutations with small fitness effects can be rendered 'nearly neutral' if the product of Ne and s (which measures the strength of selection) is < 1 [9, 10]. For human populations, Ne is approximately 10,000 and therefore |s| must be greater than 10-4 to overcome the stochastic effects of genetic drift. Because the neutral theory makes explicit and quantitative predictions about expected patterns of genetic variation within and between species, it is an indispensable tool in studies of natural selection. Specifically, the neutral theory provides an essential foundation for evaluating the evidence either for or against selection in empirical data, as it serves as the null hypothesis when exploring alternative evolutionary models [11, 12].

In combination with the neutral theory, coalescent theory provides a powerful framework for conceptualising and making inferences about evolutionary forces. The coalescent is a stochastic model of gene geneaologies [1317] and has emerged as the primary analytical tool in studies of genetic variation. In classical population genetics theory, the initial state of a population is defined and one observes the evolution of the entire population by looking forward in time. By contrast, the coalescent is a sample-driven theory that traces the history of coalescent events backwards in time (see Figure 1 for some basic properties of the coalescent). Several excellent and detailed reviews of coalescent theory can be found elsewhere [18, 19]. Deviations from the standard neutral model distort the branch lengths, topology and coalescent times of gene genealogies, as described below.

Figure 1

Coalescent representation of a neutrally evolving sequence. (A) Explicitly tracing the history of a sample of alleles from the population, each progenitor allele is derived from a randomly chosen parental allele. Occasionally, two progenitor alleles are derived from the same parent, causing the lineages of these two alleles to unite or coalesce when they are followed backward in time from the present. Note that if progeny are derived from parents at random, the probability that two lineages coalesce increases as the number of distinct lineages increases and as the effective population size decreases. Thus, for a constant-sized population, a characteristic distribution of waiting times between coalescent events is expected. (B) The untangled, coalescent representation of (A) is created by treating lineages as branches, ignoring the intermediate ancestors between coalescent events. Under neutrality, mutational events (represented by shaded diamonds) are uniformly distributed throughout time, hence the number of mutations that occur on a branch is proportional to the length of the branch. Note that mutations occurring on external branches are rare, appearing on only a single allele, whereas mutations occurring on internal branches are common.

Evolutionary forces perturb patterns of genetic variation

Natural selection and population demographic history perturb patterns of genetic variation relative to what is expected under a standard neutral model (constant sized, randomly mating, panmictic population at mutation drift equilibrium). Below, the way in which selection and demographic history affect patterns of genetic variation will be considered, from a coalescent point of view.

Theoretical studies have investigated the evolutionary dynamics of genetic variation subject to a variety of selective pressures, including, purifying [2022] positive [2326] and balancing selection [2730]. This paper will focus on positive and balancing selection, as these have been the primary types of selection that current genome-wide scans have studied. Positive selection acts to increase the frequency of advantageous alleles in a population. Strongly advantageous mutations are rapidly swept to fixation, hence the term 'selective sweep'. Importantly, through a process referred to as 'genetic hitchhiking', positive selection also affects patterns of neutral polymorphisms linked to an advantageous mutation [23]. Positive selection leads to a shallow star-like genealogy (Figure 2) with a decreased time to the most recent common ancestor. Tracing the history of alleles backwards in time, these effects are a direct consequence of the rapid coalescence of lineages in the small but expanding progenitor population [31]. The signature of positive selection includes reduced levels of genetic variation compared with neutral expectations, [2426] a skew in the allele frequency spectrum towards low-frequency alleles [32] (including an excess of high-frequency derived alleles [33]) and elevated levels of linkage disequilibrium [34].

Figure 2

Effects of deviations from neutrality on gene genealogies. (A) Neutral evolution. (B) Population growth. (C) Population bottleneck. Here, only one ancestral lineage passes through the bottleneck, leading to a short tree with relatively long external branches. (D) Population subdivision. An initial population (represented by solid lines) separates into two subpopulations, which are denoted by dashed and solid lines. (E) Positive selection. An advantageous allele (represented by the dashed lines) sweeps through the population to fixation. Note that the genealogy of a selective sweep is similar to that produced by a population growth or bottleneck. (F) Balancing selection. An allele that is advantageous only in the heterozygous state (dashed line) appears in the population and is maintained at an intermediate frequency. Note that the genealogy under balancing selection is similar to population subdivision. Ne, effective population size.

Balancing selection occurs when polymorphisms are selectively maintained in a population. By contrast with positive selection, the genealogy of a locus subject to balancing selection is characterised by an increased time to the most recent common ancestor and long internal branches (Figure 2). The effect of balancing selection on gene genealogies can be understood by considering balanced alleles as distinct subpopulations, such that coalescence events can occur rapidly within a subpopulation but slowly between subpopulations [27]. The signature of balancing selection includes elevated levels of polymorphism relative to neutral expectations and a skew of the allele frequency distribution towards an excess of intermediate frequency alleles [29, 30, 35].

In addition to natural selection, population demographic history can also have strong influences on patterns of genetic variation, which often mimic the effect of natural selection [36, 37]. In other words, inferences of natural selection are confounded by population demographic history. For example, both positive selection and increases in population size have similar effects on gene genealogies (Figure 2); both processes therefore lead to an excess of low-frequency alleles in a population. In fact, strong positive selection can be thought of as a rapid population expansion of an advantageous allele as it sweeps through a population. Similarly, population structure and balancing selection both result in subdivided genealogies and therefore both processes are expected to result in an excess of intermediate-frequency alleles in a population (Figure 2). Population bottlenecks can lead to an excess of either low- or intermediate-frequency alleles relative to neutral expectations, depending on the age and severity of the bottleneck. Figure 2 demonstrates the effect of a severe and recent bottleneck, which forces all lineages to coalesce at the time of the size reduction and results in a genealogy that is similar to positive selection. Human populations clearly do not meet all of the assumptions of the standard neutral model; hence, rejecting the standard neutral model for a particular locus cannot be interpreted as unambiguous evidence for selection.

Detecting the signature of natural selection

Before presenting the results from genome-wide scans for natural selection, there now follows a brief description of some commonly used statistical methods designed to detect departures from neutrality, highlighting some of their strengths and limitations. The following is not meant to be an exhaustive discussion of such tests, and descriptions of many interesting and useful methods will not be included here. For further study, the reader is encouraged to see an excellent review by Kreitman [38].

Statistical tests of neutrality can broadly be classified into three categories, based upon the type of data that they use: 1) within species tests; 2) within and between species tests; and 3) between species tests (Table 1). The most common class of within-species tests compares summary statistics of the observed allele frequency distribution at a locus with the values expected under neutral evolution; it includes Tajima's D, [39] Fu and Li's D and F, [40] Fay and Wu's H [33] and Fu's Fs [41]. An attractive feature of these tests is that they do not require any a priori classification of functional versus non-functional sites, thus making them equally suitable for protein and nonprotein coding regions. In a thorough examination of the power of Tajima's D and Fu and Li's D and F, Simonsen et al. [42] found that these tests can only detect selective sweeps in a narrow time interval in the recent past, and that they can only detect balancing selection if it has acted for a very long time period. Interestingly, Simonsen et al. also found that the power of these tests could drop below the nominal false-positive rate, α, if non-neutral evolution did not occur within these critical time windows, thus creating the undesirable scenario in which rejection of the null is more likely if the null is true than if it is false. Fu observed similar results, although he found that Fs was most powerful and performed better at detecting more ancient positive selection [41].

Table 1 Statistical tests of neutrality

The site-frequency spectrum tests discussed above are confounded by demographic events such as population growth, bottlenecks and subdivision (Figure 2) and are rendered conservative by intra-locus recombination. The desire to estimate population demographic parameters, recombination rates and evolutionary parameters has prompted the development of maximum likelihood-based methods which use the complete data, rather than summary statistics [31, 43, 44]. These methods are computationally intensive and are not currently feasible for large datasets, but they potentially allow for substantial gains in statistical power relative to summary statistics methods and are likely to become increasingly important tools in the future (for a general discussion, see Felsenstein [45]).

Another within-species test that has been used to detect selection is to compare the variation in allele frequencies between populations, which can be quantified by the statistic FST. Under selective neutrality, FST is determined by genetic drift, whereas natural selection is a locus-specific force that can cause systematic deviations in FST values for a selected gene and nearby genetic markers. For example, geographically restricted directional selection may lead to an increase in FST of a selected locus, whereas balancing or species-wide directional selection may lead to a decrease in FST compared with neutrally evolving loci [4650]. In a series of simulation experiments analysing two different FST test implementations, Beaumont and Balding found that this approach yielded sufficient power to detect positive selection provided that the selective coefficient was approximately five times larger than the migration rate, but that FST had little power to detect balancing selection [50].

Positive selection is also expected to increase levels of linkage disequilibrium (LD) relative to neutral expectations. Recently, a new statistical test was developed, the long-range haplotype (LRH) test, [33] which takes advantage of ancestral recombination events and the associated decay in LD to identify genes subject to positive selection. The rationale for this test is that a common allele with long-range LD potentially represents a site that has appeared recently and was driven to high frequency before recombination could erode LD. The LRH approach does not detect balancing selection, however, and the robustness of the test to non-neutral population demographics, the choice of haplotype defining markers and phase misspecification have not been well studied.

The second major class of neutrality tests compares levels of within-species polymorphism and between-species divergence and includes the Hudson - Kreitman - Aguade (HKA) [51] and McDonald - Kreitman (MK) [52] tests. The HKA method tests the goodness of fit of the observed levels of polymorphism within species and the observed divergence between species to those predicted under neutral theory. In order to determine polymorphism and divergence expectations under neutrality, data are required from at least two loci in each species, so that a simultaneous estimate can be made of a time-since-speciation parameter and a relative population size parameter. Under the HKA test, rejection of the null is formally interpreted as elevated polymorphism at one locus or reduced polymorphism at the other, or excess divergence at one locus or limited divergence at the other. Thus, it may not be obvious which locus or which process is responsible for producing a statistically significant test. McDonald [53] has described improvements to the HKA test which may ameliorate this problem.

In the MK test, a 2 × 2 contingency table is formed to compare the number of non-synonymous and synonymous sites that are polymorphic within a species (PN and PS) and fixed between species (DN and DS). Under neutrality, the ratio of non-synonymous to synonymous sites that are polymorphic equals the ratio of non-synonymous to synonymous sites that are fixed (ie PN/PS = DN/DS). Under positive selection, however, these two ratios are no longer equal and DN/DS > PN/PS [54]. Among the strengths of the MK test are that it does not require assumptions about population demographic history (although under some circumstances the test can be adversely affected by increases in effective population size [54]) and is relatively insensitive to intra-locus recombination. Positive or purifying selection for codon usage may, however, bias the MK test [38].

The final class of neutrality tests uses between-species data to test for adaptive protein evolution. The classic test of positive selection compares the number of non-synonymous amino acid substitutions in a gene (dn) with the number of synonymous amino acid substitutions (ds). Under neutrality, the mutation rate at both categories of sites is the same, and dn/ds is expected to equal one; however, dn/ds < 1 for proteins subject to purifying selection and dn/ds > 1 for proteins under adaptive evolution. Although dn/ds > 1 provides strong evidence for adaptive protein evolution, it is a very conservative test, particularly if only a small number of codons have been selected for. The basic test has also been extended by Nielsen and Yang [55] and others to include models of codon and transition/transversion bias, to detect variation in dn/ds ratios among lineages and to identify specific codons under selection [56, 57].

Key advantages of genome-wide analyses

As alluded to above, distinguishing between the confounding effects of natural selection and population demographic history is difficult when studying a single locus. When many unlinked genes are considered, however, a clear strategy emerges. Population demographic history affects patterns of variation at all loci in a genome in a similar manner, whereas natural selection acts upon specific loci [12, 37, 46, 58]. Therefore, by sampling a large number of unlinked loci throughout the genome, empirical distributions of test statistics can be constructed and genes subject to locus-specific forces, such as natural selection, can be identified as outlier loci.

To provide some examples of how genome-wide analyses can facilitate inferences of natural selection, Figure 3 shows empirical distributions of Tajima's D, FST and dn/ds, along with their theoretical distributions, simulated under both standard neutral models and alternative demographic histories. Figure 3 highlights two important points. First, empirical distributions provide important information that can be used to infer population demographic history. For example, the demographic models used in simulating Tajima's D (Figure 3A and 3B) recapitulate the empirical distributions much more closely than data simulated under a standard neutral model. Secondly, outlier loci can be identified with greater precision and accuracy with more realistic models of human demographic history. Specifically, the best-fitting non-neutral distributions dramatically reduce the number of test statistics that are apparent outliers under neutrality. Conversely, some test statistics that do not appear to deviate from neutrality are outliers under the best non-neutral distributions. Thus, in principle, empirical distributions of test statistics can be used both to reduce the false-positive rate and to improve power. Although this general strategy has recently been dubbed 'population genomics', the theoretical foundation of searching for outlier loci to find targets of natural selection was outlined decades ago [46, 47].

Figure 3

Empirical distributions of some commonly used test statistics and their theoretical distributions. (A) Distribution of Tajima's D in 201 genes in an African-American sample. The empirical distribution is denoted with bars. The solid line indicates the distribution of Tajima's D simulated under a standard neutral model with recombination using the ms program [59]. The dashed line indicates the distribution of Tajima's D simulated under the best fitting population demographic model from Akey et al. [60] (exponential expansion starting 50,000 years ago with a growth rate of 10-3 per generation). (B) Distribution of Tajima's D from the same 201 genes in a European-American sample. The best-fitting population demographic model is a bottleneck beginning 40,000 years ago with an inbreeding coefficient of 0.175 [60]. Data used to calculate Tajima's D in panels (A) and (B) was obtained from the SeattleSNPs project ( (C) The empirical distribution of FST for 5,590 chromosome 7 single nucleotide polymorphisms (SNPs) obtained from the HapMap project.61 The theoretical distributions of FST were simulated using ms [59]. An island migration model was assumed, with a constant migration parameter between each pair of populations. The solid line shows the expected distribution of FST under neutrality, whereas the dashed line shows the expected distribution under neutrality with an ascertainment bias favouring common SNPs. To approximate the ascertainment bias in the HapMap data, a 'double-hit' SNP discovery strategy was modelled [61] by randomly selecting four simulated chromosomes and only analysing SNPs where each allele was observed twice. The migration parameter was the same for both distributions and was chosen such that the mean of the biased FST distribution (0.138) closely matched the mean observed FST. (D) The empirical distribution of dn/ds from Clark et al. [62] Only those genes with dn > 0.001 and ds > 0.001 are shown. The solid line shows the distribution of dn/ds estimated using the method of Yang and Nielsen [63] for neutrally evolving coding sequences simulated with the PAML program [64]. The dashed line shows the distribution of dn/ds for sequences under negative selection, with the magnitude of the selective force chosen such that the mean log10 dn/ds (21.25) matched the mean of the Clark et al. [62] distribution. The length of the simulated sequences (450 codons) was chosen to match the mean length of sequences from Clark et al. [62].

In addition to providing empirical distributions, genomewide scans for natural selection offer several additional advantages compared with single-locus studies. Genome-wide scans can suggest general principles about the types of variation that natural selection acts most forcefully upon. Datasets derived from an unbiased sampling of loci throughout the genome allow for the discovery of novel functional elements whose presence is revealed by evidence for selection. Whole-genome scans also have the potential to reveal networks of genes whose evolutionary histories are correlated due to their collaboration in executing cellular functions. Finally, it is important to stress that genome-wide analyses do not preclude single-locus analyses, and that achieving a detailed and thorough understanding of the selective and demographic forces acting upon a locus will necessitate focused single-locus analyses drawing from multiple scientific disciplines.

Genome scans for natural selection

Several genome-wide scans for natural selection have recently been performed and are summarised in Table 2. These studies have used a variety of different statistical approaches, data and populations, but are united by the common theme of sampling a large number of loci and making inferences of natural selection. Below, some of these studies will be considered in more detail, to highlight the salient results emerging from genome-wide scans for selection.

Table 2 Summary of genome-wide scans for selection

One of the first genome-wide screens for selection to be performed analysed 26,530 single nucleotide polymorphisms (SNPs), which were genotyped in three human populations: African-Americans, East Asians and European-Americans [65]. An empirical distribution of FST was constructed and outlier SNPs in gene regions were identified. As discussed above, geographically restricted selection (local adaptation) can accentuate levels of population structure by creating large differences in allele frequencies between populations. Conversely, balancing selection can lead to lower than expected levels of population structure. In total, 174 candidate selection genes were identified whose levels of population structure were significantly different compared with neutral expectations (156 genes had exceptionally high values of FST and 18 had exceptionally low values of FST). In addition, the average FST was significantly different between SNPs located in exons, introns and non-genic regions, which is consistent with the action of purifying selection. One limitation of this study was that it relied upon markers that were discovered in a small number of chromosomes, which can lead to significant ascertainment bias (ie in this case, an over-representation of intermediate-frequency alleles). Such ascertainment bias complicates inferences of natural selection, and, as the authors note, additional analyses are needed to confirm the signature of selection in these genes.

Three genome-wide scans for natural selection have also been performed with microsatellite markers, [6668] the largest of which analysed 5,257 microsatellite markers in 28 individuals of European descent [66]. A sliding window analysis across the genome revealed 43 bins that contained a significant reduction in heterozygosity relative to neutral expectations. Interestingly, the recombination rate in these 43 bins was significantly reduced compared with the genomewide average, which is consistent with theoretical predictions that positive selection will be easier to detect in regions of the genome with low recombination rates [23].

The other two microsatellite based genome-wide scans for selection included multiple populations and searched for evidence of local adaptation by identifying outlier loci that exhibited large levels of population structure relative to the empirical distribution of all loci. Specifically, Kayser et al. [67] studied 332 microsatellite markers in 47 Europeans and 47 Africans (23 Ethiopians and 24 South Africans). The test statistics RST, a multiallelic analogue of FST, and ln RV, which is the natural log of the variance in allele sizes between populations, [69] were calculated for all loci. Numerous outlier loci were detected and 11 were studied further by genotyping additional microsatellite markers in these regions. The additional microsatellite analyses confirmed the large differences in genetic differentiation, which strengthens the hypothesis that outlier loci have been targets of geographically restricted selective pressures. Similarly, Storz et al. [68] analysed a total of 624 microsatellite loci that were previously genotyped in multiple populations from Africa, Europe and Asia. Again, measures of population structure were calculated for all markers (FST and an analogue to ln RV) and outlier loci were identified. In total, 13 outlier loci were found and all but one had significant reductions in heterozygosity in non-African populations; this was interpreted as evidence that local adaptation was more common outside of Africa. An important limitation of the microsatellite analyses is that the high mutation rate of microsatellites may obscure signatures of selection, except in low-recombining regions of the genome [70, 71].

In one of the largest gene-based genome-wide screens performed to date, Clark et al. [62] analysed 7,645 orthologous genes from humans, chimpanzees and mice (see also Figure 3D). Maximum-likelihood models were fitted to proteincoding DNA sequences to estimate rates of synonymous (ds) and non-synonymous (dn) substitutions. In total, 1,547 genes had dn/ds ratios > 1 in humans, which is commonly interpreted as evidence for positive selection, but the neutral model could be formally rejected at p < 0.05 for only six of these genes. Using an alternative statistical method with greater sensitivity, branch site models were fitted to the data in order to detect accelerated rates of dn/ds in the human lineage for a subset of nucleotide sites (ie dn/ds does not have to be > 1 for the entire gene). A total of 667 genes were identified as significant at p < 0.05 in this analysis; subsequent bioinformatics analyses revealed two interesting observations. First, accelerated rates of evolution were found for several functional classes of genes, including olfactory, nuclear transport and sensory perception. Secondly, genes with evidence for positive selection were enriched for genes that are associated with human diseases, as defined by the Online Mendelian Inheritance of Man (OMIM) database. OMIM primarily contains monogenic disease genes with large phenotypic effects, and it will therefore be interesting to see if these results also extend to complex disease genes. Indeed, signatures of natural selection have been described for several genes associated with various complex diseases [34, 7278]. If complex disease genes are enriched for signatures of natural selection, finding targets of adaptive evolution may be a useful strategy for prioritising candidate genes in diseasemapping studies.

It is important to note that a recent theoretical study has suggested that maximum-likelihood branch site models may have a high false-positive rate [79] and, therefore, the 667 significant (at p < 0.05) genes in the study by Clark et al. [62] may contain a higher than anticipated fraction of false positives. In addition, increased rates of dn/ds along a lineage do not always indicate the action of positive selection and can also occur due to relaxation of purifying selection [79, 80]. As the authors point out, obtaining polymorphism data from human populations would provide further insight into the evolutionary history of these genes and help to clarify some of the issues raised above.

Local adaptation

An interesting observation that has consistently emerged from large-scale studies of selection is that local adaptation may be a more common feature of recent human evolutionary history than previously thought [5260, 6368]. Human populations have clearly had dramatic range expansions during the past 100,000 years that, at least theoretically, may have led to geographically restricted selective pressures, such as unique dietary, pathogenic and climatic challenges. Several genes that possess patterns of genetic variation consistent with local adaptation have previously been reported (Table 3) [60, 7274, 76, 8185]. As an illustrative example, Figure 4 shows patterns of genetic variation for a 115 kilobase region on chromosome 7q33 that possesses a striking signature of local adaptation in European-American populations [60]. Two of the genes in this region, TRPV5 and TRPV6, mediate the rate-limiting step of dietary calcium absorption; [86, 87] given the fact that lactase persistence and related metabolic pathways were selected for in northern European populations, [83] they are particularly strong candidates for the gene or genes driving this pattern of local adaptation.

Table 3 Genes with evidence of local adaptation
Figure 4

Signature of local adaptation on chromosome 7q33. A graphical representation of genotypes is shown for 23 European-American (EA) and 24 African-Americans (AA) across a 115 kilobase region on chromosome 7q33, which encompasses four genes. Rows correspond to individuals and columns denote a particular single nucleotide polymorphism (SNP). For each SNP, blue, red and yellow boxes indicate whether the individual is homozygous for the common allele, heterozygous or homozygous for the rare allele, respectively. Grey boxes indicate missing data. Notice the significant reduction in polymorphism in the European-American sample, which is consistent with the hypothesis that variation in one or more of these four genes conferred a selective advantage to European- Americans but not African-Americans. See Akey et al. [60] for more details. This figure was produced using genotype data from SeattleSNPs project (

In addition, several studies have found that non-African populations possess more evidence for selection relative to African populations [60, 67, 68]. As most studies have considered only a single African population, however, it is difficult to determine whether the observed differences in the frequency of selective events between African and non-African populations is a general phenomenon or simply reflects the need to sample African populations more comprehensively. Furthermore, theoretical studies have demonstrated that the power to detect a recent selective sweep is greater compared with an older sweep [41, 42, 88, 89]. Therefore, the frequency of selective events may be similar in African and non-African populations, but may be easier to detect in non-African populations if they occurred more recently.

Looking ahead: The HapMap project

The HapMap project ( is a large international collaboration to describe patterns of common haplotype variation throughout the human genome [61]. The initial goal of the HapMap project is to genotype 600,000 SNPs in 270 individuals: 90 individuals of northern and western European ancestry (30 trios consisting of two parents and an adult child), 90 Yoruban individuals from Ibadan, Nigeria (30 trios), 45 unrelated Japanese individuals from Tokyo, Japan, and 45 unrelated Han Chinese individuals from Beijing, China. Although the HapMap project was initially developed to facilitate the search for complex disease genes, it will provide a powerful resource for population genetics and evolutionary studies. Specifically, it will provide a unifying publicly-available resource of genome-wide variation data to interrogate systematically for signatures of natural selection. As numerous evolutionary analyses will undoubtedly be conducted on the HapMap data, results can be verified across studies, which will allow prioritising candidate selection genes for subsequent studies.

Future challenges

It is important to temper our enthusiasm for genome-wide scans of natural selection because several analytical and conceptual challenges remain. For example, as indicated above, thousands of hypothesis tests will be performed in a typical study and it is necessary to correct for multiple tests to avoid an unacceptably high false-positive rate. One particularly appealing approach is to control the false discovery rate, [90, 91] which is more powerful than traditional methods such as Bonferroni corrections and has been used in a wide variety of genomics analyses. Furthermore, as numerous genome-wide scans for selection will be applied to common datasets, such as the HapMap, methods for combining results across studies would be invaluable.

A critical issue that has already arisen in current genomewide scans for selection is the need to verify the signature of selection through replication studies and by alternative experimental approaches. The importance of follow-up studies cannot be overstated because in their absence we will simply be left with a list of interesting 'candidate selection genes'. The problem of follow-up replication in genome-wide studies is a general one that has been considered in linkage analysis [92] and genetic association studies [93]. Clearly, replication in independent samples from the same population is an important criterion that can be used to discard false positives that accumulate from the multiple testing inherent in genome scans. Genome-wide study designs are known to suffer from the 'winner's curse' phenomenon, however, whereby the effect sizes of statistically significant loci are systematically over-estimated [93, 94]. If such concerns are ignored, the statistical power of subsequent replication attempts is likely to be over-estimated, leading the community to place undue faith in the veracity of failed replication attempts. Even if signatures of selection are confirmed, it remains difficult to identify the specific variants that have been subject to selection. Ideally, suspected targets of selection will be functionally characterised, which will facilitate inferences on genotype - phenotype correlations and ultimately on how the putative selected alleles affect fitness. Finally, more powerful methods to estimate evolutionary parameters, such as the timing of selective events and the strength of selection, need to be developed.

In addition to the issues described above, it is important to note that all of the statistical methods and studies considered in this review are predicated upon simple theoretical models of natural selection. For example, tests such as Tajima's D search for signatures of selection that act on a single locus. Genes do not exist in isolation, however, and it is possible -- perhaps even likely -- that selection acts on combinations of alleles, a process that is referred to as epistatic selection [95]. Recently, two studies in Drosophila melanogaster demonstrated strong empirical evidence for epistatic selection [96, 97]. It seems likely that that progress in reconstructing gene and protein networks will serve as a valuable guide in beginning to explore epistatic selection in humans.


The intersection of high-throughput methods to access human genetic variation on a genome-wide scale and statistical tools to identify signatures of natural selection will undoubtedly provide a deeper understanding of how adaptive processes helped to shape our genomes. Furthermore, the same resources used to scan the genome for signatures of selection will also provide a more comprehensive understanding of human demographic history, which will be necessary to understand how neutral and non-neutral evolutionary forces have interacted to shape extant patterns of human genetic and phenotypic diversity. Although many hurdles are likely to be encountered, the evolutionary insights obtained from genome-wide analyses will have implications for many contemporary issues, such as the functional annotation of the human genome and the discovery of complex disease genes.


  1. 1.

    Valle D: Genetics, individuality, and medicine in the 21st century. Am J Hum Genet. 2004, 74: 374-381. 10.1086/382790.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  2. 2.

    Bamshad M, Wooding SP: Signatures of natural selection in the human genome. Nat Rev Genet. 2003, 4: 99-111. 10.1038/nrg999.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Harr B, Kauer M, Schlotterer C: Hitchhiking mapping: A population-based fine mapping strategy for adaptive mutations in Drosophila melanogaster. Proc Natl Acad Sci USA. 2002, 99: 12949-12954. 10.1073/pnas.202336899.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  4. 4.

    Kauer MO, Dieringer D, Schlotterer C: A microsatellite variability screen for positive selection associated with the "Out of Africa" habitat expansion of Drosophila melanogaster. Genetics. 2003, 165: 1137-1148.

    PubMed Central  CAS  PubMed  Google Scholar 

  5. 5.

    Schofl G, Schlotterer C: Patterns of microsatellite variability among X chromosomes and autosomes indicate a high frequency of beneficial mutations in non-African D. simulans. Mol Biol Evol. 2004, 21: 1384-1390. 10.1093/molbev/msh132.

    Article  PubMed  Google Scholar 

  6. 6.

    Kimura M: Evolutionary rate at the molecular level. Nature. 1968, 217: 624-626. 10.1038/217624a0.

    CAS  Article  PubMed  Google Scholar 

  7. 7.

    King JL, Jukes TH: Non-Darwinian evolution. Science. 1969, 164: 788-798. 10.1126/science.164.3881.788.

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Kimura M: The Neutral Theory of Molecular Evolution. 1983, Cambridge University Press, Cambridge, UK

    Google Scholar 

  9. 9.

    Ohta T: Slightly deleterious mutant substitutions in evolution. Nature. 1973, 246: 96-98. 10.1038/246096a0.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Ohta T, Gillespie JH: Development of neutral and nearly neutral theories. Theor Popul Biol. 1996, 49: 128-142. 10.1006/tpbi.1996.0007.

    Article  PubMed  Google Scholar 

  11. 11.

    Otto SP: Detecting the form of selection from DNA sequence data. Trends Genet. 2000, 16: 526-529. 10.1016/S0168-9525(00)02141-7.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Nielsen R: Statistical tests of selective neutrality in the age of genomics. Heredity. 2001, 86: 641-647. 10.1046/j.1365-2540.2001.00895.x.

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    Kingman JFC: The coalescent. Stochastic Process Appl. 1982, 13: 235-248. 10.1016/0304-4149(82)90011-4.

    Article  Google Scholar 

  14. 14.

    Kingman JFC: On the genealogy of large populations. J Appl Prob. 1982, 19A: 27-43.

    Article  Google Scholar 

  15. 15.

    Hudson RR: Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983, 23: 183-201. 10.1016/0040-5809(83)90013-8.

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Hudson RR: Testing the constant-rate neutral allele model with protein sequence data. Evolution. 1983, 37: 203-217. 10.2307/2408186.

    Article  Google Scholar 

  17. 17.

    Tajima F: Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983, 105: 437-460.

    PubMed Central  CAS  PubMed  Google Scholar 

  18. 18.

    Fu YX, Li WH: Coalescing into the 21st century: An overview and prospects of coalescent theory. Theor Popul Biol. 1999, 56: 1-10. 10.1006/tpbi.1999.1421.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Rosenberg NA, Nordborg M: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet. 2002, 3: 380-390. 10.1038/nrg795.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Charlesworth B, Morgan MT, Charlesworth D: The effect of deleterious mutations on neutral molecular variation. Genetics. 1993, 134: 1289-1303.

    PubMed Central  CAS  PubMed  Google Scholar 

  21. 21.

    Hudson RR, Kaplan NL: Deleterious background selection with recombination. Genetics. 1995, 141: 1605-1617.

    PubMed Central  CAS  PubMed  Google Scholar 

  22. 22.

    Neuhauser C, Krone SK: The genealogy of samples in models with selection. Genetics. 1997, 145: 519-534.

    PubMed Central  CAS  PubMed  Google Scholar 

  23. 23.

    Maynard Smith J, Haigh J: The hitch-hiking effect of a favorable gene. Genet Res. 1974, 231: 1114-1116.

    Google Scholar 

  24. 24.

    Thomson G: The effect of a selected locus on a linked neutral locus. Genetics. 1977, 85: 752-788.

    Google Scholar 

  25. 25.

    Kaplan N, Hudson RR, Langley CH: The "hitchhiking effect" revisited. Genetics. 1989, 123: 887-899.

    PubMed Central  CAS  PubMed  Google Scholar 

  26. 26.

    Stephan W, Wiehe THE, Lenz MW: The effect of strongly selected substitutions on neutral polymorphism: Analytical results based on diffusion theory. Theor Popul Biol. 1992, 41: 237-254. 10.1016/0040-5809(92)90045-U.

    Article  Google Scholar 

  27. 27.

    Nordborg M: Structured coalescent processes on different time scales. Genetics. 1997, 146: 1501-1514.

    PubMed Central  CAS  PubMed  Google Scholar 

  28. 28.

    Schierup MH, Vekemans X, Charlesworth D: The effect of subdivision on variation at multi-allelic loci under balancing selection. Genet Res. 2000, 76: 51-62. 10.1017/S0016672300004535.

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Kelly JK, Wade MJ: Molecular evolution near a two-locus balanced polymorphism. J Theor Biol. 2000, 204: 83-101. 10.1006/jtbi.2000.2003.

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Nordborg M, Innan H: The genealogy of sequences containing multiple sites subject to strong selection in a subdivided population. Genetics. 2003, 163: 1201-1213.

    PubMed Central  PubMed  Google Scholar 

  31. 31.

    Neilsen R: Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics. 2000, 154: 931-942.

    Google Scholar 

  32. 32.

    Braverman JM, Hudson RR, Kaplan NL, et al: The hitchhiking effect on the site frequency spectrum of DNA polymorphism. Genetics. 1995, 140: 783-796.

    PubMed Central  CAS  PubMed  Google Scholar 

  33. 33.

    Fay JC, Wu CI: Hitchhiking under positive Darwinian selection. Genetics. 2000, 155: 1405-1413.

    PubMed Central  CAS  PubMed  Google Scholar 

  34. 34.

    Sabeti PC, Reich DE, Higgins JM, et al: Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002, 419: 832-837. 10.1038/nature01140.

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Takahata N, Nei M: Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics. 1990, 124: 967-978.

    PubMed Central  CAS  PubMed  Google Scholar 

  36. 36.

    Tajima F: The effect of change in population size on DNA polymorphism. Genetics. 1989, 123: 597-601.

    PubMed Central  CAS  PubMed  Google Scholar 

  37. 37.

    Przeworski M, Hudson RR, Di Rienzo A: Adjusting the focus on human variation. Trends Genet. 2000, 16: 296-302. 10.1016/S0168-9525(00)02030-8.

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Kreitman M: Methods to detect selection in populations with applications to the human. Annu Rev Genomics Hum Genet. 2000, 1: 539-559. 10.1146/annurev.genom.1.1.539.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.

    PubMed Central  CAS  PubMed  Google Scholar 

  40. 40.

    Fu YX, Li WH: Statistical test of neutrality of mutations. Genetics. 1993, 133: 693-709.

    PubMed Central  CAS  PubMed  Google Scholar 

  41. 41.

    Fu YX: Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics. 1997, 186: 1997-2004.

    Google Scholar 

  42. 42.

    Simonsen KL, Churchill GA, Aquadro CF: Properties of statistical tests of neutrality for DNA polymorphism data. Genetics. 1995, 141: 413-429.

    PubMed Central  CAS  PubMed  Google Scholar 

  43. 43.

    Kuhner MK, Yamato J, Felsenstein J: Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics. 1995, 140: 1421-1430.

    PubMed Central  CAS  PubMed  Google Scholar 

  44. 44.

    Kuhner MK, Yamato J, Felsenstein J: Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 1998, 149: 429-434.

    PubMed Central  CAS  PubMed  Google Scholar 

  45. 45.

    Felsenstein J: Likelihood calculations on coalescents. Inferring Phylogenies. Edited by: Felsenstein J. 2004, Sinauer Associates, Sunderland, MA, 470-487.

    Google Scholar 

  46. 46.

    Cavalli-Sforza LL: Population structure and human evolution. Proc R Soc Lond B Biol Sci. 1966, 164: 362-379. 10.1098/rspb.1966.0038.

    CAS  Article  PubMed  Google Scholar 

  47. 47.

    Lewontin RC, Krakauer J: Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics. 1973, 74: 175-195.

    PubMed Central  CAS  PubMed  Google Scholar 

  48. 48.

    Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution. 1984, 38: 1358-1370. 10.2307/2408641.

    Article  Google Scholar 

  49. 49.

    Vitalis R, Dawson K, Boursot P: Interpretation of variation across marker loci as evidence of selection. Genetics. 2001, 158: 1811-1823.

    PubMed Central  CAS  PubMed  Google Scholar 

  50. 50.

    Beaumont MA, Balding DJ: Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol. 2004, 13: 969-980. 10.1111/j.1365-294X.2004.02125.x.

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Hudson RR, Kreitman M, Aguade M: A test of neutral molecular evolution based on nucleotide data. Genetics. 1987, 116: 153-159.

    PubMed Central  CAS  PubMed  Google Scholar 

  52. 52.

    McDonald JH, Kreitman M: Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991, 351: 652-654. 10.1038/351652a0.

    CAS  Article  PubMed  Google Scholar 

  53. 53.

    McDonald JH: Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol Biol Evol. 1998, 15: 377-384. 10.1093/oxfordjournals.molbev.a025934.

    CAS  Article  PubMed  Google Scholar 

  54. 54.

    Eyre-Walker A: Changing effective population size and the McDonald-Kreitman test. Genetics. 2002, 162: 2017-2024.

    PubMed Central  PubMed  Google Scholar 

  55. 55.

    Nielsen R, Yang Z: Likelihood models for detecting positive selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998, 148: 929-936.

    PubMed Central  CAS  PubMed  Google Scholar 

  56. 56.

    Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.

    CAS  PubMed  Google Scholar 

  57. 57.

    Suzuki Y, Gojobori T: A method for detecting positive selection at single amino acid sites. Mol Biol Evol. 1999, 16: 1315-1328. 10.1093/oxfordjournals.molbev.a026042.

    CAS  Article  PubMed  Google Scholar 

  58. 58.

    Andolfatto P: Adaptive hitchhiking effects on genome variability. Curr Opin Genet Dev. 2001, 11: 635-641. 10.1016/S0959-437X(00)00246-X.

    CAS  Article  PubMed  Google Scholar 

  59. 59.

    Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002, 18: 337-338. 10.1093/bioinformatics/18.2.337.

    CAS  Article  PubMed  Google Scholar 

  60. 60.

    Akey JM, Eberle MA, Rieder MJ, et al: Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2004, 2: 1591-1599.

    CAS  Article  Google Scholar 

  61. 61.

    International HapMap Consortium: The international HapMap project. Nature. 2003, 426: 789-794. 10.1038/nature02168.

    Article  Google Scholar 

  62. 62.

    Clark AG, Glanowski S, Nielsen R, et al: Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science. 2003, 302: 1960-1963. 10.1126/science.1088821.

    CAS  Article  PubMed  Google Scholar 

  63. 63.

    Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43. 10.1093/oxfordjournals.molbev.a026236.

    CAS  Article  PubMed  Google Scholar 

  64. 64.

    Yang Z: PAML: A program package for phylogenetic analysis by maximum likelihood. Comput Appl BioSci. 1997, 13: 555-556.

    CAS  PubMed  Google Scholar 

  65. 65.

    Akey JM, Zhang G, Zhang K, et al: Interrogating a highdensity SNP map for signatures of natural selection. Genome Res. 2002, 12: 1805-1814. 10.1101/gr.631202.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  66. 66.

    Payseur BA, Cutter AD, Nachman MW: Searching for evidence of positive selection in the human genome using patterns of microsatellite variability. Mol Biol Evol. 2002, 19: 1143-1153. 10.1093/oxfordjournals.molbev.a004172.

    CAS  Article  PubMed  Google Scholar 

  67. 67.

    Kayser M, Brauer S, Stoneking M: A genome scan to detect candidate regions influenced by local natural selection in human populations. Mol Biol Evol. 2003, 20: 893-900. 10.1093/molbev/msg092.

    CAS  Article  PubMed  Google Scholar 

  68. 68.

    Storz JF, Payseur BA, Nachman MW: Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. Mol Biol Evol. 2004, 21: 1800-1811. 10.1093/molbev/msh192.

    CAS  Article  PubMed  Google Scholar 

  69. 69.

    Schlötterer C: A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics. 2002, 160: 753-763.

    PubMed Central  PubMed  Google Scholar 

  70. 70.

    Schlötterer C, Wiehe T: Microsatellites, a neutral marker to infer selective sweeps. Microsatellites -- Evolution and Applications. Edited by: Goldstein D, Schlötterer C. 1999, Oxford University Press, Oxford, UK, 238-248.

    Google Scholar 

  71. 71.

    Wiehe T: The effect of selective sweeps on the variance of the allele distribution of a linked multi-allele locus-hitchhiking of microsatellites. Theor Popul Biol. 1998, 53: 272-283. 10.1006/tpbi.1997.1346.

    CAS  Article  PubMed  Google Scholar 

  72. 72.

    Hamblin MT, Di Rienzo A: Detection of the signature of natural selection in humans: Evidence from the Duffy blood group locus. Am J Hum Genet. 2000, 66: 1669-1679. 10.1086/302879.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  73. 73.

    Tishkoff SA, Varkonyi R, Cahinhinan N, et al: Haplotype diversity and linkage disequilibrium at human G6PD: Recent origin of alleles that confer malarial resistance. Science. 2001, 293: 455-462. 10.1126/science.1061573.

    CAS  Article  PubMed  Google Scholar 

  74. 74.

    Hamblin MT, Thompson EE, Di Rienzo A: Complex signatures of natural selection at the Duffy blood group locus. Am J Hum Genet. 2002, 70: 369-383. 10.1086/338628.

    PubMed Central  Article  PubMed  Google Scholar 

  75. 75.

    Bamshad MJ, Mummidi S, Gonzalez E, et al: A strong signature of balancing selection in the 50 cis-regulatory region of CCR5. Proc Natl Acad Sci USA. 2002, 99: 10539-10544. 10.1073/pnas.162046399.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  76. 76.

    Fullerton SM, Bartoszewicz A, Ybazeta G, et al: Geographic and haplotype structure of candidate type 2 diabetes susceptibility variants at the calpain-10 locus. Am J Hum Genet. 2002, 70: 1096-1106. 10.1086/339930.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  77. 77.

    Rockman MV, Hahn MW, Soranzo N, et al: Positive selection on MMP3 regulation has shaped heart disease risk. Curr Biol. 2004, 14: 1531-1539. 10.1016/j.cub.2004.08.051.

    CAS  Article  PubMed  Google Scholar 

  78. 78.

    Nakajima T, Wodding S, Sakagami T, et al: Natural selection and population history in the human angiotensinogen gene (AGT): 736 complete ATG sequences in chromosomes from around the world. Am J Hum Genet. 2004, 74: 898-916. 10.1086/420793.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  79. 79.

    Zhang J: Frequent false detection of positive selection by the likelihood method with branch-site models. Mol Biol Evol. 2004, 21: 1332-1339. 10.1093/molbev/msh117.

    CAS  Article  PubMed  Google Scholar 

  80. 80.

    Rooney AP, Zhang J: Rapid evolution of a primate sperm protein: Relaxation of functional constraint or positive Darwinian selection?. Mol Biol Evol. 1999, 16: 706-710. 10.1093/oxfordjournals.molbev.a026153.

    CAS  Article  PubMed  Google Scholar 

  81. 81.

    Gilad Y, Rosenberg S, Przeworski M, et al: Evidence for positive selection and population structure at the human MAO-A gene. Proc Natl Acad Sci USA. 2002, 99: 862-867. 10.1073/pnas.022614799.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  82. 82.

    Rana BK, Hewett-Emmett D, Jin L, et al: High polymorphism at the human melanocortin 1 receptor locus. Genetics. 1999, 151: 1547-1557.

    PubMed Central  CAS  PubMed  Google Scholar 

  83. 83.

    Bersaglieri T, Sabeti PC, Patterson N, et al: Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet. 2004, 74: 1111-1120. 10.1086/421051.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  84. 84.

    Stephens JC, Reich DE, Goldstein DB, et al: Dating the origin of the CCR5-Delta32 AIDS-resistance allele by the coalescence of haplotypes. Am J Hum Genet. 1998, 62: 1507-1515. 10.1086/301867.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  85. 85.

    Rockman MV, Hahn MW, Soranzo N, et al: Positive selection on a human-specific transcription factor binding site regulating IL4 expression. Curr Biol. 2003, 13: 2118-2123. 10.1016/j.cub.2003.11.025.

    CAS  Article  PubMed  Google Scholar 

  86. 86.

    Nijenhuis T, Hoenderop JGJ, Nilius B, Bindels RJM: (Patho)physiological implications of the novel epithelial Ca2þ channels TRPV5 and TRPV6. Pflugers Arch. 2003, 446: 401-409. 10.1007/s00424-003-1038-7.

    CAS  Article  PubMed  Google Scholar 

  87. 87.

    van de Graaf SF, Hoenderop JG, Gkika D, et al: Functional expression of the epithelial Ca2+ channels (TRPV5 and TRPV6) requires association of the S100A10-annexin 2 complex. EMBO J. 2003, 22: 1478-1487. 10.1093/emboj/cdg162.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  88. 88.

    Kim Y, Stephan W: Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics. 2000, 155: 1415-1427.

    PubMed Central  CAS  PubMed  Google Scholar 

  89. 89.

    Przeworski M: The signature of positive selection at randomly chosen loci. Genetics. 2002, 160: 1179-1189.

    PubMed Central  PubMed  Google Scholar 

  90. 90.

    Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. JR Stat Soc. 1995, 57: 289-300.

    Google Scholar 

  91. 91.

    Storey JD, Tibshirani R: Statistical significance for genome-wide experiments. Proc Nat Acad Sci USA. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  92. 92.

    Lander E, Kruglyak L: Genetic dissection of complex traits: Guidelines for interpreting and reporting linkage results. Nat Genet. 1995, 11: 241-247. 10.1038/ng1195-241.

    CAS  Article  PubMed  Google Scholar 

  93. 93.

    Lohmueller KE, Pearce CL, Pike M, et al: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet. 2003, 33: 177-182. 10.1038/ng1071.

    CAS  Article  PubMed  Google Scholar 

  94. 94.

    Goring HH, Terwilliger JD, Blangero J: Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet. 2001, 69: 1357-1369. 10.1086/324471.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  95. 95.

    Lewontin RC, Kojima K: The evolutionary dynamics of complex polymorphisms. Evolution. 1960, 14: 458-472. 10.2307/2405995.

    Article  Google Scholar 

  96. 96.

    Takano-Shimizu T, Kawabe A, Inomata N, et al: Interlocus nonrandom association of polymorphisms in Drosophila chemoreceptor genes. Proc Natl Acad Sci USA. 2004, 101: 14156-14161. 10.1073/pnas.0401782101.

    PubMed Central  Article  PubMed  Google Scholar 

  97. 97.

    Zapata C, Nunez C, Velasco T: Distribution of nonrandom associations between pairs of protein loci along the third chromosome of Drosophila melanogaster. Genetics. 2002, 161: 1539-1550.

    PubMed Central  CAS  PubMed  Google Scholar 

Download references


We thank Jennifer Madeoy, Dayna Akey and an anonymous reviewer for critical reading of the manuscript and providing valuable comments. J.R. is supported by the University of Washington Medical Scientist Training Program. J.M.A. is supported by a Pilot and Feasibility Award from the Clinical Nutrition Research Unit at the University of Washington.

Author information



Corresponding author

Correspondence to Joshua M. Akey.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ronald, J., Akey, J.M. Genome-wide scans for loci under selection in humans. Hum Genomics 2, 113 (2005).

Download citation


  • genetic variation
  • evolutionary genomics
  • natural selection
  • single nucleotide polymorphisms (SNPs)