Genome-wide scans for loci under selection in humans

Natural selection, which can be defined as the differential contribution of genetic variants to future generations, is the driving force of Darwinian evolution. Identifying regions of the human genome that have been targets of natural selection is an important step in clarifying human evolutionary history and understanding how genetic variation results in phenotypic diversity, it may also facilitate the search for complex disease genes. Technological advances in high-throughput DNA sequencing and single nucleotide polymorphism genotyping have enabled several genome-wide scans of natural selection to be undertaken. Here, some of the observations that are beginning to emerge from these studies will be reviewed, including evidence for geographically restricted selective pressures (ie local adaptation) and a relationship between genes subject to natural selection and human disease. In addition, the paper will highlight several important problems that need to be addressed in future genome-wide studies of natural selection.


Introduction
Phenotypic diversity is au biquitous characteristic of natural populations. Individuals vary in almost everyc onceivable way, including physicalappearance,behaviour,disease susceptibility, ability to detoxify drugsa nd perception of environmental stimuli. 1 Although environmental forces undoubtedly contribute to phenotypic variation, so too does genetic variation. Therefore, explaining the evolutionary forces that create,maintain and shape patterns of human genetic variation is of fundamental importance in understandingp henotypic variation. 2 An important goal in studies of human genetic variationi s to identifyl oci that have been targets of natural selection due to their variable effects on the fitness of individuals throughout ap opulation'sh istory. Signatures of natural selection delimit regions of the genome that are,o rh aveb een, functionally important. Therefore, identifying such regions will facilitate the identification of genetic variationt hat contributes to phenotypic variationa nd help to functionally annotate the genome.U nfortunately,i nferring the action of natural selectionr emains ac hallenge.T his is likely to change in the near future,a sh igh-throughput methods for cataloguing genetic variation on ag enome-wide scale and new statistical tools for detecting selectionh aveb een, and continue to be, developed.
Much important work has been done on genome scans for natural selection in model organisms such as Drosophila; 3-5 this review,h owever,w ill focus on studies performed in human populations. Firstly,t here will be as ummaryo ft he effects of natural selection and population historyonp atternso fg enetic variationa nd someo ft he common statistical methods used to test for deviations from neutrality will be presented. Next, ac ritical evaluation of several empirical genome-wides cans for selection will be presented. Finally,the paper will highlight several important problems, both practical and conceptual, that need to be addressed in future studies.

Human genetic variation: The neutral expectation
The evolutionarys ojourno fanewly arisen mutation dependsu pon howi ta ffects the fitness of the individual who possessesi t. The neutral theoryo fm oleculare volution posits that the vast majority of polymorphisms in apopulation are selectively neutral and have no appreciablee ffects on fitness. 6,7 Under neutrality,c hangesi na llelef requency are governed by the stochastic effects of genetic drift in populations of finite size.T hus, the effective populations ize, N e ,a nd neutral mutation rate, m o ,d etermine levelso f polymorphism within species and the rate of divergence between species. 8 In addition, the effect of mutations with small fitnesse ffects can be rendered 'nearly neutral' if the productofN e and s(which measures the strength of selection) is , 1. 9,10 For humanp opulations, N e is approximately 10,000a nd therefore j s j must be greatert han 10 2 4 to overcome the stochastic effects of genetic drift. Because the neutral theorym akes explicit and quantitativep redictions about expected patterns of genetic variation within and between species,i ti sa ni ndispensable tool in studies of natural selection. Specifically,t he neutral theoryp rovides an essential foundation for evaluating the evidence either for or againsts election in empirical data, as it servesa st he null hypothesis when exploring alternative evolutionary models. 11,12 In combination with the neutral theory, coalescent theory provides ap owerfulf ramework for conceptualisinga nd makinginferences about evolutionaryforces. Thecoalescent is astochastic model of gene geneaologies 13 -17 and has emerged as the primaryanalyticaltoolinstudies of genetic variation. In classical population genetics theory, the initials tate of a population is defined and one observest he evolution of the entire population by looking forward in time.Bycontrast,the coalescenti sasample-driven theoryt hat traces the history of coalescente vents backwards in time (see Figure 1f or some basic propertieso ft he coalescent). Several excellent and detailed reviews of coalescentt heoryc an be found elsewhere. 18,19 Deviations from the standard neutral model distortt he branch lengths, topology and coalescentt imes of gene genealogies, as described below.
Evolutionaryf orces perturb patterns of genetic variation Explicitly tracing the historyofasample of alleles from the population, each progenitor allele is derived from ar andomly chosen parental allele.O ccasionally,t wo progenitor alleles are derived from the same parent, causing the lineages of these two alleles to unite or coalesce when they arefollowed backwardintime from the present. Note that if progeny arederived from parents at random, the probability that two lineages coalesce increases as the number of distinct lineages increases and as the effectivep opulation size decreases. Thus, for ac onstant-sized population, ac haracteristic distribution of waiting times between coalescent events is expected. (B) The untangled, coalescent representation of (A) is created by treating lineages as branches, ignoring the intermediate ancestors between coalescent events. Under neutrality,m utational events (represented by shaded diamonds) areuniformly distributed throughout time,h ence the number of mutations that occur on ab ranch is proportional to the length of the branch. Note that mutations occurring on external branches are rare, appearing on only as ingle allele,w hereas mutations occurring on internal branches arecommon. as tandard neutral model (constants ized, randomly mating, panmictic populationa tm utationd rift equilibrium).B elow, the wayi nw hich selection and demographic historya ffect patterns of genetic variation will be considered, from a coalescentp oint of view. Theoretical studies have investigated the evolutionary dynamics of genetic variation subject to av ariety of selectivep ressures, including, purifying 20 -22 positive 23 -26 and balancing selection. 27 -30 This paper will focusonpositiveand balancing selection, as these have been the primaryt ypes of selectiont hat current genome-wide scans have studied. Positive selection acts to increase the frequency of advantageous alleles in ap opulation. Strongly advantageous mutations are rapidly swept to fixation, hence the term' selective sweep'. Importantly,t hrough ap rocess referred to as 'genetic hitchhiking', positives election also affects patterns of neutral polymorphisms linked to an advantageous mutation. 23 Positive selectionl eads to as hallows tar-like genealogy ( Figure 2) with adecreased time to the most recent common ancestor. Tracing the historyo fa lleles backwards in time,t hese effects area direct consequence of the rapid coalescence of lineages in the small but expanding progenitor population. 31 The signature of positives election includes reduced levelso fg enetic variation compared with neutral expectations, 24 -26 as kewi nt he allele frequency spectrum towardslow-frequency alleles 32 (including an excesso fh igh-frequency derived alleles 33 )a nd elevated levels of linkage disequilibrium. 34 Balancings election occursw hen polymorphisms are selectively maintained in ap opulation. By contrast with positiveselection, the genealogy of alocus subject to balancing selectioni sc haracterised by an increased time to the most recent common ancestor and long internal branches( Figure 2). The effect of balancings election on gene genealogies can be understood by considering balanced alleles as distinct subpopulations, such that coalescence events can occur rapidly within as ubpopulationb ut slowly between subpopulations. 27 The signature of balancings election includes elevated levels of polymorphism relative to neutral expectations and as kew of the allele frequency distribution towards an excess of intermediate frequency alleles. 29,30,35 In addition to natural selection, population demographic historyc an also have strong influences on patterns of genetic variation, which often mimic thee ffect of natural selection. 36,37 In other words, inferences of natural selection are confounded by populationd emographic history. For example, both positives election and increases in population size have similar effects on geneg enealogies ( Figure 2); both processes therefore lead to an excess of low-frequency alleles in ap opulation. In fact, strong positives election can be thought of as arapidpopulation expansion of an advantageous allele as it sweeps through apopulation. Similarly,population structure and balancings electionb oth result in subdivided genealogies and thereforeboth processes are expected to result in an excess of intermediate-frequency alleles in ap opulation ( Figure 2). Population bottlenecks can lead to an excess of either low-or intermediate-frequency alleles relative to neutral expectations, depending on thea ge and severity of theb ottleneck. Figure 2 demonstrates the effect of as evere and recent bottleneck, which forces all lineages to coalesce at thet ime of the size reduction and results in ag enealogy that is similar to positive selection. Human populations clearly do not meeta ll of the assumptions of the standard neutral model; hence,r ejecting the standard neutral model for ap articular locus cannot be interpreted as unambiguous evidence for selection.
Detecting the signatureo f natural selection Before presenting ther esults from genome-wide scans for natural selection, there nowf ollows ab rief description of some commonly used statistical methods designed to detect departures from neutrality,highlighting some of their strengths and limitations. Thef ollowing is not meant to be an exhaustived iscussion of such tests, and descriptions of many interesting and useful methods will not be included here. For further study,t he reader is encouragedt os ee an excellent review by Kreitman. 38 Statistical tests of neutrality can broadly be classified into three categories, based upon the type of data that they use: 1) within species tests; 2) within and between species tests; and 3) between species tests ( Table 1). The most common class of within-species tests compares summarys tatistics of the observeda llelef requency distribution at al ocus witht he values expected under neutral evolution; it includes Ta jima's D, 39 Fu and Li'sDand F, 40 Faya nd Wu's H 33 and Fu'sF s . 41 An attractivef eature of these tests is that they do not require any ap riori classification of functional versus non-functional sites, thus making them equally suitable for protein and nonprotein codingr egions. In at horough examination of the powerofT ajima's Dand Fu and Li'sDand F, Simonsen et al. 42 found that these tests can only detect selectives weeps in a narrowtime intervalinthe recent past, and that they can only detectb alancing selection if it has acted for av eryl ong time period. Interestingly,Simonsen et al. also found that the power of these tests could drop belowt he nominal false-positive rate, a ,i fn on-neutral evolution didn ot occur within these critical time windows, thus creating the undesirable scenario in which rejection of the null is more likely if the null is true than if it is false.F uo bserveds imilar results, although he found that F s wasm ost powerful and performed better at detecting more ancient positives election. 41 The site-frequency spectrum tests discussed above are confounded by demographic events sucha sp opulation growth, bottlenecks and subdivision (Figure2 )a nd are rendered conservativeb yi ntra-locus recombination. The desire to estimate population demographic parameters, recombination rates and evolutionaryp arametersh as prompted the developmento fm aximum likelihood-based methods which use the complete data, rather than summary statistics. 31,43,44 These methods are computationally intensive and aren ot currently feasible for large datasets, but they potentially allow for substantial gains in statisticalp ower relative to summarystatistics methods and are likely to become increasingly important tools in thef uture (for ag eneral discussion, see Felsenstein 45 ).
Another within-species test that has been used to detect selectioni st oc ompare the variationi na llele frequencies between populations, which can be quantified by thes tatistic F ST .U nder selectiven eutrality,F ST is determined by genetic drift, whereas natural selection is al ocus-specific force that can cause systematic deviations in F ST values for as elected gene and nearbyg enetic markers. For example, geographically restrictedd irectional selection mayl eadt oa n increase in F ST of as elected locus, whereas balancing or species-wide directional selection mayl ead to ad ecrease in F ST compared with neutrallye volving loci. 46 -50 In as eries of simulation experiments analysing twod ifferent F ST test implementations, Beaumonta nd Balding found that this approach yielded sufficient powert od etectp ositives election provided that the selective coefficient wasa pproximately five times larger than the migrationr ate,b ut that F ST had little powert od etect balancings election. 50 Positives election is also expected to increase levels of linkage disequilibrium (LD) relative to neutral expectations. Recently,anew statistical test wasd eveloped, the long-range haplotype (LRH) test, 33 which takes advantage of ancestral recombination events and the associated decayi nL Dt oi dentify genes subject to positives election. The rationale for this test is that ac ommon allelew ith long-range LD potentially represents as ite that has appeared recently and wasd rivent o high frequency before recombination could erode LD.T he LRH approach does not detect balancing selection, however, and the robustness of the test to non-neutral population demographics, the choice of haplotype defining markersa nd phase misspecificationh aven ot been well studied.
The second major classofneutrality tests compares levels of within-species polymorphism and between-species divergence and includes the Hudson -Kreitman -Aguade (HKA) 51 and McDonald -Kreitman (MK) 52 tests. The HKA method tests the goodnesso ffi to ft he observedl evels of polymorphism within species and the observeddivergence between species to those predicted under neutral theory. In order to determine polymorphism and divergence expectations under neutrality, data are required from at least twol oci in each species,s ot hat as imultaneous estimate can be made of at ime-since-speciation parameter and arelative population size parameter.Under the HKA test, rejection of the null is formally interpreted as elevated polymorphism at one locus or reduced polymorphism at the other,o re xcess divergence at one locus or limited divergence at the other.T hus, it mayn ot be obvious which locus or which process is responsible for producing a statistically significant test. McDonald 53 has described improvements to the HKA test which maya meliorate this problem.
In the MK test, a2£ 2c ontingency table is formed to compare the number of non-synonymous and synonymous sites that arep olymorphic within as pecies (P N and P S )a nd fixed between species (D N and D S ). Under neutrality,t he ratio of non-synonymous to synonymous sitest hat are polymorphic equals the ratio of non-synonymoust o synonymous sites that arefi xed( ie P N /P S ¼ D N /D S ). Under positives election, however, these twor atios aren ol onger equal and D N /D S . P N /P S . 54 Among the strengths of the MK test are that it does not requirea ssumptions about population demographic history( although under some circumstances the test can be adversely affected by increases in effective population size 54 )a nd is relatively insensitivet oi ntra-locus recombination. Positiveo rp urifying selection for codon usage may, however, bias the MK test. 38 The final class of neutrality tests uses between-species data to test for adaptivep rotein evolution. The classic test of positives election compares the number of non-synonymous aminoa cid substitutions in ag ene (d n )w ith the number of synonymous aminoacid substitutions (d s ). Under neutrality, the mutationr ate at both categories of sitesi st he same, and d n /d s is expected to equalo ne;h owever, d n /d s , 1f or proteinss ubjectt op urifying selection and d n /d s . 1f or proteinsu nder adaptivee volution. Although d n /d s . 1 providess trong evidence for adaptivep rotein evolution, it is av eryc onservativet est, particularly if only as malln umber of codons have been selected for.T he basic test has also been extended by Nielsen and Ya ng 55 and otherst oi nclude models of codona nd transition/transversion bias, to detect variationi nd n /d s ratios among lineages and to identify specific codons under selection. 56,57 Keya dvantageso fg enome-wide analyses As alluded to above,d istinguishingb etween the confounding effects of natural selection and populationd emographic historyi sd ifficultw hen studying as inglel ocus. When many unlinked genes arec onsidered, however, ac lear strategy emerges. Population demographic historya ffectsp atterns of variationa ta ll loci in ag enome in as imilar manner,w hereas natural selection acts upons pecific loci. 12,37,46,58 Therefore, by sampling al arge number of unlinked loci throughout the genome,e mpirical distributions of test statistics can be constructed and genes subject to locus-specific forces, such as natural selection, can be identifieda so utlier loci.
To provide some examples of howg enome-wide analyses can facilitate inferences of natural selection, Figure 3s hows empirical distributions of Ta jima's D, F ST and d n /d s ,a long with their theoretical distributions, simulated under both standard neutral models and alternative demographic histories. Figure 3h ighlights twoi mportant points. First, empirical distributions provide important information that can be used to infer population demographic history. For example,t he demographic models used in simulatingT ajima'sD( Figure 3A and B) recapitulate the empirical distributions much more closely than datas imulated under as tandard neutral model. Secondly,o utlier loci can be identifiedw ith greater precision and accuracy withm ore realistic models of human demographic history. Specifically,t he best-fitting non-neutral distributions dramatically reduce the number of test statistics that area pparent outliersu nder neutrality.C onversely,s ome test statistics that do not appeart od eviate from neutrality are outliersu nder the best non-neutral distributions. Thus, in principle,e mpirical distributions of test statistics can be used both to reduce the false-positiver ate and to improve power. Although this general strategyh as recently been dubbed 'population genomics', thet heoretical foundation of searching for outlier loci to find targets of natural selection waso utlined decades ago. 46,47 In addition to providing empirical distributions, genomewide scans for natural selection offer several additional advantages compared with single-locus studies. Genome-wide scans can suggest general principles about the types of variationt hat natural selection acts most forcefully upon. Datasets derived from an unbiased sampling of loci throughout the genome allow for the discoveryo fn ovel functional elements whose presence is revealed by evidence for selection. Whole-genomes cans also have the potentialt or eveal networks of genes whose evolutionary histories arec orrelated due to their collaboration in executing cellular functions. Finally,i ti si mportant to stress that genome-wide analyses do not preclude single-locus analyses, and that achieving a detailed and thorough understanding of the selectivea nd demographic forces acting uponal ocus will necessitate focuseds ingle-locus analyses drawing from multiple scientific disciplines.

Genome scans for natural selection
Several genome-wide scans for natural selection have recently been performed and are summarised in Ta ble 2. These studies have used av ariety of different statisticala pproaches,d ata and populations, but are united by the common themeo f sampling al arge number of loci and making inferences of natural selection. Below, some of these studies will be considered in more detail, to highlightt he salientr esults emerging from genome-wide scans for selection.
One of the first genome-wide screens for selection to be performed analysed 26,530 singlen ucleotide polymorphisms (SNPs), which were genotyped in three human populations: African-Americans, East Asiansa nd European-Americans. 65 An empirical distribution of F ST wasc onstructed and outlier SNPs in gene regions were identified. As discussed above,g eographically restricted selection (local adaptation) can accentuate levels of populations tructure by creating large differences in allele frequencies between populations. Conversely,b alancings election can lead to lowert han expected levels of populationstructure. In total, 174 candidate selectiong enes were identifiedw hose levels of population structure were significantly different compared withn eutral expectations (156 genes had exceptionally high values of F ST and 18 had exceptionally lowv alues of F ST ). In addition, the average F ST wass ignificantly different between SNPs located in exons, introns and non-genic regions, which is consistent with the action of purifying selection. One limitation of this study wast hat it relied upon markerst hat were discovered in as malln umber of chromosomes, which can lead to significant ascertainmentb ias (ie in this case, an over-representation of intermediate-frequency alleles).S uch ascertainmentb ias complicates inferences of natural selection, and, as the authorsn ote,a dditionala nalyses aren eeded to confirmt he signature of selection in these genes.
Three genome-wide scans for natural selection have also been performed with microsatellite markers, 66 -68 the largest of which analysed 5,257 microsatellite markersi n 28 individuals of Europeand escent. 66 As liding window analysisa cross the genome revealed 43 bins that contained a significantr eduction in heterozygosity relative to neutral expectations. Interestingly,t he recombination rate in these4 3 bins wass ignificantly reduced compared with the genomewide average,w hich is consistentw ith theoretical predictions Ta ble 2. Summaryofgenome-wide scans for selection.

Data
Number that positives election will be easier to detect in regionso f the genome with lowr ecombination rates. 23 The other twom icrosatellite based genome-wide scans for selectioni ncluded multiple populations and searched for evidence of local adaptation by identifying outlier loci that exhibited large levelso fp opulation structure relative to the empirical distribution of all loci. Specifically,K ayser et al. 67 studied 332 microsatellite markersi n4 7E uropeans and 47 Africans (23 Ethiopians and 24 South Africans). Thet est statistics R ST ,amultiallelic analogue of F ST ,a nd ln RV,w hich is the natural log of the variance in allele sizes between populations, 69 were calculated for all loci. Numerous outlier loci were detected and 11 were studied further by genotyping additional microsatellitem arkersi nt hese regions. The additional microsatellitea nalyses confirmed the large differences in genetic differentiation, which strengthens the hypothesis that outlier loci have been targetsofg eographically restricted selectivep ressures. Similarly,S torz et al. 68 analysed a total of 624 microsatellite loci that were previously genotyped in multiple populations from Africa, Europe and Asia. Again, measures of population structure were calculated for all markers( F ST and an analogue to ln RV )a nd outlier loci were identified. In total, 13 outlier lociw ere found and all but one had significantr eductions in heterozygosity in non-African populations; this wasi nterpreted as evidence that local adaptation wasmore common outside of Africa. An important limitation of the microsatellite analyses is that the high mutation rate of microsatellites mayo bscure signatures of selection, except in low-recombining regionso ft he genome. 70,71 In one of the largest gene-based genome-wide screens performed to date,C lark et al. 62 analysed 7,645 orthologous genes from humans, chimpanzees and mice (see also Figure  3D). Maximum-likelihood models were fitted to proteincoding DNA sequences to estimate rates of synonymous( d s ) and non-synonymous (d n )s ubstitutions.I nt otal, 1,547 genes had d n /d s ratios . 1i nh umans, which is commonly interpreted as evidence for positives election, but the neutral model could be formallyr ejected at p , 0.05 for only six of these genes. Using an alternatives tatistical method with greater sensitivity,b ranch site models were fitted to the data in order to detect accelerated rates of d n /d s in the human lineage for as ubset of nucleotide sites( ie d n /d s does not have to be . 1f or the entire gene).Atotal of 667 genes were identifieda ss ignificant at p , 0.05 in this analysis; subsequentb ioinformatics analyses revealed twoi nteresting observations.F irst, accelerated rates of evolutionw ere found for several functional classes of genes,i ncluding olfactory, nuclear transporta nd sensoryp erception. Secondly,g enes with evidence for positives electionw ere enriched for genes that are associated with human diseases, as defined by the Online MendelianI nheritance of Man (OMIM) database. OMIM primarily contains monogenic disease genes with large phenotypic effects, and it will thereforeb ei nteresting to see if theser esults also extend to complex disease genes. Indeed, signatures of natural selection have been described for several genes associated with various complex diseases. 34,72 -78 If complex disease genes are enriched for signatures of natural selection, findingt argets of adaptivee volutionm ay be a useful strategyf or prioritising candidate genes in diseasemappings tudies.
It is important to note that ar ecent theoretical study has suggested that maximum-likelihood branchs ite models may have ah igh false-positiver ate 79 and, therefore,t he 667 significant( at p , 0.05)g enes in the study by Clark et al. 62 mayc ontain ah igher than anticipated fraction of false positives. In addition, increased rates of d n /d s alongal ineage do not alwaysi ndicate the action of positives election and can also occur due to relaxation of purifying selection. 79,80 As the authorsp oint out, obtaining polymorphism data from human populations would provide further insight into the evolutionaryhistoryofthesegenes and help to clarify someof the issues raised above.

Local adaptation
An interesting observation that has consistently emerged from large-scale studies of selection is that local adaptation maybeamore common feature of recent human evolutionary historyt han previously thought. 52 -59,60,63 -68 Human populations have clearly had dramatic range expansions during the past 100,000y earst hat, at least theoretically,m ay have led to geographically restrictedselectivepressures, such as unique dietary, pathogenic and climatic challenges. Several genes that possess patterns of genetic variationc onsistent with local adaptation have previously been reported (Table 3). 60,72 -74,76,81 -85 As an illustrativee xample,F igure 4 shows patterns of genetic variation for a1 15 kilobase region on chromosome7 q33t hat possesses as trikings ignature of local adaptation in European-American populations. 60 Tw oo f the genes in this region, TRPV5 and TRPV6,m ediate the rate-limiting step of dietaryc alcium absorption; 86,87 givent he fact that lactase persistencea nd related metabolic pathways were selected for in northernEuropean populations, 83 they are particularly strong candidates for the gene or genes driving this patterno fl ocal adaptation.
In addition, several studies have found that non-African populations possess more evidence for selection relative to African populations. 60,67,68 As most studies have considered only as ingleA frican population, however, it is difficult to determine whether the observedd ifferences in the frequency of selective events between African and non-African populations is ag eneral phenomenon or simply reflects the need to sample African populations more comprehensively. Furthermore,t heoretical studies have demonstrated that the powert od etect ar ecent selectives weep is greater compared with an older sweep. 41,42,88,89 Therefore, the frequency of selectivee ventsm ay be similar in African and non-African populations, butm ay be easier to detect in non-African populations if they occurred more recently.

Looking ahead: The HapMap project
The HapMap project (http://www.hapmap.org/) is al arge internationalc ollaboration to describe patterns of common haplotype variationt hroughout the human genome. 61 The initial goal of the HapMap project is to genotype 600,000 SNPs in 270 individuals: 90 individuals of northerna nd westernE uropean ancestry (30 trios consisting of twop arents and an adult child), 90 Yo ruban individuals from Ibadan, Nigeria (30 trios), 45 unrelated Japanese individuals from To kyo, Japan, and 45 unrelated Han Chinesei ndividuals from Beijing,C hina. Although theH apMap project wasi nitially developed to facilitate the search for complex disease genes, it will provide ap owerful resource for population genetics and evolutionarys tudies. Specifically,i tw ill provideaunifying publicly-available resource of genome-wide variation data to interrogate systematically for signatures of natural selection. As numerous evolutionarya nalyses will undoubtedly be Ta ble 3. Genes with evidence of local adaptation.

Gene
Potential selective pressureR eference  Rows correspond to individuals and columns denote ap articular single nucleotide polymorphism (SNP). Foreach SNP,b lue,r ed and yellow boxes indicate whether the individual is homozygous for the common allele,h eterozygous or homozygous for the rare allele, respectively.G reyb oxes indicate missing data. Notice the significant reduction in polymorphism in the European-American sample, which is consistent with the hypothesis that variation in one or more of these four genes conferred as elective advantage to European-Americans but not African-Americans. See Akey et al. 60 for more details. This figure was produced using genotype data from SeattleSNPs project (http://pga.gs.washington.edu/). conducted on the HapMap data, results can be verified across studies, which will allowp rioritising candidate selection genes for subsequent studies.

Futurec hallenges
It is important to temper our enthusiasm for genome-wide scans of natural selection becauses everal analytical and conceptual challenges remain. Forexample,asindicated above, thousands of hypothesis tests will be performed in at ypical study and it is necessaryt oc orrect for multiple tests to avoid an unacceptably high false-positiver ate.O ne particularly appealing approach is to control the false discovery rate, 90,91 which is more powerful than traditionalm ethods sucha s Bonferroni corrections and has been used in awide variety of genomics analyses. Furthermore,a sn umerous genome-wide scans for selection will be applied to common datasets, such as the HapMap,m ethods for combining results across studies would be invaluable. Acritical issue that has already arisen in current genomewide scans for selection is the need to verify the signature of selectionthrough replication studies and by alternative experimental approaches. The importance of follow-up studies cannot be overstated because in their absence we will simply be left with alistofinteresting 'candidate selection genes'. The problem of follow-up replication in genome-wide studies is a generalone that has been considered in linkage analysis 92 and genetic association studies. 93 Clearly,replication in independent samplesfromthe same population is an important criterion that can be used to discardfalse positives that accumulate from the multiple testing inherent in genome scans. Genome-wide study designs are known to suffer from the 'winner'scurse' phenomenon, however, wherebythe effect sizes of statisticallysignificant loci are systematically over-estimated. 93,94 If such concerns are ignored, the statisticalpower of subsequent replication attempts is likely to be over-estimated,leading the community to place undue faith in the veracity of failed replication attempts. Even if signatures of selection are confirmed, it remains difficult to identifythe specific variants that have been subject to selection. Ideally,suspected targets of selection will be functionally characterised, which will facilitateinferences on genotypephenotype correlations and ultimately on howthe putative selected alleles affect fitness. Finally,more powerful methods to estimate evolutionaryparameters, such as the timing of selective events and the strength of selection, need to be developed.
In additiont ot he issues described above,i ti si mportant to note that all of the statisticalm ethods and studies considered in this review are predicated upon simple theoretical models of natural selection. For example,t ests such as Ta jima's D search for signatures of selection that act on as ingle locus. Genes do not exist in isolation, however, and it is possibleperhaps even likely -that selection actso nc ombinationso f alleles,aprocess that is referred to as epistatic selection. 95 Recently,t wo studies in Drosophila melanogaster demonstrated strong empirical evidence for epistatic selection. 96,97 It seems likelyt hat that progress in reconstructing gene and protein networks will serve as avaluable guide in beginning to explore epistatic selectioni nh umans.

Conclusions
The intersection of high-throughput methods to access human genetic variation on ag enome-wide scale and statisticalt ools to identify signatures of natural selection will undoubtedly provide ad eeper understandingo fh ow adaptivep rocesses helped to shape our genomes. Furthermore,t he same resourcesu sed to scan the genomef or signatures of selection will also provideamore comprehensiveu nderstandingo f human demographic history, which will be necessaryt o understand hown eutral and non-neutral evolutionary forces have interacted to shape extant patterns of human genetic and phenotypic diversity.A lthough many hurdles are likely to be encountered, the evolutionary insights obtained from genome-wide analyses will have implications for many contemporary issues, such as thef unctional annotation of the human genome and the discovery of complex disease genes.