- Review
- Published:
Genome-wide scans for loci under selection in humans
Human Genomics volume 2, Article number: 113 (2005)
Abstract
Natural selection, which can be defined as the differential contribution of genetic variants to future generations, is the driving force of Darwinian evolution. Identifying regions of the human genome that have been targets of natural selection is an important step in clarifying human evolutionary history and understanding how genetic variation results in phenotypic diversity, it may also facilitate the search for complex disease genes. Technological advances in high-throughput DNA sequencing and single nucleotide polymorphism genotyping have enabled several genome-wide scans of natural selection to be undertaken. Here, some of the observations that are beginning to emerge from these studies will be reviewed, including evidence for geographically restricted selective pressures (ie local adaptation) and a relationship between genes subject to natural selection and human disease. In addition, the paper will highlight several important problems that need to be addressed in future genome-wide studies of natural selection.
Introduction
Phenotypic diversity is a ubiquitous characteristic of natural populations. Individuals vary in almost every conceivable way, including physical appearance, behaviour, disease susceptibility, ability to detoxify drugs and perception of environmental stimuli [1]. Although environmental forces undoubtedly contribute to phenotypic variation, so too does genetic variation. Therefore, explaining the evolutionary forces that create, maintain and shape patterns of human genetic variation is of fundamental importance in understanding phenotypic variation [2].
An important goal in studies of human genetic variation is to identify loci that have been targets of natural selection due to their variable effects on the fitness of individuals throughout a population's history. Signatures of natural selection delimit regions of the genome that are, or have been, functionally important. Therefore, identifying such regions will facilitate the identification of genetic variation that contributes to phenotypic variation and help to functionally annotate the genome. Unfortunately, inferring the action of natural selection remains a challenge. This is likely to change in the near future, as high-throughput methods for cataloguing genetic variation on a genome-wide scale and new statistical tools for detecting selection have been, and continue to be, developed.
Much important work has been done on genome scans for natural selection in model organisms such as Drosophila;[3–5] this review, however, will focus on studies performed in human populations. Firstly, there will be a summary of the effects of natural selection and population history on patterns of genetic variation and some of the common statistical methods used to test for deviations from neutrality will be presented. Next, a critical evaluation of several empirical genome-wide scans for selection will be presented. Finally, the paper will highlight several important problems, both practical and conceptual, that need to be addressed in future studies.
Human genetic variation: The neutral expectation
The evolutionary sojourn of a newly arisen mutation depends upon how it affects the fitness of the individual who possesses it. The neutral theory of molecular evolution posits that the vast majority of polymorphisms in a population are selectively neutral and have no appreciable effects on fitness [6, 7]. Under neutrality, changes in allele frequency are governed by the stochastic effects of genetic drift in populations of finite size. Thus, the effective population size, Ne, and neutral mutation rate, μo, determine levels of polymorphism within species and the rate of divergence between species [8]. In addition, the effect of mutations with small fitness effects can be rendered 'nearly neutral' if the product of Ne and s (which measures the strength of selection) is < 1 [9, 10]. For human populations, Ne is approximately 10,000 and therefore |s| must be greater than 10-4 to overcome the stochastic effects of genetic drift. Because the neutral theory makes explicit and quantitative predictions about expected patterns of genetic variation within and between species, it is an indispensable tool in studies of natural selection. Specifically, the neutral theory provides an essential foundation for evaluating the evidence either for or against selection in empirical data, as it serves as the null hypothesis when exploring alternative evolutionary models [11, 12].
In combination with the neutral theory, coalescent theory provides a powerful framework for conceptualising and making inferences about evolutionary forces. The coalescent is a stochastic model of gene geneaologies [13–17] and has emerged as the primary analytical tool in studies of genetic variation. In classical population genetics theory, the initial state of a population is defined and one observes the evolution of the entire population by looking forward in time. By contrast, the coalescent is a sample-driven theory that traces the history of coalescent events backwards in time (see Figure 1 for some basic properties of the coalescent). Several excellent and detailed reviews of coalescent theory can be found elsewhere [18, 19]. Deviations from the standard neutral model distort the branch lengths, topology and coalescent times of gene genealogies, as described below.
Evolutionary forces perturb patterns of genetic variation
Natural selection and population demographic history perturb patterns of genetic variation relative to what is expected under a standard neutral model (constant sized, randomly mating, panmictic population at mutation drift equilibrium). Below, the way in which selection and demographic history affect patterns of genetic variation will be considered, from a coalescent point of view.
Theoretical studies have investigated the evolutionary dynamics of genetic variation subject to a variety of selective pressures, including, purifying [20–22] positive [23–26] and balancing selection [27–30]. This paper will focus on positive and balancing selection, as these have been the primary types of selection that current genome-wide scans have studied. Positive selection acts to increase the frequency of advantageous alleles in a population. Strongly advantageous mutations are rapidly swept to fixation, hence the term 'selective sweep'. Importantly, through a process referred to as 'genetic hitchhiking', positive selection also affects patterns of neutral polymorphisms linked to an advantageous mutation [23]. Positive selection leads to a shallow star-like genealogy (Figure 2) with a decreased time to the most recent common ancestor. Tracing the history of alleles backwards in time, these effects are a direct consequence of the rapid coalescence of lineages in the small but expanding progenitor population [31]. The signature of positive selection includes reduced levels of genetic variation compared with neutral expectations, [24–26] a skew in the allele frequency spectrum towards low-frequency alleles [32] (including an excess of high-frequency derived alleles [33]) and elevated levels of linkage disequilibrium [34].
Balancing selection occurs when polymorphisms are selectively maintained in a population. By contrast with positive selection, the genealogy of a locus subject to balancing selection is characterised by an increased time to the most recent common ancestor and long internal branches (Figure 2). The effect of balancing selection on gene genealogies can be understood by considering balanced alleles as distinct subpopulations, such that coalescence events can occur rapidly within a subpopulation but slowly between subpopulations [27]. The signature of balancing selection includes elevated levels of polymorphism relative to neutral expectations and a skew of the allele frequency distribution towards an excess of intermediate frequency alleles [29, 30, 35].
In addition to natural selection, population demographic history can also have strong influences on patterns of genetic variation, which often mimic the effect of natural selection [36, 37]. In other words, inferences of natural selection are confounded by population demographic history. For example, both positive selection and increases in population size have similar effects on gene genealogies (Figure 2); both processes therefore lead to an excess of low-frequency alleles in a population. In fact, strong positive selection can be thought of as a rapid population expansion of an advantageous allele as it sweeps through a population. Similarly, population structure and balancing selection both result in subdivided genealogies and therefore both processes are expected to result in an excess of intermediate-frequency alleles in a population (Figure 2). Population bottlenecks can lead to an excess of either low- or intermediate-frequency alleles relative to neutral expectations, depending on the age and severity of the bottleneck. Figure 2 demonstrates the effect of a severe and recent bottleneck, which forces all lineages to coalesce at the time of the size reduction and results in a genealogy that is similar to positive selection. Human populations clearly do not meet all of the assumptions of the standard neutral model; hence, rejecting the standard neutral model for a particular locus cannot be interpreted as unambiguous evidence for selection.
Detecting the signature of natural selection
Before presenting the results from genome-wide scans for natural selection, there now follows a brief description of some commonly used statistical methods designed to detect departures from neutrality, highlighting some of their strengths and limitations. The following is not meant to be an exhaustive discussion of such tests, and descriptions of many interesting and useful methods will not be included here. For further study, the reader is encouraged to see an excellent review by Kreitman [38].
Statistical tests of neutrality can broadly be classified into three categories, based upon the type of data that they use: 1) within species tests; 2) within and between species tests; and 3) between species tests (Table 1). The most common class of within-species tests compares summary statistics of the observed allele frequency distribution at a locus with the values expected under neutral evolution; it includes Tajima's D, [39] Fu and Li's D and F, [40] Fay and Wu's H [33] and Fu's Fs [41]. An attractive feature of these tests is that they do not require any a priori classification of functional versus non-functional sites, thus making them equally suitable for protein and nonprotein coding regions. In a thorough examination of the power of Tajima's D and Fu and Li's D and F, Simonsen et al. [42] found that these tests can only detect selective sweeps in a narrow time interval in the recent past, and that they can only detect balancing selection if it has acted for a very long time period. Interestingly, Simonsen et al. also found that the power of these tests could drop below the nominal false-positive rate, α, if non-neutral evolution did not occur within these critical time windows, thus creating the undesirable scenario in which rejection of the null is more likely if the null is true than if it is false. Fu observed similar results, although he found that Fs was most powerful and performed better at detecting more ancient positive selection [41].
The site-frequency spectrum tests discussed above are confounded by demographic events such as population growth, bottlenecks and subdivision (Figure 2) and are rendered conservative by intra-locus recombination. The desire to estimate population demographic parameters, recombination rates and evolutionary parameters has prompted the development of maximum likelihood-based methods which use the complete data, rather than summary statistics [31, 43, 44]. These methods are computationally intensive and are not currently feasible for large datasets, but they potentially allow for substantial gains in statistical power relative to summary statistics methods and are likely to become increasingly important tools in the future (for a general discussion, see Felsenstein [45]).
Another within-species test that has been used to detect selection is to compare the variation in allele frequencies between populations, which can be quantified by the statistic FST. Under selective neutrality, FST is determined by genetic drift, whereas natural selection is a locus-specific force that can cause systematic deviations in FST values for a selected gene and nearby genetic markers. For example, geographically restricted directional selection may lead to an increase in FST of a selected locus, whereas balancing or species-wide directional selection may lead to a decrease in FST compared with neutrally evolving loci [46–50]. In a series of simulation experiments analysing two different FST test implementations, Beaumont and Balding found that this approach yielded sufficient power to detect positive selection provided that the selective coefficient was approximately five times larger than the migration rate, but that FST had little power to detect balancing selection [50].
Positive selection is also expected to increase levels of linkage disequilibrium (LD) relative to neutral expectations. Recently, a new statistical test was developed, the long-range haplotype (LRH) test, [33] which takes advantage of ancestral recombination events and the associated decay in LD to identify genes subject to positive selection. The rationale for this test is that a common allele with long-range LD potentially represents a site that has appeared recently and was driven to high frequency before recombination could erode LD. The LRH approach does not detect balancing selection, however, and the robustness of the test to non-neutral population demographics, the choice of haplotype defining markers and phase misspecification have not been well studied.
The second major class of neutrality tests compares levels of within-species polymorphism and between-species divergence and includes the Hudson - Kreitman - Aguade (HKA) [51] and McDonald - Kreitman (MK) [52] tests. The HKA method tests the goodness of fit of the observed levels of polymorphism within species and the observed divergence between species to those predicted under neutral theory. In order to determine polymorphism and divergence expectations under neutrality, data are required from at least two loci in each species, so that a simultaneous estimate can be made of a time-since-speciation parameter and a relative population size parameter. Under the HKA test, rejection of the null is formally interpreted as elevated polymorphism at one locus or reduced polymorphism at the other, or excess divergence at one locus or limited divergence at the other. Thus, it may not be obvious which locus or which process is responsible for producing a statistically significant test. McDonald [53] has described improvements to the HKA test which may ameliorate this problem.
In the MK test, a 2 × 2 contingency table is formed to compare the number of non-synonymous and synonymous sites that are polymorphic within a species (PN and PS) and fixed between species (DN and DS). Under neutrality, the ratio of non-synonymous to synonymous sites that are polymorphic equals the ratio of non-synonymous to synonymous sites that are fixed (ie PN/PS = DN/DS). Under positive selection, however, these two ratios are no longer equal and DN/DS > PN/PS [54]. Among the strengths of the MK test are that it does not require assumptions about population demographic history (although under some circumstances the test can be adversely affected by increases in effective population size [54]) and is relatively insensitive to intra-locus recombination. Positive or purifying selection for codon usage may, however, bias the MK test [38].
The final class of neutrality tests uses between-species data to test for adaptive protein evolution. The classic test of positive selection compares the number of non-synonymous amino acid substitutions in a gene (dn) with the number of synonymous amino acid substitutions (ds). Under neutrality, the mutation rate at both categories of sites is the same, and dn/ds is expected to equal one; however, dn/ds < 1 for proteins subject to purifying selection and dn/ds > 1 for proteins under adaptive evolution. Although dn/ds > 1 provides strong evidence for adaptive protein evolution, it is a very conservative test, particularly if only a small number of codons have been selected for. The basic test has also been extended by Nielsen and Yang [55] and others to include models of codon and transition/transversion bias, to detect variation in dn/ds ratios among lineages and to identify specific codons under selection [56, 57].
Key advantages of genome-wide analyses
As alluded to above, distinguishing between the confounding effects of natural selection and population demographic history is difficult when studying a single locus. When many unlinked genes are considered, however, a clear strategy emerges. Population demographic history affects patterns of variation at all loci in a genome in a similar manner, whereas natural selection acts upon specific loci [12, 37, 46, 58]. Therefore, by sampling a large number of unlinked loci throughout the genome, empirical distributions of test statistics can be constructed and genes subject to locus-specific forces, such as natural selection, can be identified as outlier loci.
To provide some examples of how genome-wide analyses can facilitate inferences of natural selection, Figure 3 shows empirical distributions of Tajima's D, FST and dn/ds, along with their theoretical distributions, simulated under both standard neutral models and alternative demographic histories. Figure 3 highlights two important points. First, empirical distributions provide important information that can be used to infer population demographic history. For example, the demographic models used in simulating Tajima's D (Figure 3A and 3B) recapitulate the empirical distributions much more closely than data simulated under a standard neutral model. Secondly, outlier loci can be identified with greater precision and accuracy with more realistic models of human demographic history. Specifically, the best-fitting non-neutral distributions dramatically reduce the number of test statistics that are apparent outliers under neutrality. Conversely, some test statistics that do not appear to deviate from neutrality are outliers under the best non-neutral distributions. Thus, in principle, empirical distributions of test statistics can be used both to reduce the false-positive rate and to improve power. Although this general strategy has recently been dubbed 'population genomics', the theoretical foundation of searching for outlier loci to find targets of natural selection was outlined decades ago [46, 47].
In addition to providing empirical distributions, genomewide scans for natural selection offer several additional advantages compared with single-locus studies. Genome-wide scans can suggest general principles about the types of variation that natural selection acts most forcefully upon. Datasets derived from an unbiased sampling of loci throughout the genome allow for the discovery of novel functional elements whose presence is revealed by evidence for selection. Whole-genome scans also have the potential to reveal networks of genes whose evolutionary histories are correlated due to their collaboration in executing cellular functions. Finally, it is important to stress that genome-wide analyses do not preclude single-locus analyses, and that achieving a detailed and thorough understanding of the selective and demographic forces acting upon a locus will necessitate focused single-locus analyses drawing from multiple scientific disciplines.
Genome scans for natural selection
Several genome-wide scans for natural selection have recently been performed and are summarised in Table 2. These studies have used a variety of different statistical approaches, data and populations, but are united by the common theme of sampling a large number of loci and making inferences of natural selection. Below, some of these studies will be considered in more detail, to highlight the salient results emerging from genome-wide scans for selection.
One of the first genome-wide screens for selection to be performed analysed 26,530 single nucleotide polymorphisms (SNPs), which were genotyped in three human populations: African-Americans, East Asians and European-Americans [65]. An empirical distribution of FST was constructed and outlier SNPs in gene regions were identified. As discussed above, geographically restricted selection (local adaptation) can accentuate levels of population structure by creating large differences in allele frequencies between populations. Conversely, balancing selection can lead to lower than expected levels of population structure. In total, 174 candidate selection genes were identified whose levels of population structure were significantly different compared with neutral expectations (156 genes had exceptionally high values of FST and 18 had exceptionally low values of FST). In addition, the average FST was significantly different between SNPs located in exons, introns and non-genic regions, which is consistent with the action of purifying selection. One limitation of this study was that it relied upon markers that were discovered in a small number of chromosomes, which can lead to significant ascertainment bias (ie in this case, an over-representation of intermediate-frequency alleles). Such ascertainment bias complicates inferences of natural selection, and, as the authors note, additional analyses are needed to confirm the signature of selection in these genes.
Three genome-wide scans for natural selection have also been performed with microsatellite markers, [66–68] the largest of which analysed 5,257 microsatellite markers in 28 individuals of European descent [66]. A sliding window analysis across the genome revealed 43 bins that contained a significant reduction in heterozygosity relative to neutral expectations. Interestingly, the recombination rate in these 43 bins was significantly reduced compared with the genomewide average, which is consistent with theoretical predictions that positive selection will be easier to detect in regions of the genome with low recombination rates [23].
The other two microsatellite based genome-wide scans for selection included multiple populations and searched for evidence of local adaptation by identifying outlier loci that exhibited large levels of population structure relative to the empirical distribution of all loci. Specifically, Kayser et al. [67] studied 332 microsatellite markers in 47 Europeans and 47 Africans (23 Ethiopians and 24 South Africans). The test statistics RST, a multiallelic analogue of FST, and ln RV, which is the natural log of the variance in allele sizes between populations, [69] were calculated for all loci. Numerous outlier loci were detected and 11 were studied further by genotyping additional microsatellite markers in these regions. The additional microsatellite analyses confirmed the large differences in genetic differentiation, which strengthens the hypothesis that outlier loci have been targets of geographically restricted selective pressures. Similarly, Storz et al. [68] analysed a total of 624 microsatellite loci that were previously genotyped in multiple populations from Africa, Europe and Asia. Again, measures of population structure were calculated for all markers (FST and an analogue to ln RV) and outlier loci were identified. In total, 13 outlier loci were found and all but one had significant reductions in heterozygosity in non-African populations; this was interpreted as evidence that local adaptation was more common outside of Africa. An important limitation of the microsatellite analyses is that the high mutation rate of microsatellites may obscure signatures of selection, except in low-recombining regions of the genome [70, 71].
In one of the largest gene-based genome-wide screens performed to date, Clark et al. [62] analysed 7,645 orthologous genes from humans, chimpanzees and mice (see also Figure 3D). Maximum-likelihood models were fitted to proteincoding DNA sequences to estimate rates of synonymous (ds) and non-synonymous (dn) substitutions. In total, 1,547 genes had dn/ds ratios > 1 in humans, which is commonly interpreted as evidence for positive selection, but the neutral model could be formally rejected at p < 0.05 for only six of these genes. Using an alternative statistical method with greater sensitivity, branch site models were fitted to the data in order to detect accelerated rates of dn/ds in the human lineage for a subset of nucleotide sites (ie dn/ds does not have to be > 1 for the entire gene). A total of 667 genes were identified as significant at p < 0.05 in this analysis; subsequent bioinformatics analyses revealed two interesting observations. First, accelerated rates of evolution were found for several functional classes of genes, including olfactory, nuclear transport and sensory perception. Secondly, genes with evidence for positive selection were enriched for genes that are associated with human diseases, as defined by the Online Mendelian Inheritance of Man (OMIM) database. OMIM primarily contains monogenic disease genes with large phenotypic effects, and it will therefore be interesting to see if these results also extend to complex disease genes. Indeed, signatures of natural selection have been described for several genes associated with various complex diseases [34, 72–78]. If complex disease genes are enriched for signatures of natural selection, finding targets of adaptive evolution may be a useful strategy for prioritising candidate genes in diseasemapping studies.
It is important to note that a recent theoretical study has suggested that maximum-likelihood branch site models may have a high false-positive rate [79] and, therefore, the 667 significant (at p < 0.05) genes in the study by Clark et al. [62] may contain a higher than anticipated fraction of false positives. In addition, increased rates of dn/ds along a lineage do not always indicate the action of positive selection and can also occur due to relaxation of purifying selection [79, 80]. As the authors point out, obtaining polymorphism data from human populations would provide further insight into the evolutionary history of these genes and help to clarify some of the issues raised above.
Local adaptation
An interesting observation that has consistently emerged from large-scale studies of selection is that local adaptation may be a more common feature of recent human evolutionary history than previously thought [52–60, 63–68]. Human populations have clearly had dramatic range expansions during the past 100,000 years that, at least theoretically, may have led to geographically restricted selective pressures, such as unique dietary, pathogenic and climatic challenges. Several genes that possess patterns of genetic variation consistent with local adaptation have previously been reported (Table 3) [60, 72–74, 76, 81–85]. As an illustrative example, Figure 4 shows patterns of genetic variation for a 115 kilobase region on chromosome 7q33 that possesses a striking signature of local adaptation in European-American populations [60]. Two of the genes in this region, TRPV5 and TRPV6, mediate the rate-limiting step of dietary calcium absorption; [86, 87] given the fact that lactase persistence and related metabolic pathways were selected for in northern European populations, [83] they are particularly strong candidates for the gene or genes driving this pattern of local adaptation.
In addition, several studies have found that non-African populations possess more evidence for selection relative to African populations [60, 67, 68]. As most studies have considered only a single African population, however, it is difficult to determine whether the observed differences in the frequency of selective events between African and non-African populations is a general phenomenon or simply reflects the need to sample African populations more comprehensively. Furthermore, theoretical studies have demonstrated that the power to detect a recent selective sweep is greater compared with an older sweep [41, 42, 88, 89]. Therefore, the frequency of selective events may be similar in African and non-African populations, but may be easier to detect in non-African populations if they occurred more recently.
Looking ahead: The HapMap project
The HapMap project (http://www.hapmap.org/) is a large international collaboration to describe patterns of common haplotype variation throughout the human genome [61]. The initial goal of the HapMap project is to genotype 600,000 SNPs in 270 individuals: 90 individuals of northern and western European ancestry (30 trios consisting of two parents and an adult child), 90 Yoruban individuals from Ibadan, Nigeria (30 trios), 45 unrelated Japanese individuals from Tokyo, Japan, and 45 unrelated Han Chinese individuals from Beijing, China. Although the HapMap project was initially developed to facilitate the search for complex disease genes, it will provide a powerful resource for population genetics and evolutionary studies. Specifically, it will provide a unifying publicly-available resource of genome-wide variation data to interrogate systematically for signatures of natural selection. As numerous evolutionary analyses will undoubtedly be conducted on the HapMap data, results can be verified across studies, which will allow prioritising candidate selection genes for subsequent studies.
Future challenges
It is important to temper our enthusiasm for genome-wide scans of natural selection because several analytical and conceptual challenges remain. For example, as indicated above, thousands of hypothesis tests will be performed in a typical study and it is necessary to correct for multiple tests to avoid an unacceptably high false-positive rate. One particularly appealing approach is to control the false discovery rate, [90, 91] which is more powerful than traditional methods such as Bonferroni corrections and has been used in a wide variety of genomics analyses. Furthermore, as numerous genome-wide scans for selection will be applied to common datasets, such as the HapMap, methods for combining results across studies would be invaluable.
A critical issue that has already arisen in current genomewide scans for selection is the need to verify the signature of selection through replication studies and by alternative experimental approaches. The importance of follow-up studies cannot be overstated because in their absence we will simply be left with a list of interesting 'candidate selection genes'. The problem of follow-up replication in genome-wide studies is a general one that has been considered in linkage analysis [92] and genetic association studies [93]. Clearly, replication in independent samples from the same population is an important criterion that can be used to discard false positives that accumulate from the multiple testing inherent in genome scans. Genome-wide study designs are known to suffer from the 'winner's curse' phenomenon, however, whereby the effect sizes of statistically significant loci are systematically over-estimated [93, 94]. If such concerns are ignored, the statistical power of subsequent replication attempts is likely to be over-estimated, leading the community to place undue faith in the veracity of failed replication attempts. Even if signatures of selection are confirmed, it remains difficult to identify the specific variants that have been subject to selection. Ideally, suspected targets of selection will be functionally characterised, which will facilitate inferences on genotype - phenotype correlations and ultimately on how the putative selected alleles affect fitness. Finally, more powerful methods to estimate evolutionary parameters, such as the timing of selective events and the strength of selection, need to be developed.
In addition to the issues described above, it is important to note that all of the statistical methods and studies considered in this review are predicated upon simple theoretical models of natural selection. For example, tests such as Tajima's D search for signatures of selection that act on a single locus. Genes do not exist in isolation, however, and it is possible -- perhaps even likely -- that selection acts on combinations of alleles, a process that is referred to as epistatic selection [95]. Recently, two studies in Drosophila melanogaster demonstrated strong empirical evidence for epistatic selection [96, 97]. It seems likely that that progress in reconstructing gene and protein networks will serve as a valuable guide in beginning to explore epistatic selection in humans.
Conclusions
The intersection of high-throughput methods to access human genetic variation on a genome-wide scale and statistical tools to identify signatures of natural selection will undoubtedly provide a deeper understanding of how adaptive processes helped to shape our genomes. Furthermore, the same resources used to scan the genome for signatures of selection will also provide a more comprehensive understanding of human demographic history, which will be necessary to understand how neutral and non-neutral evolutionary forces have interacted to shape extant patterns of human genetic and phenotypic diversity. Although many hurdles are likely to be encountered, the evolutionary insights obtained from genome-wide analyses will have implications for many contemporary issues, such as the functional annotation of the human genome and the discovery of complex disease genes.
References
Valle D: Genetics, individuality, and medicine in the 21st century. Am J Hum Genet. 2004, 74: 374-381. 10.1086/382790.
Bamshad M, Wooding SP: Signatures of natural selection in the human genome. Nat Rev Genet. 2003, 4: 99-111. 10.1038/nrg999.
Harr B, Kauer M, Schlotterer C: Hitchhiking mapping: A population-based fine mapping strategy for adaptive mutations in Drosophila melanogaster. Proc Natl Acad Sci USA. 2002, 99: 12949-12954. 10.1073/pnas.202336899.
Kauer MO, Dieringer D, Schlotterer C: A microsatellite variability screen for positive selection associated with the "Out of Africa" habitat expansion of Drosophila melanogaster. Genetics. 2003, 165: 1137-1148.
Schofl G, Schlotterer C: Patterns of microsatellite variability among X chromosomes and autosomes indicate a high frequency of beneficial mutations in non-African D. simulans. Mol Biol Evol. 2004, 21: 1384-1390. 10.1093/molbev/msh132.
Kimura M: Evolutionary rate at the molecular level. Nature. 1968, 217: 624-626. 10.1038/217624a0.
King JL, Jukes TH: Non-Darwinian evolution. Science. 1969, 164: 788-798. 10.1126/science.164.3881.788.
Kimura M: The Neutral Theory of Molecular Evolution. 1983, Cambridge University Press, Cambridge, UK
Ohta T: Slightly deleterious mutant substitutions in evolution. Nature. 1973, 246: 96-98. 10.1038/246096a0.
Ohta T, Gillespie JH: Development of neutral and nearly neutral theories. Theor Popul Biol. 1996, 49: 128-142. 10.1006/tpbi.1996.0007.
Otto SP: Detecting the form of selection from DNA sequence data. Trends Genet. 2000, 16: 526-529. 10.1016/S0168-9525(00)02141-7.
Nielsen R: Statistical tests of selective neutrality in the age of genomics. Heredity. 2001, 86: 641-647. 10.1046/j.1365-2540.2001.00895.x.
Kingman JFC: The coalescent. Stochastic Process Appl. 1982, 13: 235-248. 10.1016/0304-4149(82)90011-4.
Kingman JFC: On the genealogy of large populations. J Appl Prob. 1982, 19A: 27-43.
Hudson RR: Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983, 23: 183-201. 10.1016/0040-5809(83)90013-8.
Hudson RR: Testing the constant-rate neutral allele model with protein sequence data. Evolution. 1983, 37: 203-217. 10.2307/2408186.
Tajima F: Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983, 105: 437-460.
Fu YX, Li WH: Coalescing into the 21st century: An overview and prospects of coalescent theory. Theor Popul Biol. 1999, 56: 1-10. 10.1006/tpbi.1999.1421.
Rosenberg NA, Nordborg M: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet. 2002, 3: 380-390. 10.1038/nrg795.
Charlesworth B, Morgan MT, Charlesworth D: The effect of deleterious mutations on neutral molecular variation. Genetics. 1993, 134: 1289-1303.
Hudson RR, Kaplan NL: Deleterious background selection with recombination. Genetics. 1995, 141: 1605-1617.
Neuhauser C, Krone SK: The genealogy of samples in models with selection. Genetics. 1997, 145: 519-534.
Maynard Smith J, Haigh J: The hitch-hiking effect of a favorable gene. Genet Res. 1974, 231: 1114-1116.
Thomson G: The effect of a selected locus on a linked neutral locus. Genetics. 1977, 85: 752-788.
Kaplan N, Hudson RR, Langley CH: The "hitchhiking effect" revisited. Genetics. 1989, 123: 887-899.
Stephan W, Wiehe THE, Lenz MW: The effect of strongly selected substitutions on neutral polymorphism: Analytical results based on diffusion theory. Theor Popul Biol. 1992, 41: 237-254. 10.1016/0040-5809(92)90045-U.
Nordborg M: Structured coalescent processes on different time scales. Genetics. 1997, 146: 1501-1514.
Schierup MH, Vekemans X, Charlesworth D: The effect of subdivision on variation at multi-allelic loci under balancing selection. Genet Res. 2000, 76: 51-62. 10.1017/S0016672300004535.
Kelly JK, Wade MJ: Molecular evolution near a two-locus balanced polymorphism. J Theor Biol. 2000, 204: 83-101. 10.1006/jtbi.2000.2003.
Nordborg M, Innan H: The genealogy of sequences containing multiple sites subject to strong selection in a subdivided population. Genetics. 2003, 163: 1201-1213.
Neilsen R: Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics. 2000, 154: 931-942.
Braverman JM, Hudson RR, Kaplan NL, et al: The hitchhiking effect on the site frequency spectrum of DNA polymorphism. Genetics. 1995, 140: 783-796.
Fay JC, Wu CI: Hitchhiking under positive Darwinian selection. Genetics. 2000, 155: 1405-1413.
Sabeti PC, Reich DE, Higgins JM, et al: Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002, 419: 832-837. 10.1038/nature01140.
Takahata N, Nei M: Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics. 1990, 124: 967-978.
Tajima F: The effect of change in population size on DNA polymorphism. Genetics. 1989, 123: 597-601.
Przeworski M, Hudson RR, Di Rienzo A: Adjusting the focus on human variation. Trends Genet. 2000, 16: 296-302. 10.1016/S0168-9525(00)02030-8.
Kreitman M: Methods to detect selection in populations with applications to the human. Annu Rev Genomics Hum Genet. 2000, 1: 539-559. 10.1146/annurev.genom.1.1.539.
Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.
Fu YX, Li WH: Statistical test of neutrality of mutations. Genetics. 1993, 133: 693-709.
Fu YX: Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics. 1997, 186: 1997-2004.
Simonsen KL, Churchill GA, Aquadro CF: Properties of statistical tests of neutrality for DNA polymorphism data. Genetics. 1995, 141: 413-429.
Kuhner MK, Yamato J, Felsenstein J: Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics. 1995, 140: 1421-1430.
Kuhner MK, Yamato J, Felsenstein J: Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 1998, 149: 429-434.
Felsenstein J: Likelihood calculations on coalescents. Inferring Phylogenies. Edited by: Felsenstein J. 2004, Sinauer Associates, Sunderland, MA, 470-487.
Cavalli-Sforza LL: Population structure and human evolution. Proc R Soc Lond B Biol Sci. 1966, 164: 362-379. 10.1098/rspb.1966.0038.
Lewontin RC, Krakauer J: Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics. 1973, 74: 175-195.
Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution. 1984, 38: 1358-1370. 10.2307/2408641.
Vitalis R, Dawson K, Boursot P: Interpretation of variation across marker loci as evidence of selection. Genetics. 2001, 158: 1811-1823.
Beaumont MA, Balding DJ: Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol. 2004, 13: 969-980. 10.1111/j.1365-294X.2004.02125.x.
Hudson RR, Kreitman M, Aguade M: A test of neutral molecular evolution based on nucleotide data. Genetics. 1987, 116: 153-159.
McDonald JH, Kreitman M: Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991, 351: 652-654. 10.1038/351652a0.
McDonald JH: Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol Biol Evol. 1998, 15: 377-384. 10.1093/oxfordjournals.molbev.a025934.
Eyre-Walker A: Changing effective population size and the McDonald-Kreitman test. Genetics. 2002, 162: 2017-2024.
Nielsen R, Yang Z: Likelihood models for detecting positive selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998, 148: 929-936.
Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.
Suzuki Y, Gojobori T: A method for detecting positive selection at single amino acid sites. Mol Biol Evol. 1999, 16: 1315-1328. 10.1093/oxfordjournals.molbev.a026042.
Andolfatto P: Adaptive hitchhiking effects on genome variability. Curr Opin Genet Dev. 2001, 11: 635-641. 10.1016/S0959-437X(00)00246-X.
Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002, 18: 337-338. 10.1093/bioinformatics/18.2.337.
Akey JM, Eberle MA, Rieder MJ, et al: Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2004, 2: 1591-1599.
International HapMap Consortium: The international HapMap project. Nature. 2003, 426: 789-794. 10.1038/nature02168.
Clark AG, Glanowski S, Nielsen R, et al: Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science. 2003, 302: 1960-1963. 10.1126/science.1088821.
Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43. 10.1093/oxfordjournals.molbev.a026236.
Yang Z: PAML: A program package for phylogenetic analysis by maximum likelihood. Comput Appl BioSci. 1997, 13: 555-556.
Akey JM, Zhang G, Zhang K, et al: Interrogating a highdensity SNP map for signatures of natural selection. Genome Res. 2002, 12: 1805-1814. 10.1101/gr.631202.
Payseur BA, Cutter AD, Nachman MW: Searching for evidence of positive selection in the human genome using patterns of microsatellite variability. Mol Biol Evol. 2002, 19: 1143-1153. 10.1093/oxfordjournals.molbev.a004172.
Kayser M, Brauer S, Stoneking M: A genome scan to detect candidate regions influenced by local natural selection in human populations. Mol Biol Evol. 2003, 20: 893-900. 10.1093/molbev/msg092.
Storz JF, Payseur BA, Nachman MW: Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. Mol Biol Evol. 2004, 21: 1800-1811. 10.1093/molbev/msh192.
Schlötterer C: A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics. 2002, 160: 753-763.
Schlötterer C, Wiehe T: Microsatellites, a neutral marker to infer selective sweeps. Microsatellites -- Evolution and Applications. Edited by: Goldstein D, Schlötterer C. 1999, Oxford University Press, Oxford, UK, 238-248.
Wiehe T: The effect of selective sweeps on the variance of the allele distribution of a linked multi-allele locus-hitchhiking of microsatellites. Theor Popul Biol. 1998, 53: 272-283. 10.1006/tpbi.1997.1346.
Hamblin MT, Di Rienzo A: Detection of the signature of natural selection in humans: Evidence from the Duffy blood group locus. Am J Hum Genet. 2000, 66: 1669-1679. 10.1086/302879.
Tishkoff SA, Varkonyi R, Cahinhinan N, et al: Haplotype diversity and linkage disequilibrium at human G6PD: Recent origin of alleles that confer malarial resistance. Science. 2001, 293: 455-462. 10.1126/science.1061573.
Hamblin MT, Thompson EE, Di Rienzo A: Complex signatures of natural selection at the Duffy blood group locus. Am J Hum Genet. 2002, 70: 369-383. 10.1086/338628.
Bamshad MJ, Mummidi S, Gonzalez E, et al: A strong signature of balancing selection in the 50 cis-regulatory region of CCR5. Proc Natl Acad Sci USA. 2002, 99: 10539-10544. 10.1073/pnas.162046399.
Fullerton SM, Bartoszewicz A, Ybazeta G, et al: Geographic and haplotype structure of candidate type 2 diabetes susceptibility variants at the calpain-10 locus. Am J Hum Genet. 2002, 70: 1096-1106. 10.1086/339930.
Rockman MV, Hahn MW, Soranzo N, et al: Positive selection on MMP3 regulation has shaped heart disease risk. Curr Biol. 2004, 14: 1531-1539. 10.1016/j.cub.2004.08.051.
Nakajima T, Wodding S, Sakagami T, et al: Natural selection and population history in the human angiotensinogen gene (AGT): 736 complete ATG sequences in chromosomes from around the world. Am J Hum Genet. 2004, 74: 898-916. 10.1086/420793.
Zhang J: Frequent false detection of positive selection by the likelihood method with branch-site models. Mol Biol Evol. 2004, 21: 1332-1339. 10.1093/molbev/msh117.
Rooney AP, Zhang J: Rapid evolution of a primate sperm protein: Relaxation of functional constraint or positive Darwinian selection?. Mol Biol Evol. 1999, 16: 706-710. 10.1093/oxfordjournals.molbev.a026153.
Gilad Y, Rosenberg S, Przeworski M, et al: Evidence for positive selection and population structure at the human MAO-A gene. Proc Natl Acad Sci USA. 2002, 99: 862-867. 10.1073/pnas.022614799.
Rana BK, Hewett-Emmett D, Jin L, et al: High polymorphism at the human melanocortin 1 receptor locus. Genetics. 1999, 151: 1547-1557.
Bersaglieri T, Sabeti PC, Patterson N, et al: Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet. 2004, 74: 1111-1120. 10.1086/421051.
Stephens JC, Reich DE, Goldstein DB, et al: Dating the origin of the CCR5-Delta32 AIDS-resistance allele by the coalescence of haplotypes. Am J Hum Genet. 1998, 62: 1507-1515. 10.1086/301867.
Rockman MV, Hahn MW, Soranzo N, et al: Positive selection on a human-specific transcription factor binding site regulating IL4 expression. Curr Biol. 2003, 13: 2118-2123. 10.1016/j.cub.2003.11.025.
Nijenhuis T, Hoenderop JGJ, Nilius B, Bindels RJM: (Patho)physiological implications of the novel epithelial Ca2þ channels TRPV5 and TRPV6. Pflugers Arch. 2003, 446: 401-409. 10.1007/s00424-003-1038-7.
van de Graaf SF, Hoenderop JG, Gkika D, et al: Functional expression of the epithelial Ca2+ channels (TRPV5 and TRPV6) requires association of the S100A10-annexin 2 complex. EMBO J. 2003, 22: 1478-1487. 10.1093/emboj/cdg162.
Kim Y, Stephan W: Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics. 2000, 155: 1415-1427.
Przeworski M: The signature of positive selection at randomly chosen loci. Genetics. 2002, 160: 1179-1189.
Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. JR Stat Soc. 1995, 57: 289-300.
Storey JD, Tibshirani R: Statistical significance for genome-wide experiments. Proc Nat Acad Sci USA. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.
Lander E, Kruglyak L: Genetic dissection of complex traits: Guidelines for interpreting and reporting linkage results. Nat Genet. 1995, 11: 241-247. 10.1038/ng1195-241.
Lohmueller KE, Pearce CL, Pike M, et al: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet. 2003, 33: 177-182. 10.1038/ng1071.
Goring HH, Terwilliger JD, Blangero J: Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet. 2001, 69: 1357-1369. 10.1086/324471.
Lewontin RC, Kojima K: The evolutionary dynamics of complex polymorphisms. Evolution. 1960, 14: 458-472. 10.2307/2405995.
Takano-Shimizu T, Kawabe A, Inomata N, et al: Interlocus nonrandom association of polymorphisms in Drosophila chemoreceptor genes. Proc Natl Acad Sci USA. 2004, 101: 14156-14161. 10.1073/pnas.0401782101.
Zapata C, Nunez C, Velasco T: Distribution of nonrandom associations between pairs of protein loci along the third chromosome of Drosophila melanogaster. Genetics. 2002, 161: 1539-1550.
Acknowledgements
We thank Jennifer Madeoy, Dayna Akey and an anonymous reviewer for critical reading of the manuscript and providing valuable comments. J.R. is supported by the University of Washington Medical Scientist Training Program. J.M.A. is supported by a Pilot and Feasibility Award from the Clinical Nutrition Research Unit at the University of Washington.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ronald, J., Akey, J.M. Genome-wide scans for loci under selection in humans. Hum Genomics 2, 113 (2005). https://doi.org/10.1186/1479-7364-2-2-113
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1479-7364-2-2-113