Skip to main content
Figure 3 | Human Genomics

Figure 3

From: Genome-wide scans for loci under selection in humans

Figure 3

Empirical distributions of some commonly used test statistics and their theoretical distributions. (A) Distribution of Tajima's D in 201 genes in an African-American sample. The empirical distribution is denoted with bars. The solid line indicates the distribution of Tajima's D simulated under a standard neutral model with recombination using the ms program [59]. The dashed line indicates the distribution of Tajima's D simulated under the best fitting population demographic model from Akey et al. [60] (exponential expansion starting 50,000 years ago with a growth rate of 10-3 per generation). (B) Distribution of Tajima's D from the same 201 genes in a European-American sample. The best-fitting population demographic model is a bottleneck beginning 40,000 years ago with an inbreeding coefficient of 0.175 [60]. Data used to calculate Tajima's D in panels (A) and (B) was obtained from the SeattleSNPs project (http://pga.gs.washington.edu/). (C) The empirical distribution of FST for 5,590 chromosome 7 single nucleotide polymorphisms (SNPs) obtained from the HapMap project.61 The theoretical distributions of FST were simulated using ms [59]. An island migration model was assumed, with a constant migration parameter between each pair of populations. The solid line shows the expected distribution of FST under neutrality, whereas the dashed line shows the expected distribution under neutrality with an ascertainment bias favouring common SNPs. To approximate the ascertainment bias in the HapMap data, a 'double-hit' SNP discovery strategy was modelled [61] by randomly selecting four simulated chromosomes and only analysing SNPs where each allele was observed twice. The migration parameter was the same for both distributions and was chosen such that the mean of the biased FST distribution (0.138) closely matched the mean observed FST. (D) The empirical distribution of dn/ds from Clark et al. [62] Only those genes with dn > 0.001 and ds > 0.001 are shown. The solid line shows the distribution of dn/ds estimated using the method of Yang and Nielsen [63] for neutrally evolving coding sequences simulated with the PAML program [64]. The dashed line shows the distribution of dn/ds for sequences under negative selection, with the magnitude of the selective force chosen such that the mean log10 dn/ds (21.25) matched the mean of the Clark et al. [62] distribution. The length of the simulated sequences (450 codons) was chosen to match the mean length of sequences from Clark et al. [62].

Back to article page