Skip to main content
Figure A1 | Human Genomics

Figure A1

From: Inference of ancestry: constructing hierarchical reference populations and assigning unknown individuals

Figure A1

Example of modes detected for a subset of the total dataset, n = 10 runs at K = 4. Three modes were explored by the structure algorithm in this dataset. The Pr(K) values for runs r 1 to r n and an n × n matrix of similarity scores S for the set of runs is calculated. A symbol (specified in the key) is assigned to each run r and plotted on the x-axis at the - log10(Pr[K]) value for runs r 1 to r n , versus S representing the level of similarity calculated between r and r 1 to r n . This method expands upon previous systems for selecting a repeatable K in the sample space. Rosenberg et al. [6] also generated an n × n matrix of similarity values and took the mean of all values in the matrix to assess levels of similarity for runs generated at various K. This method was found to be restrictive, in that certain K values were rejected due to a single disparate run that skewed the overall similarity below the selected similarity threshold (0.97) for acceptance of the K. For the n × n matrix generated for this dataset, seven of the ten runs produced a similarity score of S = 1 relative to one another. The remaining three solutions were ≤ 80 per cent similar to the other seven, and had a much lower likelihood. Screening K based on the mean of all S scores (0.868 versus the minimum defined S threshold 0.97) would disqualify this K from consideration as a viable representation of the structure of the dataset. Because a major mode of high probability and high similarity can be identified within the dataset, this illustrates the selection of a statistically likely and empirically stable substructure of the dataset.

Back to article page