Skip to main content
Fig. 1 | Human Genomics

Fig. 1

From: Community data-driven approach to identify pathogenic founder variants for pan-ethnic carrier screening panels

Fig. 1

Pipeline for generation and application of a carrier screening panel. Arrows are used to indicate each processing step, and rectangles represent data generated after each step. Left panel: Ethnicity-specific cohort creation—initial dataset of 6242 exomes. a Quality control (QC) and related samples removal resulted in 3061 samples; b a machine learning model performed ethnicity/ancestry detection, as detailed in the methods section. The two largest ancestry groups were Ashkenazi Jewish (1013 samples) and Muslim Arabs (613 samples), as well as 11 additional inferred ethnicities with smaller numbers of samples. Middle panel: Prevalent PFV candidates—c intersection of the cohort variants with ClinVar and Franklin Community submissions resulted in 3847 reported P/LP variants. Variant frequencies were calculated per each ancestry for each of these P/LP variants; d in order to focus on novel PFVs only, variants present in existing carrier screening panels were removed. Removal of variants with a carrier frequency less than 1/200 in the Ashkenazi Jewish or Muslim Arab ancestries resulted in 195 candidate tier 2 or 3 variants (Additional file 1: Table S2). Right panel: Curation and novel PFV detection—e a semi-automated process to filter out variants with an overall gnomAD frequency > 0.5% or with three or more homozygous counts, or variants associated with mild conditions, resulted in 43 strong candidates for novel PVFs; f retrieval of real-world evidence from Franklin community members with homozygous samples and evidence from the literature resulted in eight novel curated PFVs

Back to article page