Skip to main content

Advertisement

A genome-wide association study of mitochondrial DNA copy number in two population-based cohorts

Article metrics

Abstract

Background

Mitochondrial DNA copy number (mtDNA CN) exhibits interindividual and intercellular variation, but few genome-wide association studies (GWAS) of directly assayed mtDNA CN exist.

We undertook a GWAS of qPCR-assayed mtDNA CN in the Avon Longitudinal Study of Parents and Children (ALSPAC) and the UK Blood Service (UKBS) cohort. After validating and harmonising data, 5461 ALSPAC mothers (16–43 years at mtDNA CN assay) and 1338 UKBS females (17–69 years) were included in a meta-analysis. Sensitivity analyses restricted to females with white cell-extracted DNA and adjusted for estimated or assayed cell proportions. Associations were also explored in ALSPAC children and UKBS males.

Results

A neutrophil-associated locus approached genome-wide significance (rs709591 [MED24], β (change in SD units of mtDNA CN per allele) [SE] − 0.084 [0.016], p = 1.54e−07) in the main meta-analysis of adult females. This association was concordant in magnitude and direction in UKBS males and ALSPAC neonates. SNPs in and around ABHD8 were associated with mtDNA CN in ALSPAC neonates (rs10424198, β [SE] 0.262 [0.034], p = 1.40e−14), but not other study groups. In a meta-analysis of unrelated individuals (N = 11,253), we replicated a published association in TFAM (β [SE] 0.046 [0.017], p = 0.006), with an effect size much smaller than that observed in the replication analysis of a previous in silico GWAS.

Conclusions

In a hypothesis-generating GWAS, we confirm an association between TFAM and mtDNA CN and present putative loci requiring replication in much larger samples. We discuss the limitations of our work, in terms of measurement error and cellular heterogeneity, and highlight the need for larger studies to better understand nuclear genomic control of mtDNA copy number.

Introduction

Mitochondria are the cellular organelles responsible for producing adenosine triphosphate (ATP), a ubiquitous substrate required for metabolism. ATP is the final product of the series of redox reactions that are facilitated by the complexes of the respiratory chain (RC), located on the cristae, the folded inner membrane of mitochondria.

Mitochondria possess their own genome (mtDNA), an extra-nuclear, double-stranded, circular DNA molecule of ~ 16.6 kb that is inherited maternally. Thirteen subunits contributing to complexes of the RC are encoded by mtDNA, and the entire mitochondrial genome is present at variable copy number in the cell. The relative copy number of mtDNA (mtDNA CN) may reflect differing energy requirements between cells: those from active tissues (e.g. liver, muscle, neuron) are observed to have higher mtDNA CNs compared to endothelial cells, which are comparatively quiescent [1,2,3].

Several nuclear genes are known to influence the regulation of mtDNA CN, and these are reviewed in detail elsewhere [1, 4,5,6]. These include POLG [4,5,6,7,8,9,10,11,12] and POLG2 [4, 5, 12], the catalytic and accessory subunits of DNA polymerase-gamma, the principal enzyme implicated in mtDNA replication. Other regulators include TFAM (mitochondrial transcription factor A) [4, 13,14,15,16,17], which initiates mtDNA replication, along with other factors TFB1M and TFB2M [4, 8, 18]. Regulators of these transcription factors include PGC-1α (peroxisome proliferators-activated receptor gamma coactivator 1 alpha) [4, 5, 8] and two nuclear respiratory factors (NRF-1, NRF-2) [4, 5, 8]. Moreover, maintenance of replication requires an adequate mitochondrial nucleotide supply [19]: nucleotides may be imported from the cytosol or salvaged by specific mitochondrial enzymes. Defective phosphorylation of deoxyribonucleosides by kinases encoded by DGUOK (deoxyguanosine kinase) and TK2 (thymidine kinase) leads to dysfunctional mitochondrial dNTP synthesis and key regulators of dNTP synthesis in the cytosol include the helicase C10orf2 (alias TWINK) [4,5,6,7,8,9,10, 12], along with thymidine phosphorylase (TYMP) [4, 5, 9] and the target of the p53-transcription factor, p53R2 (encoded by RRM2B) [4, 6, 7, 9, 12]. The role of succinyl CoA synthase deficiency as a cause of mtDNA depletion is less well understood, but mutations in the alpha and β subunits of succinyl CoA synthase genes (SUCLA2, SUCGL1) [6, 7, 9, 10] may be associated with mitochondrial nucleotide depletion [6].

To our knowledge, few genome-wide scans of mtDNA CN have been published, and those that exist are of relatively small sample size [16, 20, 21] or use in silico proxies for mtDNA CN without actual biological measurements [17]. We had access to directly assayed mtDNA CN in a diverse set of study groups and so performed hypothesis-generating genome-wide association studies (GWAS) in ~ 14,000 individual participants from the Avon Longitudinal Study of Parents and Children (ALSPAC) and the UK Blood Service (UKBS) cohort. For our main analyses, the two most comparable study groups of adult females were combined in a joint analysis (N = 6799, approximately 10 times larger than previous GWAS of directly assayed mtDNA CN) [20, 21], with results from the other groups presented as opportunistic, secondary analyses. It is known that cellular heterogeneity contributes to mtDNA CN: granulocytes have relatively few mitochondria, whereas lymphocytes are rich in mitochondria, and therefore in mtDNA [22]. Since we also had access to data on white cell proportions, estimated from methylation data in ALSPAC, and assayed directly in UKBS, we performed sensitivity analyses that considered DNA source (whole blood/white cells), and controlled for white cell proportions. Finally, we extracted two SNPs that were robustly related to mtDNA CN in a recent GWAS of mtDNA CN measured in silico [17], and compared our results to those published associations.

Participants and methods

Cohort details

ALSPAC is a prospective cohort of mothers and their children. Between 1991 and 1992, 14,541 women living in the former county of Avon, UK, were recruited during pregnancy, of whom 13,761 were enrolled into the study (women were aged between 16 and 43 years at recruitment when samples for mtDNA CN analyses were obtained). Further details are available in the cohort profile papers [23, 24], and the study website contains details of all data that are available through a fully searchable data dictionary: http://www.bristol.ac.uk/alspac/researchers/our-data/. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.

The UK Blood Service control group is part of the Wellcome Trust Case Control Consortium 2. The UK National Blood Service (UKBS) consists of 3091 unrelated, healthy individuals (aged 17–69 years when samples for mtDNA CN assay were obtained), recruited between September 2005 and February 2006. Informed consent was obtained from all participants in accordance with protocols approved by the Peterborough & Fenland Local Research Ethics Committee in September 2005.

DNA samples

Blood samples used for mtDNA CN assay were collected from ALSPAC mothers during routine antenatal care. Children included in this study had DNA sampled at either birth (from cord blood, these individuals are hereafter referred to as ALSPAC ‘neonates’) or at a follow-up research clinic assessment at mean age 7 (range 6–9 years, hereafter ALSPAC ‘6–9-year-olds’). Antenatal DNA from mothers was extracted using a phenol-chloroform method [25]. DNA from ALSPAC children was extracted using a phenol-chloroform (ALSPAC neonates) or salting-out method (ALSPAC 6–9-year-olds) [25]. DNA sources used for the mtDNA CN assay varied by age group: DNA from whole blood was used for 6–9-year-olds, ALSPAC mothers’ DNA was extracted from whole blood or white cells, and ALSPAC neonates had DNA extracted from white cells, as described previously [25].

UKBS blood samples were separated by density centrifugation, and white blood cells were retained to perform DNA extractions, as previously described, using a guanidine-chloroform-based method [26, 27]. Thus, in UKBS, the DNA source was white blood cells for all participants. Blood composition information for UKBS samples was provided by Willem Ouwehand at the University of Cambridge as part of an on-going collaboration with Patrick Chinnery. These details are also summarised in Table 1.

Table 1 Description of study groups and analysis structure in this paper

Genotype data

ALSPAC

ALSPAC mothers were genotyped on the Illumina Human660W-Quad array (Illumina, San Diego, CA, USA) at the Centre Nationale du Génotypage (CNG). ALSPAC children were genotyped with the Illumina HumanHap550-Quad array, by the Wellcome Trust Sanger Institute, Cambridge, UK, and the Laboratory Corporation of America, Burlington, NC, USA, using support from 23andMe. Genotypes were called using Illumina GenomeStudio®. Quality control (QC) was performed using PLINK v1.07 [28], phasing using ShapeIT (v2.r644) [29], and imputation was to the Haplotype Reference Consortium (v1.0), performed using IMPUTE (v3) (http://mathgen.stats.ox.ac.uk/impute/impute.html). The genome build used was GRCh37. Further details of genotype QC are given in Additional file 1.

GWAS were run separately in 5461 ALSPAC mothers, 3647 6–9-year-olds, and 2102 neonates (see Additional file 1 for details of selection into the study). Relatedness within each group of participants (mothers, neonates, and 6–9-year-olds) was assessed by identical-by-descent (IBD) proportions from a genetic relatedness matrix, calculated using the GCTA standard algorithm [30], based on 1.1 million HapMap3 best-guess tag SNPs, present at a combined allele frequency of > 0.01 and imputation quality > 0.8 in 17,842 individuals. Within each group (mothers, 6–9-year-olds, and neonates), participants were unrelated (IBD > 0.125; i.e. first-cousin level). A subset of children was related to the 5461 ALSPAC mothers: there were 1611 mother/6–9-year-old pairs related at IBD > 0.125 (1570 pairs IBD > 0.45) and 869 mother-neonate pairs related at IBD > 0.125 (839 IBD > 0.45). For some sensitivity analyses, a GWAS of a subset of 2833 mothers, who are unrelated to any 6–9-year-olds or neonates at IBD > 0.125, is used. SNPs were filtered by MAF < 0.01 and imputation score < 0.8 in all study groups [31], leaving 7,360,988; 7,410,776; and 7,361,275 SNPs in ALSPAC mothers, 6–9-year-olds, and neonates, respectively.

UKBS

The UKBS cohort was genotyped using the Illumina 1.2 M Duo platform. Raw genotype data (called using Illuminus [32], http://www.sanger.ac.uk/science/tools/illuminus) were downloaded from the European Genotype Archive (http://www.ebi.ac.uk/ega). QC was performed using PLINK v1.90 [33], phasing and imputation using EAGLE2 (v2.0.5) [34], and PBWT [35] (imputation was to the Haplotype Reference Consortium (v1.0) performed using the Sanger Imputation Server [https://imputation.sanger.ac.uk/]). The genome build was GRCh37. Further details are given in Additional file 1.

GWAS were run in 2671 UKBS individuals (1333 males, 1338 females). Individuals were unrelated at any level of IBD. After filtering by MAF < 0.01 and imputation quality > 0.8, 7,441,490 variants remained (7,369,986 males only and 7,373,492 females only).

Assay of mtDNA CN

For mtDNA CN assay details in ALSPAC and UKBS, see Additional file 1. Both cohorts had mtDNA CN assayed by quantitative PCR. Both assays used B2M as the single-gene reference, but mtDNA amplicons differed. Raw data are plotted in Additional file 2: Figure S1.

Despite differences in raw mtDNA CNs between study groups, validation analyses of the two adult cohorts suggested relative mtDNA CNs were reliable. Cross-validation of qPCR methodology between centres was performed on 384 random samples (169 from ALSPAC, 185 from UKBS). Samples were assayed by PG and AG in Bristol and RB in Newcastle, using cohort-specific protocols. There was moderate-to-good agreement between z-scores of mtDNA CNs assayed in ALSPAC by AG and PG (r[Spearman] = 0.68) and between those obtained from the two sets of exchanged plates (ALSPAC r[Spearman] = 0.58 [analysts: AG/RB] and UKBS r[Spearman] = 0.69 [analysts: PG/RB]) (Additional file 2: Figure S2). However, panel B of Additional file 2: Figure S2 suggests that there may have been some non-linearity when comparing the two assays. To control for absolute differences in mtDNA CNs, z-scored phenotypes were used in GWAS (after log-transformation to approximate normality). Z-scores were computed separately for ALSPAC mothers, 6–9-year-olds, neonates, and UKBS.

Statistical analysis

Genome-wide association study

GWAS were undertaken separately for ALSPAC mothers, ALSPAC 6–9-year-olds, ALSPAC neonates, UKBS females, and UKBS males. Additive models were fitted, using dosage data for ALSPAC and best-guess data for UKBS. The genome-wide significance threshold was p = 5e−08 [36].

Main analyses

We had access to several study groups with relevant data. Since these groups were diverse in nature and were of relatively small sample sizes (compared to some complex trait GWAS), we did not consider them as ‘discovery’ and ‘replication’ cohorts. Instead, after validating and harmonising data (see the ‘Assay of mtDNA CN’ section), we considered our main analyses to be hypothesis-generating GWAS of the two most comparable groups, i.e. all adult females (5461 ALSPAC mothers and 1338 UKBS females). This decision also took into account results from some preliminary analyses that suggested some possible differences between UKBS females and males (see Additional file 2: Figures S3 and S4), although we acknowledge that we have insufficient power to detect sex differences that are not potentially due to chance.

Thus, results from ALSPAC mothers and UKBS females were meta-analysed (‘Meta-analysis 1’), using random-effects models in order to protect against possible heterogeneity in effects between cohorts. Since ALSPAC mothers had DNA extracted from two sources, a sensitivity meta-analysis (‘Meta-analysis 2’, N = 4743) restricted the ALSPAC mothers to 3405 females with white cell DNA extracted by a phenol-chloroform method (i.e. the most comparable subgroup to UKBS females, all of whom had white cell DNA extracted by a guanidine-chloroform based method [26, 27]). Heterogeneity was assessed using Cochran’s Q statistic [37] and the I2 statistic [38]. Using the summary statistics from Meta-analysis 1, we estimated the variance explained in mtDNA CN using LD score regression [39].

Secondary analyses

Results from GWAS of 3647 ALSPAC children [6–9 years], 2102 ALSPAC neonates, and 1333 UKBS males are presented as secondary analyses. We applied the same p value threshold for genome-wide significance and also specifically looked at whether hits identified in these groups showed similar directions and magnitudes of association in the meta-analyses of adult females. However, we did not consider these groups to be suitable replication samples for the main analyses of adult females, given their small sample sizes and sex and age differences.

Look-up of top loci from a recent large GWAS of in silico mtDNA CN

A recent large GWAS (using a discovery sample of 10,560 Han Chinese females) identified TFAM (mitochondrial transcription factor A) and CDK6 (Cyclin Dependent Kinase 6), as loci strongly associated with mtDNA CN estimated in silico, from sequence data [17]. The replication sample for that study included 1753 ALSPAC children within the UK10K consortium [40]. There is overlap between that group and the 6–9-year-olds studied in the current analysis [17]. Using all available unrelated participants in the current study (i.e. excluding mother-child duos, N = 11,253 total), we meta-analysed the two lead SNPs at these loci and compared our results to those of this previous GWAS [17].

Covariates

Covariates included age at mtDNA CN assay, sex, and DNA concentration (ng/μL) [41] (ALSPAC only, measured by a PicoGreen® fluorescence-based method), and principal components (2 in ALSPAC, 4 in UKBS), to adjust for population structure. The ALSPAC mothers’ analysis was adjusted for DNA source (white cells or whole blood) (see Table 1).

For all analyses, effect sizes adjusted for cell proportions are also presented, since cell lineages vary in their average numbers of mitochondria [22]. In ALSPAC, cell proportions have been estimated in the ‘Accessible Resource for Integrated Epigenomics’ (ARIES) subset [42], using DNA methylation data from the Illumina Infinium HumanMethylation450 BeadChip (450 K) array and the Houseman method [43]. Five hundred forty-six mothers, 606 6–9-year-olds, and 82 neonates included in this study had cell proportion data. Proportions were estimated from methylation data derived from the same time points as mtDNA CNs were assayed (antenatally for mothers, at birth and ~ 7 years for children). In this study, we only performed cell proportion-adjusted analyses if participants had mtDNA CN assayed in the same DNA sample type that was used for cellular proportion estimation (i.e. white cells/whole blood) [44]. Lymphocytes (total of CD8T, CD4T, B lymphocytes) and neutrophils (granulocytes for neonates [45]) were included as covariates for mothers, with the addition of monocytes for 6–9-year-olds and neonates (monocytes were not used in the mothers’ analyses since this prevented model convergence). In UKBS, neutrophil and lymphocyte proportions derived from full blood count data were included [27, 46].

Software

GWAS were performed with SNPTESTv2.5.0 [47] and meta-analysed with META v1.7.0 [48]. The ‘qqman’ R package [http://cran.r-project.org/web/packages/qqman/] was used to create Manhattan/quantile-quantile (QQ) plots. The ‘bedtools’ ‘clusterBed’ function [https://github.com/arq5x/bedtools2] [49] was used to group SNPs associated at p < 1e-06 (except for the 'Meta-Analysis 2', see footnote of Table 2) into 1Mb clusters, which were then annotated with ANNOVAR [50] [http://annovar.openbioinformatics.org/en/latest/]. Regional association plots were produced with LocusZoom v1.3, using an hg19 reference (1000G March 2012) [51].

Power

An R implementation of the method used by Genetic Power Calculator [52, 53] was used to determine the minimum detectable effect sizes at 80% power, given the sample size of the study groups used in the main meta-analyses (‘http://www.cureffi.org/2012/12/05/power-for-gwas-and-extreme-phenotype-studies/’). This method requires ‘total QTL variance’ (i.e. the proportion of variance in a quantitative trait locus, Vq explained by the causal variant) as input. Assuming a standardised normal distribution (phenotype in SD units), the following is true:

$$ {V}_q={\beta}^2p $$

where β is the effect size (in SD units) and p is the minor allele frequency (MAF). Estimated minimal effect sizes for our given sample sizes, a range of Vq values was calculated, with a minimum value of 0.001 (equivalent to an effect size of 0.316 at MAF = 0.01 and 0.045 at MAF = 0.5). Linkage disequilibrium (LD) of 0.8 (measured by the D’ metric) was assumed between causal and tag variants. Power curves are shown in Additional file 2: Figure S5. Heat maps of power by effect size and MAF are shown in Additional file 2 Figure S6. For the largest (N = 6799 [5461 + 1338]) and smallest (N = 1333) GWAS undertaken in this paper (i.e. the main meta-analysis of adult females and the smallest secondary analysis of UKBS males, respectively), minimum Vq values detectable were 0.0091 and 0.0458. At MAF = 0.25, this is equivalent to effect sizes of 0.191 and 0.428. However, it should be noted that these calculations do not take account of measurement error in mtDNA CN, white cell heterogeneity, covariate adjustment, or the use of random-effects meta-analysis for the main GWAS.

Results

Main analysis

Manhattan/QQ plots for GWAS of ALSPAC mothers (for all 5461 mothers and for 3405 with white cell-extracted DNA), UKBS females, and the two meta-analyses are shown in Figs. 1 and 2. Values of lambda from genomic control calculations are provided in the appropriate figure legend. Regional association plots for loci identified from the meta-analyses are in Fig. 3. For strongest associations from separate GWAS of ALSPAC mothers and UKBS females, see Additional file 3: Tables S1, S2, S3, S4, and S5 and Additional file 1.

Fig. 1
figure1

a Manhattan (left)/quantile-quantile (QQ) plots (right) for ALSPAC (all mothers). λ = 0.995. ‘Minimally adjusted’ refers to the fact that these results are from the analysis that did not adjust for cell proportions. b Manhattan (left)/quantile-quantile (QQ) plots (right) for UKBS (females). λ = 1.011. ‘Minimally adjusted’ refers to the fact that these results are from the analysis that did not adjust for cell proportions. c Manhattan (left)/quantile-quantile (QQ) plots (right) for random-effects meta-analysis of ALSPAC (all mothers) and UKBS (females). λ = 0.995 and 1.011 for ALSPAC (all mothers) and UKBS females, respectively, and meta-analyses are corrected for these lambdas. ‘Minimally adjusted’ refers to the fact that these results are from the analysis that did not adjust for cell proportions

Fig. 2
figure2

a Manhattan (left)/quantile-quantile (QQ) plots (right) for ALSPAC (white cell mothers). λ = 0.992. ‘Minimally adjusted’ refers to the fact that these results are from the analysis that did not adjust for cell proportions. b Manhattan (left)/quantile-quantile (QQ) plots (right) for random-effects meta-analysis of ALSPAC (white cell mothers) and UKBS females. λ = 0.992 and 1.011 for ALSPAC (white cell mothers, see Fig. 2a) and UKBS females (see Fig. 1b), and meta-analyses are corrected for these lambdas. ‘Minimally adjusted’ refers to the fact that these results are from the analysis that did not adjust for cell proportions

Fig. 3
figure3

Regional association plots (created with LocusZoom). These three regional association plots (a, b, and c, created with LocusZoom) detail the top loci presented in the meta-analyses of all ALSPAC mothers and UKBS females (a, b) and of the one locus identified after restriction of the meta-analysis to ALSPAC mothers with DNA extracted from white cells, only (c). In each plot, the lead SNP (i.e. the SNP with the lowest p value) is annotated in purple, with other SNPs colour coded according to their values of linkage disequilibrium (in r2) with the lead SNP. Transformed, −log10p values and recombination rate (in centimorgans per megabase, cM/Mb) are shown on the left and right y-axes, respectively. A schematic of the genes in each region, along with coordinates and annotations, is shown at the bottom of each plot, with chromosomal location in megabases (Mb) along the x-axis. See Table 2 for more details of each locus

Results from the main meta-analysis of adult females (N = 6799, ‘Meta-analysis 1’) as well as the analysis restricted to those with white cell-extracted DNA (N = 4743, ‘Meta-analysis 2’) are given in Table 2. SNPs in associated regions were clustered into 1 Mb windows [49]. For each analysis, results are shown before and after cell-proportion adjustment. Annotations are given for loci within 200 kb [54]. Coordinates are hg19.

Table 2 Summary statistics for top SNPs associated with mtDNA CN in two random-effects meta-analyses

Meta-analysis 1: all adult females (N = 6799)

No SNP passed the genome-wide significance threshold of p < 5e−08. The top panel of Table 2 describes two top loci (p < 1e−06) identified from the main meta-analysis of all adult females: these included an intergenic locus on chr13 (lead SNP rs12873707, β [SE] 0.159 [0.030], p = 9.27e−08, I2 = 0) and a locus on chr17 (lead SNP rs709591, β [SE] − 0.084 (0.016) p = 1.54e−07, I2 = 0). A list of all SNPs associated at p < 1e−06 in these loci is given in Additional file 3: Table S6. Using summary statistics for all SNPs included in this meta-analysis, we used LD score regression to calculate the estimated total observed scale heritability as 0.052 [SE = 0.079].

Regional association plots (see panels A and B of Fig. 3) show the LD structure at these loci. Panel A shows that there are no nearby SNPs in LD with the lead chr13 variant. Panel B shows a large region of LD at the chr17 locus, spanning the genes PSMD3, CSF3, and MED24. SNPs in this region are associated with neutrophil count [27]; given this fact, and since cellular heterogeneity affects mtDNA CN [22, 55, 56], this locus was assessed in more detail, by extracting the SNP from all study groups (Table 3). Despite differences in the proportion of participants with information that enabled adjustment for cell type, there was consistency across study groups with each showing an approximate 50% attenuation of the effect size with adjustment for cell proportions.

Table 3 General concordance of association at rs709591, before adjustment for cell proportions

Meta-analysis 2: adult females with white cell-extracted DNA (N = 4743)

The bottom panel of Table 2 gives details of the strongest association in the meta-analysis restricted to ALSPAC mothers with DNA extracted from white cells (‘Meta-analysis 2’). The locus associated with mtDNA CN (p < 1e−06) was rs150387260, an intergenic variant (β [SE] 0.405 [0.084], p = 1.65e−06, I2 = 0, see also Additional file 3: Table S7). A regional association plot (panel C of Fig. 3) shows that few neighbouring SNPs are in substantial LD with this SNP. The nearest gene was RB1 Inducible Coiled-Coil 1 (RB1CC1), 78 kb upstream. The product of this tumour suppressor gene is implicated in cell growth, migration, proliferation, apoptosis, and autophagy [57]. RB1CC1 deletions are associated with increased numbers of mitochondria in haematopoietic stem cells [58] and mice [59] and with breast cancer in humans [60].

Effect of adjusting for cell proportions in these two meta-analyses

In Meta-analysis 1 (all adult females, regardless of DNA source), point estimates attenuated after cell proportion adjustment, although the reduced sample size meant that confidence intervals were wide. In contrast, after cell proportion adjustment for the association identified from Meta-analysis 2 (restricted to females with white cell-extracted DNA), confidence intervals were also wide, but the effect estimate was not attenuated (it increased slightly).

Secondary analyses

Table 4 shows results of GWAS in ALSPAC 6–9-year-olds, neonates, and UKBS males. Only genome-wide-significant loci are shown (for UKBS males, top loci at p < 1e−06 are shown, as there were no loci p < 5e−08). When comparing effect sizes between any hits in these groups with effects of the same SNPs in the main (adult female) analyses, it should be noted that there are ~ 1600 mother-child duos between ALSPAC mothers included in the main analysis and the 6–9-year-olds/neonates (see the ‘Methods’ section). Results of the associations of SNPs at a known neutrophil count locus (PSMD3, CSF3, MED24) identified in our main analyses in these secondary analysis groups are discussed above (see the ‘Main analysis’ section).

Table 4 Lead SNPs in secondary analyses

Associations in ALSPAC 6–9-year-olds

There were two genome-wide significant, intergenic loci in 6–9-year-olds (rs139045492 on chr20, β [SE] − 0.695 [0.119], p = 6.35e−09 and rs144874692 on chr3, β [SE] − 0.632 [0.112] p = 1.78e−08). The rs139045492 association persisted after cell proportion adjustment. The SNP rs139045492 was not directionally concordantly (or statistically significantly) associated with mtDNA CN in either of the adult female main meta-analyses. The other association (rs144874692), which attenuated after cell-proportion adjustment, was not available in the main analysis. Therefore, a proxy (rs144928561, r2 = 0.29, β [SE] − 0.328 [0.105], p = 0.002 in ALSPAC 6–9-year-olds) was used to explore whether there was any evidence of it being associated with mtDNA CN in the main analyses. This proxy SNP was not directionally concordantly associated in either main meta-analysis. Manhattan/QQ plots are available in Additional file 2: Figure S7. A list of all SNPs at p < 1e−06 in the 6–9-year-olds is available in Additional file 3: Table S8.

Associations in ALSPAC neonates

The locus most strongly associated with mtDNA CN in the ALSPAC neonates was at chromosome 19, in ABHD8 (abhydrolase-domain containing 8) (lead SNP: rs10424198, β [SE] − 0.262 (0.034), p = 1.40e−14). Sixteen SNPs were identified at genome-wide significance within BABAM1, ANKLE1, ABHD8, and MRPL34. Whilst the standard error for this SNP was large after cell-composition adjustment, this latter analysis was in a tiny sample (N = 82), and yet still the point estimate remained broadly consistent (cell-proportion adjusted β [SE] − 0.326 [0.193], p = 0.095). However, there was no evidence of a concordant association in either meta-analysis from the main analyses. Manhattan/QQ plots are available in Additional file 2: Figure S8. A list of SNPs associated with mtDNA CN at p < 1e−06 in neonates is available in Additional file 3: Table S9.

Associations in UKBS (males)

No SNPs were associated at p < 5e−08 in UKBS males. Two SNPs were associated at p < 1e−06: an intronic SNP in MYT1L (Myelin Transcription Factor 1 Like, β (SE) 0.401 (0.078), p = 2.94e−07) and two intergenic SNPs 121 kb upstream of PDZRN3 (PDZ Domain Containing Ring Finger 3, lead SNP β [SE] − 0.497 [0.099], p = 5.84e−07). Whilst these SNPs survived cell proportion adjustment, neither locus showed evidence of an association concordant in terms of direction and magnitude in either main meta-analysis. Manhattan and QQ plots for this analysis are available in Additional file 2: Figure S3. A list of SNPs at p < 1e−06 in UKBS males is available in Additional file 3: Table S10.

Look-up of top loci from a recent large GWAS of in silico mtDNA CN

The results of a look-up of two loci (rs11006126 and rs445) that are strongly associated with mtDNA CN measured in silico [17] are presented in Fig. 4. Estimates for these SNPs were extracted from and meta-analysed across all study groups (after removing ALSPAC mother-offspring pairs; total N for these meta-analyses = 11,253, see also the ‘Genotype data’ section). The SNP rs11006126 was associated with mtDNA CN across our study groups (β [SE] 0.046 [0.017], p = 0.007), but the effect size was considerably (~ 1/7 of the size) smaller than that observed in the discovery and replication cohorts of the in silico GWAS. There was no evidence of replication of rs445 across our study groups (β [SE] 0.021 [0.022], p = 0.328) compared with discovery and replication result from the in silico GWAS.

Fig. 4
figure4

Extraction and meta-analysis of two top loci from Cai et al. [17] from study groups in this GWAS. SNPs for two loci identified in a GWAS of in silico estimated mtDNA CN were extracted from each of the study groups in this cohort (NB: a smaller subset of 2833 ALSPAC mothers were used, since there were mother-child duos present between the original study group of 5461 and the two groups of ALSPAC children) total N = 11,253. Columns: SNP = rsID; gene = gene name; group = study group, beta = effect size; LCI/UCI = 95% confidence interval (lower, then upper bound); P = p value; and I2 = I2 metric for heterogeneity. Meta-analyses were by random-effects and are shown as black diamonds. For reference, the ALSPAC estimate from Cai et al. [17] is shown for each locus. This replication group included ALSPAC 6–9-year-olds (with mtDNA CN assayed from sequence data). Betas had to be harmonised, as those in Cai et al. [17] were given as SD change in mtDNA CN per SD increase in genotype. SD of genotype was estimated from allele frequencies provided for the cohort by Cai et al. [17] given as 0.342 for rs445 and 0.169 for rs11006126 (in the supplement of this paper). SDs were then calculated as √(2 × (1-MAF) × MAF) (evaluating to 0.53 and 0.67 for rs11006126 and rs445, respectively). Betas and standard errors were then transformed from those given in Table 1 of Cai et al. to a ‘per risk allele’ scale, by multiplying the given beta by (1/the estimated SD), i.e. rs11006126 = 0.179 × (1/0.53); rs445 = 0.110 × (1/0.67). CaiALSPAC = result from Cai et al. [17]. UKBSF/UKBSM = females and males in UKBS cohort

Discussion

We conducted the largest ever GWAS of directly assayed mtDNA CN, in two population-based cohorts. Although diversity between our study groups prevented us from conceptualising a traditional discovery and replication model, we considered our study as hypothesis generating and meta-analysed the two most comparable groups (ALSPAC mothers and UKBS females) as our main analyses and other subgroups as opportunistic, secondary analyses.

There were no genome-wide significant hits in the meta-analysis of adult females, but SNPs at a known neutrophil (and other leucocytes [61]) count locus (PSMD3, CSF3, MED24) [62] were associated with mtDNA CN at p ~ 2e−07. There is extended LD across these genes (see Fig. 3); although CSF3 (alias granulocyte colony-stimulating factor) may be the most likely biological candidate [63], a previous study in Japanese individuals reported that the neutrophil count associated SNP was most associated with PSMD3 expression [62]. It is established that cellular heterogeneity is related to mtDNA CN [22], and one of the strengths of this study is that we were able to control for this. This chr17 locus showed notable consistency of effect sizes across several study groups (UKBS males, UKBS females, ALSPAC neonates), corresponding to a ~ 0.08 reduction in SD units of log mtDNA CN per risk allele. However, there was notable (~ 50%) attenuation with adjustment for white cell proportions. Thus, a key finding of this study is the importance of undertaking GWAS of directly assayed mtDNA CN in sample sizes that are much larger and that also have measures of white cell proportions. Ideally, these studies should also be sufficiently large to enable exploration of any possible variation in nuclear genetic control of mtDNA copy number variation by sex and age.

Secondary analyses in ALSPAC children (6–9 years) and ALSPAC neonates and UKBS males revealed three genome-wide significant hits: two intergenic loci in ALSPAC 6–9-year-olds and a region containing ABHD8 (abhydrolase-domain containing-8) in ALSPAC neonates. None of the associations we observed at this locus in ALSPAC neonates were at all evident in the main meta-analysis of adult females. We only identified one published association of SNPs within ABHD8 with a disease trait (breast cancer) in the literature [64], although it is notable that this gene lies in head-to-head orientation with mitochondrial ribosomal protein L34, MRPL34. Another mitochondrial ribosomal protein, MRPL37, has previously been found to associate with mtDNA CN, which [20] raises the possibility that the BABAM1-ANKLE1-ABHD8-MRPL34 locus might be a true signal, despite the lack of evidence of effect in our main analyses. Mitochondrial ribosomal proteins are encoded by nuclear DNA, synthesised on cytoplasmic ribosomes, then imported into mitochondria, where they facilitate the translation of mitochondrial mRNA, in conjunction with the two mitochondrially encoded rRNAs [65]. Whilst there are no candidate disorders for MRPL34, other diseases, such as Parkinson’s disease (previously linked to reduced mtDNA CN [66]), are related to other mitochondrial ribosomal proteins [20, 65, 67]. This locus also included an additional neighbouring gene, GTP Binding Protein 3 (Mitochondrial) (GTPBP3), associated with mtDNA CN at p ~ 5e−07; rare mutations in this gene are known to cause a Leigh syndrome-like disorder [68]. However, it is clear that independent replication will be needed in order to confirm these associations, since our current results are in a small sample from our secondary analyses only, and could be attributable to ‘winner’s curse’ [69]. In addition, if such an association were to replicate in future GWAS, careful fine-mapping would be needed in order to further refine the signal and suggest putative causal genes out of those found to be associated.

The identification of the cell-count associated locus (lead SNP in MED24) could suggest that some loci, such as this chr17 locus, may be related to mtDNA CN only via their association with cell composition. It is noteworthy that the meta-analysis in which this locus was identified included ALSPAC mothers with DNA extracted from both white cells and whole blood: it is possible that combining individuals with DNA prepared from multiple sources could lead to preferential detection of loci associated with mtDNA CN predominantly via their association with cellular heterogeneity. When results from participants with more similar DNA sources are pooled, the power to detect loci associated with mtDNA CN independently of cell proportions may increase: we postulate that the BABAM1-ANKLE1-ABHD8-MRPL34 locus might be such a ‘cell-composition independent’ locus (although as noted above, this needs further exploration). For loci in this latter category, controlling for cell proportions may improve the signal-to-noise ratio of observed associations, and failure to control for cellular proportions may act akin to measurement error. Future work should seek to assert whether associations of known neutrophil loci with mtDNA CN are entirely due to cell composition in DNA samples or whether these loci are detected because mtDNA CN is causally related to leucocyte count [70]. Ideally, such studies would use directly assayed neutrophil data, as opposed to estimates of cellular composition. In addition, it might be useful to control associations for platelet count, since platelets are anucleate, but mitochondria-rich [71, 72]. However, we acknowledge that in adjusting for cellular composition, it is possible that we could induce bias, if mtDNA CN plays a causal role in determining cell counts. It is for this reason that we chose to present models adjusted and unadjusted for cellular composition in our work, and we compare the results throughout.

Beyond the possibility of true underlying genetic heterogeneity between our cohorts, several other technical factors may have limited their comparability. We believe that population stratification is unlikely to be a problem, as analyses controlled for principal components, and the correlation of MAFs for tops hits was high (see Additional file 2 Figure S9). Another technical difference between study groups is the DNA extraction method used. DNA extraction method has a considerable effect on mtDNA CN assay [73] and similar qPCR assays, including that of telomere length [74]. However, it is difficult to assess the extent to which DNA extraction method may have affected the performance of the mtDNA CN assay, since extraction method in ALSPAC was also related to age at DNA sampling, and it is possible that the genetic regulation of mtDNA CN varies across the life course.

When we combined all our study groups and looked at whether two hits from a previous in silico GWAS were replicated, we found some evidence for one (a SNP in mitochondrial transcription factor A (TFAM)), but not the other (CDK6) [17]. Notably, the previously reported SNP for the CDK6 locus (rs445) has also been shown to be associated with leucocyte counts [61]. TFAM is known as ‘a key activator of mitochondrial transcription’ and is involved in ‘promoter recognition by the mitochondrial RNA polymerase’ [16], yet despite the comparable sample sizes between our (N = 11,253) and the previous in silico (N = 10,442 discovery and N = 1753 replication) analyses, we observed a considerably smaller effect size for the TFAM hit (~ 1/7 the size of the replication effect from the previous study, despite partial overlap in participants between the previously published result and our sample). We postulate that this might flag an important limitation in our work, namely that of measurement error. If a substantial component of our assayed mtDNA CNs includes non-differential measurement error, we would expect to see attenuation of effect sizes in our results. This is possible in qPCR assays; indeed, although we demonstrated correlations in relative mtDNA CNs in a validation analysis, we saw some suggestion of non-linearity. Whilst our attempted replication of the previous in silico GWAS hit had 100% power to detect effect sizes of 0.338 SD units (for a SNP with a MAF of 0.169, i.e. the TFAM SNP), this power calculation will be overoptimistic if our mtDNA CN assays are affected by measurement error. Therefore, we might suggest that in silico measurement of mtDNA CN may have advantages over the directly assayed method in this instance.

Conclusion

In conclusion, we confirm an association of TFAM with mtDNA CN, and after performing a range of hypothesis-generating GWAS in diverse study groups, we present several putative regulators of mtDNA CN that will require further follow-up. However, we generally observed poor concordance across study groups. Overall, our main conclusion is that here we find no strong evidence to support our primary hypothesis of common loci regulating mtDNA CN in the study groups used here. We assess and discuss the possible implications of cellular heterogeneity on our results and present the directly assayed mtDNA CN assay as another example of a qPCR assay that may be subject to measurement error. These findings should be considered as possible power-limiting factors in GWAS studying the genomic regulation of mtDNA CN. Nonetheless, we believe that to fully understand nuclear genomic control of mtDNA CN variation, it is necessary to conduct GWAS of directly assayed mtDNA CN. Thus, our work (the largest GWAS to date) makes an important contribution in terms of future requirements to gain this knowledge, which is necessary for fuller understanding of the biology and potential clinical impact of subtle variation in mtDNA CN.

References

  1. 1.

    Dickinson A, Yeung KY, Donoghue J, Baker MJ, Kelly RD, McKenzie M, et al. The regulation of mitochondrial DNA copy number in glioblastoma cells. Cell Death Differ. 2013;20:1644–53 Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3824586/pdf/cdd2013115a.pdf.

  2. 2.

    Moyes CD, Battersby BJ, Leary SC. Regulation of muscle mitochondrial design. Cell Death Differ. 1998;201:299–307.

  3. 3.

    Xing J, Chen M, Wood CG, Lin J, Spitz MR, Ma J, et al. Mitochondrial DNA content: its genetic heritability and association with renal cell carcinoma. J Natl Cancer Inst. 2008;100:1104–12.

  4. 4.

    Carling PJ, Cree LM, Chinnery PF. The implications of mitochondrial DNA copy number regulation during embryogenesis. Mitochondrion. 2011;11:686–92.

  5. 5.

    Harvey A, Gibson T, Lonergan T, Brenner C. Dynamic regulation of mitochondrial function in preimplantation embryos and embryonic stem cells. Mitochondrion. 2011;11:829–38.

  6. 6.

    Rotig A, Poulton J. Genetic causes of mitochondrial DNA depletion in humans. Biochim Biophys Acta. 2009;1792:1103–8.

  7. 7.

    Venegas V, Wang J, Dimmock D, Wong L-J. Real-Time Quantitative PCR Analysis of Mitochondrial DNA Content. Curr Protoc Hum Genet. 2011;68(1):19.17.11–12.

  8. 8.

    Chiaratti MR, Meirelles FV, Wells D, Poulton J. Therapeutic treatments of mtDNA diseases at the earliest stages of human development. Mitochondrion. 2011;11:820–8.

  9. 9.

    Bornstein B, Area E, Flanigan KM, Ganesh J, Jayakar P, Swoboda KJ, et al. Mitochondrial DNA depletion syndrome due to mutations in the RRM2B gene. Neuromuscul Disord. 2008;18:453–9.

  10. 10.

    Spinazzola A, Invernizzi F, Carrara F, Lamantea E, Donati A, Dirocco M, et al. Clinical and molecular features of mitochondrial DNA depletion syndromes. J Inherit Metab Dis. 2009;32:143–58.

  11. 11.

    Oskoui M, Davidzon G, Pascual J, Erazo R, Gurgel-Giannetti J, Krishna S, et al. Clinical spectrum of mitochondrial DNA depletion due to mutations in the thymidine kinase 2 gene. Arch Neurol. 2006;63:1122–6.

  12. 12.

    Tyynismaa H, Ylikallio E, Patel M, Molnar MJ, Haller RG, Suomalainen A. A heterozygous truncating mutation in RRM2B causes autosomal-dominant progressive external ophthalmoplegia with multiple mtDNA deletions. Am J Hum Genet. 2009;85:290–5.

  13. 13.

    Choi YS, Hong JM, Lim S, Ko KS, Pak YK. Impaired coactivator activity of the Gly482 variant of peroxisome proliferator-activated receptor gamma coactivator-1alpha (PGC-1alpha) on mitochondrial transcription factor A (Tfam) promoter. Biochem Biophys Res Commun. 2006;344:708–12.

  14. 14.

    Alvarez V, Corao AI, Sanchez-Ferrero E, De Mena L, Alonso-Montes C, Huerta C, et al. Mitochondrial transcription factor A (TFAM) gene variation in Parkinson’s disease. Neurosci Lett. 2008;432:79–82.

  15. 15.

    Belin AC, Bjork BF, Westerlund M, Galter D, Sydow O, Lind C, et al. Association study of two genetic variants in mitochondrial transcription factor A (TFAM) in Alzheimer’s and Parkinson’s disease. Neurosci Lett. 2007;420:257–62.

  16. 16.

    Curran JE, Johnson MP, Dyer TD, Göring HHH, Kent JW, Charlesworth JC, et al. Genetic determinants of mitochondrial content. Hum Mol Genet. 2007;16:1504–14.

  17. 17.

    Cai N, Li Y, Chang S, Liang J, Lin C, Zhang X, et al. Genetic control over mtDNA and its relationship to major depressive disorder. Curr Biol. 2015;25:3170–7.

  18. 18.

    Grady JP, Murphy JL, Blakely EL, Haller RG, Taylor RW, Turnbull DM, et al. Accurate measurement of mitochondrial DNA deletion level and copy number differences in human skeletal muscle. PLoS One. 2014;9:e114462.

  19. 19.

    González-Vioque E, Torres-Torronteras J, Andreu AL, Martí R. Limited dCTP availability accounts for mitochondrial DNA depletion in mitochondrial neurogastrointestinal encephalomyopathy (MNGIE). PLoS Genet. 2011;7:e1002035.

  20. 20.

    López S, Buil A, Souto JC, Casademont J, Martinez-Perez A, Almasy L, et al. A genome-wide association study in the genetic analysis of idiopathic thrombophilia project suggests sex-specific regulation of mitochondrial DNA levels. Mitochondrion. 2014;18:34–40.

  21. 21.

    Workalemahu T, Enquobahrie DA, Tadesse MG, Hevner K, Gelaye B, Sanchez SE, et al. Genetic variations related to maternal whole blood mitochondrial DNA copy number: a genome-wide and candidate gene study. J Matern Neonatal Med. 2017;30:2433–9.

  22. 22.

    Pyle A, Burn DJ, Gordon C, Swan C, Chinnery PF, Baudouin SV. Fall in circulating mononuclear cell mitochondrial DNA content in human sepsis. Intensive Care Med. 2010;36:956–62.

  23. 23.

    Fraser A, Macdonald-Wallis C, Tilling K, Boyd A, Golding J, Davey Smith G, et al. Cohort Profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int J Epidemiol. 2013;42:97–110.

  24. 24.

    Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, Henderson J, et al. Cohort Profile: the ‘children of the 90s’--the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol. 2013;42:111–27.

  25. 25.

    Jones RW, Ring S, Tyfield L, Hamvas R, Simmons H, Pembrey M, et al. A new human genetic resource: a DNA bank established as part of the Avon Longitudinal Study of Pregnancy and Childhood (ALSPAC). Eur J Hum Genet. 2000;8:653–60.

  26. 26.

    Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78.

  27. 27.

    Nalls MA, Couper DJ, Tanaka TT, van Rooij FJA, Chen M-H, Smith AV, et al. Multiple loci are associated with white blood cell phenotypes. PLoS Genet. 2011;7:e1002113.

  28. 28.

    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

  29. 29.

    O’Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014;10:e1004234.

  30. 30.

    Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.

  31. 31.

    Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11:499–511.

  32. 32.

    Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP, et al. A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics. 2007;23:2741–6.

  33. 33.

    Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.

  34. 34.

    Loh P-R, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet. 2016;48:1443–8.

  35. 35.

    Durbin R. Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT). Bioinformatics. 2014;30:1266–72.

  36. 36.

    Aschebrook-Kilfoy B, Argos M, Pierce BL, Tong L, Jasmine F, Roy S, et al. Genome-wide association study of parity in Bangladeshi women. PLoS One. 2015;10:e0118488.

  37. 37.

    Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10:101.

  38. 38.

    Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327:557–60.

  39. 39.

    Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Patterson N, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47:291–5.

  40. 40.

    Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S, et al. The UK10K project identifies rare variants in health and disease. Nature. 2015;526:82–90.

  41. 41.

    Malik AN, Shahni R, Rodriguez-de-Ledesma A, Laftah A, Cunningham P. Mitochondrial DNA as a non-invasive biomarker: accurate quantification using real time quantitative PCR without co-amplification of pseudogenes and dilution bias. Biochem Biophys Res Commun. 2011;412:1–7.

  42. 42.

    Relton CL, Gaunt T, McArdle W, Ho K, Duggirala A, Shihab H, et al. Data resource profile: accessible resource for integrated epigenomic studies (ARIES). Int J Epidemiol. 2015;44(4):1181–90.

  43. 43.

    Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinf. 2012;13:86.

  44. 44.

    Faul F, Erdfelder E, Lang A-G, Buchner A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007;39:175–91.

  45. 45.

    Cardenas A, Allard C, Doyon M, Houseman EA, Bakulski KM, Perron P, et al. Validation of a DNA methylation reference panel for the estimation of nucleated cells types in cord blood. Epigenetics. 2016;11:773–9.

  46. 46.

    Gieger C, Radhakrishnan A, Cvejic A, Tang W, Porcu E, Pistis G, et al. New gene functions in megakaryopoiesis and platelet formation. Nature. 2011;480:201–8.

  47. 47.

    Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–13.

  48. 48.

    Liu JZ, Tozzi F, Waterworth DM, Pillai SG, Muglia P, Middleton L, et al. Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat Genet. 2010;42:436–40.

  49. 49.

    Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.

  50. 50.

    Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164.

  51. 51.

    Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26:2336–7.

  52. 52.

    Sham PC, Cherny SS, Purcell S, Hewitt JK. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet. 2000;66:1616–30.

  53. 53.

    Purcell S, Cherny SS, Sham PC. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics. 2003;19:149–50.

  54. 54.

    Eberle MA, Rieder MJ, Kruglyak L, Nickerson DA, Gudjonsson S. Allele frequency matching between SNPs reveals an excess of linkage disequilibrium in genic regions of the human genome. PLoS Genet. 2006;2:e142.

  55. 55.

    Xu E, Sun W, Gu J, Chow WH, Ajani JA, Wu X. Association of mitochondrial DNA copy number in peripheral blood leukocytes with risk of esophageal adenocarcinoma. Carcinogenesis. 2013;34:2521–4.

  56. 56.

    Tin A, Grams ME, Ashar FN, Lane JA, Rosenberg AZ, Grove ML, et al. Association between mitochondrial DNA copy number in peripheral blood and incident CKD in the atherosclerosis risk in communities study. J Am Soc Nephrol. 2016;27(8):2467–73.

  57. 57.

    Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards suite: from gene data mining to disease genome sequence analyses. Curr Protoc Bioinforma. 2016;54:1.30.1-1.30.33.

  58. 58.

    Liu F, Lee JY, Wei H, Tanabe O, Engel JD, Morrison SJ, et al. FIP200 is required for the cell-autonomous maintenance of fetal hematopoietic stem cells. Blood. 2010;116:4806–14.

  59. 59.

    Yao J, Jia L, Khan N, Lin C, Mitter SK, Boulton ME, et al. Deletion of autophagy inducer RB1CC1 results in degeneration of the retinal pigment epithelium. Autophagy. 2015;11:939–53.

  60. 60.

    Chano T, Kontani K, Teramoto K, Okabe H, Ikegawa S. Truncating mutations of RB1CC1 in human breast cancer. Nat Genet. 2002;31:285–8.

  61. 61.

    Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016;167:1415–29 e19.

  62. 62.

    Okada Y, Kamatani Y, Takahashi A, Matsuda K, Hosono N, Ohmiya H, et al. Common variations in PSMD3-CSF3 and PLCB4 are associated with neutrophil count. Hum Mol Genet. 2010;19:2079–85.

  63. 63.

    Reiner AP, Lettre G, Nalls MA, Ganesh SK, Mathias R, Austin MA, et al. Genome-wide association study of white blood cell count in 16,388 African Americans: the continental origins and genetic epidemiology network (COGENT). PLoS Genet. 2011;7:e1002108.

  64. 64.

    Lawrenson K, Kar S, McCue K, Kuchenbaeker K, Michailidou K, Tyrer J, et al. Functional mechanisms underlying pleiotropic risk alleles at the 19p13.1 breast–ovarian cancer susceptibility locus. Nat Commun. 2016;7:12675.

  65. 65.

    Sylvester JE, Fischel-Ghodsian N, Mougey EB, O’Brien TW. Mitochondrial ribosomal proteins: candidate genes for mitochondrial disease. Genet Med. 2004;6:73–80.

  66. 66.

    Pyle A, Anugrha H, Kurzawa-Akanbi M, Yarnall A, Burn D, Hudson G. Reduced mitochondrial DNA copy number is a biomarker of Parkinson’s disease. Neurobiol Aging. 2016;38:216.e7–216.e10.

  67. 67.

    Kenmochi N, Suzuki T, Uechi T, Magoori M, Kuniba M, Higa S, et al. The human mitochondrial ribosomal protein genes: mapping of 54 genes to the chromosomes and implications for human disorders. Genomics. 2001;77:65–70.

  68. 68.

    Kopajtich R, Nicholls TJ, Rorbach J, Metodiev MD, Freisinger P, Mandel H, et al. Mutations in GTPBP3 cause a mitochondrial translation defect associated with hypertrophic cardiomyopathy, lactic acidosis, and encephalopathy. Am J Hum Genet. 2014;95:708–20.

  69. 69.

    Xiao R, Boehnke M. Quantifying and correcting for the winner’s curse in genetic association studies. Genet Epidemiol. 2009;33:453–62.

  70. 70.

    Knez J, Winckelmans E, Plusquin M, Thijs L, Cauwenberghs N, Gu Y, et al. Correlates of peripheral blood mitochondrial DNA content in a general population. Am J Epidemiol. 2016;183:138–46.

  71. 71.

    Urata M, Koga-Wada Y, Kayamori Y, Kang D. Platelet contamination causes large variation as well as overestimation of mitochondrial DNA content of peripheral blood mononuclear cells. Ann Clin Biochem. 2008;45:513–4.

  72. 72.

    Hurtado-Roca Y, Ledesma M, Gonzalez-Lazaro M, Moreno-Loshuertos R, Fernandez-Silva P, Enriquez JA, et al. Adjusting MtDNA quantification in whole blood for peripheral blood platelet and leukocyte counts. PLoS One. 2016;11:e0163770.

  73. 73.

    Guo W, Jiang L, Bhasin S, Khan SM, Swerdlow RH. DNA extraction procedures meaningfully influence qPCR-based mtDNA copy number determination. Mitochondrion. 2009;9:261–5.

  74. 74.

    Raschenberger J, Lamina C, Haun M, Kollerits B, Coassin S, Boes E, et al. Influence of DNA extraction methods on relative telomere length measurements and its impact on epidemiological studies. Sci Rep. 2016;6:25398.

Download references

Acknowledgements

We are extremely grateful to all the families who took part in ALSPAC, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses.

The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. DAL is a UK National Health Research Senior Investigator (NF-SI-0611-10196). This publication is the work of the authors, who will serve as guarantors for the contents of this paper.

Funding

The UK Medical Research Council and the Wellcome Trust (Grant ref.: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. Child GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. Mothers GWAS data generated using funding from Wellcome Trust (WT088806). A comprehensive list of grants funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf).

UKBS used genotype data from cases and population controls that were generated by the Wellcome Trust Case Control Consortium 2 (PMID: 17554300, a full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk). The UK Blood Services Collection of Common Controls were funded by The Wellcome Trust.

This research was funded by a Medical Research Council (MRC) grant awarded to Santiago Rodriguez (MR/K002767/1). AG was funded by a Wellcome Trust PhD studentship (102433/Z/13/Z). RB was in receipt of a Wellcome Trust Centre for Mitochondrial Research PhD Studentship (Grant ref: 096919/Z/11/Z), as well as funding from The Barbour Foundation. GH is a PDUK Senior Research Fellow (Grant ref: F-1202) and receives funding from the Wellcome Trust Centre for Mitochondrial Research at Newcastle University (Grant ref: 096919/Z/11/Z). Work was carried out in the MRC Integrative Epidemiology Unit at the University of Bristol (MC_UU_12013/5 and MC_UU_12013/8), and at the Wellcome Trust Centre for Mitochondrial Research at Newcastle University. Additional support for study recruitment and sample collection was provided by the British Heart Foundation (SP/07/008/24066), UK Medical Research Council (G1001357), and Wellcome (WT092830M and WT088806). PFC is a Wellcome Trust Senior Fellow in Clinical Science (101876/Z/13/Z), and a UK NIHR Senior Investigator, who receives support from the Medical Research Council Mitochondrial Biology Unit (MC_UP_1501/2), the Medical Research Council (UK) Centre for Translational Muscle Disease (G0601943), and the National Institute for Health Research (NIHR) Biomedical Research Centre based at Cambridge University Hospitals NHS Foundation Trust and the University of Cambridge.

Availability of data and materials

The datasets generated and analysed during the current study are available on application to the ALSPAC and UKBS cohorts by bona fide researchers.

Author information

All authors contributed to the critical appraisal and revision of the manuscript. PAIG, RRB, KB, and ALG undertook laboratory assays. ALG, RRB, KB, and PAIG analysed the data. SR, GH, DAL, PFC, TRG, HJC, AP, and SMR supervised the work. SR, GH, PFC, and RA provided the funding. All authors read and approved the final manuscript.

Correspondence to Santiago Rodriguez.

Ethics declarations

Ethics approval and consent to participate

The ALSPAC study has ethical approval. Specific ethical approval for this particular project was not required, but permission to undertake the research was granted by the ALSPAC executive committee.

Consent for publication

Not applicable.

Competing interests

TRG reports funding from Sanofi, Biogen, and GlaxoSmithKline for projects unrelated to the work presented in this manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Appendices 1-3. (PDF 343 kb)

Additional file 2:

Supplementary Figures. (PDF 1250 kb)

Additional file 3:

Supplementary Tables. (XLSX 42 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Keywords

  • Mitochondrial DNA
  • ALSPAC
  • Genome-wide association study
  • Complex traits
  • Genetic epidemiology