To analyse the gene-longevity association, two datasets are needed: the genotype data and the univariate survival data for the individuals involved in the study. To improve the accuracy and power of the study, the longevity data for twins can additionally be analysed. Denoting longevity and non-longevity alleles at an autosomal locus by *a* and *A*, respectively, assume that the frequencies *P*_{
g
} of genotypes *AA*, *aA* or *Aa*, and *aa* at the moment of birth are *P*_{
AA
}, *P*_{
Aa
} and *P*_{
aa
}, respectively. If the Hardy-Weinberg equilibrium holds, then *P*_{
AA
} = (1-*P*_{
a
})^{2}, *P*_{
Aa
} = 2*P*_{
a
}(1-*P*_{
a
}) and {P}_{aa}={P}_{a}^{2}, where *P*_{
a
} is the frequency of the allele *a* at the moment of birth. We parameterise *P*_{
a
} as follows:

{P}_{a}=1-1/\left(1+{e}^{v+\delta x+R\phi \left(x,{x}_{0}\right)}\right),x=T-t.

(1)

In accordance with (1), the logit of *P*_{
a
} is a linear function of unknown parameters *R*, *ν* and *δ* with domain of definition **R**^{3}. This parameterisation includes the sudden change in the allele frequency by the value *Rφ*(*x, x*_{0}) in the cohort *T* - *x*_{0} and the slow linear cohort effect *ν + δx* of the allele frequency. Here, *T* stands for the year of data collection, *x* for the age, and *t* for the cohort. We assume that the value of *x*_{0} is known. The step function *φ*(*x*, *x*_{0}) is defined by the interval equations *φ*(*x*, *x*_{0}) = 1 for 0 ≤ *x* ≤ *x*_{0} and *φ*(*x*, *x*_{0}) = 0 for *x* >*x*_{0}.

To estimate the genotype frequencies for twin pairs, we need to calculate the bivariate survival functions. One possible approach to doing so is to use the correlated gamma-frailty model, which provides simple analytical expressions for the bivariate survival functions [7]. Assume that that individual's instantaneous risk of death *μ* for genotype *g*∈{*aa, Aa, AA*} at age *x*, as measured by the hazard of mortality, is *μ*(*x*, *Z*, *g*) = *Zμ*_{0, g}(*x*), where *Z* is the gamma distributed frailty (non-observed risk of mortality) with mean 1 and variance *σ*^{2}, and *μ*_{0, g}(*x*) is the baseline hazard. The univariate survival function {S}_{g}\left(x\right)=E{e}^{-Z{H}_{g}\left(x\right)}={\left(1+{\sigma}^{2}{H}_{g}\left(x\right)\right)}^{-1/{\sigma}^{2}} is the Laplace transform for the gamma probability density function at the point {H}_{g}\left(x\right)={\int}_{0}^{x}{\mu}_{0,g}\left(t\right)dt (cumulative hazard function). For related individuals, we assume that life spans *T*_{1} and *T*_{2} are conditionally independent, given frailties *Z*_{1}, *Z*_{2} and genotypes *g*_{1}, *g*_{2}. In general, frailties *Z*_{1} and *Z*_{2} have unequal variances. Below, we shall assume, for simplicity, that *Z*_{1} and *Z*_{2} are identically distributed. If *Corr*(*Z*_{1}, *Z*_{2}) = *ρ*, *E*(*Z*_{1}) = *E*(*Z*_{2}) = 1 and *Var*(*Z*_{1}) = *Var*(*Z*_{2}) = *σ*^{2}, then:

\begin{array}{c}P\left\{{T}_{1}>{x}_{1},\phantom{\rule{0.3em}{0ex}}{T}_{2}>{x}_{2}\right\}={S}_{g1,\phantom{\rule{0.3em}{0ex}}g2}\left({x}_{1},\phantom{\rule{0.3em}{0ex}}{x}_{2}\right)\hfill \\ \phantom{\rule{1em}{0ex}}=\frac{{S}_{g1}{\left({x}_{1}\right)}^{1-\rho}{S}_{g2}{\left({x}_{2}\right)}^{1-\rho}}{{\left({S}_{g1}{\left({x}_{1}\right)}^{-{\sigma}^{2}}+{S}_{g2}{\left({x}_{2}\right)}^{-{\sigma}^{2}}-1\right)}^{\rho /{\sigma}^{2}}}\hfill \end{array}

(2)

Here, *S*_{g1, g2}(*x*_{1}, *x*_{2}) is the bivariate survival function at ages *x*_{1} and *x*_{2} for twins with genotypes *g*_{1} and *g*_{2}, respectively. We relate cumulative hazard functions with some unknown function *H*_{0}(*x*) as follows:

{H}_{g}\left(x\right)={c}_{g}x+{a}_{g}{H}_{0}{\left(x\right)}^{{b}_{g}}

(3)

with unknown *a*_{
g
} ≥ 0, *b*_{
g
} ≥ 0 and *c*_{
g
} ≥ 0. Such parameterisation, where cumulative hazards *H*_{
g
}(*x*) rather than survival functions *S*_{
g
}(*x*) for different genotypes are parametrically related (eg {S}_{g}\left(x\right)={S}_{0}{\left(x\right)}^{{b}_{g}}, is more flexible and allows us to detect the antagonistic pleiotropic effect [6]. Without loss of generality, we can assume that *a*_{
AA
} = *b*_{
AA
} = 1.

For univariate and bivariate survival functions in the whole population, it holds that:

\begin{array}{c}S\left(x\right)=\sum _{g}{P}_{g}{S}_{g}\left(x\right),\hfill \\ {S}^{MZ}\left({x}_{1},{x}_{2}\right)=\sum _{g,g}{P}_{g,g}^{MZ}{S}_{g,g}^{MZ}\left({x}_{1},{x}_{2}\right)\hfill \\ {S}^{DZ}\left({x}_{1},{x}_{2}\right)=\sum _{g1,g2}{P}_{g1,g2}^{DZ}{S}_{g1,g2}^{DZ}\left({x}_{1},{x}_{2}\right)\hfill \end{array}

(4)

Here, *P*_{
g
}, {P}_{g,g}^{MZ} and {P}_{g1,g2}^{DZ} are the univariate and the bivariate genotype frequencies for monozygotic (MZ) and dizygotic (DZ) twin pairs, respectively, at the moment of birth. Since the frailty correlation *ρ* can be different for MZ and DZ twins, we use the upper index MZ or DZ in the notation for bivariate survival. Given univariate survival *S*(*x*) and parameters, we calculate the baseline cumulative hazard *H*_{0}(*x*) using the simple bisectional procedure [6]. For univariate genotype frequencies, we will use the values given above. To calculate the bivariate genotype frequencies, note that for MZ twin pairs, *g*_{1} = *g*_{2} = *g* and {P}_{g,g}^{MZ}={P}_{g}. Assuming independent transmission of the maternal and paternal alleles to the offspring, we can use the standard formulae for DZ twin pairs:

\begin{array}{c}{P}_{aa,aa}^{DZ}={P}_{aa}^{2}+\left(1/2\right){P}_{aa}{P}_{Aa}+\left(1/16\right){P}_{Aa}^{2}\hfill \\ {P}_{aa,Aa}^{DZ}={P}_{Aa,aa}^{DZ}=\left(1/2\right){P}_{aa}{P}_{Aa}+\left(1/8\right){P}_{Aa}^{2}\hfill \\ {P}_{aa,AA}^{DZ}={P}_{AA,aa}^{DZ}=\left(1/16\right){P}_{Aa}^{2}\hfill \\ {P}_{Aa,AA}^{DZ}={P}_{AA,Aa}^{DZ}=\left(1/2\right){P}_{Aa}{P}_{AA}+\left(1/8\right){P}_{Aa}^{2}\hfill \\ {P}_{Aa,Aa}^{DZ}=\left(1/2\right){P}_{aa}{P}_{Aa}+2{P}_{aa}{P}_{AA}\hfill \\ \phantom{\rule{0.2em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+\left(1/2\right){P}_{Aa}{P}_{AA}+\left(1/4\right){P}_{Aa}^{2}\hfill \\ {P}_{AA,AA}^{DZ}={P}_{AA}^{2}+\left(1/2\right){P}_{AA}{P}_{Aa}+\left(1/16\right){P}_{Aa}^{2}\hfill \end{array}

(5)

The frequencies {\pi}_{g,g}^{MZ}\left(x\right) and {\pi}_{g1,g2}^{DZ}\left(x\right) of the genotype (*g*, *g*) and (*g*_{1}, *g*_{2}) at any age *x* for MZ and DZ twin pairs can be calculated from the formulae:

{\pi}_{g,g}^{MZ}\left(x\right)=\frac{{P}_{g}{S}_{g,g}^{MZ}\left(x,x\right)}{{S}^{MZ}\left(x,x\right)}

(6)

{\pi}_{g1,g2}^{DZ}\left(x\right)=\frac{{P}_{g1,g2}^{DZ}{S}_{g1,g2}^{DZ}\left(x,x\right)}{{S}^{DZ}\left(x,x\right)}

(7)

Assuming that the variance *σ*^{2} does not depend on genotype and zygosity, we have the following unknown vector parameter:

\begin{array}{c}\theta =(R,\delta ,\nu ,{a}_{aa},{a}_{aA+Aa},{b}_{aa},{b}_{aA+Aa},{c}_{aa},{c}_{aA+Aa},\hfill \\ \phantom{\rule{0.2em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{c}_{AA},{\sigma}^{2},{\rho}_{MZ},{\rho}_{DZ}).\hfill \end{array}

Here, *ρ*_{
MZ
} and *ρ*_{
DZ
} are the frailty correlations for MZ and DZ twins. We estimate unknown vector parameter *θ* maximising the likelihood function:

Li{k}_{g}=\left(\prod _{i=1}^{{N}_{g}^{MZ}}{\pi}_{{g}_{i},{g}_{i}}^{MZ}\left({x}_{i},\theta \right)\right)\left(\prod _{i=1}^{{N}_{g}^{DZ}}{\pi}_{{g}_{i1,\phantom{\rule{0.3em}{0ex}}i2}}^{DZ}\left({x}_{i},\theta \right)\right)

(8)

(the maximum likelihood estimates [MLE]), where *x*_{
i
} is the age of twin pair *i* at the moment of data collection, {N}_{g}^{MZ} and {N}_{g}^{DZ} are the observed numbers of MZ and DZ twin pairs in the genetic dataset (twin pairs with known genotypes and ages), respectively. To choose the optimal model, we can use the likelihood ratio test for nested models and either the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) for non-nested models. Under the null hypothesis, we assume that *a*_{
aa
} = *a*_{
Aa
} = *b*_{
aa
} = *b*_{
Aa
} = 1 and *c*_{
aa
} = *c*_{
Aa
} = *c*_{
AA
} = 0. Significant deviation from this hypothesis can indicate a gene-longevity association.

If, in addition to genetic data, the data on the longevity of related individuals such as twins are also available, we can use this information to improve the accuracy of statistical estimates and to increase the power. Denote the life spans of the twin pair *i* in the demographic dataset by (*x*_{i1}, *x*_{i2}), where I=1,\dots ,{N}_{d}^{MZ} for MZ twin pairs and I=1,\dots ,{N}_{d}^{DZ} for DZ twin pairs. We assume that twin pairs in the sample are chosen at random and that all twins are deceased. Although the censored data are less informative than non-censored data, they can be also included in the analysis. The bivariate probability density function for a twin pair with longevities *x*_{i1 }and *x*_{i2 }can be calculated as follows:

\begin{array}{c}\frac{{\partial}^{2}{S}^{j}\left({x}_{i1},{x}_{i2}\right)}{\partial {x}_{i1}\partial {x}_{i2}}=\sum _{g1,g2}{P}_{g1,g2}^{j}\frac{{S}_{g1,g2}^{j}\left({x}_{i1},{x}_{i1}\right)\partial {S}_{g1}\left({x}_{1}\right)}{{S}_{g1}\left({x}_{1}\right){S}_{g2}\left({x}_{2}\right)\partial S\left({x}_{1}\right)}\hfill \\ \times \frac{\partial {S}_{g2}\left({x}_{2}\right)}{\partial S\left({x}_{2}\right)}\frac{\partial S\left({x}_{1}\right)}{\partial {x}_{1}}\frac{\partial S\left({x}_{2}\right)}{\partial {x}_{2}}\hfill \\ \times \left(1-{\rho}_{j}^{2}\right.+\frac{{\rho}_{j}\left(1-{\rho}_{j}\right)\left({S}_{g1}{\left({x}_{1}\right)}^{-{\sigma}^{2}}+{S}_{g2}{\left({x}_{2}\right)}^{-{\sigma}^{2}}\right)}{\left({S}_{g1}{\left({x}_{1}\right)}^{-{\sigma}^{2}}+{S}_{g2}{\left({x}_{2}\right)}^{-{\sigma}^{2}}-1\right)}\hfill \\ \left(\right)close=")">+\frac{{\rho}_{j}\left({\rho}_{j}+{\sigma}^{2}\right){S}_{g1}{\left({x}_{1}\right)}^{-{\sigma}^{2}}{S}_{g2}{\left({x}_{2}\right)}^{-{\sigma}^{2}}}{{\left({S}_{g1}{\left({x}_{1}\right)}^{-{\sigma}^{2}}+{S}_{g2}{\left({x}_{2}\right)}^{-{\sigma}^{2}}-1\right)}^{2}}\hfill \end{array}\n

(9)

with *j* = *MZ*, *DZ*. We can write the likelihood function for the demographic dataset in the form:

\begin{array}{c}Li{k}_{d}=\left(\prod _{i=1}^{{N}_{d}^{MZ}}\frac{{\partial}^{2}{S}^{MZ}\left({x}_{i1},{x}_{i2}\right)}{\partial {x}_{i1}\partial {x}_{i2}}\left({x}_{i1},{x}_{i2},\phantom{\rule{0.3em}{0ex}}\theta \right)\right)\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\times \left(\prod _{i=1}^{{N}_{d}^{DZ}}\frac{{\partial}^{2}{S}^{DZ}\left({x}_{i1},{x}_{i2}\right)}{\partial {x}_{i1}\partial {x}_{i2}}\left({x}_{i1},{x}_{i2},\phantom{\rule{0.3em}{0ex}}\theta \right)\right)\end{array}

(10)

Now, unknown parameters can be found through maximising the joint likelihood function *Lik*_{
g
} × *Lik*_{
d
}.