Cross-comparison of the genome sequences from human, chimpanzee, Neanderthal and a Denisovan hominin identifies novel potentially compensated mutations

The recent publication of the draft genome sequences of the Neanderthal and a ~50,000-year-old archaic hominin from Denisova Cave in southern Siberia has ushered in a new age in molecular archaeology. We previously cross-compared the human, chimpanzee and Neanderthal genome sequences with respect to a set of disease-causing/disease-associated missense and regulatory mutations (Human Gene Mutation Database) and succeeded in identifying genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species. Here, in an attempt to identify further 'potentially compensated mutations' (PCMs) of interest, we have compared our dataset of disease-causing/disease-associated mutations with their corresponding nucleotide positions in the Denisovan hominin, Neanderthal and chimpanzee genomes. Of the 15 human putatively disease-causing mutations that were found to be compensated in chimpanzee, Denisovan or Neanderthal, only a solitary F5 variant (Val1736Met) was specific to the Denisovan. In humans, this missense mutation is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this ancient hominin.


Introduction
The recent publication of the draft sequence of the Neanderthal genome 1 ushered in a new age in molecular archaeology. 2,3 This achievement was followed closely by the publication of the draft genome sequence (1.9-fold coverage) of a 50,000-year old archaic hominin from Denisova Cave in southern Siberia. 4 This hominin (a 'Denisovan') is thought to have been a member of a sister group of hominins to the Neanderthals with whom they lived sympatrically during the Upper Pleistocene. 4 -7 Denisovans appear to be more closely related to Neanderthals than humans, having diverged from Neanderthals about 640,000 years ago and from extant Africans about 804,000 years ago. 4 Access to DNA sequence data from ancient hominins not only promises to revolutionise our knowledge of hominin relationships, but is also potentially informative in the context of exploring the molecular basis of human genetic disease. 8,9 We have previously cross-compared the human, chimpanzee and Neanderthal genome sequences with a set of disease-causing/disease-associated missense and regulatory mutations in order to identify genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species ('potentially compensated mutations' [PCMs]). 10 PCMs correspond to variants that may have been deleterious for a certain period of evolutionary time but which persisted long enough in a given population or species to have become positively selected upon the introduction of a 'compensatory' nucleotide change. 8,11 -14 Such compensatory changes are thought to be localised in the same gene as the PCM. 15 Not only do PCMs represent excellent candidates for recent population-specific selection (with different alleles having exhibited differential functional importance in different environments), but they may also furnish us with new insights into the genetic basis of susceptibility to common diseases. 8,14 Here, in an attempt to identify further PCMs of interest, we have compared a dataset of human mutations of putative pathological significance with their corresponding nucleotide positions in the Neanderthal, Denisovan and chimpanzee genomes.

Methods
Human Gene Mutation Database (HGMD) dataset A total of 46,060 disease-causing (DMs) or disease-associated mutations had been obtained from the HGMD 16 (http://www.hgmd.org) as of 13th May 2010. These data comprised 44,348 missense mutations from within the coding regions of 2,628 genes, and 1,712 single base-pair substitutions from within the regulatory regions (5 0 and 3 0 untranslated/flanking regions) of 807 genes. Some 42,595 of the mutations were disease-causing (41,960 missense and 635 regulatory), whereas 3,465 represented disease-associated or functional polymorphisms (2,388 missense and 1,077 regulatory) ( Table 1). The latter were further ascribed to three distinct subcategories: (1) DPs, comprising variants reported to be in statistically significant ( p , 0.05) association with a particular human disease state but lacking experimental evidence of functionality -for example, from expression studies; (2) disease-associated polymorphisms with experimental evidence of functionality (DFPs) such as, for example, altered in vitro gene expression or protein function; (3) FPs that have been shown in vitro or in vivo to affect the structure, function or expression of the gene or gene product but for which no statistically significant disease association has yet been reported (see http://www.hgmd.cf.ac. uk/docs/poly.html for further information).

Identification of PCMs
A total of 8,280,851 nucleotide positions at which the Denisovan genome differs from either the human (NCBI36/hg18) or chimpanzee genome were downloaded from the website of the Max Planck Institute for Evolutionary Anthropology (http://bioinf.eva. mpg.de/download/DenisovaGenome/Denisova_ Neandertal_catalog.tgz). 1,4 The human and the Denisovan hominin were found to exhibit the same nucleotide at 7,283,268 positions (87.95 per cent), so that the human -chimpanzee mismatches must have arisen before the divergence of modern 'Human': The Denisovan nucleotide, Neanderthal nucleotide and chimpanzee nucleotide were identical to a human DM/disease-associated mutation; 'Neanderthal': The Neanderthal nucleotide was identical to the human DM/disease-associated mutation, whereas both the chimpanzee nucleotide and the Denisovan nucleotide were identical to the human wild-type nucleotide; 'Denisovan': The Denisovan nucleotide was identical to the human DM/disease-associated mutation, whereas both the chimpanzee nucleotide and the Neanderthal nucleotide were identical to the human wild-type nucleotide; 'Ancient': Both the Denisovan nucleotide and the Neanderthal nucleotide were identical to the human DM/disease-associated P mutation, whereas the chimpanzee nucleotide was identical to the human wild-type nucleotide. 'Chimpanzee': The chimpanzee nucleotide was identical to the human DM/disease-associated mutation, whereas both the Neanderthal nucleotide and the Denisovan nucleotide were identical to the modern human wild-type nucleotide. 'Denisovan and chimpanzee': Both the Denisovan nucleotide and the chimpanzee nucleotide were identical to the human DM/disease-associated mutation, whereas the Neanderthal nucleotide was identical to the human wild-type nucleotide; 'Neanderthal and chimpanzee': Both the Neanderthal nucleotide and the chimpanzee nucleotide were identical to the human DM/disease-associated mutation, whereas the Denisovan nucleotide was identical to the human wild-type nucleotide. Under coding sequence, 'a/b' means that there were a total number of 'b' mutations, of which 'a' were non-synonymous mutations (there were some synonymous mutations within the coding sequence; eg CM068190, CM077900). PCM, potentially compensated mutations; DM, disease-causing mutation; DP, disease-associated polymorphism with functional evidence; FP, polymorphism with functional evidence but lacking a reported disease association as yet; DFP, disease-associated polymorphism with functional evidence.
Denisovan, Neanderthal or chimpanzee was logged in the HGMD as disease causing or disease associated in modern humans (Table 2). From the remaining 3,075,115 sites, we identified 117 sites for which the apparent wild-type nucleotide in the Denisovan or chimpanzee was logged in the HGMD as disease causing or disease associated in either the Denisovan or chimpanzee (Table 3).
Gene ontology (GO) enrichment analysis A GO enrichment analysis of PCM-containing genes against a background of 2,688 human disease-causing genes was performed using the DAVID bioinformatics tool. 17 The statistical significance of a particular GO term was calculated using Fisher's exact test, which was then adjusted to allow for multiple testing by means of the Benjamini-Hochberg correction. 18

Calculation of Wright's fixation index (F ST ) values
The F ST measures the proportion of genetic diversity in a subdivided population that is attributable to allele frequency differences between subpopulations. Pairwise F ST values have also been used as a measure of genetic distance between populations. In this context, the allele frequencies of polymorphic ancestral PCMs in selected populations were obtained from HapMap (http://hapmap.ncbi. nlm.nih.gov/) and pairwise F ST values were estimated for each polymorphism using the small sample estimate proposed by Weir and Hill. 19 The significance of individual F ST values was then assessed by reference to the empirical distribution of F ST among all single nucleotide polymorphisms (SNPs) in HapMap.

Results and discussion
Identification of PCMs in the Denisovan, Neanderthal and/or chimpanzee genomes A total of 44,348 missense mutations from 2,628 genes and 1,712 putative regulatory mutations from 807 genes, which have been recorded in the HGMD as being either causative of (or associated with) a human inherited disease state, were crosscompared with the corresponding nucleotide positions in the Neanderthal, Denisovan and chimpanzee genomes. When the 197 PCMs covered by both the Denisovan and the Neanderthal sequences were considered, these included 129 of 143 PCMs identified in the Neanderthal genome (10/12 DMs, 65/ 73 DPs, 25/26 FPs, 29/32 DFPs), and 123 (62 per cent) PCMs for which the Denisovan, Neanderthal and chimpanzee wild-type nucleotides were identical to the human disease-causing/disease-associated mutant allele. Of the 117 PCMs covered only by the Denisovan sequence, there were 79 (67.5 per cent) for which both the Denisovan nucleotide and the chimpanzee nucleotide were identical to a human DM/disease-associated mutation. This may be indicative of either a bottleneck effect or selection during the evolution of the modern human lineage. Of the 197 PCMs, there was one mutation  that was compensated only in the Neanderthal, one that was compensated only in the Denisovan, five that were compensated in both Neanderthal and Denisovan and 16 that were compensated only in the chimpanzee. There were also 18 mutations that differed between the Neanderthal and the Denisovan, which could imply that such mutations were identical-by-state (Tables 2 and 3).

Disease-causing PCMs
There were 16 human DMs that were found to be potentially compensated in the chimpanzee, Denisovan or Neanderthal (covered by both the Neanderthal and the Denisovan sequence) and 12 human DMs potentially compensated in the chimpanzee or Denisovan (covered only by the Denisovan sequence) ( Table 4).
Of the human DMs that were potentially compensated in the chimpanzee, Denisovan or Neanderthal, only the putatively pathological F5 variant was specific to the Denisovan. In humans, this missense mutation, Val1736Met, is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage. 20,21 It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this archaic hominin.
Even though Denisovans appear to be more closely related to Neanderthals than humans, the Neanderthal and Denisovan were discrepant with respect to certain PCMs (eg the SLC5A1 H615Q variant associated with glucose -galactose malabsorption). In this case, the Denisovan (and the chimpanzee) possessed the allele that was mutant in humans (G), whereas the Neanderthal possessed the allele (C) which was wild-type in humans. In this context, it may be pertinent to mention that SLC5A1 is located on chromosome 22q12.3 within a region of putative gene flow from Neanderthal to Eurasian. 1 Some of the PCMs listed in Table 4 may well have been misclassified by the original authors as disease-causing in human (especially those variants which have been allocated a '?' by the HGMD; see Potentially compensated mutations in the genome sequences from chimpanzee, Neanderthal and Denisovan PRIMARY RESEARCH Table 4) when they were actually neutral polymorphisms; however, this is much less likely in the case of the 16 human disease-causing mutations that are covered by both the Neanderthal and Denisovan sequences. These mutant alleles would have had to have been maintained in both Neanderthal and Denisovan populations for 640,000 years, when these two hominins last shared a common ancestor, and this would have been unlikely if such variants had been neutral polymorphisms.
Statistically enriched GO terms were identified for genes containing human DMs identified as PCMs (Table 4) against a background of known diseasecausing genes (from the HGMD) and are shown in Table S1. Five significantly enriched GO terms were found; all relate to the plasma membrane.
With respect to the DPs/FPs, 100 DPs, 39 FPs and 43 DFPs were covered by both the Neanderthal and Denisovan sequences (Table S2), while 52 DPs, 26 FPs and 27 DFPs were covered by the Denisovan but not the Neanderthal sequence (Table S3); these DPs/FPs may be relevant to human genetic disease.
Human variants with significantly different population frequencies at sites of PCMs The F ST was used to quantify the allele frequency differences for the different polymorphic PCMs between extant African, Asian and European populations. Alleles that have been the target of localised positive selection tend to exhibit unusually high F ST values. 22 (Table 5).
Although four of these PCMs had already been identified in our previous comparative analysis of the human, chimpanzee and Neanderthal genomes, 10 two novel PCMs were identified in the putative cation exchanger SLC24A5 (DP) gene and in the alcohol dehydrogenase ADH1B (FP) gene. These genes have in common the GO terms GO:0046872, GO:0043169 and GO:0043167, terms which relate to metal ion binding, cation binding and ion binding, respectively. The SLC24A5 variant appears to be associated with increased skin pigmentation and predominates in African/East Asian populations. 25,26 In conclusion, using the newly reported genome sequence from a Denisovan hominin, we have identified a number of PCMs in the chimpanzee, Neanderthal and Denisovan. Those human PCMs that were ancestral (ie both the Denisovan nucleotide and the chimpanzee nucleotide were identical to the human DM/disease-associated mutation) could potentially be indicative of either the human lineage-specific loss of compensatory nucleotide changes within the respective genes carrying the PCM, or adaptive differences between modern humans and Denisovans.