Exploring the potential relevance of human-specific genes to complex disease

Although human disease genes generally tend to be evolutionarily more ancient than non-disease genes, complex disease genes appear to be represented more frequently than Mendelian disease genes among genes of more recent evolutionary origin. It is therefore proposed that the analysis of human-specific genes might provide new insights into the genetics of complex disease. Cross-comparison with the Human Gene Mutation Database (http://www.hgmd.org) revealed a number of examples of disease-causing and disease-associated mutations in putatively human-specific genes. A sizeable proportion of these were missense polymorphisms associated with complex disease. Since both human-specific genes and genes associated with complex disease have often experienced particularly rapid rates of evolutionary change, either due to weaker purifying selection or positive selection, it is proposed that a significant number of human-specific genes may play a role in complex disease.

Human 'disease genes' have been known for some time to differ significantly from 'non-disease genes' in terms of their higher degree of evolutionary conservation. 1 -3 Further, with respect to their evolutionary age, human disease genes appear not to be simply a random subset of all genes in the genome but are instead biased toward being of ancient (early metazoan) origin. 4 Concomitantly, a pronounced paucity of human lineage-specific genes is also evident among disease genes. 4 These initial findings were subsequently confirmed and elaborated upon by Cai et al., 5 who determined the approximate age of evolutionary emergence of all human genes and then proceeded to compare disease genes with nondisease genes with respect to whether they were 'young', 'middle-aged' or 'old-aged'. For the purposes of their study, the origin of a given gene was determined by retracing its orthologues back to the species most distantly related to human. Genes that originated during the period since the adaptive radiation of the Laurasiatheria were described as 'young', the term 'middle-aged' was employed to describe those genes whose origin went back to the bony fish, and genes that emerged at some stage between yeast and Ciona (a tunicate) were ascribed the term 'old-aged'. Using these fairly crude descriptors of gene age, Cai et al. 5 confirmed that there was a tendency for Mendelian disease genes (ie those genes underlying single gene disorders) to be of more ancient evolutionary origin than nondisease genes. With Mendelian disease genes, the 'old-aged' genes were in the majority, closely followed by the 'middle-aged' genes. By contrast, most genes involved in the aetiology of complex disease were found to reside in the 'middle-aged' category. Although both Mendelian and complex disease genes were found to be under-represented in the 'young' category, the frequency of complex disease genes in this category was found to be more than twice that exhibited by the Mendelian disease genes. 5 Given this finding, we speculated that closer examination of the most recently acquired (ie human-specific) genes might well provide new insights into the genetics of complex disease.
Considerable efforts have been made to identify those genes that have been inactivated in the human lineage but which are still present in other higher primate species, including chimpanzee. 6 -9 However, somewhat less attention has been paid so far either to human genes of relatively recent origin that are specific to the human lineage 10,11 or to those genes that have been retained in the human genome despite having been lost in other primate species. 12 The first reported attempts to identify putative human-specific gene duplications were those of Fortna et al. 13 and Cheng et al. 14 The dataset of Cheng et al. comprised 88 complete gene duplications that were considered to have occurred since the divergence of human and chimpanzee. Of these, it was found that 13 have been reported in association with human inherited disease (see Human Gene Mutation Database (HGMD) 15 ). These are the genes encoding Fc fragment of IgG, high affinity Ia, receptor (FCGR1A), DEAD box 1 protein (DDX11), cholinergic nicotinic acetylcholine receptor, alpha 7 subunit, exons 5-10 and family with sequence similarity 7A, exons A -E fusion (CHRFAM7A), aryl sulphotransferase (SULT1A1), CC chemokine ligand 4-like 1 (CCL4L1), killer cell immunoglobulin-like receptor, three domains, long cytoplasmic tail, 1 (KIR3DL1), mammalian STE20-like kinase 1 (MST1), dopamine receptor D5 (DRD5), succinate dehydrogenase complex (SDHA), survivor motor neurone 2 (SMN2), general transcription factor 2-I repeat domain-containing protein 2 (GTF2IRD2), neutrophil cytosolic factor 1 (NCF1), and aquaporin 7 (AQP7). Despite Cheng et al. 14 presenting expression data to support their claim that their 88 gene duplications involved functional duplicated gene copies, however, a question mark has remained over whether or not some of these genes may actually represent pseudogenes.
Probably, the most reliable dataset of humanspecific gene duplications so far produced is that of Itan et al. 16 These workers identified 138 humanspecific complete gene duplications that appear to have occurred since the divergence of human and chimpanzee. It was found that four of these human-specific genes are listed in the HGMD as having been reported in association with human inherited disease (Table 1). Indeed, these genes have been shown to harbour a number of different disease-causing mutations (DMs; including missense mutations and copy number variations) or disease-associated polymorphisms that may confer increased risk of a given disease state. In two of the four cases, the reported disease association was between a disease-associated polymorphism and a complex disease phenotype (ie susceptibility to infectious disease).
Intuitively, it might be supposed that those human genes that are present in more than one copy, as a consequence of a human lineage-specific increase in copy number, are better protected against the consequences of mutation in their functional parent genes by virtue of their newly acquired genetic redundancy. 27,28 Using fairly stringent selection criteria, at least 27 human genes have so far been identified as having experienced a human lineage-specific increase in copy number. 13,29 -32 Contrary to expectation, nine of these genes (AQP7), cadherin 12, type 2 (CDH12), CHRFAM7A, DRD5, FCGR1A, GTF2IRD2, neuronal apoptosis inhibitory protein (NAIP), NCF1, and occludin (OCLN) are listed in the HGMD, although only five of them (FCGR1A, GTF2IRD2, NAIP, NCF1, OCLN) have been reported to harbour mutations that actually cause inherited disease (ie DMs; Table 2). It would be of considerable interest to ascertain (retrospectively) whether the particular patients in whom these mutations were described do indeed harbour extra functional copies of the relevant genes or whether perhaps these mutations have come to clinical attention precisely because the disease genes are effectively single copy in these particular individuals. 27,28 An alternative explanation could, however, involve the loss of genetic redundancy through the post-duplication functional divergence of the gene copies leading to diversification through sub-or neo-functionalisation. 42 -45 It is premature to speculate as to which of these postulates might provide an explanation for the observation that five of the nine genes listed in Table 2 can harbour clinically important mutations.
Another approach to this whole issue is to identify a set of human genes whose orthologues have been lost from the chimpanzee genome and then to ascertain how many of these genes are known to be involved in human inherited disease. The initial analysis of the chimpanzee genome (Chimpanzee Sequencing and Analysis Consortium, 2005) identified a total of 55 genes that are present in human but have been lost (or irrevocably disrupted) in chimpanzee. It should be noted that these genes may not necessarily be bona fide human-specific genes, since although they are absent from the chimpanzee genome, they may still be present in the genomes of other higher primates. This notwithstanding, cross-checking with the HGMD revealed that eight of these genes are known to be associated with either human inherited disease or disease susceptibility (Table 3). In seven of these eight cases, the reported disease association was between a disease-associated polymorphism and a complex disease phenotype (susceptibility to infectious disease or autoimmune disease). Although the numbers involved are clearly small, it would appear that there is at least a tendency for human-specific genes to harbour a greater-than-expected number of examples of polymorphisms associated with complex disease by comparison with the number of mutations causing Mendelian disease (bearing in mind that disease-associated polymorphisms constitute only a small minority of the lesions logged in the HGMD; see legend to Table 3).
Another intriguing finding emerges if Tables 1-3 are considered together: NAIP, OCLN and SMN2 are all located at 5q13.2; GTF2IRD2 and NCF1 are both located at 7q11.23, whereas butyrophilin-like 2 (BTNL2), HLA complex P5 (HCP5) and MHC class I polypeptide-related sequence A (MICA) are all located at 6p21.3. The most likely explanation for the clustering of the human-specific genes at 5q13.2 and 7q11.23 is that both of these chromosomal regions became duplicated specifically during the human lineage. On the other hand, the clustering of Abbreviations: Chrom. loc., chromosomal localisation; CNV, copy number variant; dbSNP, the Single Nucleotide Polymorphism database; DP, disease-associated polymorphism in statistically significant association with a particular disease state but lacking experimental evidence of functionality. Table 2. Human genes identified as having experienced a human lineage-specific increase in copy number and which harbour mutations causing, or associated with, inherited disease. The nine genes listed were taken from a total of 27 human genes identified as having experienced a human lineage-specific increase in copy number. 13  Exploring the potential relevance of human-specific genes to complex disease REVIEW the BTNL2, HCP5 and MICA genes at 6p21.3 is likely to be due to the loss of this chromosomal region in chimpanzee. The clustering of humanspecific genes at 5q13.2 has been noted previously. 13 The present data, however, suggest that at least two additional genomic regions (7q11.23 and 6p21.3) contain clusters of human-lineage specific genes.
Despite the above cited examples, there still appear to be relatively few 'young' (humanspecific) genes among human disease genes. It may therefore be inferred that only a few young genes perform functions of sufficient importance to ensure that they have immediately come to clinical attention when mutated. This view is certainly supported by the observation that younger human, primate and mammalian genes tend to evolve more rapidly and are subject to weaker purifying selection than their more ancient counterparts. 45,60 -62 If it is assumed that the underlying mutation rate does not differ markedly between these categories of gene, there are essentially two potential explanations for very young genes experiencing a particularly rapid rate of evolutionary change: weaker purifying selection or an increase in positive selection. The available evidence suggests that both explanations are likely to pertain. 62 -65 The majority of newly duplicated genes experience a period of relaxed purifying selection, while only a relatively small proportion exhibits a signature of positive selection consequent to the acquisition of new biological functions. 66 Presumably, in between the initial post-duplicational redundancy and eventual neofunctionalisation through genetic divergence, there is a 'half-way house' state in which a gene is relatively free to explore the acquisition of new functions while still being constrained to some extent by selection against the loss of those functions it has already acquired. 67 Whereas mutant alleles responsible for single gene disorders are usually under negative selection, alleles associated with complex disease appear to have been either under much less stringent purifying selection or may even have been subject to positive selection. 3,68,69 Genes that have experienced positive selection during the human lineage are likely to be characterised by human-specific functional adaptations. Since genes that have been subject to positive selection during human evolution have frequently also been implicated in disease, 70 we may surmise that the underlying pathological mutations may sometimes have interfered with these newly acquired functions. Consistent with this interpretation, Lappalainen et al. 71 have reported a statistically significant correlation between those regions of the human genome that have experienced recent positive selection in northern European populations and those regions that have been implicated in complex disease. This observation is also compatible with the tendency observed by the present authors for human-specific genes to harbour a disproportionate number of polymorphic variants associated with complex disease.
In terms of their expression profiles, there appears to be a tendency for rapidly evolving younger genes to be tissue-specific, whereas the more slowly evolving older genes are more broadly expressed. 72 -76 It remains to be seen whether complex disease genes differ from Mendelian disease genes in terms of their expression characteristics.
More generally, it has been proposed that 'taxonomically restricted genes' may play a role in the generation of morphological diversity, thereby enabling organisms to adapt to changing environmental conditions. 77 If this is also true for the human lineage-specific genes discussed here, it follows that the acquisition of mutations in these genes may well reduce an individual's capacity to deal with a rapidly changing environment. 78 This chimes well with the (essentially unrelated) idea that some forms of common disease susceptibility may be a consequence of ancient human adaptations to a long-term stable environment ('thrifty alleles'). These ancestral alleles may now increase the risk of common disease in a changed environment consequent to the recent shift to a modern lifestyle. 79 The present authors therefore suggest that it would be a worthwhile exercise to construct a complete lexicon of human-specific genes, since these loci may well provide a happy hunting ground for those seeking to identify genes that play a key role in complex disease.