Further statistical analysis for genome-wide expression evolution in primate brain/liver/fibroblast tissue
© Henry Stewart Publications 2004
Received: 18 December 2003
Accepted: 18 December 2003
Published: 1 May 2004
In spite of only a 1-2 per cent genomic DNA sequence difference, humans and chimpanzees differ considerably in behaviour and cognition. Affymetrix microarray technology provides a novel approach to addressing a long-term debate on whether the difference between humans and chimpanzees results from the alteration of gene expressions. Here, we used several statistical methods (distance method, two-sample t-tests, regularised t-tests, ANOVA and bootstrapping) to detect the differential expression pattern between humans and great apes. Our analysis shows that the pattern we observed before is robust against various statistical methods; that is, the pronounced expression changes occurred on the human lineage after the split from chimpanzees, and that the dramatic brain expression alterations in humans may be mainly driven by a set of genes with increased expression (up-regulated) rather than decreased expression (down-regulated).
Keywordsmicroarray Affymetrix differential expression human evolution
Comparison of the human genome with closely related species, as well as distantly related species, has provided a better understanding of human evolution. Well before the era of modern molecular biology, great apes (chimpanzees, pygmy chimpanzees and gorillas) had already been recognised as human's closest relatives. The divergence time between human and chimpanzee is estimated to be only 4.6-6.2 million years ago, based on the sequences of autosomal intergenic non-repetitive DNA in human, chimpanzee, gorilla and orangutan . Humans have already evolved considerable differences from chimpanzees, however, in morphological appearance, behaviour, language and cognition, as well as in disease susceptibility (eg susceptibility to human immunodeficiency virus). Only about a 1.2 per cent difference [2–4], however, or up to 5 per cent difference including insertions and deletions , appears in their genomic DNA sequences. This striking observation raises an interesting question concerning the genetic basis for the difference between humans and chimpanzees. The long-term hypothesis of gene expression alteration remains attractive but still calls for hard evidence .
Recently, Enard et al.  studied the gene expression patterns across several primate species by using microarray technologies. Their analysis suggested that it is the brain rather than the liver that has a dramatic expression change in the human lineage compared with that in the chimpanzee lineage. The original work did not include an appropriate statistical assessment, however, so it is difficult to rule out the possibility of other explanations, including random effects. We resolved this statistical issue, using a standard two-sample t-test to show that, indeed, the enhanced gene expression changes in the human lineage, rather than in the chimpanzee lineage, was only found in the brain tissue, and not in the liver tissue . Moreover, we found that, in brain, induced gene expression alterations (up-regulation) in the human lineage are more frequent than reduced gene expression (down-regulation) . Put together, these studies [6, 7] not only support the long-termnotion  that the difference for humanity lies in gene expression, but also have important implications for the evolution of the human brain.
Because of these important implications, we recognised that the statistical methodology needed to be re-examined carefully, due to the very small sample size and the very large number of genes (simultaneous null hypotheses) used in the previous studies. In this paper, we applied several statistical methods to test whether the differential expression patterns across primates in the brain and liver tissues are essentially congruent from these tests. Moreover, as an independent dataset in the primate fibroblast tissue was available , we incorporated it into our statistical analysis.
Materials and methods
Gene expression data
We revisited the Affymetrix U95A array data reported by Enard et al. (2002) , which is available at http://www.email.eva.mpg.de/'khaitovi/supplement1.html. The array, which contains oligonucleotides from approximately 12,600 human genes, was hybridised to the brain and liver tissues of human (Homo sapiens), chimpanzee (Pan troglodyte) and orangutan (Pongo pygmaeus). The brain and liver tissues from three adult male humans, three adult male chimpanzees and one adult male orangutan were collected, and two independent isolations of RNA were performed for each individual and analysed independently. The detailed experimental procedures are given in the original work .
In addition, gene expression data from 18 humans, ten pygmy chimpanzees (Pan paniscus) and 11 gorillas (Gorilla gorill) from cultured fibroblasts, measured by Affymetrix U95Av2, was obtained from http://www.genome.org/cgi/content/full/13/7/1619/DC1.
Measurement of gene expression
The Bioconductor package affy (http://www.bioconductor.org), written in the R language, was used to read the primate Affymetrix data and convert probe level data to probe set (gene) level expression measurements. After background adjustment, normalisation and logarithm transformation, a gene-specific robust multichip average (RMA) measurement was used to represent the level of gene expression for each gene under certain analysis conditions . Furthermore, because variation of expression measures between samples from the same individual is small, the average of the duplicates from one individual was used to represent the measure of expression for that individual.
Branch length analysis
- (1)Absolute distance, which is the sum of all absolute differences between gene expression levels in two individuals, ie
- (2)Euclidean distance
- (3)Scaled Euclidean (statistical) distance
where is the estimation of variance for the kth gene.
Testing for genes with differential expression between humans and chimpanzees
We have tested whether differences in expression among primate species are constant across different tissue types . The first step is to identify genes with an altered level of expression between species for each tissue type. For example, between humans and chimpanzees, the null hypothesis is H0: μHk = μCk; where μHk is the population mean of expression levels of gene k in the human species, and μCk is the population mean of expression levels of gene k in the chimpanzee species in a specific tissue (brain, liver or fibroblasts). Since our dataset was limited in the number of replicates, several statistical testing methods were applied to eliminate the potential bias due to violation of underlying assumptions.
In our earlier report, we adopted a standard two-sample t-test to detect genes with differential expression between humans and chimpanzees . For a given gene k, under the assumption of normality and independence of gene expression levels, the t-statistic follows a student's t-distribution. A p value (the probability of seeing results as, or more, extreme as those actually observed if the null hypothesis were true) can be obtained by comparing the calculated t-statistic with a standard t-distribution. Given a certain level of significance (eg α = 5 per cent), one can declare a gene to be significantly differentially expressed in two species if the p value for this gene is less than α.
Regularised t-test (Cyber-T) under a Bayesian framework
We are fully aware that oligonucleotide array experiments were replicated only a few times in this study, so that the estimation of the t-statistic may be a poor estimator of the true variance among individuals within a species. Thus, a Bayesian probabilistic framework was applied to improve the variance estimation, which produced a regularised t-test . In addition to the empirical sample variance estimated from real observations, prior background variation for several 'pseudo-observations' (ie, local average of the variances for genes showing similar expression levels) was also taken into account. A regularised t-test was then employed by replacing the empirical sample variance in the previous t-test by the posterior variance estimator, as implemented in the Cyber-T software package (http://genomics.biochem.uci.edu/genex/cybert/).
ANOV A and bootstrap
— where Y ijk is the expression level from species i (i = 1, 2, 3 for human, chimpanzee and orangutan); individual j (j = 1 1, 2, 3 for human and chimpanzee and j = 1 for orangutan); and gene k (k = 1, 2, ..., 12, 600). Two random terms are included in this linear model; ϵ ijk are random errors and sj(i) are random subject effects accounting for variation within species. We assume that the means of sj(i) and ε ijk are both equal to zero; the variances of sj(i) and ε ijk are and σ2, respectively. For the fixed (non-random) terms τ i represents the additive effect due to ith species that is common to all genes; γ k represents the additive effect due to the kth gene that is common to all species; and the interaction terms (τγ)1kand (τγ)2kallow for the effect of the kth gene to vary with species (the subscript being '1' for human and '2' for chimpanzee), such that we consider genes with non-zero interaction term [(τγ)1k- (τγ)2k] to be differentially expressed between human and chimpanzee. A bootstrapping approach  was conducted to test the hypothesis: (τγ)1k- (τγ)2k= 0 .
Predicting the phylogenetic location and the trend of expression change between human and chimpanzee (or pygmy chimpanzee) in brain, liver and fibroblast tissues
Overall expression changes in humans and chimpanzees: More changes detected in the brain
We mapped the change in the level of expression between human and chimpanzee onto the phylogenetic tree, where the branch length for each species, ie bH, bC or bO (or bG), was obtained using the least squares method given the pairwise distance matrices for 12,600 genes in different individuals (see Figure 1a, and the Methods section). Here, the branch length for each species can be interpreted as the measure of overall altera tion in gene expression that has occurred in that lineage. In particular, the ratios of expression changes that have occurred in human lineage to those that have occurred in chimpanzee line age (bH/bC) in the brain and the liver can serve as important indicators for the alterations since the human-chimpanzee split. The branch ratios were estimated to be 1.95 and 1.01 in brain tissue and liver tissue respectively, using the absolute distance -- 1.87 and 1.07, respectively (Euclidean distance) or 1.75 and 1.02, respectively (scaled Euclidean distance). Moreover, 1,000 bootstrap samples or 12,600 genes confirmed that the ratio estimation is largely reliable (Figure 1b-1d). Consistent with the conclusion of Enard et al.,  the analysis of overall expression suggested that expression changes occurred in the human lineage more frequently than they occurred in the chimpanzee lineage in brain tissue, although this is not the case in liver tissue. In the independent analysis in fibroblasts , the estimated ratio of expression change in the human lineage to that in the chimpanzee lineage is about 1.3, which is higher than that in the liver but lower than that in the brain (data not shown).
Differentially expressed genes between humans and chimpanzees: More changes detected in the liver than in the brain
The number of differentially expressed genes between humans and chimpanzees, detected by different statistical method
Significanc e level (α)
Lineage specific expression: Enhanced expression level for brain-expressed genes in the human lineage
The ratio of gene expression changes that have occurred in the human lineage to those that have occurred in the chimpanzee lineage (LH/LC)
Significance level (α)
Our comparative analysis of Affymetrix microarray data in human, chimpanzee and orangutan tissues (brain, liver and fibroblast) [6, 8] has provided fairly strong statistical evidence for the hypothesis that after the split between humans and chimpanzees, the change of expression pattern in the human brain became more dramatic than that in the chimpanzee brain, as suggested by our previous study . Interestingly, these results are not only statistically significant, but also are further supported by a more recent independent study .
Hsieh et al. have interpreted the finding, shown both in the present study and in previous studies [6–8], that more genes have undergone divergence between the human and chimpanzee lineage in liver than in the brain tissue, as evidence that it is not clear that expression divergence has been accelerated in the human brain . Those authors appeared to have confused two issues: (1) more genes are expressed in the liver than in the brain, which may be true for all primates, even mammals; and (2) there have been more expression changes in the human brain than in the chimpanzee brain after the speciation. The whole point of our discussion here, as well as that of previous studies [6–8, 14], concerns the second, rather than the first point, although the first issue itself is also very interesting. Indeed, no one is claiming that, between the human and the chimpanzee, the absolute number of differentially expressed genes in the brain is higher than that in all other tissues. What is important is that the asymmetric expression that has evolved between the human and the chimpanzee has so far only been found in the brain. Moreover, we have shown that induction (increased gene expression) in the human brain is much more frequent than repression (decreased gene expression) , which is consistent with the finding of elevated gene expression levels in the human cortex . In this paper, we applied various statistical methods, including regularised t-tests and bootstrap methods, in addition to two-sample t-tests, to confirm these important results.
Different statistical tests are based on different assumptions, and some of these may not necessarily hold for a given dataset. Thus, the robustness of our main results, obtained from using various methods, becomes important. As shown above, it seems that our results are indeed robust. We also notice that the selected sets of differentially expressed genes are largely consistent, although do not perfectly match (not shown). In spite of the fact that the classification of phylogenetic location of occurrence of expression changes may differ somewhat between these methods, this hardly alters the ratio of expression changes between the human and the chimpanzee lineages. The enhanced expression level for some brain-expressed genes (up-regulation) in the human lineage may have played an important role in the emergence of human beings ; this certainly deserves further investigation. Our study has suggested that the analysis of microarray data provides a starting point for the identification of key regulators involving the evolution of the human brain and a better understanding of human evolution. Large-scale, multi-tissue and high-quality microarray data, with sufficient replicates, are essential for acheieving this goal.
We thank Dan Nettleton and Yufeng Wang for a helpful discussion, and two anonymous reviewers for their insightful comments. We also thank Xiangyun Wang for his help with data retrieval. This work has been supported by an NIH grant.
- Chen FC, Li WH: 'Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees'. Am J Hum Genet. 2001, 68: 444-456. 10.1086/318206.PubMed CentralView ArticlePubMedGoogle Scholar
- King MC, Wilson AC: 'Evolution at two levels in humans and chimpanzees'. Science. 1975, 188: 107-116. 10.1126/science.1090005.View ArticlePubMedGoogle Scholar
- Fujiyama A, Watanabe H, Toyoda A, et al: 'Construction and analysis of a human-chimpanzee comparative clone map'. Science. 2002, 295: 131-134. 10.1126/science.1065199.View ArticlePubMedGoogle Scholar
- Ebersberger I, Metzler D, Schwarz C, et al: 'Genome-wide comparison of DNA sequences between humans and chimpanzees'. Am J Hum Genet. 2002, 70: 490-1497.View ArticleGoogle Scholar
- Britten RJ: 'Divergence between samples of chimpanzee and human DNA sequences is 5 per cent, counting indels'. Proc Natl Acad Sci USA. 2002, 99: 13633-13635. 10.1073/pnas.172510699.PubMed CentralView ArticlePubMedGoogle Scholar
- Enard W, Khaitovich P, Klose J, et al: 'Intra- and interspecific variation in primate gene expression patterns'. Science. 2002, 296: 340-343. 10.1126/science.1068996.View ArticlePubMedGoogle Scholar
- Gu J, Gu X: 'Induced gene expression in human brain after the split from chimpanzee'. Trends Genet. 2003, 19: 63-65. 10.1016/S0168-9525(02)00040-9.View ArticlePubMedGoogle Scholar
- Karaman MW, Houck ML, Chemnick LG, et al: 'Comparative analysis of gene-expression patterns in human and African Great Ape cultured fibroblasts'. Genome Res. 2003, 13: 1619-1630. 10.1101/gr.1289803.PubMed CentralView ArticlePubMedGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, et al: 'Exploration, normalization, and summaries of high density oligonucleotide array probe level data'. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.View ArticlePubMedGoogle Scholar
- Kumar S, Tamura K, Nei M: 'MEGA: Molecular evolutionary genetics analysis software for microcomputers'. Comput Appl Biosci. 1994, 10: 189-191.PubMedGoogle Scholar
- Baldi P, Long AD: 'A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes'. Bioinformatics. 2001, 17: 509-519. 10.1093/bioinformatics/17.6.509.View ArticlePubMedGoogle Scholar
- Efron B, Tibshirani R: 'An Introduction to the Bootstrap'. 1993, Chapman & Hall, San Francisco, CAView ArticleGoogle Scholar
- Kerr MK, Churchill GA: 'Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments'. Proc Natl Acad Sci USA. 2001, 98: 8961-8965. 10.1073/pnas.161273698.PubMed CentralView ArticlePubMedGoogle Scholar
- Caceres M, Lachuer J, Zapala MA, et al: 'Elevated gene expression levels distinguish human from non-human primate brains'. Proc Natl Acad Sci USA. 2003, 100: 13030-13035. 10.1073/pnas.2135499100.PubMed CentralView ArticlePubMedGoogle Scholar
- Hsieh WP, Chu TM, Wolfinger RD, et al: 'Mixed- model reanalysis of primate data suggests tissue and species biases in oligonucleotide-based gene expression profiles'. Genetics. 2003, 165: 747-757.PubMed CentralPubMedGoogle Scholar