- Primary research
Further statistical analysis for genome-wide expression evolution in primate brain/liver/fibroblast tissue
Human Genomics volume 1, Article number: 247 (2004)
In spite of only a 1-2 per cent genomic DNA sequence difference, humans and chimpanzees differ considerably in behaviour and cognition. Affymetrix microarray technology provides a novel approach to addressing a long-term debate on whether the difference between humans and chimpanzees results from the alteration of gene expressions. Here, we used several statistical methods (distance method, two-sample t-tests, regularised t-tests, ANOVA and bootstrapping) to detect the differential expression pattern between humans and great apes. Our analysis shows that the pattern we observed before is robust against various statistical methods; that is, the pronounced expression changes occurred on the human lineage after the split from chimpanzees, and that the dramatic brain expression alterations in humans may be mainly driven by a set of genes with increased expression (up-regulated) rather than decreased expression (down-regulated).
Comparison of the human genome with closely related species, as well as distantly related species, has provided a better understanding of human evolution. Well before the era of modern molecular biology, great apes (chimpanzees, pygmy chimpanzees and gorillas) had already been recognised as human's closest relatives. The divergence time between human and chimpanzee is estimated to be only 4.6-6.2 million years ago, based on the sequences of autosomal intergenic non-repetitive DNA in human, chimpanzee, gorilla and orangutan . Humans have already evolved considerable differences from chimpanzees, however, in morphological appearance, behaviour, language and cognition, as well as in disease susceptibility (eg susceptibility to human immunodeficiency virus). Only about a 1.2 per cent difference [2–4], however, or up to 5 per cent difference including insertions and deletions , appears in their genomic DNA sequences. This striking observation raises an interesting question concerning the genetic basis for the difference between humans and chimpanzees. The long-term hypothesis of gene expression alteration remains attractive but still calls for hard evidence .
Recently, Enard et al.  studied the gene expression patterns across several primate species by using microarray technologies. Their analysis suggested that it is the brain rather than the liver that has a dramatic expression change in the human lineage compared with that in the chimpanzee lineage. The original work did not include an appropriate statistical assessment, however, so it is difficult to rule out the possibility of other explanations, including random effects. We resolved this statistical issue, using a standard two-sample t-test to show that, indeed, the enhanced gene expression changes in the human lineage, rather than in the chimpanzee lineage, was only found in the brain tissue, and not in the liver tissue . Moreover, we found that, in brain, induced gene expression alterations (up-regulation) in the human lineage are more frequent than reduced gene expression (down-regulation) . Put together, these studies [6, 7] not only support the long-termnotion  that the difference for humanity lies in gene expression, but also have important implications for the evolution of the human brain.
Because of these important implications, we recognised that the statistical methodology needed to be re-examined carefully, due to the very small sample size and the very large number of genes (simultaneous null hypotheses) used in the previous studies. In this paper, we applied several statistical methods to test whether the differential expression patterns across primates in the brain and liver tissues are essentially congruent from these tests. Moreover, as an independent dataset in the primate fibroblast tissue was available , we incorporated it into our statistical analysis.
Materials and methods
Gene expression data
We revisited the Affymetrix U95A array data reported by Enard et al. (2002) , which is available at http://www.email.eva.mpg.de/'khaitovi/supplement1.html. The array, which contains oligonucleotides from approximately 12,600 human genes, was hybridised to the brain and liver tissues of human (Homo sapiens), chimpanzee (Pan troglodyte) and orangutan (Pongo pygmaeus). The brain and liver tissues from three adult male humans, three adult male chimpanzees and one adult male orangutan were collected, and two independent isolations of RNA were performed for each individual and analysed independently. The detailed experimental procedures are given in the original work .
In addition, gene expression data from 18 humans, ten pygmy chimpanzees (Pan paniscus) and 11 gorillas (Gorilla gorill) from cultured fibroblasts, measured by Affymetrix U95Av2, was obtained from http://www.genome.org/cgi/content/full/13/7/1619/DC1.
Measurement of gene expression
The Bioconductor package affy (http://www.bioconductor.org), written in the R language, was used to read the primate Affymetrix data and convert probe level data to probe set (gene) level expression measurements. After background adjustment, normalisation and logarithm transformation, a gene-specific robust multichip average (RMA) measurement was used to represent the level of gene expression for each gene under certain analysis conditions . Furthermore, because variation of expression measures between samples from the same individual is small, the average of the duplicates from one individual was used to represent the measure of expression for that individual.
Branch length analysis
Considering this gene expression data scatter at a 12,600-coordinate space, we defined the pairwise distance between the expression levels of two individuals i and j as follows:
Absolute distance, which is the sum of all absolute differences between gene expression levels in two individuals, ie
Scaled Euclidean (statistical) distance
where is the estimation of variance for the kth gene.
We then calculated all pairwise distances among individuals. For simplicity, the average measure from all individuals of each species was used to represent the gene expression level in that species. As shown in Figure 1a, the branch lengths of human, chimpanzee and orangutan/gorilla species (denoted by bH, bC and bO/G, respectively) can be obtained from pairwise distances using the method of least squares and the MEGA2.1 software (downloaded from http://www.megasoftware.net/) . The ratio bH/bC can be interpreted as the ratio of expression changes that have occurred in the human lineage to those that have occurred in the chimpanzee lineage. Further, the reliability of this ratio estimation was examined by 1,000 bootstrap samples of 12,600 genes.
Testing for genes with differential expression between humans and chimpanzees
We have tested whether differences in expression among primate species are constant across different tissue types . The first step is to identify genes with an altered level of expression between species for each tissue type. For example, between humans and chimpanzees, the null hypothesis is H0: μHk = μCk; where μHk is the population mean of expression levels of gene k in the human species, and μCk is the population mean of expression levels of gene k in the chimpanzee species in a specific tissue (brain, liver or fibroblasts). Since our dataset was limited in the number of replicates, several statistical testing methods were applied to eliminate the potential bias due to violation of underlying assumptions.
In our earlier report, we adopted a standard two-sample t-test to detect genes with differential expression between humans and chimpanzees . For a given gene k, under the assumption of normality and independence of gene expression levels, the t-statistic follows a student's t-distribution. A p value (the probability of seeing results as, or more, extreme as those actually observed if the null hypothesis were true) can be obtained by comparing the calculated t-statistic with a standard t-distribution. Given a certain level of significance (eg α = 5 per cent), one can declare a gene to be significantly differentially expressed in two species if the p value for this gene is less than α.
Regularised t-test (Cyber-T) under a Bayesian framework
We are fully aware that oligonucleotide array experiments were replicated only a few times in this study, so that the estimation of the t-statistic may be a poor estimator of the true variance among individuals within a species. Thus, a Bayesian probabilistic framework was applied to improve the variance estimation, which produced a regularised t-test . In addition to the empirical sample variance estimated from real observations, prior background variation for several 'pseudo-observations' (ie, local average of the variances for genes showing similar expression levels) was also taken into account. A regularised t-test was then employed by replacing the empirical sample variance in the previous t-test by the posterior variance estimator, as implemented in the Cyber-T software package (http://genomics.biochem.uci.edu/genex/cybert/).
ANOV A and bootstrap
In addition to the t-test, we applied a non-parametric bootstrap approach for identifying significant differential expression. First we fitted a linear model --
— where Y ijk is the expression level from species i (i = 1, 2, 3 for human, chimpanzee and orangutan); individual j (j = 1 1, 2, 3 for human and chimpanzee and j = 1 for orangutan); and gene k (k = 1, 2, ..., 12, 600). Two random terms are included in this linear model; ϵ ijk are random errors and sj(i) are random subject effects accounting for variation within species. We assume that the means of sj(i) and ε ijk are both equal to zero; the variances of sj(i) and ε ijk are and σ2, respectively. For the fixed (non-random) terms τ i represents the additive effect due to ith species that is common to all genes; γ k represents the additive effect due to the kth gene that is common to all species; and the interaction terms (τγ)1kand (τγ)2kallow for the effect of the kth gene to vary with species (the subscript being '1' for human and '2' for chimpanzee), such that we consider genes with non-zero interaction term [(τγ)1k- (τγ)2k] to be differentially expressed between human and chimpanzee. A bootstrapping approach  was conducted to test the hypothesis: (τγ)1k- (τγ)2k= 0 .
Predicting the phylogenetic location and the trend of expression change between human and chimpanzee (or pygmy chimpanzee) in brain, liver and fibroblast tissues
We obtained sets of differentially expressed genes between human and chimpanzee in different tissues, based on the statistical tests described above. For each selected gene, we then used the orangutan (or gorilla) as a reference to infer the lineage in which the gene expression alteration occurred. We classified the selected differentially expressed genes between human and chimpanzee based on two additional tests for null hypotheses: (1) whether the gene expression in orangutan (or gorilla in fibroblast data) is different from that in human; and (2) whether the gene expression in orangutan (or gorilla) is different from that in chimpanzee. As shown in Figure 2, for a certain significance level (α), the genes with a significant difference between human and chimpanzee can be categorised into one of the four following groups, according to the most parsimonious rule. (1) Diversified group: gene expression level in three species are significantly different from each other; (2) Chimpanzee lineage (LC)-specific group: gene expression in orangutan (or gorilla) is significantly different from that in chimpanzee but not from that in human, suggesting the expression change may occur specifically in the chimpanzee lineage after the human-chimpanzee split; (3) Human lineage (LH)-specific group: gene expression in orangutan is significantly different from that in human but not from that in chimpanzee, suggesting the expression change may occur in the human lineage; and (4) Unclassified group: expression in orangutan (or gorilla) is not significantly different from that in both chimpanzee and human. For each gene that belongs to group 2 or 3, we further inferred the direction of expression change -- induced or repressed.
Overall expression changes in humans and chimpanzees: More changes detected in the brain
We mapped the change in the level of expression between human and chimpanzee onto the phylogenetic tree, where the branch length for each species, ie bH, bC or bO (or bG), was obtained using the least squares method given the pairwise distance matrices for 12,600 genes in different individuals (see Figure 1a, and the Methods section). Here, the branch length for each species can be interpreted as the measure of overall altera tion in gene expression that has occurred in that lineage. In particular, the ratios of expression changes that have occurred in human lineage to those that have occurred in chimpanzee line age (bH/bC) in the brain and the liver can serve as important indicators for the alterations since the human-chimpanzee split. The branch ratios were estimated to be 1.95 and 1.01 in brain tissue and liver tissue respectively, using the absolute distance -- 1.87 and 1.07, respectively (Euclidean distance) or 1.75 and 1.02, respectively (scaled Euclidean distance). Moreover, 1,000 bootstrap samples or 12,600 genes confirmed that the ratio estimation is largely reliable (Figure 1b-1d). Consistent with the conclusion of Enard et al.,  the analysis of overall expression suggested that expression changes occurred in the human lineage more frequently than they occurred in the chimpanzee lineage in brain tissue, although this is not the case in liver tissue. In the independent analysis in fibroblasts , the estimated ratio of expression change in the human lineage to that in the chimpanzee lineage is about 1.3, which is higher than that in the liver but lower than that in the brain (data not shown).
Differentially expressed genes between humans and chimpanzees: More changes detected in the liver than in the brain
The two-sample t-test, the regularised t-test and the boot-strapping approach were employed to test the hypothesis that the expression pattern of a particular gene in the human lineage is the same as that in the chimpanzee lineage, and the significance level (p-value) was determined for each gene. Without consideration of the multi-testing problem, the total number of genes predicted to be differentially expressed between humans and chimpanzees was determined by choosing the significance level α (p < α), as shown in Table 1. It is noteworthy that these methods revealed the congruent expression pattern regardless of the α value chosen. The most differentially expressed genes are in liver tissue and the least differentially expressed genes are in brain tissue; this may reflect the stringent functional constraints on brain evolution. In general, the t-test appeared to be the most conservative method used, since in all brain, liver and fibroblast tissues, fewer genes with small p-values were detected, compared with the other two methods. Nevertheless, we calculated the correlation coefficient of ranks of p-values. Overall, the correlations of p-values from any two methods were greater than 0.75, suggesting a reasonably high agreement between different testing methods.
Lineage specific expression: Enhanced expression level for brain-expressed genes in the human lineage
At different significance levels (α), a list of genes was shown to be expressed at statistically significant different levels between humans and chimpanzees (Table 1). By using orangutan (or gorilla, using fibroblast tissue) as a reference, these genes were classified into four different expression pattern groups: diversified, chimpanzee lineage-specific, human lineage-specific and unclassified (see the Methods section). The number of genes falling into each expression change group was counted using each statistical testing method. The expression changes of brain-expressed (or liver-expressed or fibroblast-expressed) genes in the human lineage versus that in the chimpanzee lineage, as measured by the ratio LH/LC, were calculated. For the conservative t-test method, the LH/LC ratio for brain-expressed genes ranged from 2.76 to 3.22 for α = 0.05-0.001 (Table 2); in each case the null hypothesis LH/LC = 1, which suggested that the expression changes occurring in the human and in the chimpanzee lineage are almost equal, was rejected at p < 0.001. By contrast, the LH/LC ratio for liver-expressed genes was virtually equal to one in all cases. This suggested that brain expression changes in the human lineage are more frequent than those in the chimpanzee lineage, regardless of the significance level (α) chosen. Analogously, the regularised t-test and the bootstrap method detected a similar pattern of increased expression changes in the human lineage in brain tissue (Table 2). Our multiple statistical methods have provided robust evidence for supporting the notion of dramatic expression changes for brain-expressed genes in the human lineage . In the independent analysis of cultured fibroblast cells, although the LH/LC ratios were consistent across all significance level, they varied between different statistical testing methods (~ 1.7 using the standard t-test and ~ 1.1 using the regularised t-test).
For genes that have been identified as chimpanzee lineage-specific (LC) or human lineage-specific (LH), we can further infer the change in direction of the evolutionary event using the orangutan (or gorilla, using fibroblast tissue) as an out-group; that is, from low to high expression level (induction, denoted by I), or from high to low expression level (repression, R) (see Figure 2 for illustration). All three methods revealed that among the human lineage-specific expression changes, more genes had been induced than repressed in brain tissue, whereas there was no strong evidence for a differential induction/repression pattern in liver and fibroblast tissue (Figure 3). For example, using the two-sample t-test, the induction/repression (I/R) ratio in the brain ranged from 2.21 to 5.90 in the human with different significance levels α, assigned; this ratio was statistically greater than one. By contrast, for liver-expressed genes, the I/R ratio ranged from 0.86 to 1.33 in the human, which was not significantly greater than one. The patterns in fibroblast-expressed genes were not clear according to the different statistical methods used. Interestingly, in the chimpanzee lineage, the induction/repression (I/R) ratio fluctuated around two for both brain- and liver-expressed genes, and was sensitive to the significance level, while in fibroblast tissue the I/R ratio was close to, or lower than, one (Figure 3).
Our comparative analysis of Affymetrix microarray data in human, chimpanzee and orangutan tissues (brain, liver and fibroblast) [6, 8] has provided fairly strong statistical evidence for the hypothesis that after the split between humans and chimpanzees, the change of expression pattern in the human brain became more dramatic than that in the chimpanzee brain, as suggested by our previous study . Interestingly, these results are not only statistically significant, but also are further supported by a more recent independent study .
Hsieh et al. have interpreted the finding, shown both in the present study and in previous studies [6–8], that more genes have undergone divergence between the human and chimpanzee lineage in liver than in the brain tissue, as evidence that it is not clear that expression divergence has been accelerated in the human brain . Those authors appeared to have confused two issues: (1) more genes are expressed in the liver than in the brain, which may be true for all primates, even mammals; and (2) there have been more expression changes in the human brain than in the chimpanzee brain after the speciation. The whole point of our discussion here, as well as that of previous studies [6–8, 14], concerns the second, rather than the first point, although the first issue itself is also very interesting. Indeed, no one is claiming that, between the human and the chimpanzee, the absolute number of differentially expressed genes in the brain is higher than that in all other tissues. What is important is that the asymmetric expression that has evolved between the human and the chimpanzee has so far only been found in the brain. Moreover, we have shown that induction (increased gene expression) in the human brain is much more frequent than repression (decreased gene expression) , which is consistent with the finding of elevated gene expression levels in the human cortex . In this paper, we applied various statistical methods, including regularised t-tests and bootstrap methods, in addition to two-sample t-tests, to confirm these important results.
Different statistical tests are based on different assumptions, and some of these may not necessarily hold for a given dataset. Thus, the robustness of our main results, obtained from using various methods, becomes important. As shown above, it seems that our results are indeed robust. We also notice that the selected sets of differentially expressed genes are largely consistent, although do not perfectly match (not shown). In spite of the fact that the classification of phylogenetic location of occurrence of expression changes may differ somewhat between these methods, this hardly alters the ratio of expression changes between the human and the chimpanzee lineages. The enhanced expression level for some brain-expressed genes (up-regulation) in the human lineage may have played an important role in the emergence of human beings ; this certainly deserves further investigation. Our study has suggested that the analysis of microarray data provides a starting point for the identification of key regulators involving the evolution of the human brain and a better understanding of human evolution. Large-scale, multi-tissue and high-quality microarray data, with sufficient replicates, are essential for acheieving this goal.
Chen FC, Li WH: 'Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees'. Am J Hum Genet. 2001, 68: 444-456. 10.1086/318206.
King MC, Wilson AC: 'Evolution at two levels in humans and chimpanzees'. Science. 1975, 188: 107-116. 10.1126/science.1090005.
Fujiyama A, Watanabe H, Toyoda A, et al: 'Construction and analysis of a human-chimpanzee comparative clone map'. Science. 2002, 295: 131-134. 10.1126/science.1065199.
Ebersberger I, Metzler D, Schwarz C, et al: 'Genome-wide comparison of DNA sequences between humans and chimpanzees'. Am J Hum Genet. 2002, 70: 490-1497.
Britten RJ: 'Divergence between samples of chimpanzee and human DNA sequences is 5 per cent, counting indels'. Proc Natl Acad Sci USA. 2002, 99: 13633-13635. 10.1073/pnas.172510699.
Enard W, Khaitovich P, Klose J, et al: 'Intra- and interspecific variation in primate gene expression patterns'. Science. 2002, 296: 340-343. 10.1126/science.1068996.
Gu J, Gu X: 'Induced gene expression in human brain after the split from chimpanzee'. Trends Genet. 2003, 19: 63-65. 10.1016/S0168-9525(02)00040-9.
Karaman MW, Houck ML, Chemnick LG, et al: 'Comparative analysis of gene-expression patterns in human and African Great Ape cultured fibroblasts'. Genome Res. 2003, 13: 1619-1630. 10.1101/gr.1289803.
Irizarry RA, Hobbs B, Collin F, et al: 'Exploration, normalization, and summaries of high density oligonucleotide array probe level data'. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.
Kumar S, Tamura K, Nei M: 'MEGA: Molecular evolutionary genetics analysis software for microcomputers'. Comput Appl Biosci. 1994, 10: 189-191.
Baldi P, Long AD: 'A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes'. Bioinformatics. 2001, 17: 509-519. 10.1093/bioinformatics/17.6.509.
Efron B, Tibshirani R: 'An Introduction to the Bootstrap'. 1993, Chapman & Hall, San Francisco, CA
Kerr MK, Churchill GA: 'Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments'. Proc Natl Acad Sci USA. 2001, 98: 8961-8965. 10.1073/pnas.161273698.
Caceres M, Lachuer J, Zapala MA, et al: 'Elevated gene expression levels distinguish human from non-human primate brains'. Proc Natl Acad Sci USA. 2003, 100: 13030-13035. 10.1073/pnas.2135499100.
Hsieh WP, Chu TM, Wolfinger RD, et al: 'Mixed- model reanalysis of primate data suggests tissue and species biases in oligonucleotide-based gene expression profiles'. Genetics. 2003, 165: 747-757.
We thank Dan Nettleton and Yufeng Wang for a helpful discussion, and two anonymous reviewers for their insightful comments. We also thank Xiangyun Wang for his help with data retrieval. This work has been supported by an NIH grant.
About this article
Cite this article
Gu, J., Gu, X. Further statistical analysis for genome-wide expression evolution in primate brain/liver/fibroblast tissue. Hum Genomics 1, 247 (2004). https://doi.org/10.1186/1479-7364-1-4-247
- differential expression
- human evolution