Book Reviews

The aim of ‘Statistics for Microarrays’ is to explain the statistical methods commonly used for microarray analysis. The book is divided into two parts: the ﬁrst, ‘Getting Good Data’, focuses on experimental design, normalisation and quality control issues. The second, ‘Getting Good Answers’, deals with higher level statistical inferences about the biological questions of interest, such as clustering samples or genes; assessing differential expression; and classiﬁcation and prediction. The book opens with a useful section describing a number of experiments and associated datasets that are used throughout the book to illustrate the different stages in an analysis of a microarray dataset. An additional special feature is the inclusion of descriptions of R-language functions. Some of these are existing functions in standard R libraries, others are implementations by the authors of new methods developed in the book. The new functions are included in an R-package ‘smida’, which, along with the datasets, is made available online at the website for the book.

The aim of 'Statistics for Microarrays' is to explain the statistical methods commonly used for microarraya nalysis.The book is divided into twop arts: the first, 'Getting Good Data', focuseso ne xperimental design, normalisation and quality control issues.The second, 'Getting Good Answers', deals with higher level statistical inferences about the biological questions of interest, such as clustering samples or genes; assessing differential expression; and classificationa nd prediction.The booko pens withau seful section describing a number of experiments and associated datasetst hat areu sed throughout the book to illustrate the different stages in an analysisofamicroarraydataset.An additionalspecial feature is the inclusion of descriptions of R-language functions.Some of these aree xisting functions in standard Rl ibraries, othersa re implementations by the authorso fn ew methods developed in the book.Then ew functions are included in an R-package 'smida',w hich,a longw ith the datasets, is made available online at the website for the book.
The book represents one of the first attempts to present acoherent exposition of the field of microarraydata analysis, and is written in aclear and readable style.Although not areference work, it has been written in such away that one should not have to readthe earlier chaptersinorder to understand the later ones.This idea that chaptersors ections should be self-contained has been somewhat overdone and results in some statistical methods being explained more than once.For example, the definition of the t-statistic is giveninboth the classical and the Bayesian hypothesis sections in Chapter 8.The style is also rather repetitivewithin sections: for example, 'confounding' is explained well on page37, yetaseparate paragraph with the heading 'confounding' appears on page38.In general, the use of cross-references within the book would have been helpful (eg Sammon plots are used at an early stage but are not introduced until later,withnocross-reference).
Chapters3and 4onexperimental design and normalisation are very good.There is ad etailed discussion of then umber of replicate arrays needed to detect ag iven level of fold change between experimental conditions; ad iscussion of the variability of pooled samples;a nd an extensives ection on finding optimal designs for two-colour arrays.Normalisationi s explained well, with plots illustrating the variousr easons for normalisation.Thed iscussiono fs ingle-channel arrays, in particular those of Affymetrix, is partly dealt with in separate subsections of Chapters3a nd 4.The resulting presentation is rather messy and also slightly misleading: the probes representing ag ene aren ot technicalr eplicates (as claimed in Chapter 3), but represent different subsequences of the sequence encoding ag ene (as correctly stated in Chapter 4).The shortd escription of some methods for estimating gene expression measures from oligonucleotide arrays at the end of the normalisationc hapter does not do justice to the large literature on this subject.
The quality assessment chapter contains someinstructive examplesand good illustrations of the qualities of some of the most commonly used pairwise distance measures: the Euclidian, Manhattan and correlation distance measures.The chapter describes someinteresting methods, such as Sammon mappingfor dimensional reduction and 'false arrayimages' for assessing arrayhandling, but on the former pointisnot very clear.Sammon mappingisused to illustrate several different possible reasons for poor-quality data; however, howthe different possible reasons would be distinguished is not obvious.
There are twoc hapterso nc lustering methods, one on unsupervised methods used to group samples and/or genes and one on supervisedm ethods of classification.Thec hapter on unsupervised clustering starts with ag ood discussiono f different possible measures for calculating distancesb etween clusters.I tf ocuses mainly on hierarchical and partitioning around medoids (PAM)-type algorithms, with ab rief discussion of model-based clustering at the end.The authorsrightly warn against putting too much faith in agglomerative hierarchical clustering of genes,a lthough the point could have been made better withsome illustrations of where this maygo wrong, in line with the good illustrations in the booko nt he virtues of various distance measures.
The topic of gene-filtering is covered in the chapter on supervisedm ethods.This chapter briefly introduces an umber of important concepts in classification theory, predictor evaluation and cross-validation.Attention is restrictedt o simplem ethods, such as principal component analysis, linear discriminant analysis and k-nearest neighbour classification for class prediction and penalised and k-nearest neighbour regression for classifying and predicting continuous responses.Throughout the chapterso nc lustering, the authorsp resent anumberofnew methods, particularlyrelating to the problem of selecting appropriate numberso fc lusterso rn umbers of predictors, which are made available as Rf unctions.
Differential geneexpression is covered in aseparate chapter.The standard varieties of t-test are described, along with guidanceo nw hich to use in variouss ituations.Moreover, p -values and errorr ates (familywise and false discovery rates) are discussed and different methods of obtaining p -values (parametric,b ootstrapa nd permutation) areg iven.This section provides av eryg ood introduction to one of the most widely used methods for assessing differential expression.One

BOOK REVIEWS
q HENRYSTEWART PUBLICATIONS 1479-7364.HUMANGENOMICS .V OL 2. NO 1. 75-76 MARCH 2005 drawback is that there is no discussion of methods (such as the significance analysiso fm icroarrays [SAM] method) for stabilising gene variancee stimates used in t-statistics, which are often used when very fewr eplicate arrays area vailable.
In summary, this book provides ag ood introduction to the statisticalanalysisofmicroarraydata.The focus is primarily on cDNA arrays, although the higher-level analysisi nt he second half of the bookc an mostly be applied to oligonucleotide arrays as well.The more mathematically inclined reader mayw ish to refer to originalp apersf or sophisticated discussion.The books hould be well suited to the biologist or computer scientist who wants an overview of the problems encountered in analysing microarrayd ata and to gain some understandingo ft he different methods available.This book tells the historyo fs tudies of classical polymorphisms, surnames and church records of marriages (and particu-larly dispensations to marry relatives) that began in the small communities of theP arma Va lley during the 1950s and 1960s, and wasl ater extended to the Italian islandsa nd -inm ore limitedf orm-the whole of Italy.Although aspects of these studies have previously been published in several formats, this is the first full account of the background to the studies, the methods used, ther esults and the conclusions of the principal investigators.The publication of this booki st herefore important, becauset hese studies have formed ab edrock for late 20th centuryp opulation genetics.Human population genetics has, and continues to be,m arred by poorly designed sampling schemes.The careful designo f these studies, together with statistical analyses and computer simulations that were often groundbreaking, remains an example to others.

Anne-Mette
Much of the book discusses consanguineous marriagesfor example, marriages between cousins.The discussion covers Roman,' German' (Lombard) and CatholicC hurch law, as well as other social, economic and demographic factorst hat affect the prevalence of such marriages.T he effects of both consanguinity and 'random inbreeding' (geographically restrictedmatechoice) on genetic drift are also studied.There is also ac hapter discussing their effects on both normal and pathological phenotypes.
In these days of genome-wide genetic surveys and fast computational analyses, the painstaking effortr equired to collect and analyse these relatively sparse data seems unthinkable.A lthough we can nowa nswert he questions withm uch greater precision, the basic issues of genetic variationonafine geographical scale -and its relationship to demographic factors, drift, selection and,u ltimately,t op henotypes of interest -are the same as they were 50 yearsa go,w hen the studies described here were just beginning.

DavidB alding Imperial College London
London, UK Book Reviews