Comparison of human (and other) genome browsers

The sequence of the human genome provides a scaffold on which numerous annotations, such the locations of genes, can be laid. Genome browsers have been created to allow the simultaneous display of multiple annotations within a graphical interface. In addition, they provide the ability to search for markers and sequences, to extract annotations for specific regions or for the whole genome and to act as a central starting point for genomic research. This review describes the basic functionality of genome browsers and compares three of them: the University of California Santa Cruz (UCSC) Genome Browser, the Ensembl Genome Browser and the NCBI MapViewer.


Introduction
Genome browsersallowresearcherstonavigate the genome in an analogous wayt on avigating the internet with Internet Explorer or Mozilla. As with the internet, the amount of available genomic data is overwhelming, and browsersa im to maket hese data accessible to all researchers. The number and varietyo fa nnotations has increased dramatically,e nabling a detailed view of many aspects of the genome.O fc ourse,o ne of the primaryannotations is still the location and structure of genes, but even this is not straightforward, as many sources of information (sometimes conflicting) necessitate the creation of several gene-related annotations. These include the locations of mRNAa nd expressed sequence tag (EST) sequences deposited in them ajor sequence databases, curated gene sequence projectss uch as the Ve rtebrate Genome Annotation (VEGA), 1 Ref Seq, 2 MGC 3 and ENSEMBL 4 and computational predictions such as GenScan 5 and Tw inscan. 6 There is aw ide range of additionala nnotations. The locations of clones from bacterial artificial chromosome (BAC) and other clone libraries, sequence-tagged site (STS) markers from genetic maps [7][8][9] and estimated boundaries of cytogenetic bands 10 provide crucial mappingi nformation. Alignments with genomic sequences from other species delineate regions of synteny and help to identify orthologous genes. Single nucleotide polymorphisms (SNPs) and other types of variation point to differences within as pecies. Locations of repetitive sequences, dueb oth to retrotransposable elements and to simpler epeats such as microsatellites, help to provideamore complete description of the genomic landscape.A ni ncom-plete listing of annotations is shown in Ta ble 1. Browsers simultaneously displayt hese annotations, allowing for the investigation and appreciation of the genomic context in which to consider ag ene or region of interest.
Three browsersi np articular,t he University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc. edu), 11 the Ensembl Genome Browser( http://www.ensembl. org) 12 and the National Center for Biotechnology Information (NCBI) MapViewer (http://www. ncbi.nih.gov/ mapview) 13 provide information portals for multiple genome sequences, including human. They share many common features, butd iffer in significant ways. The following presents an overview and comparison of these browsers.

Genome browser comparisons
Genome browsersc an be described and compared with respect to presentation, contentand functionality.Presentation refers to howt he data ared isplayedi nagraphical forma nd the overall structure of the website.C ontentr eferst ow hat data is accessible,s uch as particular genomes equences and annotations for aspecific genome.Functionality referstotools available for mining theg enome sequence and annotations, such as sequencea nd text searches and data extraction.
The UCSC,E nsembl and NCBI genome browsersa im to present genomic datai namanner that will facilitate research, but they do so in different ways. Ta ble 2s ummarises some of these differences, and am ore complete,y et still high-level, discussiono ft hese is presented below. Presentation UCSC features three types of browsers:agenome browser, a gene family browser (Gene Sorter) and ap roteome browser. The genome browser is the most widely used and will be the focus of this discussion, although this in no wayi mplies that the other twoa re not very valuable research tools. The primaryw eb page of the genome browserconsists of agraphic that displays annotations for some specified genomic region surrounded by navigational buttons and links to tools. The navigational buttons allowf or zooming in and out or moving left or right alongt he genomic sequence. Within the graphic, annotations -also referred to as 'tracks' -are displayed horizontally,w ith the genome sequence runningf roml eftt o right. The locations of specific elements within annotations are primarily indicated by boxesw ith lines sometimes connecting them to showr elationships, such as in gene structures (boxes ¼ exons, lines ¼ introns). Arrows indicate forwardo r reverses trand,w here applicable.T he use of different colours and shading of boxesh ighlights the properties of certain annotations, such as confidence in the underlying data -asi s the case in the Known Genes track -and quantitativet raits, employedb yt he GC Percent track to indicate differing levels of contentofguanine (G) and cytosine (C) base pairs. Clicking on an element within an annotation will bring up as eparate 'details' webp age with specific information about that element and links to other databases and resources such as GenBank 13 and SwissProt. 14 The amount of this additional information varies between annotations. Drop-down menus towards the bottom of the page,a lso accessible through a separate 'configuration' page, allowf or the selection of annotations to displayi nt he graphic (Table 1).
Ensembl structures its site around 'Views'. For humans, there are 22 Views that displayd ifferent types of data and/or provide various functions. TheprimaryView,analogous to the UCSC main browserp age, is the ContigView.W ithin this View are three graphic displays that provide information at different resolutions for ar egion in the genome.T he Overview graphic displays multiple megabases (Mbs), the Detailed view shows approximately1Mb and the Basepair view details about 100 bases. Similarly to UCSC,the genome is shown in a horizontal fashion with navigational buttons located within the Detailed view graphic.I nt he three graphics on this page, annotations ared elineated by boxes, sometimes connected by lines and other times contained within al arger box. In the Detailed and Basepair views, the DNA contigs annotation divides the graphic with elements on the forwards trand appearing above and on the reverses trand below. Clicking on an element in an annotation will cause asmall pop-up window to appear withs ome basic information and possibly links to other Views within Ensembl or resources at other sites. For example,c licking on an Ensembl gene provides links to GeneView,T ranscriptView and ProtView pages, which contain additional information about the geneo raregion of the gene.M enus at the top of the Detailed view graphic provide the ability to select specific annotations for display.
The primaryd isplayo fN CBI'sM apViewer differss ignificantly from bothU CSC and Ensembl by orienting the genome sequence in av ertical fashion.A gain, boxesa nd lines indicate positions of elements in annotations, also referred to Ta ble 1. As ample of annotations found in one or more of the UCSC,E nsembl and NCBI genome browsers. Comparison of human (and other) genome browsers Review SOFTWARER EVIEW as 'maps', which arep resented in columns. The ability to navigate the genome is provided in aside bar to the left of the screen. Links within the annotations, as well as the LinkOut column,p rovide easy access to other relevant resources at the NCBI, such as Entrez Gene (formerly LocusLink), 15 Online MendelianI nheritance in Man (OMIM) 16 and dbSNP. 17 A 'Maps &O ptions' button brings up as eparate window, allowing one to select annotations to display.

Content
The NCBI providesa ccesst ot he largest number of genome sequence assemblies, including 11 vertebrates,fi ve invertebrates, one protozoan, eight plants and 12 fungi. Ensembl and UCSC aremore heavily slanted towardsthe larger eucharyotic genomes,p roviding access to as imilar set of 13 vertebrate genomes and six (Ensembl) or 15 (UCSC) invertebrates,a nd are devoid of the other classes of species.
Annotations available within the NCBI MapViewer primarily originate in the numerous databases available at the NCBI. TheM apViewer,t herefore,i sv eryt ightly integrated with these data sources, some of which -such as the Mitelman Breakpoint annotation -are not available at the other sites. UCSC and Ensembl alsop resent annotations that originate from outside resources, such as the databases at NCBI, but supplementt hese with numerous additionala nnotations contributed by in-house or third-partyr esearches.
The UCSC browser arguably contains the broadest set of annotations, especially in thea rea of cross-species comparisons. Fore xample,t he Conservation annotation, developed at and displayedonly at UCSC,shows ameasureofevolutionary conservation across eightvertebrate species, as determined by a phylogenetic hidden Markov model. 18 UCSC is also the official repositoryf or,a nd displays data from, the ENCODE ( Encyclopedia Of DNA Elements) project, 19 containing annotations ranging from histonem odifications to regionso f DNase 1h ypersensitivity.

Review SOFTWARER EVIEW
The Ensembl browser contains the most extensives et of gene and transcription-related data, with 14 of its 22 Views primarily focused on the presentation of gene-or proteinrelated data. There is tight integration with gened ata originating from both the Ensembl genes annotation 4 -a computationally generatede vidence-based set that Ensembl produces -and the VEGA project 1 -a manual curation effort. The Ensembl browser also has the moste xtensive presentation of haplotype data, especially in their LDView, which wasg enerated by the HapMap project. 20 The underlying genomic sequence is exactlythe same at all three sites, but analogous annotations maydiffer.For example, locations of mRNA and EST sequences requireanalignmentto the genome sequence.Their precise alignment mayvary, based on the alignment program used and specificparameter settings within the program. Thethree sitesdonot employ the same alignmentmethods, resulting in slight differences, althoughthey are in agreement for the vast majority of the time.

Functionality
There are many common functions that all three sitesprovide. Specificr egions of interest can be quickly and easily displayed using keywords such as geneorm arker names, exact base pair positions within chromosomes, or sequences viaa lignment programs likeB LAST 21 (Ensembl and NCBI) or BLAST-like alignmentt ool (BLAT) 22 (UCSC).Locations of paired primer sequences can be obtained via electronicp olymerase chain reaction (ePCR) 23 (NCBI and Ensembl) or isPCR (UCSC). Associated FTP sitesa llowf or the download of complete genome sequences and annotations.
Annotation data can also be downloaded for particular regions.N CBI allows userst ov iew annotations in at abular formatthat can then be downloaded into atext file.Ensembl's BioMart 24 and the UCSC Ta ble Browser 25 allowf or both simpled ownloads of annotations and for quite complex datasets to be generated. These twot ools also allowf or the uploadingo ffi les of genomic regions or names of genes or markersf or which annotation data, including the underlying sequence,c an be obtained.
UCSC and Ensembl provide the ability for researcherst o displayt heir owna nnotation information within the browser. As imple text file denoting the base pair locations of annotation elements is uploaded and used to create ac orresponding temporarya nnotation within the graphic,w hich is essentially only viewable by the originator.I nt his way, researchersc an usefully view their ownd ata within the contexto fa ll other available genomic data.
Ensembl provides the ability to view syntenic regions of twog enomes simultaneously in their MultiContigView.T he layout is similar to the ContigView described previously,b ut with the additiono fd ata from twos eparate genomes being displayedi nt he Detailed view graphic,a nd aN avigational view replacing the Overview with az oomed-out displayo f the regions being analysed in both genomes.

Last words
This overview of the UCSC,E nsembl and NCBI genome browsers is by no means complete and is not meant to recommend the use of one or the other of these sites. Users should explore the capabilities of each browser to determine the one they prefer.I nt he end, the browser that allows a researcher to be the most productive is the best.
The genome browsersr eviewedh ere provide access to not only human genome sequence data, but also to annotations from an ever-growing set of species. Similar functionality for each genome assembly is provided for all species, although the range of annotations varies dramatically.
These areb yn om eans the only genome-related browsers available,but they are among the most comprehensive. Similar browsers with more narrowfoci, such as for asingleorganism, share many of the features and functions described above.
The quality of the publicly available data displayed in browsers is highly variable.Therefore, researchersmust view this data as critically as any other.Appropriate experimentationisrequired as necessarytotest theaccuracy of anyhypothesis generated using these data. Nevertheless, genome browsers offer a powerful research tool to be utilised by researchersworldwide.