Strategies for the detection of copy number and other structural variants in the human genome

Advances in genome scanning technologies are revealing that copy number variants (CNVs) and polymorphisms, ranging from a few kilobases to several megabases in size, are present in genomes at frequencies much greater than previously known. Discoveries of additional forms of genomic variation, including inversions, insertions, deletions and complex rearrangements, are also occurring at an increased rate. Along with CNVs, these sequence alterations are collectively known as structural variants, and their discovery has had an immediate impact on the interpretation of basic research and clinical diagnostic data. This paper discusses different methods, experimental strategies and technologies that are currently available to study copy number variation and other structural variants in the human genome.


Introduction
The capacity for targeted or en masse detection of variation in the human genome is dictated by the resolution of the available technologies. In the early yearso fh umang enetics, variationw as detected by studying chromosomes under microscopes, with notable observations of aneuploidy, 1-3 heteromorphism 4 and fragile sites, 5 to name afew,dominating our knowledge base.T he advent of molecular biology,a nd in particular nucleotide resolution analysist hrough DNA sequencing and genotyping, led to the discovery,c haracterisation and mappingo fs hortt andem repeats (STRs; eg di-,triand tetranucleotide microsatellites) 6 and single nucleotide polymorphisms (SNPs). 7,8 STRsa nd SNPs, and the technologies used to detect them,a re described in numerous comprehensivea rticles and reviews. [9][10][11][12] The latest advances in studying human genome variation have been in the examination of copynumber variants (CNVs) (Figures 1and 2) and other similarly sized structural changes along human chromosomes. In general, this class of variants refers to changes of an intermediate size,between microor minisatellites and microscopically visible changes (usually . 1kilobase[kb]and , 3megabases). Several recent investigations have found that thesevariants aremuch more frequent in thehuman genome than previously recognised 13 -20 and in-depthdescriptions of these variants and their properties can be found in several recent reviews. 21 -29 Investigations of structural variants have been accompanied by ahost of newly developed technologies and methodologies, with many of these latest-generation techniques being currently implemented and continually improved upon. As more techniques arise,t he most commonly asked questions eems to be,' What is the most appropriate technique or experimental approach to address our specificq uestion?' Several factorsc an be used to distinguish different techniques and shouldb e considered before embarking on an ew study.T hese include the scope of the technique (does the method assess targeted or genome-wide variation?) and the resolution of the technique (what types and sizes of variants need to be analysed?) ( Figure 2). Here,t he techniques and methodologies currently available for the analysiso fs tructural variation (in particular, copyn umber variation) in theh umang enome arer eviewed. Important factorsi nt hesea nalyses areh ighlighted, while,a t the same time,r ecommendations for theses trategies are made -i ns omec ases based on the authors' ownp ersonal experiences. Fluorescence in situ hybridisation (FISH) is not discussed here because it is well established and has been previously reviewed in detail. 30 It shouldb en oted, however, that FISH is often the only wayt oa ssess certain forms of structural variationa long the chromosomes.

Methods for detecting and scoring copyn umber variation
The discovery and characterisation of structural variationin the human genome has been driven by advances in methods that allowcomprehensivescanning of an entiregenome,along with targeted scans of defined loci. Originally,t echnologies such as Southernblot hybridisation -using conventional and pulsed-field gel electrophoresis, FISHa nd microsatellite scanning offered the best methods for revealing changes in DNA copynumber. In some cases,particularly for disease diagnostics, these methodologies are still the standard and maycontinue to offer the only wayt ocorrectly resolveacomplex rearrangement. Each of these approaches lacks scalability,however, and mayalso requireparental DNA to facilitate data interpretation. Notwithstanding, the prevailinga pproaches for identifying CNVs have been either arraybased or quantitative( primarily, polymerase chain reaction [PCR]-based) assays. The former is used primarily for genome scanning, while the latter is used for locus-specific testing (often with multiple lociscreened simultaneously) in first-pass and confirmatorys creening (for example,t oconfirmg enome scandata). Additional methods for detecting CNVs,including what arereferred to as computationally based and genotype-based assays, have come forth recently.Aflowcharto utliningt he available methodologies is giveni nF igure 2, while some techniques ared iscussed in greaterdetail belowand illustrated in Figure 3.

Array-based methods for detection of copyn umber variation
Array-based methods offer the mostr obust approach to scanning for CNVs in ag enome-wide manner.T he main differences between available platforms are in the type of DNA molecule found on the array( genomic DNA clones, cDNA, PCR products or oligonucleotides) and the type of hybridisation used (competitive hybridisationv ersus singlesource hybridisation). These different platformse ach have inherent advantages and disadvantages.

Comparativeg enomic hybridisation
Detection of DNA copyn umber variation between differentially labelled test and reference genomes has long been feasible usingc ompetitive in situ hybridisationt om etaphase spreads. 34 This approach, in which fluorescence ratiosbetween the twoh ybridised DNA sources reveal regionso fg aino r loss, is referred to as comparativeg enomic hybridisation (CGH). With the advent of microarrays, sample DNA can nowb eh ybridised to arrays spotted with am ultitude of DNA sequences. This not only increases the specificity of CGH, but it also provides adramatic increase in resolution. Array-based CGH (aCGH) wasfi rstd escribed in 1997. 35 Tw oy earsl ater,t he first experiment to employagenomewide scanning approach utilised an arrays potted with cDNA clones. 36 As the technologye volved, bacterial artificial chromosomes (BACs) became the mostc ommon type of genomic clone for spotting on the arrays ( Figure 3A). 13,15,37 -43 There are several advantages to using BACc lones, including their availability,c overage of the genome,k nown sequence identifiers (eg completed sequence or clone end-sequence) and FISH-readiness. Moreover,B ACs and other large insertc lones retain ag enomic complexity that often results in amore robust signal-to-noise ratio profile.
There are nows everal BACa rraysa vailable that provide partial or near-complete genome-wide coverage (genomewide coverage is also referred to as clone 'tiling-path' resolution analysis). 38 -42 There are also commercial BACa rrays  Figure 1. Illustration of copynumber differences between two homologous chromosomes (A and B). Coloured boxes indicate copy number changes, including tandem duplication (red)a nd twot ypes of deletions: deletion of at andem segment (green) and deletion of an interspersed segment (blue). Below the chromosomes is al ine graph showing the results of ac omparison of chromosome Bwith chromosome A. This is an idealised output of the results that could be obtained from ah igh-resolution comparative genomic hybridisation experiment.
Carson et al.

Review REVIEW
available,b oth for genome-wide screening and for targeting regions involved in microdeletions yndromes( eg Spectral Genomics and Signature Genomics). Additionally,BAC arrays have been designed that target potentialr earrangement hotspots that arefl anked by highly identicals egmental duplications 15 ( Figure 3B). BACa rraysd o, however, have some inherent disadvantages. First, even with dense coverage,t he resolution of BACa rraysi sl imited by the size of the clones themselves. Since BACclones aregenerally 80 -200 kb in size, it can be difficult to identify CNVs smaller than , 50 kb. Higher-resolution analysis can be achieved by spotting shorter DNA molecules, such as cosmids, fosmids, cDNA, 36 PCR products 44,45 or oligonucleotides on the array. 14,31,46 -49 Secondly,t ot he authors' knowledge,t here are currently no available commercial services providing 'tiling-path' resolution arrays. Presumably,this is because of the inherent challenges in manufacturing this type of product. An academic enterprise from the British Columbia Cancer Research Centre (http:// www.bccrc.ca/arraycgh/arrays.html) is the only provider of such arrays, although there mayb eo ther vendors that the authorsare unaware of.Moreover, somegroupsoffer access to their arrays on ac ollaborativeb asis.  Figure 2. Flowchartillustrating some of the factors that need to be considered when attempting to assess copynumber variation.

Computationallybased
Hexagons are used to designate choices, and rounded rectangles indicate the major techniques that are discussed further in this review. Note the arrow from the bottom of the whole-genome scan that leads back to targeted analysis. This emphasises the fact that copy number variants identified through whole-genome scans can be confirmed or tested directly in large cohorts of individuals using targeted analyses. Abbreviations: BACs, bacterial artificial chromosomes; CGH, comparative genomic hybridisation; MAPH, multiplex amplifiable probe hybridisation; MLPA, multiplex ligation-dependent probe amplification; PCR, polymerase chain reaction; QMPSF, quantitative multiplex PCR of shortfluorescent fragments; qPCR, quantitative PCR; SNP,s ingle nucleotide polymorphism.  . Comparison of methods used to detect novelc opyn umber variations. Three methods employ comparative genomic hybridisation. In these methods, test (green bar) and reference (red bar) DNA are hybridised against: (A) aBAC array; 13 (B) at argeted BACarray; 15 or (C) ar epresentational oligonucleotide array. 31 Probes of the arraya re drawn abovethe DNA region to which they hybridise.D arkened boxeswithin the test or reference sequence bars correspond to duplications only found in one of the two sequences. The colour of the probe indicates the relative hybridisation of the arraytot he assayedD NA, with yellow representing equal copynumber,g reen representing ad uplication of the test region and red representing aduplication of the reference region. (D) The single nucleotide polymorphism (SNP) gene chip can be used as aq uantitative array 32 that assays only at est (green) DNA sequence.Q uantification of copynumber is obtained by comparing the intensity levels of 20 pairs of oligonucleotide probes per SNP (red lines) to data available for ar eference set of control individuals. Both (C) and (D) use genomic representations created using restriction enzyme fragmentation followed by adaptor-mediated polymerase chain reaction (PCR). This PCR is optimised to amplify genomic DNA between 200 and 1,200 base pairs (bp). Thus, the regions illustrated are much smaller than in the other methods, demonstrated by their smaller scale. (E) and (F) use in silico genome comparisons. In (E), intra-or inter-assembly comparisons not only detect copyn umber changes (insertion/deletions; segments 2a nd 3), but can also identify inversions (segments 7-9), sequence variations (segments 11a versus 11b) and rearrangements (segments 13 and 14). Segments 18 -20s how how whole-genome shotgun (WGS) read depth can be used to detect duplications. As ignificant increase in the depth of WGS reads per 5kilobases (kb) is often an indication of duplication (dark green box) in the genome. 33 In (F), the comparison is between the NCBI assembly (green) and a fosmid paired-end sequence (red) derived genome. 16 Fosmid paired-ends areshown abovetheir corresponding sequence in the public assembly.T he average size of the fosmid insert is , 40 kb. Significant deviations ( , 32 kb or . 48 kb) from this mean could indicate a copyn umber variant and are highlighted. Abbreviations: BAC, bacterial artificial chromosome; Mb, megabase(s).
Carson et al.

Review REVIEW
Spotting an arrayw ith oligonucleotides is one wayo f achieving an increased resolution. The use of oligonucleotide CGH arrays for copynumberdetection wasfirstimplemented in an assayf ormat called representational oligonucleotide microarraya nalysis( ROMA) 14,31 ( Figure 3C). Similarly to BACa rrays, ROMA utilises differentially labelled test and reference genome DNA competitively hybridised to an array. In ordert oi ncrease the signal-to-noise ratio,h owever,t he complexity of the input DNA is reduced. This is accomplished by am ethod called representation or whole-genome sampling. 50 In this method, the genomic DNA is fragmented with aspecific restriction enzyme.The resulting fragments are then ligatedwith adaptors which act as primer binding sites in asubsequent PCR amplification. The amplificationconditions are set only to amplify fragments up to , 1.2 kb.T he initial version of this method used Bgl II digestion, which givesrise to an estimated 200,000 fragments in this size range. 31 Oligonucleotides for the arraya re then designed to match sequences present in the (complexity-reduced) representation sample.T he published ROMA arrays have contained 85,000 oligonucleotides that are each 70 base pairs( bp) in length, giving ar esolution of , 30 kb throughout the genome.A potentiald isadvantage of thesefi rst-generation ROMA arrays is that, in general, only unique regions in the genomew ere represented. This means that the , 5p er cent of the genome covered by lowc opyr epeats( LCRs -a lso called segmental duplications) 33,51 would not generally be assayed, even though alarge fraction of the existing copynumber variation has been shown to reside in these regions. 13,15,52 The omissiono f duplicate regions will most likelyb er emedied in higher resolutiona rrays, spotted with al arger number of oligonucleotides, which are expected to be forthcoming (ROMA platformsu sing arrays with . 300,000 probes are nowb eing developed). As reflected in the published literature,t he ROMA technologyseems to be primarily used by its inventor, Dr Michael Wigler,a long with his collaborators, and has not yetb een adopted by the majority of users.
In addition to the ROMA methodology,companies such as Agilent (http://www.agilent.com) and NimbleGen (http:// www.nimblegen.com) have generated long oligonucleotide arrays for direct( non-representational) CGH analysis. Agilent nowproduces an arraywith , 43,000 60-mer probes with bias toward genic and non-repetitive euchromatic regions. 43,53,54 Experiments usingthis arraycan be performed with as little as 100 ng of input DNA per sample and provides ar esolution that can discernl argec hromosomal aberrations but lacks the powert opick up small CNVs. To address the need for greater resolution, Agilent is expecting to release next-generation arrays with , 185,000 and . 300,000p robes early next year. NimbleGen offersb oth whole-genome and custom-targeted 'fine-tiling' arrays containing 385,000 probes. 18,55 These provideamean spacing of , 7-8 kb on the whole-genome arrayand can give aresolution better than 500 bp (with probes as denselys paced as 10 bp apart) in custom fine-tiling arrays.
NimbleGen uses probes that vary in size between 45 -85bp, such that their T m sa re equalised at 768 C( 'isothermal'), thus providing more uniformp robe performance.I nt he Nimble-Gen business model, however, the NimbleGen staff performs all of the hybridisation experiments, requiring customerst o supply 1-3m go fp urified sample and reference genomic DNA. Data analysisisalso heavily based on company-designed algorithms.A lthough this mayb ep referable for many laboratories, some mayfind it difficult to provide such arelatively large quantity of DNA. Other laboratories will not be able to provide DNA because of research ethics restrictionsgoverning howt heir DNA samples can be handled and disseminated.

SNP chips
As lightly different approach to using oligonucleotide arrays for copyn umber detection is to use the hybridisation intensities from SNPg enotyping platforms, such as those made by Affymetrix or Illumina. 32,56 -62 Although both of these approaches use oligonucleotides spotted on an array, the SNP arrays differ from CGH in an umber of fundamental ways. Most notably,t he SNPa rraysw ere originally designed for genotyping, so they do not directly compare twoD NA sources through competitiveh ybridisation, as is the case with CGH. Instead, hybridisationi ntensities from as ingle DNA source are adapted to provide information about copynumber through subsequent comparisons with aset of reference values from control individuals.
With the AffymetrixSNP arrays, input DNA preparation is similar to ROMA in that the DNA to be hybridised is reduced in complexity by restriction digestion followedb y PCR-based amplificationo ff ragments in as pecific size range. These DNA fragments are then hybridised to an arrayw here each SNP is represented by 20 probe pairsr epresenting the matcha nd mismatch alleles 57 ( Figure 3D). Comparison of the intensity values from thesem atch and mismatch probes to reference values from control individuals is used to acquire copyn umber data. The highest resolution arrayc urrently available assays roughly 500,000SNPs, withanaverage spacing of , 6kb. This high density of probes allows consecutive SNPs to be used to estimate copyn umbera nd effectively increases the signal-to-noise ratio.A lthough the distribution of probes along the euchromatic regionsi sr elatively uniform,t herei s under-representationw ithin segmentally duplicated regions. As reflected in numerous publications demonstrating applicability,t he mostc ommon publicly available algorithms used to extract the copyn umberi ntensity data include the Dchip 63 and CNAT 57 packages.
Insteadofspotting oligonucleotides on glass slides,Illumina attaches probes to beads. The coated beads arep ooled and then positioned randomlyo na na rray,g ivinga pproximately 30-foldf eature redundancy.B ecause the beads are positioned randomly,aseries of hybridisations aren eeded to identify

Genotype-based methods for identifying deletions
Along with hybridisationi ntensities, SNPp latformsa lso generate genotype data. These genotypes, when generated from multiple related individuals, afford additional opportunities to assess genomic deletions. With as ingle individual, genotyping will not detect deletions due to the fact that hemizygosity will be miscalled as homozygosity for the present allele (or,i nt he case of homozygous deletion,anull or failed genotype will be observed).B yc ontrast,i fp arent-offspring trios area nalysed,l osses of heterozygosity 65,66 can be discovered when Mendelian inheritance is violated. These can lend support to regions identifieda sd eletions when assessing hybridisation intensities. Naturally,n on-Mendelian inheritance can also signify other genomic alterations, including segmental or completeu niparental isodisomy or heterodisomy. 67,68 Additionally,l arge collections of genotype data, sucha s those generated by the InternationalH apMap Consortium, can be used to identify deletion polymorphisms. 18 -20 Tw o recent studies have utilised this resource to identifydeletionsin the human genome by looking for violations of Mendelian inheritance,H ardy-Weinberg disequilibrium or null genotypes. 19,20 An earlier study of patients amples offersaprototypic example of howt his approach can be used in ad isease study. 69 Certain regions of the genome have extensivem arker coverage,g iving this approach ah igh resolution with the powert od etect relatively small( as small as 1kb) deletions.
Interestingly,m any of the deletions appear to be in high linkage disequilibrium (LD) with neighbouring SNPs. 18,20,70 This could indicate that they have arisen on as ingle chromosome from an ancestral population. This is important, as it allows deletion carrier status to be inferred from SNP genotypes and further allows the investigation of associations between deletions and disease.I ti sa lso expected that other CNVs,s uch as insertions/deletions of regions with copy number greater than two, will shows ignificantL Dw ith neighbouring SNPs. To date, however, no reports have been published to support this assumption. Additionally,L D associations aree xpected to be more complex in regionst hat have many copies (such as segmental duplications), making complexd isease association analyses more difficult.

Quantitativem ethods for locus-specific testing
Quantitativem ethods offer an effective approach to assessing variationa tt argeted loci. These can be used to assess copy number changes at knowno rp roposed disease regions,o ften with high-throughput screening of large cohorts of samples. Alternatively,t hese methods can be used to confirmr egions that have been identifiedi nw hole-genome scans using arraybased approaches. In general, the major difference between quantitativem ethods is in howt hey analyse thei nput DNA; do they directly assaythe genomic DNA with PCRordothey first hybridise with at argetedp robe? These differences are discussed in more detail below.

QuantitativeP CR
Real-time quantitativeP CR (qPCR)h as been used for many yearsi nt he quantificationo fg ene expression. 71 The basis of qPCR is that the rate of amplificationi sp roportionatet ot he number of template copies. By monitoring the amplification in 'real-time', it is possible to determine when the PCR reaction is in thee xponential phase of amplification. It is in this phase that quantificationo ft he starting template occurs; however, it is alson ecessaryt ou se ac ontrol region with known copyn umbert oa djust for variable starting amounts between samples. There ares everal types of qPCR assay, but they area ll based on the sameb asic principle: an increase in PCR product is manifested as an increase in fluorescence, which can be monitoredt hroughout the PCR reaction. Although most available protocols for real-time qPCRw ork well for scoring deletionsa nd duplications, 72,73 they are, generally,not suitable for multiplexing. To facilitate scoring of multiple target regions in as ingle experiment, some novel approaches have been developed. These assays ares imilar in that the final productisseparated by size within agel, which is inherently more amenable to multiplexingt han spectral separation of products.
The simplesto ft hese assays is called quantitativem ultiplex PCR of shortfl uorescent fragments (QMPSF). 74,75 PCR assays are designed to amplify up to ten target regionsi n parallel,w ith each productv arying in length. One primer for each target is labelled with a6 -FAM (6-carboxyfluorescein) moiety,w hile the other primer carries as horts tabilising tail sequence.P CR amplification is stopped within the exponential phase and products of different size ares eparated by electrophoresis. Each product is represented by ap eak, and the peak height -r elative to ar eference -c orrelates with the amount of product. Becausee ach reaction requires a labelled primer,e xperiments can becomee xpensivei fm any regions are tested. To lowert he cost, an alternativep rotocol has beendeveloped, 76 in which all primersare designed with a specifich exa-decamer tail sequence.T hese tails thens erve as templates in as ubsequent amplificationr eaction, where universal labelled primersare used. Although this adds an extra step to the protocol, only as ingle FAM-labelled primer is needed,i rrespective of the number of targets.
In MAPH, the test DNA is denatured, bound to an ylon filter and then hybridised with probes specific for the target region. Each probe has ad ifferent length but they all carry identicalt ail sequences, allowing subsequent amplification with fluorescently-labelled universal primers. Aftera mplification, products are separated by size and quantified based on the fluorescence intensity ratio of target compared withc ontrol regions.U pt o4 0l oci can be interrogated simultaneously. 83 The main disadvantage of MAPH is the amount of work and optimisation needed to obtain ar obust probe set. Each probe has to be cloned in order to add the universal tail sequences. Once ap robe set has been developed, however, it can be used for high-throughput screening of all exons of as pecific gene (eg DMD 84 )f or deletionso rd uplications in large patientc ohorts.
MLPAi sd ifferent from MAPHi nt hat it is performed in solution, it is, however, still dependent on probe amplification.InMLPA, pairsofprobes are made for each target. The twop robes in each pair ared esigned to hybridise adjacent to each other at the target region. Through aDNA ligation step, acontiguous probe molecule is created. The probes carry atail sequence that servesa satemplate for universal fluorescentlylabelled primersi nasubsequent amplification step.T he resulting products can then be separated and quantified in the same manner as in MAPH. In the initial protocolf or MLPA, each probe wasc loned in av ector.Amore recent advance demonstrates that synthesised probes work equally well, but there is al imit to the size of the probes that can be produced. This, however, can be overcome by introducing twod ifferent universal primer pairs, each labelled withad ifferent fluorescent marker,t herebya llowing for separation of the final products based on boths ize and wavelength of fluorescence. 85,86 MLPAh as been successfulu singu pt o4 0 probes in asingleexperiment. 80 As with MAPH, once aprobe set has been developed, it worksr eproducibly in screening large cohorts of samples. 82 There are more than 50 commercially available pre-tested MLPAp robe kitsd esigned for many of the known microdeletion syndromes, as well as for genes where intragenic deletions and duplications are common causes of disease (http://www.mrc-holland.com).

Computationally-based methods for detecting structural variants
While the above techniques physically assayD NA molecules to assess copyn umberv ariation, it is also possible to evaluate genomes in silico by comparing DNA sequences. As more sequence data become available,this optionwill become more viable and popular.T hree main strategies have emerged, utilisingd ifferent types of sequence data: (a) sequence assemblies (Figure3 E); (b) clone end sequences ( Figure 3F); and (c) sequence read depths ( Figure 3E), although these methods arel argely limited to the analysiso fd ata that are already publicly available from large-scale sequencing initiatives. This is due to the current, hugely prohibitivec osto f generating full sequence coverage or redundant clone library end-sequence data from an individual'sD NA.
Whole-genomeorchromosomeassemblies have the benefit of being able to detect practically anyt ype of variation, even down to the single nucleotide.T his provides an advantage over array-based methods, where the resolution is dependent on the densityo fp robes spotted on the array. In this strategy, sequence assemblies from twos ources arec ompared computationally,a llowing differences in sequence,c opyn umber or orientation to be annotated.A lthough the majority of these strategies compare twoh uman assemblies, such as thet wo distinct and near-complete assemblies of chromosome 7 87 -89 and the HLA region, 90 it is also possible to detect large variations by comparing the human genome with its closest living relative,t he chimpanzee.A lthough these interspecies comparisons primarily look for sequencev ariation between species, they can also identify polymorphic differences, such as inversions 17 and copyn umber differences. 91 Using clone end-sequences (eg fosmids) from ag enomic libraryc onstructed from as ingle genome is another sensitive approach that allows for the detection of variants as small as 8kbi no ne study. 16 In this technique,e nd-sequences are anchored to the publicg enomea ssembly.T he distance between the twoe nds can then be calculated, givinga n observedo rc omputed size of the clone.B ecauset he approximate physical size of the clone is known (in the case of fosmids, the size is typically , 40 kb), large deviations between the computed and physical sizes can represent variations between the twogenomes. Although some large deletions can be identified, this approach generally does not readily allow the detection of copyn umber increases larger than 40 kb. On top of copyn umberv ariation, this method is capable of Strategies for the detection of structural variants in the human genome Review REVIEW detecting some inversions by looking for end-sequences that have an incorrect orientation withr espect to the public assembly.Along withgenome assembly sequencecomparisons, this represents the only published method (to the authors' knowledge) for genome-wide investigation of inversions. There is currently aN ationalI nstitutes of Health-led largescale genome initiativet oe nd-sequence fosmid clones from numerous libraries prepared from different HapMap samples aimed at discovering structural variationi nt he human genome.
Following the example of scanning for segmental duplications, 33 the third approach to assessing sequence-read depth will probably become more relevant when whole-genome shotgun sequencing of multiple genomes becomes standard practice. 92 Ther ationale behindt his technique is that regions of the assembly with greater readd epth mayb ep resent in multiple copies and, in certain instances, the copyn umbero f the region mayv aryb etween individuals.
In all threeo ft hese sequencea nalysiss trategies, the accuracy of the data will only be as good as the quality of the assemblies available and confirmation using as econdary method (eg quantitativemethods listed above)will probably be required. It is also noteworthyt hat the study of SNPs located in LCRs, 93 some being paralogous sequence variants, 52 can also be used to detects imple and complex copyn umber differences arising from duplication, deletion or gene conversion.

Clinical diagnostic implications
The ultimate aim of genetic diagnostics is to evaluate the genomic content of ac ell or group of cells as completely and accurately as possible.T he objectivei st op rovide either diagnostic insight into the phenotype of ap atiento r, alternatively,t op rovide predictive or prognostic insight into the patient'sdisease or developmental outcome.The upshot is that there have alwaysb eens ome implicit assumptions as to what constitutes an ormal genome and ac oncomitant normal phenotype and hence,b yextension, what would constitute an abnormal genome and ad eleterious phenotype.F or example, the location of the abl gene and BCR locus (either by molecular genetic or cytogenetic means) on chromosomes 9q34 and 22q11.2, respectively,isc onsidered normal, whereas their co-localisationo naPhiladelphia chromosome is considered abnormal and suggestiveo fc hronic myelogenous leukaemia. Similarly,t he presence of twoc opies of chromosome 21 in am etaphase preparation is considered consistent with anormal karyotype,whereas the presence of athirdcopy would be consistent withD owns yndrome.
Whereas the majority of approved diagnostic tests query specificg enomic loci of confirmed pathogenicity,s uch as in the previously mentioned examples, othersa ssess loci only suspected to be associated withdisease (eg some sub-telomeric deletions in mental retardation). 94 -100 In the latter scenario, if ag enomic variant is found at the locus in questioni na proband, current conventiona ssumes that suchavariation is pathogenic if: (i) there is an obvious accompanying phenotypic abnormalityi nt he proband and (ii) the variant is absenti n both parents. If one of these parametersi su ntrue,h owever, the diagnostician faces ad ilemma as to the potentialp athogenicity of the genomic aberration. 101 The discovery of the phenomena of CNVs, particularly in thec ontext of wholegenome array-based analyses, will challenge conventional understanding of the genome and necessitate ac autionary note regarding assumptions about phenotype/genotype associations. Many questions will emerge,such as: is the CNV detected in ap roband truly the causative genomic aberration, or is it abenign CNV which deflected our attention from the more subtle malignant genomic aberration?' 'Is that genomic variantserendipitously detected in anormal healthyindividual during the course of an unrelated genetic test cause for concern? Is it aprognosticator of alate-onset disease or,again, simplyabenign structural variant?' 'Is that copyn umber polymorphism present in greater than 1p er cent of the population truly benign,o rd oes it, in fact, have ac linical utility in identifying disease susceptibility?' In the authors' estimation, the major factorsi nfluencing which of the current methods willb eu sed broadly for the detection of CNVs in ac linicald iagnostic (or basic research) setting will depend on many factorsi ncluding accuracy, specificity,s et-up time,a ssayc ost, the extent of ag enome required to be assayeda nd requirements for sample input. Broad-based implementation, including those at the regulatory level, will also be influenced by patent restrictions. The flowchartinFigure 2summarises some factorsthat need to be considered regarding the technologies and approaches described in this review. To obtaint he most comprehensive analysiso fb otht he microscopic and sub-microscopicc opy number variation of agenome,itisthe authors' belief that the likelyp aradigm that will prevail will consist of: (i) ak aryotype and array-based scan for global assessment of balanced and copyn umber-type unbalanced variants, respectively,f ollowed by (ii) locus-specific confirmation using targeted FISH or quantitativeP CR-based approaches. Obviously,w here simple alterations areb eing tested for in defined phenotypes,s uch as the dosage-related microdeletion and duplication genomic disorders, 102,103 only those techniques yielding locus-specific data would be required.

Unresolved issues and recommendations
Currently,t here is no singlea pproach that will allowa ll types of copyn umber or other structural variants to be identified. This is underscored by the surprisingly small degreeofo verlap between the published datasets, which in some cases assess Carson et al. Review REVIEW identicals amples. 28 Shorto fc omparing 'finished' sequence assemblies generated from unique donor (and preferably haploid) sources, 26 am ultitude of approaches will be required to generate and validate variants in ac omprehensive manner. There is also the possibility that new technologies, such as those that utilise direct counting and/or sequencing of single DNA molecules, will provide superior resolution of CNVs. 92,104 Not only would these analyses be high resolution and high throughput but they could be highly informative, in that they would provide more accurate copyn umbers,a s opposed to the current techniques, which only annotate CNVs as gains or losses. Notwithstanding, thec urrent repertoire of technologies has facilitated numerous significant studies of chromosomes tructure, 105 disease 64,100 and clinical diagnostics. 98,106,107 For an ewcomer to this field, the authorsr ecommend reading papersr elevant to the biological questiona th and and then assessing which approach(es) is/are most conducivet o successfor that project.Event he best-funded laboratories will not be able to establish and maintain the complete range of technologies, so selectiono fo ne or af ew platformsw ill need to be made.Inaway, the copynumber and structural variation field is currently in the sames tate of flux as the SNPg enotyping field wasi nt he late 1990s to early 2000s. Ag eneral trend in technologies seemst ob em oving towards the developmentoflong ( . 60-mers) oligonucleotide arrays containing very high probe coverage uniformly distributed across the genome (egA gilent, Nimblegen platforms and others). It will be very interesting to seeh ow the market dictates the success and evolutiono ft his type of platformv ersus SNP-based approaches (eg Affymetrix and Illumina), which also allow extraction of copyn umber information but have the added information of genotypes. There will also be progress in identifying surrogate SNPs 19,70 or other markersf or the more complex structural variants, allowing typing in large sample collections. In some cases,the technologyused will be dictated by what equipment already exists in local (core)l aboratories. The authorsc onsider this first review as providing as napshot of what is happening in thefi eld and foresee that as econd edition summaryi nayear or so would look entirely different, due to rapid advances in thefi eld.
When investing in the technologies described in this paper, there are considerations that extend outside those of most typicall aboratorym anagementd ecisions. For example, beyond the rather straightforward choice of targeted or genome-wide experiments ( Figure 2) and then selectionofthe corresponding reagent( eg types of primerso rm icroarrays), one must consider the trainingl evel of technicians, specialised equipment (eg hybridisation ovens, scanners), and extended warranty and service supportf or some expensivee quipment that mayr equirec onstant upgrading. Moreover, in the authors' experience,t he weakest links in the current state of the technologies are the algorithms available for mining accurates tructural variationd ata. Investments in this area are growing, such that interpretation of anyd ata generatedn ow will only improve.
Many other practicali ssues arew orth discussing, including: (1) DNA must be of very high-quality preparation and, for genetic studies, preferably from peripheral blood (lymphoblastoid lines can introducesome transformationderived alterations). Low-quality DNA will lead to major problems, whereas whole-genome amplified DNA seemst ob es uitable; 108 -110 however, this assertion will requirem orea nalysist oc onfirmt hat copyn umber ratios arem aintained.  15 however, neither of these datasets is currently displayedb yt he major genome browsers (NCBI, UCSC). The number of samples and entries in these databases are still rudimentaryb ut aree xpected to grow substantially in 2006. The quality of the data in the databases is fairly heterogeneous, with only as mall proportion of variants being validatedu sing as econd technology. This will become even more of an issue as large genotyping datasets become as ignificant source for discovery of new structural variants. 64 (4) There are still limitations to all of the technologies.
Most approaches readily allowr esolution of as ingle gain or loss in copyn umbera wayf romd iploidy.W ith more complex deviations, however, where the normal copy number mayb efi ve or six, the interpretation becomes increasingly imprecise. (5) The accuracy of mappingt he breakpoints of as tructural variant can be fairly wide-ranging, due to limitations inherent in the technique or the density and distribution of probes on an array. Moreover, because many of the breakpoint regions can overlap segmental duplicationso r gaps in the humans equence assembly,m ore detailed targeted analysisi so ften required. This review focuses on methodologies for the identification of CNVs in the human genome.I ft he goal is explicitly to fine-map the precise content and boundaries of the variable regions,h owever, additional experiments are required. These include the elucidation of the DNA sequence contenti ne ach copyo faregion. Due to the nature of their design, certain approaches arem ore amenable to this type of analysis. For example, in the fosmid end-sequencing strategy,fosmid clones overlapping the variable intervala re automatically available for sequencing experiments. Also,s omeo ft hese duplicated sequences maya lready be present in the human genome raws equences (as seen in the human sequence read-depth analysis 33 ), but arec ollapsed into as ingle copyi nt he current genome build. For some variations, however, such as duplications . 100 kb,t his process could be more tedious and possibly requiree xtensivel ibrary preparation and/or screening, mapping and re-sequencing.
The field of structural variationh as nowt aken centre stage in humang enetics and genomicsr esearch. The strengths and weaknesses of the different experimental and technical approaches discussed in this review,a nd also new ones, will be borne out through thorough scientific investigation. As more studies are undertaken, in particular through the investigation of thea ssociation between structural variants and disease, 111 -113 ag reater understanding of the nature of the human genome and its variability in the population will be achieved. As in many other genomic studies, advances in the analysisofCNVsand structural variants in the human genome will set the precedent for the studyo fg enomes in other species.