How to design a national genomic project—a systematic review of active projects

Kovanda, Anja; Zimani, Ana Nyasha; Peterlin, Borut

doi:10.1186/s40246-021-00315-6

Review
Open access
Published: 24 March 2021

How to design a national genomic project—a systematic review of active projects

Human Genomics volume 15, Article number: 20 (2021) Cite this article

8634 Accesses
13 Citations
3 Altmetric
Metrics details

Abstract

An increasing number of countries are investing efforts to exploit the human genome, in order to improve genetic diagnostics and to pave the way for the integration of precision medicine into health systems. The expected benefits include improved understanding of normal and pathological genomic variation, shorter time-to-diagnosis, cost-effective diagnostics, targeted prevention and treatment, and research advances.

We review the 41 currently active individual national projects concerning their aims and scope, the number and age structure of included subjects, funding, data sharing goals and methods, and linkage with biobanks, medical data, and non-medical data (exposome). The main aims of ongoing projects were to determine normal genomic variation (90%), determine pathological genomic variation (rare disease, complex diseases, cancer, etc.) (71%), improve infrastructure (59%), and enable personalized medicine (37%). Numbers of subjects to be sequenced ranges substantially, from a hundred to over a million, representing in some cases a significant portion of the population. Approximately half of the projects report public funding, with the rest having various mixed or private funding arrangements. 90% of projects report data sharing (public, academic, and/or commercial with various levels of access) and plan on linking genomic data and medical data (78%), existing biobanks (44%), and/or non-medical data (24%) as the basis for enabling personal/precision medicine in the future.

Our results show substantial diversity in the analysed categories of 41 ongoing national projects. The overview of current designs will hopefully inform national initiatives in designing new genomic projects and contribute to standardisation and international collaboration.

Background

Genomic medicine is the use of genetic information to inform medical care or predict the risk of disease and has been importantly influenced by novel technology such as whole-exome sequencing and whole-genome sequencing [1, 2]. This has led to a significant improvement of health systems particularly in the diagnosis of rare genetic disorders and cancer [3,4,5,6,7] as well as in the development of precision medicine, which is the use of diagnostic tools and treatments targeted to the needs of the individual patient based on their genomics, epigenomics, proteomics, metabolomics, lipidomics, and other data such as environmental and lifestyle information [3, 8].

Thirty years ago, in 1990, the Human Genome Project was initiated with the primary goal to obtain a highly accurate sequence of the human genome and to identify its genes [9, 10]. It was followed, in 1998, by the Icelandic deCode Project, the first major attempt to link genomic data with other medical and non-medical data [11], and in 2010 by the UK10K project, a collaboration among several UK public and private institutions, to identify genetic causes of rare diseases [12]. In 2015, the large precision medicine initiatives of the USA and China were started (to be completed within the next decade) [13,14,15,16]. In Europe, the initiative “Towards access to at least 1 million sequenced genomes in the EU by 2022” started in 2018 with the aim to share genomic information and best practices among member states [13, 14, 17, 18]. There are high expectations on the benefits of whole genomic sequencing in terms of the development of precision medicine including improved and cost-effective diagnostics, more targeted prevention and treatment. Nevertheless, few of the projected gains have been demonstrated and no standards on designing the national genome projects have been developed so far.

With this systematic review, we aimed to provide an overview of available information on active national genome projects worldwide in terms of identifying common characteristics and differences among them, which could provide a basis for developing best practices and standards for the design of national projects and sharing of national genome resources.

Materials and methods

The principles of the PRISMA model were used in the preparation of this work, where possible and appropriate (Fig. 1) [19].

Shortly, to identify existing national genomic projects, PubMed (www.ncbi.nlm.nih.gov/pubmed), Google, and European Genome Phenome Archive-EGA (https://ega-archive.org/) searches were performed in April 2020 by using the search strings: (<country name> [Title]) and (human genome project).

Country names were used in their English language form as listed on Wikipedia countries and dependencies site (https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population).

The following exclusion criteria were used to classify on-going projects: projects concluded prior to the year 2020 or planned with no imminent date in the year 2021 were classified as ‘not currently on-going’; international projects and/or those providing only samples/sequencing facilities were defined as ‘international-scope projects’; and finally, those with unavailable information on key features examined in the article (non-functional websites, announcements with insufficient information, no information in the English language) were defined as ‘limited scope projects'. All three authors analysed and co-reviewed the data and any discrepancies and/or inconsistencies were resolved through agreement. Projects that were not currently on-going, were of limited-scope, and those of international, rather than national scope were excluded from the analysis (Fig. 1).

The complete list of categories for all identified projects is given in Supplement Table 1.

The contents of the individual national project websites were browsed for information pertaining to (1) the aims and scope of the individual project (determining normal and pathological genomic variation, infrastructure (including sequencing and analysis capacities, implementation of standards, data management, education, integration of genomics into existing health-care systems), and intention of facilitating personalized medicine); (2) the number and age structure of included subjects; (3) funding; (4) data sharing goals and methods; and (5) linkage with biobanks, medical data, and non-medical data.

A PRISMA flow-chart diagram was generated using the on-line template (http://www.prisma-statement.org/).

Shared aims of national genomic projects were visualized using an online VENN diagram tool (http://bioinformatics.psb.ugent.be/cgi-bin/liste/Venn/calculate_venn.htpl.).

World maps of national genomic projects were constructed using the online tools available at Mapchart.net (https://mapchart.net/world.html).

Results

A total of 86 countries with genomic projects and/or genomic databases were identified among the 240 countries and territories searched, of which 41 projects were currently active, according to the information provided by respective websites [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60] (Fig. 1). The remaining projects were either not active at the moment or were part of larger international projects (such as H3Africa) and hence not actual ‘national’ projects in a strict sense (Fig. 2). The full list of identified projects is given in Supplement table 1, List of national projects.

Aims and scope

The aims of the national genomic projects consisted of four major categories: (1) determining normal genomic variation, (2) determining pathological genomic variation (clinical cohorts such as rare diseases, cancer, complex diseases, etc.), (3) infrastructure, and (4) facilitating personalized and precision medicine (Fig. 3). Additionally, many country-specific aims were also identified, such as history/ethnic studies (Armenia, Brazil, Chile, Hong Kong, Iran, Malta, Mexico, New Zealand, Russia, Singapore, Vietnam) [20,21,22, 25, 31, 34, 41, 42, 45, 56, 61], drug discovery (Australia, Bahrain, Cyprus, Hong Kong, Japan, Malta, Switzerland, Thailand, UK) [23, 37, 39, 41, 43, 45, 46, 48, 60, 62], reparation efforts (Argentina) [63], or specific health-related goals (infectious diseases interactions—e.g. malaria, tuberculosis in endemic countries) [64, 65].

Determining normal genomic variation

The most common aim (90%, 37/41) of national genomic projects was to investigate normal genomic variation by sequencing healthy participants. Because defining health in the context of genomic testing can be challenging, especially in the case of non-penetrant mutations and late-onset disorders, most national projects approached this challenge by either creating cohorts based on demographic data (9/41 projects) and linking them with medical data or specific exclusion criteria, or by specifically identifying healthy individuals (healthy parents from trio testing in rare diseases, longitudinal health-tracking cohorts from previous studies) (Supplement Table 1).

Determining pathological genomic variation

The second most common aim was to determine pathological genomic variation through the sequencing of clinical cohorts (71%, 29/41). Seven of the 29 (24%) of the national projects clearly defined the number of subjects they plan to include in their clinical cohorts in advance (France, UK, Australia, Hong Kong, New Zealand, Thailand, and Slovenia), as well as the cohorts or pilot projects themselves. In case of France, 48 clinical cohorts will be included [30], the UK project will include over 190 rare diseases and cancer program [37], and similarly, Australia will include 18 rare disease and cancer flagship projects [66]. The final cohorts in the rest of the projects aiming to determine pathological genomic variation will depend on various factors (funding, pilot initiatives etc.) and will be discussed further below.

Infrastructure

The third most common aim, which was reported by roughly two thirds of the projects (59%, 24/41), was the implementation of various infrastructural goals (Supplement Table 1). Infrastructural goals were not a homologous category and reflected the individual projects’ existing sequencing and data-analysis infrastructure, and personnel capacities. The most frequently reported infrastructural project objectives apart from increasing sequencing capacity itself were data management (79%, 19/24), followed by establishing standards of analyses (71%, 17/24), and education (54%, 13/24). Several additional projects (20%, 8/41) intended to approach these goals without reporting them under ‘infrastructure’, probably reflecting cultural conceptual differences in what is considered as infrastructure.

Personalized and precision medicine

Finally, 37% (15/41) of the projects presented tangible plans for the development of personalized medicine, although most projects (85%, 35/41) reported personalized medicine as one of their rationales.

As part of the effort toward introducing personalized medicine, a further subset of countries (e.g. Australia, USA, Japan, Switzerland, etc.) intend to use their genomic data for drug discovery/precision therapy (Supplement Table 1).

Number and age structure of the included subjects

Websites of 37 of 41 national projects (90%) reported information on the total number of subjects to be included in the project. The number of included subjects ranged from a hundred to up to over a million subjects, representing from 0.0001 to 32% of the population. Approximately half of the projects aimed to sequence more than 10,000 subjects, with approximately a quarter aiming to sequence 1000 or less (Table 1). Similarly, in terms of population percentage, only four countries aimed to sequence more than 1% of their population. Of the remaining countries, half aimed to sequence more than 0.02%, and half planned to sequence less than 0.02% of their respective population.

Table 1 Numbers of genomes/WES per country and as a percent of the total population

Full size table

Of the few projects with missing information on the number of subjects included, most were focused primarily on infrastructure, whereas in the remaining projects the exact number of included subjects was reported to be determined during the project (Supplement Table 1).

The age structure of healthy subjects was reported in five projects. In the projects that provided this information, the most common strategy for determining normal genomic variation was to include the general adult population or existing health-tracking cohorts. In the case of pathological genomic variation, some groups of minors were also planned (e.g. in rare diseases). For detailed information on the included cohorts, please see the ‘Discussion’ section.

Funding

Approximately half (51%, 21/41) of all national projects stated the total funding planned (Supplement Table 2). The declared amounts reflect the scopes of the individual projects, ranging from 0.32 M USD to over 9200.00 M USD. Roughly half (49%, 20/41) of national genomic projects reported public funding, with some projects having mixed state and federal (Australia) or EU co-funded projects (e.g. Cyprus, Czech Republic) [35, 36, 46, 49, 57, 67]. The remaining national genomic projects either reported mixed public-private type funding (44%, 18/41) (including for example, USA and Switzerland), or fully private funding (7%, 3/41) (Qatar, Ireland, and Vietnam) [13, 25, 30, 31, 33, 40, 50, 55, 62, 68, 69]. The private funding partners were diverse, including sequencing, investment, and insurance companies, as will be reviewed in the discussion.

Data sharing goals and methods

Data sharing involves the analysis and curation of genomic and associated information obtained during the projects for public, academic, and/or commercial use with various levels of access. It inevitably concerns ethics and legal issues, identifying stakeholders as well as technical aspects and data security. Data sharing represents an important aspect of the national genomic projects, as most reported their main objectives to be determining normal population genomic variation that will enable the use of personalized and precision medicine. 90% (37/41) of the projects reported their intention of sharing the data obtained (Supplement Table 3), and over half of the projects (54%, 22/41) already implemented some form of data sharing. Of the existing data-sharing solutions, the most common format was a database platform with various levels of access for the public, academia, and researchers, whereas the second most common solution consisted of a fully public database containing anonymized or pooled genomic data. For example, Estonia reports it will make their data and DNA available per request and pending approval of the Ethical committee. On the other hand, several of the projects with private funding report they will provide access for approved pharmaceutical/biotechnology companies and research groups (e.g. Ireland, Switzerland, USA).

Association with biobanks, medical, and non-medical data

The majority of the national projects plan on linking their sequencing data with other medical data (78%, 32/41), existing or planned biobanks (54%, 22/41), and/or non-medical data (24%, 10/41), such as environmental and other factors, as the basis for enabling personal/precision medicine (Table 2) (Fig. 4). Additional countries explicitly plan to establish/connect biobanks and databases during the course of their projects (for example Australia, Slovenia) (Supplement Table 1). Finally, 56% (23/41) projects reported their intention to unify or establish standards for analysis and thus make provisions for adequate data management, two key prerequisites for establishing personalized medicine.

Table 2 List of biobanks associated with national genomic projects

Full size table