Abstracts from the Human Genome Meeting 2018

New insights into the human heart development using a combined spatial and single-cell transcriptomics approach


Background
With huge amount of genome-wide mutational data generated by cancer genomic sequencing studies, distinguishing cancer drivers from the vast majority of passengers is important. Existing cancer driver prediction methods capture specific mutational aspects in discriminating potential cancer drivers. We explore the possibility of alterative way in doing the task.

Methods
We noted mutational parameters (functional mutation ratio, mutation frequency and sample mutation recurrence) vary differently among mutant genes of different sizes. This led us to develop our novel algorithm (Mutant Gene Ranker -MuGeR), incorporating the comparison of multiple mutational parameters of target gene against the corresponding background derived from a specific subset of genes using a sliding window approach, to estimate the likelihood of target genes for being potential cancer drivers. We applied our MuGeR algorithm on the The Cancer Genome Atlas (TCGA) datasets.

Results
Empirical data on the TCGA datasets and comparison with the prioritization results of 4 other existing tools (MuSiC, MuSig, TUSON explorer and DOTS-Finder) suggested satisfactory performance of our MuGeR algorithm. More importantly, we demonstrated the existence of specific pattern for mutational parameters across cancers. Conclusions Empirical data verified the usefulness of our MuGeR algorithm in identifying potential cancer drivers. Moreover, our in-depth appraisal of TCGA liver hepatocellular carcinoma datasets further highlighted the frequent mutational dysregulation of ubiquitin-related proteasomal degradation in driving hepatocarcinogenesis.

A2
Maximizing the information extraction of RNA-seq data for achieving personalized medicine Tyler Weirick and Shizuka Uchida Cardiovascular Innovation Institute, University of Louisville, Louisville, KY 40202, United States Human Genomics 2018, 12(Suppl 1):A2 Through the efforts of large-scale sequencing projects around the world (e.g., ENCODE, FANTOM), we have learned that 90% of human genome is transcribed as RNA, yet only a minor portion of these RNAs encode for proteins. Similarly, the processes affecting RNA and functions of RNA have become increasingly complex. Given that RNA is an intermediate between the genomic DNA and proteins, in principle, RNA sequencing (RNA-seq) should be able to reveal the information about the genome (e.g., mutations, including single nucleotide polymorphism (SNPs)) and the final products (i.e., proteins). Although RNA should serve as exact copies of genomic DNA, in reality, RNA can be modified by a variety of enzymes, which results in the addition of 5′-cap and poly A tails as well as various forms of RNA modifications (e.g., RNA editing). To study these multiple processes simultaneously, we created a computational pipeline to maximize the extraction of information from RNA -seq data. In this pipeline, not only do we measure expression levels of RNA (i.e., genes and isoforms), we further detect A-to-I RNA editing sites and clusters of editing or "editing islands" using our previously introduced tool RNAEditor (John, Brief Bioinform, 2016; Stellos, Nat Med, 2016)) as well as circular RNAs (circRNAs) (Boeckel, Circ Res, 2015; Militello, Brief Bioinform, 2017), which arise from the backsplicing of exons and/or introns. Our pipeline is implemented in the Snakemake as workflow management system that confers numerous benefits, such as easy parallelization on HPC-clusters and cloud computing environments, advanced error detection, and automatically deleting intermediate files. Through the detailed analysis for A-to-I RNA editing events that result in adenosine (A) to inosine (I) conversion in RNA but not in genomic DNA, which can be identified as guanine (G) as replacement of A in RNA-seq sequencing reads, it is possible to correctly infer the actual mutations in the human genome, which differs from the reference genome used as template for RNA-seq data analysis. These mutations should represent the individuality of humans. In this meeting, we would like to share our computational pipeline for the further analysis of RNA-seq data and integrating the additional extracted information for use in personalized medicine, especially related to cardiovascular disease ("Cardiovascular Personalized and Precision Medicine").

A3
Versatile instrument for single cell analysis and complex tissue microdissection Stanislav L. Karsten, Zhongcai Ma, Lili C. Kudo NeuroInDx, Inc., 20725 S Western Ave, Ste 100, Torrance, CA 90501, USA Human Genomics 2018, 12(Suppl 1):A3 Collection of specific cells or subanatomical regions from complex heterogeneous tissues is a prerequisite step for understanding complex molecular mechanisms underlying health and disease. Single cell analysis (SCA) has become an essential part of cutting edge biomedical research. There are commercially available single cell collection and tissue microdissection technologies including laser based systems, and cell sorting instruments. However, these platforms are typically sample specific, complex and expensive making integration within standard lab workflows difficult. Moreover, part of SCA is the investigation of single cell adhesion properties that is a key in substrate mediated cell behavior essential for understanding cellular properties in health and disease. There are approaches for single cell adhesion measurements including atomic force microscopy, optical tweezers, and micropipette/capillary aspiration. Unfortunately, none of the existing instruments provide concurrent single cell acquisition and adhesion force measurement prior to its preparation for the downstream analysis (e.g. NGS). Based on our vacuum impulse based cell and tissue acquisition technology (UP8797644) we have developed a universal platform for single cell acquisition and analysis. Developed system collects individual cells from any adherent cultures grown in standard cell culture dishes in as small as 15 nl volume, compatible with downstream SCA and NGS. The system may be used with a wide range of inverted microscopes, and the cells of interest can be identified based on morphology, location or labeling, including fluorescence. Here, individual cells were collected from human neuroblastoma SH-SY5Y, CHO, 3T3 and neural progenitor cell cultures. Collected single cells were dispensed immediately into individual wells for clonal expansion. Clonal expansion for 7 days of these single cells revealed minimal effect on cellular viability (up to 99% when compared to dilution controls). Moreover, trypan blue assay demonstrated survival rates similar with the re-cultivation studies. In addition, we extended the use of our technology for single cell adhesion strength measurement. Adhesion strength for individual cells from several cell cultures was measured using a sensor incorporated in the collection assembly. Measurement of cell adhesion force was performed based on the capillary aspiration techniques reported earlier. Both manual and automatic algorithms in the instrument's software were developed for rapid collection and deposition of target cells and tissue regions. The benefits of the proposed technology include cost-efficiency, simple operation, complete workflow from single cell isolation to adhesion force measurement, compatibility with a wide range of inverted microscopes and use of standard plates and culture dishes. Keywords: adhesion, single cell analysis, tissue, microdissection, archival tissue, region of interest, acquisition, dispensing While copy number variants (CNVs) in human chromosomes have been under active research, relatively fewer studies have focused on medium-size variants such as microdeletions (chromosomal deletions in the 100-1000 bp range), and even less attention has been given to germline de novo microdeletions. The microdeletions of intermediate length (between 50 and 1,000 bp) are often termed as the twilight zone and are detected with much less sensitivity. Previous studies have shown that microdeletions contribute to a significant number of diseases, but their mechanism of formation is largely unknown. The whole genome sequences of the family trio cohort from Inova Translational Medicine Institute's (ITMI) Childhood Longitudinal Cohort Study provides a unique opportunity to identify pre-existing and de novo microdeletions in the human population. By leveraging the state-of-the-art SV detection and genotyping algorithms, namely, Delly, Manta, SV2 and svaba, as well as family relationship in our data, we've created a comprehensive set of common and rare highquality microdeletions in our cohort. Our analysis demonstrates that the common microdeletions (minor allele frequency >0.01) can be used to identify a subject's ancestry at continental level using Principal Component Analysis (PCA). Next, we identified genomic factors that influence the density of common and rare microdeletions by running a non-overlapping window of size 1 MB. We found that microdeletions are enriched in high GC and simple repeats regions. Interestingly, as opposed to duplications, the microdeletions mutation rates are higher in late replication regions. Next, we show that the smaller microdeletions (<300bp) prefer microhomology sizes between 2 and 3 bp (>30%) around the breakpoint, suggesting nonhomologous end joining (NHEJ); whereas the larger microdeletions prefer microhomology length of 4bp (21%), followed by 1bp (19%). Finally, the GC richness and microhomologies underlying the junctions of the microdeletions are consistent with the possibility that the microdeletions were generated by mechanisms that also produce extrachromosomal circles of DNA called microDNAs. In summary, the curated set of high quality microdeletions will enhance studies in understanding their mutational mechanism, as well as to understand their disease association and functional impact.

Background
Resistant hypertension is defined as uncontrolled blood pressure despite the use of 3 antihypertensive agents or use of 4 antihypertensive agents regardless of blood pressure control. By epidemiological reports in Western countries the prevalence of resistant hypertension is about 15% of all hypertensives. It is expected that genetic factors may play a greater role in resistant hypertensive patients than in the general hypertensives. However, there was unclear about resistant hypertension, whereas genome-wide association (GWAS) studies for hypertension identified over 100 susceptible loci.

Materials & methods
We used genome-wide variant data of 25,450 patients with hypertension obtained from the BioBank Japan project. These subjects were belonging to two groups; group (1) was resistant hypertension requiring at least 4 antihypertensive medications of different classes drug classes to achieve blood pressure control (N=2,723) and group (2) was controlled hypertensives with only 1 antihypertensive medication (N=21,470). After applying stringent quality control for samples, we imputed using 1000 Genome Projects Phase 3 data as reference and examined association of about 800 millions SNPs with accuracy (Rsq) > 0.9 and MAF > 0.05 of both cases and controls.

Conclusions
We performed GWAS using 25,450 patients with hypertension and identified 28 suggestive signals. Our findings will lead to clarify pathogenesis of resistant hypertension. Keywords; resistant hypertension genome-wide association study (GWAS) susceptibility A6 A cell-based translocation assay system identifies active fractions from plant that rescue skeletal dysplasia phenotypes in an achondroplasia mouse model Yi-Ching Lee 1 , Yun-Wen Lin 1,2 and Yuan-Tsong Chen 3,4 Background Thyroid function plays a key role in the regulation of a wide range of biological processes in human body. Circulating levels of the thyroidstimulating hormone (TSH) and free thyroxine (FT4) are used to assess thyroid function and were investigated by genome-wide association studies that identified risk alleles that partially explain their inter-individual variation. To dissect the causal role of thyroid function in human phenome, we conducted a Mendelian randomization study of TSH and FT4 levels.

Materials and methods
We applied a two-sample Mendelian randomization based on risk alleles for TSH and FT4 levels in euthyroid (healthy) subjects and genome-wide data regarding 2,419 traits assessed in up to 337,199 individuals from UK Biobank. Multiple MR methods were tested to verify the reliability of the results and MR Egger regression intercept was considered to verify the validity of the genetic instruments. False discovery rate (FDR, q< 0.05) was applied to correct the results for multiple testing accounting the number of traits tested and the numbers of MR methods applied.

Conclusion
We provided novel data regarding the consequences of interindividual variability of thyroid function. In particular, geneticallydetermined TSH levels appears to be involved in causal mechanisms of a wide range of phenotypic traits in agreement with the recognized pervasive regulatory action of thyroid in human body.
Background: In tooth development and regeneration research, a key variable for successful tissue regeneration and engineering is the environment in which cells and tissues grow. The apical papilla is essential for tooth development, stem cells from apical papilla can contribute to the formation of dentin/bone-like tissues and represent good cell sources for dental tissue regeneration. However, limited by the current research techniques, the micro environment can not be simulated when stem cells isolation and culture in vitro. Disruption of the micro environment may lead to failure of tooth regeneration. In this study, we forcus on the differencial gene expression between SCAP and apical papilla tissues in attempt to identify the genes that are crucial for inducing SCAP. Materials and methods: Gene chip analysis on cultured SCAP and apical papilla tissue: with patients' informed consent, SCAP and apical papilla tissue were collected separately from third molar of five female patients aged 18-22 years old. SCAP were isolated and cultured as previousely described. Cell and tissue samples were subjected to Trizol to extract RNA for genechip analysis, the results were further verified by ReaI-time RT-PCR. Results: SCAP was discrete distribution, stretched after 4 days. The shape of the cells were spindle-shaped with oval and polygonal. Gene chip data reveals that 2325 genes that were differently expressed, genes-S1004A, FOXM1, FGF5 were up-regulated, whereas genes-CXCL14, IGF2, BMP6 were down-regulated. The genechip results for these six genes were further verified by RT-PCR analysis. Conclusion: Genes were differently expressed between apical papilla tissues and SCAP suggesting the importance of micro environment for SCAP to proliferate and differenciate. Our data may be an indication of possible artificial micro environment that is useful for tooth regeneration. Key words: Stem cells from apical papilla; Niche; Differentiation potential; Gene expression The human leukocyte antigen (HLA) region is a genetically diverse region intimately involved in a variety of immune related functions and known to be associated with disease predisposition. Due to the ubiquity of GWAS genotyping data, the expense of HLA allele typing and interest in the association between HLA region and disease, statistical imputation of HLA alleles from genotypes is becoming indispensable. HLA imputation relies on the use of a reference panel where the association between the genotype and HLA alleles is known, and a model is built on capturing the relationship between the two. Recent work has shown the importance of having population specific reference panels for HLA allele imputation. In this study, we use the HIBAG HLA imputation framework to build Han Chinese specific reference panels for 8 HLA loci, including 3 loci in HLA class I and 5 loci in HLA class II. Using Han Chinese genotype data, we show that using the largest population specific HLA dataset to date as a reference panel leads to increased accuracy in predicting HLA alleles for imputation and increased number of HLA alleles that predicted. We compare our reference panel to existing HIBAG Pan Asian and multiethnic reference panels, and show that using a Han Chinese specific reference panel significantly improved on the existing HIBAG panels by increasing the call rate of the HLA alleles, improving the number of HLA alleles imputed, and increasing the confidence in the imputed HLA alleles. We also provide a web interface where the user can input their genotype data and impute HLA alleles using Han Chinese, Pan Asian or multiethnic reference panels. The authors hope the study demonstrates the importance of using population specific reference panels and increasing reference panel sample size for HLA imputation. Furthermore, it is hoped that the ease of use in HLA imputation will allow more HLA association studies to be conducted and provide greater insight into the fine mapping of HLA alleles in GWA studies.

Background
Blood has largely been considered free of microorganisms. Recent analysis in humans, however, detected bacterial species in blood from healthy and disease individuals, suggesting the existence of a resident blood microbiota regardless of sepsis. Of these bacteria, Proteobacteria phylum was consistently reported to be dominated in the blood. Especially lipopolysaccharides (LPS) which mainly Proteobacteria contains is related to the chronic inflammation correlated with obesity and other metabolic chronic diseases. Therefore, we focused on this phylum to clarify the bacterial component in the blood.

Materials and methods
The study was conducted with blood DNA of 240 healthy individuals. We performed a whole genome sequencing (WGS) of 30x coverage as well as a 16S rRNA gene amplicon sequencing to analyze the blood microbiota. About 99% of whole genome sequencing were mapped to the human reference genome, which remaining unmapped reads are then assigned to taxonomic labels by Kraken pipeline. DADA2 pipeline of QIIME2 was used for 16S rRNA data analysis.

Results
The Proteobacteria was the most abundant phylum in both 16S rRNA metataxa and WGS unmapped reads, around 80%. We observed differences between 16S metataxa and WGS in the relative abundances of bacterial classes, families, and gerena after extraction of only Proteobacteria phylum. Gammaproteobacteria were major component (84.5%) in WGS unmapped result, while Alphaproteobacteria (44.9%) and Betaproteobacteria (30.9%) were abundant in 16S metataxa result.

Conclusions
The blood microbiota, even when present in very low abundance, may be implicated in important physiologic role. However, to quantify and characterize the microbiome in human blood is limited because blood contain mostly human DNA and a low abundance of microbial DNA. So, further investigation will be needed to find the concordance between 16S rRNA amplicon sequencing and WGS unmapped reads.

Background
Blood has generally considered a sterile environment; however, microbiome has been consistently detected in blood in healthy individuals. The healthy individuals harbor a rich microbiota in their blood, it raises the question of the role of this microbiota and its impact on the risk associated with several diseases. To examine how blood microbiomes differ according to the age, here we characterize bacterial species in 37 blood samples.

Materials and methods
The study was conducted with blood DNA of 37 individuals from 5 families. The cohort consisted healthy population from 20s to 80s and included mono-and dizygotic twins. We performed a whole genome sequencing (WGS) of 30X 10 samples, 60X 17 samples, 90X 10 samples to analyze the blood microbiome. About 99% of whole genome sequencing were mapped to the human reference genome (hg38). The remaining unmapped reads are then assigned to taxonomic labels using Brackens (http://ccb.jhu.edu/software/bracken; v1.0.0).

Results
In all samples, the Proteobacteria was the most abundant phylum (more than 90%) also, Gammaproteobacteria were major component (on averages 57.6%) within this phylum. Proteobacteria is exhibit the most variation relative to all phylum across samples. The elderly older aged groups (over 60 years old) showed highly correlation with the distribution of Gammaproteobacteria (rho= 0.57, p-value=0.0002) compared to younger adults (under age 40) and middle-aged group (aged 40-60).

Conclusions
This study suggested that blood microbial profiles, especially proteobacteria, may have strong correlation with age like the composition of human gut microbiota changes with age.
Objectives About a half of infertility is caused by male side factors. Of that, about 90% is related to spermatogenesis dysfunctions which lead to oligozoospermia or azoospermia. However, about 60% of them is classified as idiopathic, and no effective treatment has yet been established. Therefore, the effective culture system for sperm production has been expected for research purpose as well as for possible therapeutic options. Our group has succeeded in producing fertile sperm from newborn mouse testis in vitro with an organ culture method. However, its efficiency remains low compared to that taking place in vivo, in spite of various efforts to improve the culture condition. Now, we consider that it is important and possible to reveal the molecular dynamics underlying the complexity of spermatogenesis. Here, we examined the difference in gene expression profile between in vivo and in vitro testis, and tried to clarify the phenomena occurring specifically in the cultured testis.

Materials and Methods
We used Acr-Gfp transgenic mice expressing GFP in germ cells at midpachytene stage of prophase meiosis onward as the experimental animal. We cultured testis tissue of the mice at 7day postpartum (dpp) for two about weeks. These cultured testis samples along with testis tissues directly taken from mice at 7, 14 and 21dpp, as in vivo samples, were analyzed by FACS after cell dissociation or applied for microarray analysis.

Results
The number of GFP positive cells, an indicator of spermatogenesis progression, was much fewer in testis tissues cultured for 2 weeks than those in the testes of 21dpp, the in vivo counterpart, when analyzed by FACS. The principal component analysis using the microarray data showed that the gene expression profile of the cultured testis was quite similar to that of 14dpp rather than to that of 21dpp in vivo testis, suggesting that the in vitro spermatogenesis was retarded or mostly arrested at around 14dpp state. Through differential gene expression analysis between the in vitro and 14dpp in vivo, we found that the genes relating to innate immune system were significantly upregulated in the in vitro samples.

Conclusions
Our results revealed that in vitro spermatogenesis delayed or arrested at the point around 14dpp in vivo. Furthermore, it was suggested that the innate immune system is compromising the efficiency of in vitro spermatogenesis. The data obtained in this study will be useful for developing new culture conditions that can support more efficient spermatogenesis.

A15
Disease gene discovery in amyotrophic lateral sclerosis using innovative next generation sequencing and genetic linkage strategies Background Amyotrophic lateral sclerosis (ALS; also known as motor neuron disease, MND; or Lou Gehrig's disease) is an ultimately fatal, genetically heterogeneous neurodegenerative disease. Approximately 10% of cases are hereditary (familial; FALS). While gene mutations are the only proven cause of disease, one third of ALS families carry an unidentified mutation. These families are often small, and exhibit incomplete penetrance, both factors inhibiting traditional disease gene mapping.

Materials and methods
We employed innovative approaches using next generation sequencing (NGS) and custom bioinformatics to identify novel genetic contributors to FALS. A FALS family were recruited to the Macquarie University Neurodegenerative Disease Biobank under informed written consent, as approved by Macquarie University. Genome and exome sequencing was performed for two affected individuals and one obligate carrier. After standard bioinformatics processing, a custom bioinformatics pipeline was applied to both datasets. Shared variant analysis was performed, and the NGS reads for the resultant candidates visualised, to ensure correct genotype calling and appropriate annotation of multi-allelic variants. Subsequent filtering removed non-exonic variants and common genetic variants present in ethnically matched control cohorts. SNP-based linkage analysis was also performed in these individuals and eleven additional at -risk family members, using a parametric model and liability classes. Both NGS datasets were then sub-set to only those genomic regions with positive LOD scores, and shared variant analysis was repeated as above, with the addition of potential regulatory variants being retained.

Results
Genome and exome sequencing identified approximately 8,000,000 and 185,000 variants across the individuals of this family, respectively. Custom bioinformatic analysis revealed 12 and 15 novel coding variants, equating to a total 27 candidates. However, when overlaid with linkage results, none of these variants were found within suggestive linkage regions, and are therefore less likely to cause ALS. In total, 40 positive linkage peaks were observed across the genome. As such, the above shared variant analysis pipeline was repeated, however, only the suggestive linkage regions were considered, and regulatory region variants were included in addition to exonic variants, resulting in 12 and one candidate(s) respectively, with the one exome candidate also being present in the genome candidate list.

Conclusions
It is hoped that Sanger sequencing validation and in silico evaluation of the identified candidate variants will reveal a single mutation causing ALS in this family. By elucidating the remaining genetic contributors to ALS we hope to enhance our understanding of disease and inspire downstream studies, particularly therapeutic development.

A16
Optimized homology directed repair for treatment of inherited retinal diseases using the CRISPR/Cas9 system Brian Rossmiller, Takeshi Iwata Molecular and Cellular Biology Division, National Institute of Sensory Organs, Tokyo Medical Center, National Hospital Organization, 2-5-1 Higashigaoka, Meguro, 152-8902. Tokyo, Japan Human Genomics 2018, 12(Suppl 1):A16 Nearly 1.5 million people, worldwide, are affected by hereditary vision impairment each year. For many of these, there is little to no effective treatment. Advancements in CRISPR/Cas mediated site specific gene editing provide a new and powerful tool for disease treatment but relies on inefficient homology directed recombination (HDR). The purpose of this project is to treat two models of inherited retinal degeneration using CRISPR/Cas9 mediated HDR. Prior to treatment, we will assess optimal conditions for use of the CRISPR/Cas9 system in the retina. These include age of injection, delivery vehicle (nanoparticle or adeno-associated virus (AAV)), and method of increasing homologous integration of gene replacement (ligase IV inhibitors and neurotrophic factors). An array of non-viral delivery methods were screened first utilizing transfection of 2.5 micrograms of plasmid containing cytomegalovirus promoter driven GFP in 661W cells and secondly in B6 mice. Mice were injected with the same plasmid at either postnatal day 5 or 15. 2 weeks post injection, retinas were disassociated and measured via FACS. Assessment of sgRNA cleavage efficiency was performed using a novel dual luciferase approach with a molar ratio of 1:4 target to H1-sgRNA expression vector. Luciferase activity was measured 48 hours post-transfection. Each vector will be assessed in two animal models of inherited retinal degeneration. First, a mouse model of retinitis pigmentosa, Rho (I307N), kindly provided by Dr. Nishina of Jackson laboratory. The second model is of normal tension glaucoma, OTPN (E50K) and produced in the Iwata lab. Here, we have confirmed transfection, of CMV-GFP plasmid, by nanoparticles, into 661W cells by GFP expression. These results currently show p5 injections by either one of two commercially available nanoparticles, TransIT-202 and TransIT-X2 from Mirus®, to be optimal for DNA delivery. The sgRNA cleavage showed H1 driven sgRNA to result in the highest knockdown, 77% reduction in OPTN. Whereas RNA mediated knockdown resulted in a 50% and 63% knockdown of OPTN and RHO respectively. Our current nanoparticles have shown significant transfection in tissue culture. However, we will continue to screen several additional mixtures to improve efficiency. This is coupled with a significant reduction in target gene expression by both DNA and RNA derived sgRNA in both RHO and OPTN models in vitro.

A17
Whole exome sequencing of 14 schizophrenia multiplex families in Japan Miho Toyama 1 , Yuto Takasaki 1 , Branko Aleksic 1 , Tomoo Ogi 2 , Norio Ozaki 1 1 Department of Psychiatry, Nagoya University Graduate School of Medicine, Nagoya, Japan; 2 Department of Genetics, Nagoya University Research Institute of Environmental Medicine, Nagoya, Japan Correspondence: Branko Aleksic (branko@med.nagoya-u.ac.jp) Human Genomics 2018, 12(Suppl 1):A17 Background Schizophrenia is a psychiatric disease which has a relatively high heritability estimated up to around 80% [1]. Regarding the onset risk, genetic studies focused on schizophrenia suggested that rare variants might have large effect size [2] while common variants detected by genome-wide association studies had relatively small effect size. Exploring disease-associated rare variants has become more feasible than before by utilizing Whole Exome Sequencing (WES), however, it is still challenging to extract candidates efficiently from a large number of mutations detected. Thus, most of those studies have been conducted focusing on de novo genetic variants because of its interpretability. Nevertheless, it is also essential to investigate shared variants among patients in order to elucidate association between transmitted variants and schizophrenia. To achieve this purpose, WES in multiplex families would be a prospective solution to identify disease-associated rare variants.

Materials and Methods
We defined a family which has more than one patient with schizophrenia as a multiplex family. In order to explore the shared genetic risk factors within the patients, we preformed WES study using 29 patients with schizophrenia, one patient with obsessive compulsive disorder and ten healthy controls from 14 schizophrenia multiplex families in Japan. We focused on rare single nucleotide variants (SNVs) with allele frequency (AF) ≦1% among databases of Japanese, Eastern Asian and total population, considering difference of AF among them. SNVs shared only within patients as well as de novo SNVs in highly intolerant genes with percentile residual variation intolerance score [3] ≦25% were selected as the final results.

Results
We identified 209 transmitted mutations including two homozygous SNVs in CNTN6 and MAOB genes, and three de novo SNVs in CACNA1C, ODC1 and BRD4 genes. Out of them, eight genes including CNTN6 and CACNA1C have been repeatedly reported to have rare variants associated with developmental disabilities and psychiatric disorders (Table 1).

Conclusions
Based on WES analysis of 14 schizophrenia multiplex families, we identified SNVs in genes that have been reported in the psychiatric field. Our results suggested that these genes might be strongly related to pathogenicity of common symptoms among those disorders. Now we are carrying on bioinformatics approaches for further interpretation of these results.

Ethics Approval
This study was approved by the Ethics Committees of Nagoya University Graduate School of Medicine.
Background PCNA a core protein in the DNA replication process also interacts with multiple proteins involved in DNA repair, recombination and cell cycle regulation. Little is known how mutations in human PCNA and its interacting partners affect protein interactions and phenotypic outcomes, particularly in cancers. The objectives of the this study are to investigate i) the evolutionary constraints in residue-residue interactions by network topology, ii) the mutational impact of coevolving residue on PCNA and its partner proteins at structural and functional levels, and iii) the relationships of hub residues in context of cancerassociated mutations.

Materials and Methods
Eukaryotic orthologous protein sequences of PCNA (38 organisms) and its partner proteins from DNA repair pathway were collected from Ensemble and assembled to common sets. Coevolving residues of PCNA and nine coevolving residues sets from three partners (FEN1, RFC1, RFC3-4 and POLD1-4) were determined using CAPS2 server using correlation value r >= 0.6, p<0.01 subjected to bootstrapping. These sets of coevolving residues were matched against mutations reported in COSMIC database and PolyPhen/SIFT predicted deleterious SNPs of dBSNP, and used for creating intra and inter coevolving residue networks of PCNA and its interacting partners with igraph R package.

Results
High betweenness residues (HBR) in intra and inter coevolving networks were identified and analyzed. These critical residues hold the coevolving networks together. HBR and other node residues were compared with SNPs of COSMIC database. Among 31 intra and 35 unique inter PCNA coevolving residues we found Q125 of the interdomain connecting loop (Fig. 1a) strongly coevolved (total degree 128) with residues of RFC1 (117), POLD2 (9), FEN1 (4) and RFC3 (2), and ranked among the top five HBR inter coevolving networks ( Fig.  1b and c). The occurrence of Q125 in PCNA intra coevolving residue network implies critical functional and structural roles. Among PCNA coevolving residue only Q125H (COSM5777338) and I88V (COSM21734) matched to mutations reported in cancer samples and PNCA A252T (COSM4712855) was found in the neighborhood of intra coevolving residues 248, 249 and 250. The impact of these mutations on PCNA interactions and their roles in cancer development if any remains to be tested experimentally.

Conclusion
Preliminary results of the PCNA coevolving residue network analysis suggest that residues of high betweenness and degree seem to be selected against disease-associated mutations.  Background: Colorectal cancer is one of the most common malignancies encountered in the world and it is the third among males and the fourth most common among females worldwide and the second most common cause of cancer-related mortality. Genetic factors common, with the development of colon cancer, as well as exposure to environmental factors. Although current chemotherapeutics as well as modern targeted therapy-approaches have been developed against colon cancer, the response rates to these agents remain low; the magnitude of tumor regression is variable and transient. De novo and acquired resistance to chemotherapeutics and targeted therapies and the toxicity to normal cells are the major causes of treatment failure. Therefore, it is necessary to search for new and better treatments for colon cancer. To these end, attention could be drawn toword phytochemicals derived from folk medicine due to their safety. Curcumin and their analogues due to their anti-cancer effect and their safety could be used as an efficient agent in cancer therapy. Research evidence suggests that curcumin, affects a variety of biological pathways involved in apoptosis, and the proliferation of the tumor, and induces apoptosis in various cell lines such as human prostate, lung cancer through multiple apoptotic pathways. However, curcumin analogues such as 3,5-Bis (4-hydroxy-3-methoxybenzylidene)-N-methyl-4-piperidone (PAC) have been explored to improve curcumin's efficacy in chemoprevention and therapeutics of cancer by enhancing its low systemic bioavailability. In the present study, we attempt to explore the effect of the Curcumin analogs (PAC) on DNA repair signal pathway in human colon cancer cells.
Methods: The effect of PAC on cell viability of human colon cancer lovo cell lines, was assessed by using MTT colorimetric assay. Realtime PCR was used to study pro-apoptotic and anti-apoptotic genes expression on the mRNA level. The effect of PAC on DNA repair signaling pathways was examined by real-time PCR using RT2 Profiler PCR Array and confirmed by quantitative-PCR (Q-PCR Promoters are the site at which gene regulatory signals are integrated and the site of transcription initiation during gene expression. Gene expression changes are thought to underlie much phenotypic variation between species and individuals, and may be responsible for individual variation in disease susceptibility and prognosis. We have previously identified that the complete birth and death of promoter sequence has been a common occurrence since the divergence of human and mouse [1], using Cap Analysis of Gene Expression (CAGE) data [2].
We are now integrating these evolutionary records with human variation data to assess the likely phenotypic role of these evolutionary volatile but transcriptionally active elements. Human-specific promoters that show sequence turnover (inserted or deleted promoter sequence) are more likely to contain human variants that accompany sequence gain or loss (insertions, deletions, copy number variants) than variants which preserve the nucleotide content (single nucleotide polymorphisms). Promoters whose sequence has been inserted along the human lineage more frequently overlap these polymorphic sites than those whose sequence was deleted down the mouse lineage since the divergence with human (11/14 tests, significant by permutation testing, nominal p ≤ 0.05). Subsequent analyses of the derived allele frequency of these variants will reveal selective pressures which these evolutionary volatile promoters have recently been experiencing. Analysis of gene expression data from the GTEx and GEUVADIS consortia has revealed a specific enrichment of expression quantitative trait loci (eQTLs) within those promoters that show sequence turnover between human and mouse but not those which show functional turnover (where the underlying sequence is conserved). Despite this, promoters of all volatile evolutionary histories harbour variants which reduce gene expression relative to those contained within promoters which are conserved between human and mouse. We are currently analysing other sets of QTLs, such as those for transcription-factor binding, to determine the precise mechanism(s) by which human-specific promoters may function to suppress gene expression. By analysing the transcriptomic and potential phenotypic consequences of evolutionary volatile promoters we hope to better understand the effect of these common evolutionary events in subsequently driving biological variation within the human population.

Background
Griscelli syndrome (GS) is a rare autosomal recessive disorder characterized by hypopigmentation, manifesting as silver-gray hair, presence of large clusters of pigment in the hair shaft, and the occurrence of either a primary neurological impairment or a severe immune disorder. Three different genetic forms; GS1 with an exclusive neurological, GS2 with immunological phenotypes and GS3 without any association of primary neurological or immunological defect, have been reported in literature. Genomic instability is a universal phenomenon across different types of cancers, however, its underlying mechanisms are diverse and not well understood. Previously, we and others have described an enrichment of head-to-tail somatic segmental tandem duplications (TDs), a configuration known as the tandem duplicator phenotype (TDP), in a subgroup of breast and ovarian cancers. Here, we performed a pan-cancer metaanalysis of 2,717 human cancer genomes using publicly available datasets, to determine the incidence and molecular features of the TDP across different tumor types. In total, the TDP is fund in~14% of cancers and it is remarkably prevalent in triple negative breast cancer (50%), ovarian carcinoma (55%), and uterine carcinoma (45%). Detailed analysis of the TD span size in TDPs showed that TDP tumors could be subgrouped based on their TD span size profiles, which follows either a modal or a bimodal distribution with peak values corresponding to only three major discrete span size intervals: class 1 (~11 Kb), class 2 (231 Kb) and class 3 (1.7 Mb), none of which are found in non-TDP tumors. We determined commonly altered genes or pathways for each class of TDs and identified that TDPs with a prevalence of class Exploring the functions of repressive chromatin interactions, we first found that repressive chromatin interactions are associated with transcriptional silencing, as compared to active chromatin interactions. Next, we performed circular chromosome conformation capture (4C) on estrogen-induced MCF-7 breast cancer cells, focusing on several known estrogen-regulated genes. These cells did not show significant changes in chromatin interaction profile despite upregulation of estrogen-regulated genes, suggesting that some of these chromatin interactions may not be important in gene regulation but may instead play a role in structural maintenance. Looking at RNA-seq data of these cell lines, we found that the expression level of genes involved in chromatin interactions with repressive regions showed more dependency on the state of the genes' promoters rather than the distal interacting regions. Taken together, our results suggest a non-transcriptional, structural maintenance role for this class of chromatin interactions. In genome-wide studies, a fundamental practice is to study a given genome relative to its differences to a reference genome. Typically, this involves aligning short read sequencing data to the reference then performing a variant discovery procedure. Some common goals include: discovering polymorphisms in a population, finding somatic mutations in cancer cells, or quantifying allele specific expression in single cells. However, there exist numerous uncertainties in these tasks due to sequence repetitiveness and reference biases. Here we present EAGLE, a statistical model that explicitly postulates the alternative genome versus the reference genome as hypotheses and calcuates the likelihood that the sequencing data supports each hypothesis. The model considers candidate alternative genomes (i.e. variants) which may represent germ line mutations, mutations in normal vs cancer cells, or allele specific genotypes. EAGLE is free from the details of the "pileup" and handles uncertainties related to ambiguous gaps (indels), multi-mapped reads (paralogs), and spurious mappings due to outside sequences (reference bias). These models can aid in improving specificity in variant discovery tasks. Unlike most other methods, EAGLE considers the read sequence in its entirely and can evaluate phased variants. As well, reads can be unambiguously determined to be from the reference or the alternative genome in the case of allele specific expression based on it's likelihood.
All together, EAGLE is useful in rigorously evaluating the degree to which sequencing data supports any given genotype. Histone H3 methylation at lysine 9 (H3K9) is a conserved epigenetic signal, mediating heterochromatin formation by tri-methylation, and transcriptional silencing by di-methylation. Concerted interaction of histone lysine methyltransferases and demethylases regulate these processes and the downstream gene expression pathways. Gene knockout mouse models of histone lysine methyltransferases, involved in H3K9 mono-and di-methylation, viz. GLP (Ehmt1) and G9a (Ehmt2) showed autistic phenotypes and behavioral abnormalities. These evidences have underscored the contribution of the epigenetic component in autism spectrum disorders (ASD) pathogenesis. Notably, loss of function mutations in histone lysine methyltransferases has also been a consistent observation in psychiatric disorders, intellectual disability, and developmental delays. We therefore examined the possible role of genes involved in H3K9 methylation machinery in the etiology of ASD and suggest that rare functional variants in these genes may be associated with ASD. Targeted resequencing of all exonic regions in the genes for histone lysine methyltransferases (EHMT1, EHMT2, WIZ, SETDB1, SUV39H1 and SUV39H2) and demethylases (KDM3A, KDM3B and PHF8) was performed using molecular inversion probe (MIP) based-next generation sequencing (NGS) in Japanese ASD patients and controls. The variants detected were prioritized using in silico tools based on novelty and functionality, thus unraveling several rare missense variants exclusively in ASD patients.
In vitro functional analysis revealed that the rare missense p.Ala211-Ser located in the Pre-SET domain of SUV39H2 drastically reduced H3K9 methylation efficiency. Interestingly, SUV39H1 and SUV39H2 expressions were found to be downregulated in BA21 and dorsal raphe nucleus, respectively in postmortem ASD brain samples when compared to the controls. Functional evaluation of majority of other variants in the important domains had no impact with respect to their function. Therefore prioritization of variants by in silico methods should be carefully pursued, owing to the presence of functionally neutral, rare private variants. To summarize, we have provided evidence for the putative role of SUV39H2 in ASD pathogenesis. Keywords: Autism spectrum disorders, Histone lysine methyltransferases, Rare variant, H3K9 methylation Ethics Approval All participants gave informed, written consent to join the study, after receiving a full explanation of study protocols and objectives. The human genetics study was approved by the ethics committees of RIKEN and all participating institutes, and was conducted in accordance with the Declaration of Helsinki. The animal experimental procedures were approved by the RIKEN Animal Ethics Committee. With significant advancement in DNA sequencing and internet data exchange technology, we are experiencing a new era in the field of human genetics. Decades of eye genetic research has shown that genetic variations play significant role in eye diseases. These genetic involvements can be highly penetrant as in Mendelian eye diseases or highly associated as risk factors in common eye diseases. In both cases, patient genome sequence is quickly determined and traced within a family and compared with millions of genome sequences collected around the world and stored in a database. However, most of the information originates from populations of European decent and information on other ethnic groups are limited. The Asian Eye Genetics Consortium (AEGC, http://asianeyegenetics.org) was established in 2014 to focus on eye research in Asia, the most populated region of the world where limited data are available on genetic variation in eye diseases has been reported. AEGC has the following goals and plans:

A30
a. Share genetic information in the Asian population to rapidly isolate common disease-associated variants, b. Establish systems for accurate diagnosis and grouping of Asian eye diseases, c. Establish systems for cost effective genetic analysis, d. Develop a research-oriented database to collect, diagnose and catalog eye diseases in Asia, e. Support and foster collaboration among Asian countries for the advancement of research that will provide genetic information in the Asian population, f. Collaborate with other international or regional organizations with similar goals, and g. Organize and hold regional congresses and other educational and scientific activities to promote goals of the consortium Over one hundred eye researchers from Eye Institutions and Hospitals from more than 20 countries including Australia, China, India, Indonesia, Israel, Japan, Malaysia, Pakistan, Philippines, Saudi Arabia, Singapore, South Korea, Sri Lanka, Taiwan, Thailand, Turkey, UAE, and USA are now participating. These groups are are currently interacting and collaborating to develop programs to share, catalogue and collaboratively work to identify the genetic aspects of eye diseases in Asia. The consortium has brought a collective thinking and ideas from the researchers around the world who have interest in genetic eye research in the Asian region.

A31
Identifying Background More than half of the genes in human genome transcribes noncoding RNAs and for the very long time they were thought to be nonfunctional biological "dark matter". However, recent studies have shown that noncoding RNAs especially long noncoding RNAs (lncRNAs) have a key role in various biological processes like transcriptional regulation, enhancer-like functions, differentiation of cells and tissues, epigenetic regulation, cellular senescence, DNA replication, maintenance of the telomeric structure and so on. However, despite their biological significance a comprehensive functional annotation of lncRNAs is still lacking, including the genomic functional elements regulated by each lncRNA. The information about threedimensional localization of a regulatory element in the nucleus can be used in determining their potential interacting partners or in other words in establishing their functional territory. Therefore, as part of the FANTOM6 project, we have used genomic interaction data and combine them with various regulatory element databases to define the function of the lncRNA in various human cells.

Results
We have generated a high-resolution genomic interaction map using newly produced Hi-C data for somatic and stem cells that were used to identify potential interacting partners of the lncRNAs such as proteincoding genes, other lncRNAs, microRNAs, or enhancers in the human cells. Further, we have also identified common or cell type-specific interacting partners of lncRNAs by comparing interaction map of different cell types. We are currently overlaying various databases for RNA binding proteins, transcription factor binding sites, and specific genomic patterns of lncRNAs to find the potential connecting link between two interacting genomic elements. To further validate the interacting partners, we have knocked-down unbiasedly selected lncRNAs followed by transcriptome profiling using CAGE to assess whether the expression of genes interacting with lncRNAs (as indicated by Hi-C data) was affected by their knockdown. Our analysis reveals that for many lncRNAs interacting genes are indeed enriched among the differentially expressed genes, suggesting that those lncRNAs may have a direct or indirect effect in regulating the expression of the genes.

Conclusions
Our analysis has shown that lncRNA transcribing regions are not isolated genomic regions but frequently interact with both proteincoding genes and other non-coding RNAs in the cells. Further, these interacting partners can be conserved or cell-type specific showing the dynamic nature of the lncRNAs. Overall, we will discuss by how integrating information about genomic contact together with different transcription regulatory elements can be used to predict the potential biological role of lncRNAs. Circulating tumor DNA (ctDNA) is released from dying/dead cancer cells to blood, and an emerging candidate of cancer biomarkers. Although next-generation sequencing is becoming the method of choice, the number of false positives caused by a high read error rate is a major problem to detect ctDNA (somatic mutations with low frequency) from abundant cell-free DNA (cfDNA) in blood. We have developed a high-fidelity target sequencing system, Non-Overlapping Integrated Reads Sequencing System (NOIR-SS), to accurately identify and absolutely quantify individual DNA molecules using barcode sequence tags. To apply that to ctDNA detection in cancer patients, we designed a gene panel that targets hotspots of pancreatic cancer related genes (comprising 2.8 kb of genomic DNA), and devised an assay system by combining NOIR-SS and a bioinformatic variant filter, not only to detect variants but also to remove those unlikely to be tumor-specific: those either absent or occurred at low frequencies in the Catalogue of Somatic Mutations in Cancer database. The performance of the system was evaluated using two independent cohorts (cohort1: 57 pancreatic cancer patients (PCs) and 12 healthy individuals; cohort2: 86 PCs and 20 patients with intraductal papillary mucous neoplasm (IPMN)). Our system was likely to eliminate most non-tumor-specific mutations, and showed comparable sensitivity for detecting tumor-specific mutations to other methods based on digital PCR and deep sequencing.
We hypothesize that most de novo constitutional non-recurrent small supernumerary marker chromosomes (sSMC) are the remnant of a supernumerary chromosome present in trisomic embryos, which undergoes a chromothripsis event resulting in its massive fragmentation with loss of some portions and disordered reunion of the remaining ones. Based on the mainly maternal origin of trisomies, we also assume that, depending on which of the three homologous undergo chromothripsis, the remaining two are either in maternal hetero-/isodisomy or of biparental origin. To investigate this, we collected DNA from 20 cases of non-recurrent de novo sSMCs, already defined by array-CGH, and from their parents, all in mosaic with a normal cell line. We performed whole genome paired-end sequencing (WGS) in the first 10 cases at 30-40x coverage. We also performed trio analysis by microsatellites spread along the entire chromosome by which the sSMC originated. As a result, in seven cases, namely sSMC2a, sSMC2b, sSMC7a, sSMC8a, sSMC17, sSMC18, sSMC11, markers were constituted by a disordered assembly of two or more segments of their corresponding chromosomes. In one of them, sSMC11, the marker was derived from random fusion and disordered assembly of ten fragments spread along chromosome 11. In three cases, sSMC7b, sSMC1, and sSMC8b, only a single chromosomal region was involved in their construction. In each case the novel junctions were confirmed by breakpoints (bps) PCR amplification and cloning, with a fusion signature showing repair mechanisms such as non-homologous end joining and microhomology mediated processes. Microsatellite and SNPs analysis in the trios indicated a maternal origin of the marker with biparental origin of the related homologous chromosomes in 4 cases, and a paternal origin with maternal hetero/isodisomy of their related chromosomes in the remaining 5 cases. The sSMC8b was of paternal origin whereas the homologous chromosomes 8 were biparental, indicating its postzygotic origin. Finally, these data demonstrate both a link between numerical and structural anomalies and that early lethal trisomies may leave a dramatic legacy in the postnatal life. Congenital heart disease (CHD) is one of the most common birth defects (1 of 100 live births) and is a major cause of heart diseases. In vast majority, CHD is caused by abnormalities in multilayer molecular processes involving interactions between various signaling pathways which are not fully understood. The lack of understanding on the regulatory network driving heart development often hinders the diagnosis of CHD and development of novel therapeutic strategies.
To profile the genetic and epigenetic regulation of heart development, we employed genomics methodology to profile the transcriptome and chromatin state in wild-type and several zebrafish lines carrying mutations in transcription factors responsible for heart development. Taking the advantage of transgenic zebrafish lines with fluorescently-labeled cardiomyocytes, we isolated specific cell fractions from zebrafish embryos and larvae at different stages of heart development.
We have developed and optimized the strategies of RNA sequencing (RNA-seq) from cardiomyocytes as well as the assay for transposaseaccessible chromatin with high throughput sequencing (ATAC-seq), providing an in-depth analysis of genetic and epigenetic factors regulating heart development on genome-wide scale. It allowed us to identify gene regulatory networks as well as their downstream regulatory chromatin regions during key stages of heart development. Cardiovascular diseases are the leading cause of death globally. Among them, heart arrhythmia and conduction disorders are one of the most prevalent. The cardiac conduction system is a key component that coordinates and regulates proper heartbeat. An unaffected, rhythmic heart contraction occurs due to the presence of highly specialized network of pacemaker cells responsible for generation and transmission of electrical impulses throughout the heart. These impulses originate within two major pacemaker sites, the sinoatrial node (SAN) and atrioventricular node (AVN). The AV node is a highly specialized conducting tissue that propagates and slows down the impulse conduction considerably ensuring that the atrium has ejected their blood into the ventricles first before the ventricles contract, which results in unidirectional blood flow. Any disturbances in propagation of these impulses may lead to serious consequences like heart attack, stroke and different types of arrhythmia. However, despite the prevalence of conduction system disorders, current knowledge about its development remains elusive. In our study, we aim to gain an insight and elucidate the molecular mechanism underlying the development and function of cardiac pacemaker cells, especially those that originate from AV node. To achieve our goal, we use transgenic zebrafish lines as a model organism which allows us to easily investigate AVN development by means of genomics. I will present our ongoing analysis of the AVN RNA-seq data which focuses on elucidating the molecular composition of these specialized cells as the first step to understand their development and function. In the long run we hope to establish the zebrafish as a model organism for research on cardiac arrhythmia in humans. One of the most frequent causes of neonatal death is congenital heart defect. While many individual genes and transcription factors crucial to the process of heart formation has been identified, the manner in which they interact and orchestrate gene expression remains mostly unknown. Understanding the complex genetic regulatory network of heart development is necessary to track the causes of congenital heart defect. Here, we study the network of interactions between chromatin landscape and genetic elements that governs gene expression at early stages of heart formation using the zebrafish as a model organism. Based on RNA-seq and ATAC-seq data sets generated in our laboratory, we are using computational methods such as digital genomic footprinting to elucidate the interplay between transcription factors and gene expression. Using this information, we seek to integrate our data sets and construct a model describing the relations between regulatory elements and their effect on gene expression. Our analyses focus at the core genes required for proper heart formation and the mechanisms regulating their expression. I will present our findings on the relationships between key elements involved in the process of heart development. To describe the balance between components of tumor microenvironment (TME) and to analyse impact of both non-immune and immune cells, we systematically collected information on related molecular mechanisms and represented in a form of comprehensive network maps. The modular map of cancer associated fibroblasts (CAF), the non-immune component of TME, is composed of 681 objects and 585 reactions. The map is covering the main functions of CAFs in tumor among others, interactions with extracellular matrix components, signalling coordinating involvement of CAFs in tumor growth and interactions of CAFs with immune system. There are two types of functional modules on the CAF maps, modules responsible for CAFs activation are associated with pro-tumor activity and modules involved in CAFs inhibition, therefore contributing to anti-tumor activity. In addition, we constructed signalling maps of macrophages, dendritic cells, myeloid-derived suppressor cells, natural killers, neutrophils and mast cells. These cell type-specific maps integrated together and updated by interactions and crosstalks between them and the map of tumor cell, gave rise to a seamless comprehensive meta-map of innate immune response in cancer. The meta-map contains 1466 objects and 1084 reactions and depicts signalling responsible for anti-and pro-tumor activities of innate immunity system as a whole. The meta-map is represented in a geographical-like manner, possessing a hierarchical structure with two functional zones, each divided into metamodules and smaller sub-modules. The network maps were applied for identification of possible molecular mechanisms regulating CAF and innate immune cells reprogramming in TME in metastatic melanoma. Unsupervised statistical methods were applied for decomposition of single cell RNASeq data of CAF, natural killers and macrophages. Analysis and interpretation of expression patterns in the context of the network maps demonstrated existence of numerous sub-populations within each cell type characterized by different anti-and pro-tumor functional properties. These network-based polarization signatures correlated with the survival status of patients. We concluded that tumor microenvironment may contain a range of CAF and innate immune cells, varying in their polarization status. The fine-tuned balance between the subpopulations will dictate the overall impact of TME onto tumor evolution in each given case.

A38
Single cell RNA sequencing of T cells in Crohn's disease identifies tissue specific drug targets Uniken Venema, Werna T. 1

Introduction
Crohn's disease (CD) is a chronic inflammatory disease, that predominantly causes inflammation in the terminal ileum 1 . Genome-wide association studies have identified around 200 CD risk loci 2 . These loci are enriched for genes involved in T cell signaling, highlighting the importance of T cells in CD pathology 3 . To get true insight in the underlying pathomechanisms it is crucial to study these T cells in the relevant tissue: the intestinal mucosa 4 . Characterization of mucosal T cells in a disease-and tissuespecific context at a single cell resolution has not yet been performed for CD. We aimed to increase our insight in CD pathomechanisms and to identify novel drug targets by characterizing mucosal T cell populations from CD patients using single cell RNA sequencing (scRNA-seq).

Methods
We performed scRNA-seq of 5,292 CD3+ T cells isolated from peripheral blood (PBL) and from ileal mucosal biopsies of CD patients. Biopsies were dissociated and separated into T cells from the epithelium (IEL) and the lamina propria (LPL) . scRNA -seq was performed with an adapted SmartSeq2 protocol, using 3'-end library generation and unique molecular identifiers, sequenced on the Illumina NextSeq500.

Results
Unsupervised clustering of our scRNA-seq data identified 5 distinct T cell types. The distribution of cell types differs between IEL, LPL and PBL: while cytotoxic T cells (CTL) dominate the IEL, the blood T cell reservoir is mainly composed of quiescent T cells, and T-helper 17 (Th17) cells are the largest population within the LPL. Th17 cells and CTLs show the highest proportion of differentially expressed known CD risk genes: such as CD69 and FOS in Th17 cells, and CX3CR7 and NKG7 in CTLs. Aligning potential IBD drug targets 5 to CTL signature genes, we find that Etrolizumab, but not Vedolizumab, specifically appears to target mucosal CTLs. Two potential drug repositioning opportunities for targeting mucosal Th17 cells, are MSX-122, an anti-tumor CXCR4antagonist, and Rivenprost, which targets PTGER4, and has been tested in UC patients. (Fig. 1)

Conclusion
We have conducted the first detailed transcriptomic characterization of disease-and tissue-specific effector cells in Crohn's disease. We have profiled 5 distinct T cell types using single cell transcriptome data. We show that CD risk genes are significantly overexpressed in ileal mucosal Th17 and CTLs and provide promising targets for future cell-specific therapies in CD patients. Although the APOE region is the strongest genetic risk factor for Alzheimer's diseases (AD), its pathogenic role remains poorly understood. Elucidating genetic predisposition to AD, which represents a subset of age-related diseases characteristic for post-reproductive period, is hampered by the undefined role of evolution in establishing molecular mechanisms of such diseases. This uncertainty is inevitable source of evolution-related (selection-free) genetic heterogeneity in predisposition to AD.

Materials and methods
We examined linkage disequilibrium (LD) structures characterized by seven single nucleotide polymorphisms (SNPs) from TOMM40-APOE-APOC1 locus in 2,661 AD-affected and 16,089 unaffected individuals, who were mainly older than 55 years, from four independent studies and LD structures in 16,089 older unaffected subjects and 5,303 younger individuals aged 20-55 years from two additional studies.

Results
Consistent with the undefined role of evolution in age-related diseases, we found that the LD structures, being heterogeneous, are significantly different in subjects with and without AD, p<2.0×10 -3 . The pattern of the significant difference represents molecular signature of AD comprised of SNPs from these genes. As this pattern was consistent in all studies, all pair-wise estimates of the LD difference in the pooled sample (except rs8106922 and rs405509) were significant after Bonferroni correction, p<2.4×10 -3 (Fig. 1a). There was no significant difference in LD patterns between older unaffected subjects, characterized by exponentially increasing mortality rates, and the younger population, characterized by negligible mortality rates (Fig. 1b).

Conclusions
Significant and highly heterogeneous molecular signature of AD provides evidence on complex polygenetic predisposition to AD in the TOMM40-APOE-APOC1 locus. Significant differences in pairwise LD in subjects with and without AD indicate SNPs, or their proxies, likely involved in AD pathogenesis. The lack of significant difference in LD patterns between the older and younger subjects indicates that the LD patterns in unaffected subjects are likely evolutionary selected whereas LD patterns in AD-affected subjects are likely the result of LD rearrangement in modern environment. These findings are more consistent with an environment-sensitive complex haplotype than with a single genetic variant origin of AD in this locus. Metabolic reprogramming has now been well recognized as one of 10 hallmarks of human cancer. The membrane transporters that enable efficient cellular metabolism and aid in nutrient sensing, are receiving increasing attention in recent years. Membrane transporters can be divided into 3 main classes: ABC transporters, p-type ATPases and the solute carrier family (SLC) . The SLC group of transporters includes nearly 400 members organized into 52 families that mediate the transport of a wide variety of substrates across biological membranes. Until recently, little was known about the role of these SLC transporters in the mechanisms of cancer development and progression. Yet, the genomic landscape and clinical relevance of SLC transporters in human cancer have not been investigated systematically. We here performed a comprehensive characterization of 396 SLC transporters by integrating multi-platform data across over 30 tumor types from The Cancer Genome Atlas. We observed frequent dysregulation of SLC transporters across cancer types and identified cancer type-and subtype-specific expression signatures. We identified significant correlations of these SLC transporters with the activity of cancer metabolic pathways including oxidative phosphorylation and glycolysis, and explored their complex regulatory networks. We also examined the association of SLC transporters with known oncogenic pathways including mTOR and AKT pathways and investigated the role of SLC transporters in driving tumor metastasis. We further evaluated the potential clinical relevance of these SLC transporters and identified a bunch of SLC transporters that were significantly associated with patients' clinical outcomes. Our study provided an in-depth understanding of the role of SLC transporters in cancer metabolic reprogramming, elucidated their mechanisms in cancer development and metastasis, and provided novel biomarkers and potential therapeutic targets that can be further developed for future clinical applications. Background Recent efforts from the FANTOM5 consortium have shown that specific transcriptional patterns found within intergenic unannotated regions can be associated with different classes of regulatory elements and non-coding RNAs. However such studies of the transcriptome remain challenging for multiple reasons. First, current methods often rely on pre-existing transcript models and neglect both cell-type specific transcripts and intronic signals. Second, there is a large diversity of transcriptome profiling protocols targeting only specific types of RNAs (polyA + vs polyA -, long vs short, etc). As a result, relating the signals obtained from an experiment to others is non-trivial, fastidious and often done case by case by visual inspection of a genome browser view.

Method
We developed a new method based on dynamic Bayesian networks, SegRNA, which brings a new model to Segway semi-automated genome annotation model. SegRNA identifies recurrent combinatorial patterns occurring across libraries of stranded RNA signals and produces one human-understandable annotation per strand. SegRNA model is unsupervised and allows to compare intronic and exonic patterns without relying on pre-existing transcript models. Additionally, SegRNA learns a model of the most likely transitions between all patterns, allowing us to describe the structure of larger domains made of individual RNA elements.

Results
We applied SegRNA on ENCODE and FANTOM5 transcriptome sequencing libraries from the K562 chronic myeloid leukemia cell line. Each of these libraries was selected to represent the widest possible diversity of RNA including nascent, capped, polyA + , polyA -, long or short RNAs. We found that some of the patterns re-discovered fundamental gene structures such as promoter, exonic and intronic regions, in an unsupervised fashion. Furthermore, we found that about 10% of genomic signals identified by SegRNA falls within unannotated regions. Specifically, we identified a pattern enriched for known small RNAs which also occurs within intergenic, intronic and exonic regions that have not been characterized yet.

Discussion
SegRNA identified multiple patterns associated with promoter activity and higher gene structures such as gene body and exonic regions from annotated genes. Furthermore, SegRNA's unsupervised annotations allow one to explore intergenic transcriptional stranded patterns occurring across entire genomes, and collections of transcriptome libraries.

A42
PTSD re-experiencing in the Million Veterans Program sample: risk loci and biology Gelernter J. 1,2 , Sun N., 3,4   Background Posttraumatic stress disorder (PTSD) is a major problem among the veteran population and presents treatment challenges. The US Veterans Affairs (VA) Million Veteran Program (MVP) is building a large medical and genetic information databases, currently >620,000 consented participants.~350,000 enrollees have genotype information available, linked to VA EHR data and questionnaire responses-the largest current sample for studying PTSD-relevant traits. PTSD symptoms are categorized into 3 major symptom clusters by DSM-IV criteria: intrusive re-experiencing of the trauma, avoidance of traumaassociated stimuli, and alterations in arousal or reactivity.

Materials and methods
We conducted a GWAS on the re-experiencing symptom cluster score based on a sum of 5 items from the PTSD Checklist (recurrent intrusive thoughts/dreams/flashbacks of trauma; emotional or physiological response to reminders of trauma), total score, 5-25. This is the symptom cluster most characteristic of PTSD.

Results
After cleaning, 146,660 European-Americans (EAs) and 19,983 African-Americans (AAs) were retained. In EAs, 8 distinct common-variant genomewide-significant (GWS) regions were identified-three with significance >5x10E-10. These latter regions map to chrom. 3lead SNP rs2777888 (2.1E-11), gene CAMKV, same SNP previously implicated in "age at first birth"; chrom. 17lead SNP rs2532252 (4.5E -10), closest to KANSL1 but within a well-known long high-LD region (the site of an inversion common in EAs) that also includes CRHR1 (corticotropin releasing hormone receptor 1); and chrom. 18lead SNP rs2123392 (5.4E-11), at TCF4. Other significant associations were observed at KCNIP4, HSD17B11, MAD1L1, and SRPK2. TCF4 and MAD1L1 have both previously been GWS-associated to schizophrenia and other psychiatric traits. There were no GWS associations in the smaller AA sample, but when EA and AA subjects were metaanalyzed, the lead SNP in the region on chrom. 17 shifted to rs1724409 (and increased significance to 3.6E-11). The chrom. 17 inversion is much less common in AAs than EAs, and this added finemapping information. This new lead variant is intronic at CRHR1, a very strong functional candidate for PTSD. LD score regression analysis showed polygenic association with many psychiatric and behavioral traits, including "mood swings" (1.07E-56), neuroticism (1.28E-44), and "miserableness" (2.62E-30). Conclusion Meta-analyzing results from European-and African-ancestry subjects improved our ability to narrow an associated region in a biologically meaningful way. These results provide new insight into the biology of the most characteristic PTSD symptom cluster in what is the bestpowered study undertaken to date.

Ethics Approval
Research involving MVP in general is approved by the VA Central IRB; the current project was also approved by local IRBs in Boston, San Diego, and West Haven (USA). Objectives: Identification and analysis of fusion genes is essential to find prognostic markers and therapeutic targets of cancer. ChimerDB is a knowledgebase for fusion genes encompassing analysis of deep sequencing data and manual curations. Methods: To build knowledgebase of fusion gene, we curated fusion genes from public resources and reviewed manually. We also utilized an advanced text mining method to build a literature database based on PubMed abstracts. A classifier to identify fusion genes from the candidate sentences were developed using machine learning techniques. Most importantly, we analyzed RNA-Seq data of the TCGA project using two programs of FusionScan and StarFusion in this update. Results: ChimerDB 3.1 is composed of three modules of ChimerKB, ChimerPub, and ChimerSeq. ChimerKB represents a knowledgebase including 1,041 manually curated fusion genes with the experimental evidences. Notably, 460 fusion breakpoints were catalogued in this update of ChimerDB 3.1. ChimerPub includes 2,767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq is the archival database of fusion gene candidates computationally identified from RNA-Seq data of the TCGA project. In this update, we expanded the patient samples substantially to include 31 cancer types and 7,775 patients. Based on our benchmark test, we used two reliable and efficient programs of FusionScan and StarFusion. Conclusion: ChimerDB 3.1 update includes almost twice as many patients reflecting more diverse types of cancer. We also added 460 fusion breakpoints in the ChimerKB via manual curation. ChimerDB 3.1 is available at http://ercsb.ewha.ac.kr/fusiongene/. The FANTOM projects have shown that mammalian genome is pervasively transcribed which largely include long non-coding RNAs (lncRNAs) and majority of them with no characterized function. LncRNAs are generally key regulatory elements and few of them are reported to be involved in different cellular processes. However, their functional role is still elusive. Here, we investigated the role of lncRNAs in maintaining pluripotency and neural differentiation in human induced pluripotent stem cells (hiPSCs). Based on the dynamic expression patterns of hiPSC differentiation to neurons, candidate lncRNAs were selected for functional perturbation. We used different knockdown strategies such as anti-sense oligos (ASOs) and clustered regularly interspaced short palindromic repeats interference (CRISPRi) to perturbate lncRNAs in human iPSCs. We evaluated the effect of knockdown by qPCR and single molecule RNA-fluorescence in situ hybridization (FISH). Knocking down of selected lncRNAs showed downregulation of pluripotency marker gene Oct3/4, indicating a role in stem cell maintenance and neuronal differentiation. Analysis of chromosome conformation capture (HiC) data from hiPSCs further underscored direct interaction of selected lncRNA with genes involved in pluripotency network. Mechanistic role of the lncRNA is being queried by RNA Immunoprecipitation sequencing (RIP-seq) and Chromatin Isolation by RNA purification sequencing (ChiRP-seq). Additionally, we are also performing molecular phenotyping of these lncRNA perturbations by gene expression profiling by CAGE to reveal the dynamic regulatory functions involved in the maintenance of pluripotency and neural differentiation propensities. The nature of different perturbation technologies has also been discussed, which exhibits its own strengths and weaknesses.

A45
Functional study of peptidylarginine deiminase type4 as a genetic risk factor for rheumatoid arthritis Akari Suzuki 1  Background Rheumatoid arthritis (RA) is one of the most common autoimmune diseases, which affects approximately 1% of the world population. RA is a chronic systemic inflammatory disease characterized by the inflammation of synovial joint tissues. Previously, we identified peptidylarginine deiminase type 4 (PADI4) as a susceptibility gene for RA by genome-wide association studies (1,2). PADI4 is highly expressed in immune cells, such as bone marrow, macrophages, neutrophils, and monocytes. Peptidyl-citrulline is an important molecule in RA because it is a target antigen of anti-citrullinated peptide antibodies (ACPAs), and only PADs (translated protein from PADI genes) can provide peptidyl citrulline via modification of protein substrates. The aim of this study was to evaluate the importance of the PADI genes in the progression of RA.

Materials and Methods
We generated Padi4 knockout (Padi4−/−) DBA1J mice. Padi4−/− DBA1J and wild-type mice were immunized with bovine type II collagen (CII) to develop collagen-induced arthritis (CIA). We compared the incidence and severity score, and performed measurements of expression levels of various inflammatory cytokines and Padi genes in immune cells by real-time TaqMan assay. Also cytokine concentration and CII antibodies in sera were measured by enzyme-linked immunosorbent assay. We investigated PAD4 effect on the transcriptional pattern of macrophage from Padi4-/-mice by microarray.

Results
We demonstrated that the clinical disease score was significantly decreased in Padi4−/− mice and Padi4 expression was induced by CII immunization. In Padi4−/− mice sera, serum anti-type II collagen (CII) IgM, IgG, and inflammatory cytokine levels were also significantly decreased compared with those in wild-type mice sera. Interestingly, Padi2 expression was compensationally induced in CD11b+ cells of Padi4-/-mice (3). We also identified Fus gene as a siginificant gene between WT and Padi4-/-mice by independent two microarray tests.

Conclusions
On the basis of these studies, it appears that Padi4 enhances collagen-initiated inflammatory responses. Our results revealed that PAD4 affected on expression of various cytokines and also controlled Padi genes and Fus gene.
Pathogenic variants in almost one hundred genes lead to hereditary hearing loss in humans worldwide. Due to the inaccessibility of the human inner ear, and the lack of an inner ear cell line to study the pathogenesis of deafness, the mouse has been a critical model for deciphering the mechanisms of hearing loss. Using the CRISPR/Cas9 genome editing tool, we have created several mouse models representing pathogenic variants associated with deafness in the Israeli Jewish and Palestinian Arab populations. These include Atoh1, a transcription factor essential for the development of cerebellar neurons and generation of inner ear hair cells, and Slc25a21, a gene that encodes the 2-oxoadipate mitochondrial carrier (ODC) that transports C5-C7 oxodicarboxylates across inner mitochondrial membranes. The non-syndromic hearing loss in the families ranges from congenital to childhood onset, and in severity from severe to profound. We designed sgRNAs directing the Cas9 to cut at the mutation sites, and oligonucleotides containing the mutation, in order to facilitate homologous direct repair. These were injected into mouse zygotes, along with Cas9 RNA. Both heterozygote and homozygote founder mice were obtained and mated further to create lines for further experiments. Auditory brainstem response (ABR) to evaluate hearing, behavioral tests to determine vestibular function, and scanning electron microscopy (SEM) of inner ears to examine the morphology of the hair cells were performed. If the hearing loss found in each family will be mirrored in the CRISPR/Cas9 mice, it will verify these variants as the cause of deafness. Moreover, these mice will provide a relevant tool for studying the pathogenesis and mechanisms of deafness for each gene. Finally, the Crispr/Cas9 editing tool may further be used to rescue hearing loss in mutant mice. Despite growing appreciation of the importance of long non-coding RNAs (lncRNAs) in the progression and metastasis of many malignancies, knowledge of cancer-related lncRNAs remains limited. We have recently discovered a widespread class of vlincRNAs in human cells. These transcripts with minimal length of 50kb and often reaching 100's of kb's represent over 2000 different loci and cover at least 10% of human genome. A subset of these transcripts regulated by endogenous retroviral (ERV) promoters appear to function in early embryonic development and cancer. Despite being a prominent class of human transcripts, most vlincR-NAs remain functionally uncharacterized. Here, we present a robust functional annotation system that combines experimental and computational methods and aims to globally annotate all vlincRNA in human genome. Firstly, we integrated multiple RNAseq expression datasets from public sources and our in-house database to identify the expression patterns of vlincRNAs. Secondly, based on the co-expression network among vlincRNAs and known-genes, multiple functional enrichment tools were integrated to annotate the vlincRNAs. Finally, the system is constantly modified and improved by experimental validation to build a robust annotation platform for vlincRNAs. One of the major conclusions was that a very high proportion of vlincRNAs, not only the ones regulated by ERVs, participate in cancer-related pathways. This annotation system will significantly contribute to our understanding of molecular mechanisms of lncRNAs functioning and provide an important resource for new potential prognostic biomarkers and therapeutic targets in the field of cancer.

A48
Genomic characterization of biliary tract cancers identifies their driver genes, cell-of-origin, and predisposing mutations Hidewaki Nakagawa 1  Background & Aims: Biliary tract cancer (BTC) or cholangiocarcinoma is a rare cancer worldwide, but prevalent in some areas including Japan, where a specific risk factor of environmental exposure is involved in BTC development such as chronic inflammation and chemical exposure. The genetic features of BTC remain poorly understood and the molecular profiles of BTCs are as heterogeneous as their pathology and biology, making large sample sizes necessary for comprehensive analysis and understanding of its molecular carcinogenesis and clinical associations. We performed large scale genome sequencing analyses on BTCs to investigate their somatic and germline driver events and characterize their genomic landscape. Methods: We analyzed 412 BTC samples from Japanese and Italian populations, with 107 whole exome sequencing (WES), 39 whole genome sequencing (WGS), and targeted sequencing of a further 266 samples. The BTC subtypes were 136 intrahepatic cholangiocarcinomas (ICCs), 101 distal cholangiocarcinomas (DCCs), 109 peri-hilar types (PHCs), and 66 gallbladder or cystic duct cancers (GBCs/CDCs). We identified somatic alterations and searched for driver genes in BTCs, and found pathogenic germline variants of cancerpredisposing genes. We predicted cell-of-origin for BTCs by combining somatic mutation patterns and epigenetic features. Results: We identified 32 significantly and commonly mutated genes including TP53, KRAS, SMAD4, NF1, ARID1A, PBRM1, and ATR, and FGFR2 fusion was detected in 7% of ICCs. Strong negative effects on overall survival were observed in patients harboring mutations in ARID1A (P=0.0011) and KRAS (P=0.0042). Smokers had significantly worse prognosis (P=4x10 -3 ). A novel focal deletion of MUC17 at 7q22.1 affected patient prognosis (P=3x10 -4 ), which was confirmed by immunohistochemical analysis on the tissue microarrays. The cellof-origin (COO) of a cancer can be determined by comparing the genomic distribution of mutations to the chromatin organization of specific cell types, as mutations are more likely to occur in open, transcriptionally active chromatin. COO predictions using WGS data of BTCs and liver cancers and 424 epigenetic features by the Epigenome Roadmap suggest hepatocyte-origin of hepatitis-related ICCs. Deleterious germline mutations of cancer-predisposing genes such as BRCA1, BRCA2, RAD51D, MLH1, or MSH2 were detected in 11% of BTC patients, indicating that BTC is a tumor related with Lynch syndrome and HBOC. Conclusions: BTCs have distinct genetic features including somatic events and germline predisposition. These findings could be useful to establish treatment and diagnostic strategies for BTCs based on genetic information.

A49
Improved prediction of functional annotation of genes Rezvan Ehsani 1 , Finn Drabløs 2 1 University of Zabol, Zabol, Iran; 2 NTNU -Norwegian University of Science and Technology, Trondheim, Norway Human Genomics 2018, 12(Suppl 1):A49 A large number of genes do not have useful annotation, meaning that we do have sufficiently reliable information about which processes the gene products may be involved in. This is in particular the case for non-coding RNA (ncRNA) genes, like the long ncRNA genes (lncRNA), where most of the known genes are without any functional annotation. It is therefore of great interest to be able to predict function of un-annotated genes. One possible approach is to use "guilt by association", i.e., to identify well-annotated genes that seem to be involved in similar processes as a given un-annotated gene. This can for example be co-expression, indicating potential co-regulation. It is then possible to predict function of the un-annotated gene using information from the well-annotated co-expressed genes. This has been tested in previous implementations, but with somewhat limited success. In particular, the lncRNAs are challenging because they can have a very cell type-specific expression pattern, which may bias co-expression analyses.
Here we present an improved approach for function prediction. We use several measures for estimating co-expression, including statistical (Pearson, Spearman) and geometrical ones (Sobolev, Riemannian). This gives improved identification of true co-expression. We then use an enrichment analysis to identify enriched GO-terms in the co-expressed gene set, and use this to predict GO terms for the un-annotated gene. We have tested this approach on subsets of well-annotated protein coding genes, using a leave-one-out procedure. For each gene the GO terms were predicted (without using the known terms), and the fit between predicted and known GO terms was measured using semantic similarity measures, in particular GOSemSim and TopoICSim. This showed good correlation between predicted and known GO terms, in particular for terms related to biological process (BP). The procedure was also tested on lncRNAs, in particular a set of five welldescribed lncRNAs tested in previous publications. The predicted GO-terms showed good correspondence with published functional descriptions of these lncRNAs. This shows that it is possible to predict the function of both proteincoding and ncRNA genes, given a reliable set of expression data. Understanding of genome regulation in native human cancer cells is severely limited due to availability of fresh tumor sample and tissue heterogeneity. Previously, we analyzed gastric cancer tissue samples and discovered cryptic promoter activation events which derive expression of important developmental genes [1]. It indicated that ectopic expression of linage-specific factors could contribute to epigenetic reprogramming of normal cells in cancer development.
However, it was unclear if such cancer associated epigenetic changes occurred in cancer cells or other cell types in bulk tissue sample. In this study, we developed a workflow for integrative chromatin immunoprecipitation-sequencing (ChIP-seq) and RNA-sequencing (RNA-seq) analysis of clinical tissue samples in combination with isolation of specific cell population by laser-microdissection microscopy (LMD). As a pilot study, we applied this workflow to analysis of 6 squamous cell and adeno carcinoma cases in which matched normal lung tissue, stromal and tumor cell parts can be isolated. ChIP-seq analyses of trimethyl histone H3 Lys4 (H3K4me3), acetyl histone H3 Lys27 (H3K27ac) and tri-methyl histone H3 Lys27 (H3K27me3) mark indicated cancer-associated activation of non-canonical promoter regions in linage-specific transcription factors, driving expression of alternative forms of RNAs. These observations were further supported by strand-specific RNA-seq analysis of larger number of cases. These cryptic promoter sites are also frequently found in polycomb complex 2 binding sites as previously found in gastric cancer study. Overall, these results demonstrated that ectopic induction of key developmental regulators by cryptic promoter activation was common features in multiple types of human cancers. To further study potential mechanisms of cancer-associated promoter and enhancer regulation, we analyzed single-nucleotide polymorphisms (SNPs) and somatic mutations. Our results indicated that majority of cryptic promoter activation events were not directly driven by somatic mutations. However, we could find many cancerassociated regulatory elements showing allele-specific histone modifications. This result indicated influence of genetic background in enhancer and promoter regulation and presence of functional SNPs. From the perspectives of human genome research, LMD-based isolation of clinical cell samples and integrative ChIP-seq and RNA-seq approach could be highly useful for broader range of clinical studies as it allows effective filtering of candidate functional SNPs based on the accurate positioning of functional regulatory regions and allelespecific chromatin regulation.

Ethics Approval
This study was approved by University of Tsukuba's Ethics Board, approval numbers 1593-1 and H29-052. mRNA-based switches (e.g., RNP L7Ae switches, and miRNA switches) and the gene circuits that they build are gaining in popularity in various applications due to the advantages of using RNA-only delivery, and the dynamic responsiveness of post-transcriptional basedswitches. However, the necessity of using modified rUTP and rCTP in the exogenous mRNAs to evade immune surveillance comes at the price of decreased translational output and dynamic range between OFF and ON states of the encoded switch. This is likely resulting from the altered interaction with RNA binding proteins, including those of the ribosomal complex, and altered hybridization kinetics between complementary RNA strands. The relative low dynamic range compared to pDNA or replicon encoded circuits, means alternative modifications are highly sought after. We found a single uridine substitute modification can restore protein translation to that of unmodified exogenous mRNA, further still the same modification outperformed pseudouridine in respects to immune surveillance evasion. Finally, we tested modified rNTPs combinations and found an optimal modification that could provide higher dynamic range equal to that of unmodified mRNA. Using miRNA switches to detect cell-type specific miRNA activity; this modification provides greater resolution in separating different cell types.
In future projects, we will further investigate the influence of various types of rNTP modifications on mRNA biogenesis using various 'omic' technologies, and RNA-targeted nuclease-dead CRISPR systems to detect and modify RNA modifications at single-base resolution. Using such systems will provide causative rather than correlative evidence of a particular type of RNA modification to RNA biogenesis and various biological processes. The discovering of extensive transcription of long non-coding RNA (lncRNA) and the studies on their functional roles suggested their involvement in gene regulation in different physiological processes, including pluripotency safeguarding and differentiation induction. This study aims to identify and characterize the lncRNAs, which play regulatory roles in pluripotency and differentiation. Different categories of lncRNA including intergenic non-coding RNA (lincRNA), promoter upstream transcripts (PROMPTs) and enhancer lncRNA (eRNA) were selected for knockdown (KD) in the induced pluripotent stem (iPS) cells. For each lncRNA target (n=390), 5 or 10 LNA gapmeR anti-sense oligo (ASO) were designed to suppress the target specifically. A high-throughput screening method adopting lipofection in 96-well culture plate and real-time imaging on each well was applied to 2285 ASO KD in duplication. RNA was isolated at 48 hr post-transfection, followed by QPCR using 3 pairs of primers for each target. Targets with at least two ASO showing sufficient KD level were considered as successful KD (43%, n=166). Targets with two ASO showing KD and growth rate change revealed by real time imaging were considered as growth phenotype (8%, n=31). The KD efficiency was shown to correlate with cellular localization enrichment and exosome sensitivity. Among the successful KD targets, 123 of them were selected for transcriptomic analysis by low-quantity CAGE to unveil the underlying molecular phenotypes, focusing on self-renewal efficiency and differentiation potential. The increasing number of lncRNAs being reported to regulate pluripotency remains limited. This study will identify key factors in stem cell pluripotency and differentiation, providing clues to improve reprogramming and cell conversion. Slovenians as a source population to interpret different demographic events happened in Europe but not much is known about the genetic background and the demographic history of this population. Therefore, our aims are i) a detailed characterisation of the genetic structure of Slovenians in a broader European context using both Y chromosome and autosomal data, ii) a description of the past and present admixture pattern and iii) a survey of variants putatively under selection and associated with different traits/diseases.

Materials and Methods
Overall, 96 samples range from Slovenian littoral to Lower Styria were genotyped for 713,599 markers using the OmniExpress 24-V1 BeadChips. The Slovenian dataset has been subsequently merged with the Human Origin dataset (REF) for a total of 2163 individuals.
Only population with a minimum samples size of 10 individuals were retained and related individuals were discarded. We also generated a dataset containing ancient genomes form.
Results Y chromosome diversity splits into two major haplogroups R1b and R1a with the latter suggesting a genetic contribution from the steppe. Slovenian individuals are more closely related to Northern and Eastern European populations than Southern European populations even though they are geographically closer. This pattern is confirmed by an admixture and clustering analysis. We also identified a single stream of admixture events between the Slovenians with Sardinians and Russians around~2630 BCE (2149-3112). We found a significant admixture event between the Yamnaya and the early Neolithic Hungarians dated around~1762 BCE (1099-2426) suggesting a strong contribution from the steppe to the foundation of the observed genetic diversity in modern Slovenians. Our selection study reveals significant hits on markers associated mainly on lipid traits and eye pigmentation when compared to South Europeans such as HERC2 for blue eye colour and FADS1-FADS2 alleles, responsible for blue eye colour and synthesis of longchain unsaturated fatty acids respectively.

Conclusions
Using our approach we found several genes linked to diet that could explain possible selection forces on these alleles that were introduced from different sources in the Slovenian genetic pool. Populations closely related to Yamnaya and early neolithic Hungarians contributed during the Bronze age to the foundation of the observed genetic variability in modern Slovenians. This means that also disease or specific traits alleles were introduced in the Slovenian genetic pool during this period, such as pigmentation alleles, lactose tolerance and immune genes. When the comparison was done with Eastern Europeans, we discovered significant signals in PKD2L1 and IL6R which are genes associated with taste and coronary artery disease. Constructing gene regulatory networks is crucial to understanding the genetic architecture of complex traits. While most of the gene regulatory network constructions only take gene expression information, here we propose using additional genomic polymorphism data to facilitate the network construction.

Materials and methods
Taking advantage of both gene expression and genomic data, we propose a two-stage penalized least squares (2SPLS) method to build large systems of structural equations for network construction. The system can be constructed using large numbers of endogenous and exogenous variables at the first stage, followed by consistent selection of regulatory effects at the second stage.
We conducted simulation studies to compare our method with the adaptive lasso (AL) based algorithm [1] and the sparsityaware maximum likelihood (SML) algorithm [2]. Both acyclic and cyclic networks were simulated with each including 300 endogenous variables. On average, each endogenous variable has one regulatory effect for sparse networks and three regulatory effects for dense networks. The effects were simulated from a uniform distribution (-1, -0.5) U (0.5, 1).

Results
For acyclic networks, 2SPLS has the greatest power when the sample size is small (n=100) and all three methods have similar performance when the sample size is large (n=1000). In addition, 2SPLS controls FDR under 20% while SML has low FDR (5%) for sparse acyclic networks with large sample size and large FDR (40%) for small sample size. Overall both 2SPLS and SML have better performance than AL in terms of FDR. For cyclic networks, 2SPLS has greater power than SML and AL for all sample sizes and lower FDR when the sample size is not large. SML has similar power as 2SPLS for sparse cyclic networks, though its power is much lower than that of 2SPLS for dense cyclic networks. SML has much higher FDR for inferring dense networks when sample size is small while the FDR is small when the sample size is large.

Conclusions
The method is computationally efficient and allows for parallel implementation. We demonstrate the superior performance of the method with computer-based simulation studies. In addition, the method was applied to yeast data with both genomic and gene expression information.

Material and methods
Diagnosis and molecular classification were established using the current international histopathological criteria and microarray expression profiling (PAM50). None of the patients received chemotherapy or radiotherapy treatment before the surgery. Two overlapping primer sets were used to sequence the entire mtDNA from paired peripheral blood and tumor samples using MiSeq System. The study was approved by Ethics Committees of the participant institutions.

Conclusions
According to our findings, the distribution of mtDNA germline mutations and mtDNA haplogroups in these patients is similar to that reported in healthy Mexican population. Additionally, a high heterogeneity in the mtDNA mutations among patients with breast cancer was observed. This study constitutes the first report where the complete sequences of the mtDNA have been analyzed in Mexican women with breast cancer.

Background
Despite more than a decade of genome-wide association studies (GWAS), the proportion of segregating variants that contribute to variation of complex traits, the degree of polygenicity and the extent of pleiotropy are still largely unknown. Here we systematically analysed 3,795 GWAS summary statistics for 2,824 traits in 28 trait domains to (1) investigate the nature of the genetic architecture across hundreds of traits, and (2) chart the extent of pleiotropy on the single variant, gene and gene -set level. Findings are discussed in the context of the recently proposed omnigenic model [1].

Materials and methods
We compiled a database of 3,795 GWAS results for 2,824 unique traits across 28 domains. All trait names were manually synchronized, and GWAS results were checked using a standardized quality control pipeline. For within-trait comparisons, we used all GWAS results available per trait. For cross-trait comparisons, we used the largest available GWAS per trait, with at least 50,000 individuals.

Results
Within-trait analyses showed that generally, with increasing sample size, the number of detected loci and SNP-based heritability increases. However, a wide variety exists across traits, with some traits showing a plateau and others displaying a decrease in SNP-based heritability, most likely due to including heterogeneous phenotypes. Cross-trait analyses showed that 22% of detected lead SNPs have a minor allele frequency < 1%, yet there is large variety across traits. The majority of lead SNPs across all traits are in non-coding regions, while more pleiotropic SNPs have a higher probability of being coding SNPs. SNP-based heritability across all traits is not evenly distributed across the genome, showing enrichment in specific chromosomes, independent from physical length. The proportion of risk loci across all traits covers more than half the genome. Pleiotropy is ubiquitous, 58% of all coding genes are associated with at least one trait and 45% of genes are pleiotropic. SNPs and genes that are more pleiotropic are less tissue-specific, and the size of gene sets associated with a single trait is significantly smaller than the size of pleiotropic gene sets, supporting an omnigenic model [1]. Genetic correlations between traits show widespread overlap, yet generally cluster within trait domains.

Conclusions
We show widespread variation in genetic architecture and widespread pleiotropy across hundreds of complex traits. Our results provide novel insights into how genetic variation contributes to trait variation, backed by a vast amount of empirical results. Obesity is a global serious health epidemic and main risk factor for type 2 diabetes and non-alcoholic fatty liver disease. Obesity promotes hypertrophic dysfunctional adipocytes that predispose to ectopic storage of fat in the liver and epicardium, causing cardiometabolic disorders. The regulatory molecular mechanisms favoring intact adipose function are poorly understood in humans, preventing biology-based treatments of obesity. Our goal was to elucidate the regulation of human adipose transcriptomes to further the understanding of adipose biology. To this end, we first identified enhancer-promoter interactions in primary human white adipocytes, using promoter Capture Hi-C, to identify chromosomal regions regulating adipocyte gene expression. We then performed a cis-expression quantitative trait locus (eQTL) analysis using subcutaneous adipose RNA-sequence data from the METSIM and GTEx cohorts (total n=612) to identify variants regulating adipose gene expression. Finally, by overlapping the interaction and cis-eQTL data, we localized those adipose cis-eQTL variants that reside within the enhancer-promoter interactions in primary human adipocytes. These interactions were enriched for enhancer (H3K4me1, H3K4me3, and H3K27ac) and repressor (H3K27me3, H3K9me3) histone marks. Using public DNase I hypersensitive site (DHS) data and the HOMER software, we also found that the DHSs within the adipocyte chromatin interactions are significantly enriched for 26 transcription factor (TF) motifs (FDR<5%), including multiple key TFs in adipose biology, such as CEBPB and PPARG. To investigate whether the adipose cis-eQTL variants in the enhancer-promotor interactions explain a significant amount of local gene expression, we partitioned the heritability of local gene expression from human adipose tissue to multiple chromosomal categories using a modified version of the partitioned LD Score regression method. Our data show that open chromatin regions (i.e. DHSs) within these adipocyte chromosomal interactions are significantly enriched in the heritability of local gene expression (p-value<0.002) and overall, the adipose cis-eQTL variants in the enhancer-promoter interacting DHS regions explain a significant portion (4.6%) of the heritability of local adipose gene expression even though they only contain 0.23% of variants genome-wide. Taken together, our integrative genomics data identify functionally highly relevant regulatory variants for local gene regulation and adipose biology. Our future studies are targeted to build regulatory looping units for each adipose expressed gene and associate the pooled set of variants within these regulatory units with obesity and related clinical traits in large human cohorts.

A59
Promoter-level expression atlas of skeletal muscle atrophy and Background Human life is often accompanied by a variety of physical inactivity: a low-active lifestyle, bed rest, limb immobilization, or zero-gravity conditions for astronauts. Muscle disuse is associated with a muscle atrophy, which leads to negative consequences for the locomotor system in mammals. To identify mechanisms underlying pathological changes in muscular system, complex molecular studies are required, thus, we aimed to study transcriptional changes in the muscles of rats during atrophy and subsequent recovery.

Materials and methods
Two types of muscles, "slow" (m. Soleus) and "fast" (m. EDL), were examined in rats in normal conditions, after 1, 3 and 7 days of antiorthostatic hanging and after subsequent 1, 3 and 7 days of recovery using CAGE (Cap Analysis of Gene Expression) method followed Illumina HiSeq 2500 sequencing. After quality check and filtration, CAGE reads were mapped to the current rat genome assembly rn6 (2014) by using bwa and then clustered by python script. Further annotation of CAGE peaks, analysis of differential expression, and functional terms enrichment were proceeded through R environment. CAGE signal visualization for our experiments was done in Zenbu browser. The study was approved by the Kazan Federal University Animal Care and Use Committee guidelines (Permit Number 2, dated 5 May, 2015).

Results
We succeeded sequencing of CAGE libraries with depth >10M reads and mapping ratio~80%. 9971 expressed CAGE clusters and 5766 associated genes in two muscle types were determined. Differential expression of genes and their promoter activity were strongly varied in m. EDL and m. soleus within hanging-recovery time course: "slow" m. soleus has no significant changes of transcriptional activity up to 7 days of hanging, but drastically shifted during recovery, while "fast" m. EDL shows quick and stable response to the stress and fast recovery after placing in normal conditions. The highest number of differentially expressed promoters and genes in m. soleus was found in the first day of recovery (~3k and 1.5k respectively) with strongly enriched oxidation-reduction processes associated with mitochondrion. In m. EDL the strongest early response to hanging was 10 times lower compared to changes in m. soleus. Besides, in functional terms, activated genes were mostly related to the muscle structure organization.

Conclusions
This study provides the first systematic annotation of promoters landscape and genes activated in "fast" and "slow" muscle types under induced atrophy and following recovery in rats. Our first results are consistent with known physiological effects of muscle disuse and might be used in further atrophy studies.

A60
Identification of population-stratified coding variants for low and high serum triglycerides in Hypertriglyceridemia is an independent risk factor for coronary heart disease, and a major constituent of multiple metabolic disorders. Hypertriglyceridemia is especially prevalent in populations with Amerindian background such as Mexicans. However, Latinos have been grossly underrepresented in genomic studies and they are unlikely to benefit from the current push for precision medicine unless genetic studies become more inclusive. Furthermore, the different haplotype structure and allele frequencies (AF) in admixed populations can provide better mapping resolution than homogenous groups, and rare risk variants in minority groups can shed light on the biological mechanism of disease. To this end, we investigated loci that we previously associated with lipid traits in Mexicans and showed to harbor variants with different frequencies between Latinos and Europeans. To assay low and rare frequency (AF>0.5%) coding variants of 28 genes that reside within these loci in Hispanics, we performed targeted and exome sequencing in 4,550 Mexicans. We focused on loss of function (LoF) and missense single-nucleotide variants (SNVs) that are more likely to have large effects, and then performed single-variant and gene-based association tests on hypertriglyceridemia (serum triglycerides (TGs) >200mg/dL and <150mg/dL for cases and controls, respectively). Overall, 89 SNVs were tested, and at the single-variant level, 10 SNVs passed the Bonferroni corrected P<0.05 threshold. Notably, all 10 SNVs were stratified in Latinos when compared to Europeans (AF difference >2X); 6 of the 10 were highly significant in the recent European exome-wide analysis (n>300,000), emphasizing the power of studying populationstratified regions in smaller diverse populations ascertained for the phenotype of interest. We also identified a new Mexican-specific TG risk variant, rs144966144, which is a missense variant (AF=5.3% in cases and AF=3.4% in controls) within ZNF259. Its frequency in Latinos is 3.7% (based on the gnomAD database), but occurs at much lower frequency in other populations (<0.5% Abstract SINEUPs had introduced a new class of regulatory long non-coding RNAs (lncRNAs) which can bind to a target mRNA in a sequencespecific manner and increase its protein expression without altering the target transcript level [1,2]. This phenomenon was initially reported in a mouse lncRNA antisense (AS) to Uchl1 (Ubiquitin C-Terminal Hydrolase L1) gene which upon binding to sense mRNA facilitated its association with polysomes and enhanced UCHL1 protein production [1]. This translation up-regulatory function was found to be due to an inverted SINEB2 repeat of B3 sub-family embedded at the 3′ end of AS Uchl1. Thus, SINEUPs constitute two important regions-i) Binding domain (BD), the region overlapping to sequence within 5′ UTR and few bases of CDS of target protein-coding mRNA; ii) Effector domain (ED), an inverted repeat of SINEB2 essential for SINEUP function [2]. In recent years, SINEUPs had proven their efficacy against a variety of exogenous and endogenous targets in a wide range of human and mouse cell lines and also in in-vivo system, still the functional features of SINEUPs and the mechanism of action are rather elusive [1,2,3]. In current study, we examined key sequence and structural features of SINEUP ED to get a grasp of underlying SINEUP biology. Previously, we short-listed more than 30 natural antisense transcripts (NATs) in mouse FANTOM3 cDNA dataset with similar genomic features to that of AS Uchl1 [1]. Here, we report the SINEUP effect of a number of diverse SINEB2 sequences isolated from some of these NATs which were tested in synthetic SINEUP-GFP in HEK293T cells. We found that not only the B3 sub-family but other sub-families of mouse SINEB2 also display the SINEUP effect suggesting vastness of SINEUP class. We also investigated combinatory and synergistic/additive effect of SINEB2 repeats over SINEUP function. As it is known that functions of various RNA classes are highly dependent on their structures, we further analyzed structure of SINEUPs in human cells using in-vivo click selective 2′-hydroxyl acylation and profiling experiment (icSHAPE). Thereby, we report structural domains crucial for SINEUP function and present consensus sequence and secondary structure model for SINEUP RNA class. Moreover, we show phylogenetic clustering of different SINEB2 repeats based on their sequences and RNA secondary structures to evaluate connection of SINE evolution and SINEUP function. In view of the controversy related to the generation of off-target mutations by gene editing approaches, we tested the specificity of TALENs by disrupting a multi-copy gene family using only one pair of TALENS. We show here that TALENS do display a high level of specificity by knocking out simultaneously the function of the three genes (H2Afb3, Gm14920, H2Afb2, which are over 92% identical) that encode for H2A.B.3. This represents the first described knockout of this histone variant.

Results
We designed and validated 19 TALEN pairs; group 1 consisted of 12 specific to the H2Afb3 gene and group 2 (7 TALENs) targeting all three H2A.B.3 genes. Activity and specificity of TALENs were tested employing a Dual-Luciferase Single Strand Annealing Assay (DLSSA) and by Cel1 cleavage assay, confirming specificity of group 1 TALENs to only cleaving H2Afb3, while group 2 showed nuclease activity against all 3 H2A.B.3 encoding genes. One pair of group 2 was chosen to knock out the function of all three H2A.B.3 genes in mice. In total, we obtained 9 out of 19 pups which showed the desired KO. Female pups were used as founders for continued breeding with WT males, resulting in a pure non-mosaic mouse colony by G3. We then used a combination of exome sequencing and molecular techniques to investigate the presence of off-target effects. Single males from three consecutive generations (G1-G3) had their exomes sequenced and combined SNV and InDel analysis was performed. Off-target effects in the founder generation would lead to a "dilution" of InDels in each consecutive generation. This dilution effect was not evident in either exome sequencing data. Despite the lack of H2A.B.3, KO mice are able to reproduce, but do so with much smaller litter sizes, an indicator that H2A.B.3 is required for fertility.

Conclusions
We successfully applied TALEN technology using one pair of TALENs to specifically and simultaneously disrupt three gene copies of a gene family. Many human diseases are caused by gene copy number variations and therefore the use of a limited number of genome editing tools might help in the future in the treatment of such diseases. The use of only one pair of TALENs, rather than multiple pairs, reduces the complexity of the TALEN approach (and hence the cost), would reduce the likelihood of mosaicism and most importantly, reduce the possibility of offtarget mutations.

A63
Promoter Background Pathological left ventricular (LV) remodeling followed by size, shape and functional changes and usually affected by aortic stenosis, myocardial infarction, hypertension, or valvular heart disease. Generally, it occurs in up to 15% of population and associated with older age, male sex, diabetes. This remodeling includes myocyte hypertrophy, dilatation, fibrosis, and inflammation, which increase the risks for further ischemia, cardiomyopathy and heart failure. High LV mass is a major complication in different heart diseases and has strong impact on longevity, but also an important factor affecting survival rate after surgery or transplantation. For example, patients with aortic stenosis and modified LV have 5 times lower survival rate, mortality rate after liver transplantation has 4 fold increase in patients with left ventricular hypertrophy. However, molecular mechanisms at DNA regulation level underlying maladaptive transition remain unclear mostly due to hearts availability.

Results
Our study is aimed to create a detailed map for human heart promoterome. It will include samples from separated left atrium (LA) and ventricle from male and female, sick and healthy patients of different age. At the time, by using CAGE (cap analysis of gene expression) sequencing of 15 samples we were able to identify 13100 heart promoters, where 12051 overlap promoters from FANTOM5 project and therefore could be annotated by cell type (8309) and related diseases (3158). A set of ion exchangers, connexines and other heart related genes (n=40) with known expression profiles (for example, KCNA5, KCNK3, GJD2, Irx4) were used for data validation. We defined LV and LA specific genes and showed that functional annotation of such groups is consistent with literature available data: LA contains "synapse", "neuron", "signaling", while LV -"sarcomere", "fiber", "heart process" terms. Enrichment analysis revealed that differentially expressed genes and promoters are responsible for diseases like ischemia, vascular disease, myopathy, aortic disease. De novo predicted heart enhancers (4388) have enriched cardiac differentiation, maturation and regulation motifs related to transcription factors like SRF, MEF2A, Foxk1.

Conclusions
These results indicate CAGE approach as a suitable tool for further heart studies on the promoter level and provides comprehensive and comparable data about molecular regulatory mechanisms in human heart. Next, we're planning to extend our project with aging time course and LV hypertrophy samples. Finally, it will be a part of the promoterome atlas project for mammalian muscles. To decipher the genetic architecture of a human disease, various types of omics data are generated. Two common omics data types are (a) genotypes at a large number of marker loci, and (b) expression levels for a large number of genes, often generated at the genome scale using microarray. Although traditionally, data generated by various omics platform were analyzed separately, nowadays data integration and joint statistical analysis are emphasized to obtain robust inferences. But the problem in joint statistical analysis of multitype omics data is that sample sizes of different data types vary. One reason is that RNA are less stable than DNA. Therefore, commonly in a disease genetic study genotype data are available from a larger number of individuals compared to data on gene-expression. Again, gene-expression assays are expensive, leading researchers to examine the genotype data first, arrive at a set of tentative inferences based on them, and then select a subset of individuals to be recruited for gene-expression assay, to analyze the specific research question in hand. Intuitively, the subset selected should carry the widest range of information subject to the research question but otherwise homogenous. Clearly, the gene-expression profile of the individuals plays no role in the subset selection criteria since, the selection is conducted before gene-expression assay. Thus, sample size variation poses a major challenge to the biologist community to do integrated analysis of omics data. This motivated us to develop a statistical method integrating the available gene-expression information with genotype data, generated in case-control study. Our method is entirely based on omics data collected from same individuals, and depends on no reference transcriptome data for imputation, that induces population-stratification-bias [1,2].

Materials and Methods
We propose a novel two-step multi-locus association method using a latent variable, conjointly with logistic regression to integrate available genome-wide-marker information, with gene-expression data on a subset of individuals.

Results
We have derived asymptotic distribution of our test-statistic to calculate of p-value faster for real dataset. Our method detects novel single nucleotide polymorphisms (SNPs) associated with psoriasis along with already reported Genome wide association study (GWAS) SNPs. We could identify these associated variants at a much smaller sample size compared to the published corresponding GWAS, using an additional gene-expression information of a subset of genotyped individuals.

Conclusion
Extensive simulations reveal that our method is robust and powerful. Based on a real dataset [dbGAP; phs000019.v1.p1], we could identify a few novel psoriasis-associated SNPs, that remain undetected by GWAS only.

A65
Finding statistically significant marker combinations from genomic data for survival analysis Background Analysis of genomic data can provide valuable information such as genetic markers for both personalized and preventive medicine. However, with high-dimensional data, large correction factors for multiple testing are usually required for exhaustive assessment of possible markers. Additionally, comprehensive evaluation of marker combinations, which may provide useful insight to complex diseases such as cancer, has also become infeasible. To address these challenges that discourage true novel discoveries from data, we developed an algorithm for finding statistically significant single and multiple-marker interactions that may affect survival of individuals.

Methods
We adapted the Limitless Arity Multiple-testing Procedure (LAMP), a previously developed p-value correction technique for statistical evaluation of combination of markers which employs significant pattern mining for finding the optimal correction factor. We extended LAMP by incorporating the log-rank test using a newly introduced theoretical lower bound of the log-rank p-value, making LAMP more suitable to handle time and censored information common in survival data.

Results
The proposed method was applied to The Cancer Genome Atlas (TCGA) mRNA expression data for urothelial bladder carcinoma (BLCA) and acute myeloid leukemia (LAML) with 129 samples and 19637 genes and 173 samples 18754 genes, respectively, with focus on the highly expressed genes. We limit our results to detected single and combination genes that are highly expressed in at least 10 individuals. The algorithm detected a total of 36 single and 2-gene combinations with statistically significant log-rank p-values for LAML, and 49 single and multiple-gene combinations of up to order 4 for BLCA. Identified LAML markers have a minimum unadjusted log-rank p-value of 4x10 -08 , and 6x10 -07 for BLCA. Among the selected genes are known oncogenes from the COSMIC Cancer Gene Consensus such as DCTN1 and CCDC6 in the LAML results, and EGFR and GMPS in the BLCA results. Using the PPISURV tool from BioProfiling.de and the GSE32894 urothelial carcinoma data, we can verify the statistical significance of identified single and 2-gene combination markers from the BLCA data analysis. Out of 34 identified combinatorial markers also present in the GSE32894 data, 24 were statistically significant (p<0.05) when considering genes expressed at both high and low levels.

Conclusion
The algorithm we present is especially advantageous in detecting multiple-gene combination markers and sets no theoretical limit on the order of interaction. We expect that it can aide in the selection of potential prognostic markers for further clinical validation among large orders of candidate tests. However, the human genome sequence data are not readily compared with each other because of the essential differences in data format and annotation. If we can integrate all the three data sets into a single volume of data, we should be able to conduct a more detailed analysis of human genome sequence variation. The simple addition of individual samples leads to a total number of 4,861 individuals (= 1,417+940+2,504 individuals). Moreover, though genomic data of the Middle East (ME) populations is limited, ME could be a key to understand human history because of its unique location as a crossroad of Asia, Europe and Africa. With the aim of elucidating an evolutionary history of ME populations, we have developed the computational tools of integrating these three human genome sequence data sets incluiding ME populations into a single volume of data. Using those tools, we successfully integrated these data. Then, we constructed a phylogenetic tree of about 5,000 human individuals at the genome sequence level. As a result, we identified evolutionary clusters of the ME populations in relation to other major ethnic groups, with very interesting features. Here, we report the outcome of this kind of big data analyses with successful data integration, discussing evolutionary significance of human genomic variations. We also present how to identify the ME-specific variant alleles for particular diseases such as diabetes, using the Clinvar database. Enhanced carbonyl stress is closely related to neuropsychiatric disorders. The enzyme encoded by GLO1 is responsible for the catalysis and formation of S-lactoyl-glutathione from methylglyoxal condensation and reduced glutatione. GLO1 is localized to 6p21.2, where it is located emerged as a plausible autism-risk loci candidate. A common variant of GLO1 (rs2736654) has been associated with autism. The reduced expression of GLO1 has been detected in mood disorder patients. These findings strongly suggest that GLO1 is a promising candidate risk gene for neuropsychiatric disorders because neuropsychiatric disorders including autism, mood disorders and schizophrenia share genetic risk factors.
In this study, we focused on the association between rare genetic variants of GLO1 and schizophrenia. First, we conducted exontargeted resequencing of GLO1 with next-generation sequencing technology in 370 Japanese schizophrenia cases. Three heterozygous variants with allele frequencies of ≤1% , including one variant in the 3'-UTR, one missense variant, and one frameshift variant, were identified. We regarded c.365delC, the frameshift variant, as functionally relevant. We then performed association analyses on 1,799 cases and 1,764 controls with this variant. The variant was found in four cases and eight controls. There was no statistically significant association between c.365delC and schizophrenia (P=0.18). This frameshift variant in GLO1 might occur at near-polymorphic frequencies in Japanese polpulations, although further investigation by sample size expansion and biological analysis will be needed to exclude lowpenetrance associations.

Ethics Approval
All procedures performed in this study involving human participants were approved by the Ethics Committee of the Nagoya University Graduate School of Medicine and conducted in accordance with the Helsinki declaration and its later amendments or comparable ethical standards. Written informed consent was obtained from the participants and from the parents of the patients under 20 years old. The proliferation of next generation sequencing (NGS) into clinical research settings is critical for personalized medicine. As numbers and types of diseases studied increase, the demand for accurate and reproducible variant-interpretation pipelines has also surged. In the Middle East, variant interpretation is complicated by the significantly elevated fraction of novel or low-frequency alleles not seen before, due to the lack of large shared datasets of population-matched controls. These considerations suggest population-aware pipelines would be required to deliver personalized medicine at the point-of-care. To demonstrate this, we use whole exome sequencing (WES) to assess 48 patients with rare diseases from 30 families recruited as part of a Mendelian Disease pilot program in Qatara nation where multiple stakeholders are building expertise and capacity in genomic medicine. We enroll entire families (parents plus additional siblings), and develop population-aware pipelines to discover both de novo and recessive putative disease variants. Leveraging 1,370 Qatari sequenced controlswhich we share in a public databasewe demonstrate the impact of large population-specific allele resources in discriminating disease-causing from likely-benign variation. We show that for a given subject, even in the absence of sequenced family members, variant prioritization is improved up to 8-fold when our populationmatched controls are considered, versus when other large public databases (e.g. dbSNP, ExAC, 1000 Genomes, gnomAD) are used, stressing the poor representation of Arabs in these and the importance of data sharing for personalized medicine. Importantly, we identify candidate disease variants in 21 of 30 families. In 81% of these, we discover novel pathogenic variants in known disease genes, pointing to significant allelic heterogeneity and founder mutations in Arabs. For another 6 of 30 families, we discover variants in genes reported to cause only part of the observed clinical presentation, suggesting possible phenotypic-expansion of reported syndromes. Our results demonstrate the high diagnostic utility of WES in Arab populations and the dramatic improvement in variant prioritization conferred by using a large set of population-matched controlswhich we share to improve personalized medicine in the Middle East.

Ethics statement
The study was approved by the Institutional Review Board of Weill Cornell Medicine in Qatar.

Evaluation of tumor microenvironment by expression-based surrogate markers
Yi Huang, An Hsu, Pei-Yi Lin, Wan-Ting Huang, Hua-Chien Chen, Shu-Jen Chen ACT Genomics Co, Ltd., Taipei, Taiwan Correspondence: Shu-Jen Chen (sjchen@actgenomics.com) Human Genomics 2018, 12(Suppl 1):A69 In recent years, several studies have revealed the impact of immune contexture on clinical outcome in human tumors. In addition, using the immunoscore for colorectal cancer (CRC) could predict the patient survival more accurate than using microsatellite instability (MSI). However, a quantitative approach that could evaluate the tumor microenvironment (TME) of different types of solid tumors is still lacking. We develop a QPCR-based detection panel consisted of a set of immune-related genes and specific cancer biomarkers as surrogate markers to evaluate of TME. The CAGE profiles retrieved from FAN-TOM 5 were used to assess the capability of distinguishing tissue types by this panel of genes. Without any feature engineering method, the accuracy is more than 60% to separate the samples into the individual type of primary cultured cells and 97% for primary cultured immune and non-immune cells by a random forest model. Currently, we are optimizing the gene content based on feature importance and developing a prediction model based on the expression-signatures to estimate immune contextures by the mixture of pure immune and cancer cells. Before using the mixture of pure immune and cancer cells, we are going to establish a theoretical model by the CAGE profiles of primary cultured immune cells mixed with cancer cell lines. Our ultimate goal is applied our TME detection panel for the clinical usages in future. Recently the Japan Agency for Medical Research and Development (AMED) started new database programs to collect genomic variation and disease phenotypes at the national level. It aims to share the data among researchers and develop high-level medical research in the area of cancer and rare diseases. Simultaneously, legislative efforts towards protecting personal information have been underway in Japan and other countries. Our ELSI team, with funding from AMED, has been working on investigating these legal movements and preparing guidance to support researchers who want to share their data to advance research in the field.

Materials and methods
We carefully examined the Personal Information Protection Acts and related government guidelines. We contacted the Personal Information Protection Commission and discussed a variety of relevant legal issues. We also held an international meeting to survey comparable regulations in other countries, especially those in the United States and the United Kingdom. While preparing guidance, we discussed with researchers about ways and means for efficient data sharing which match the needs of their profession.

Results
We classified the laws and guidelines for different kinds of data and usage, and compiled them in a chart. For example, the data were categorized by the existence of consent of the subject and by data attributes such as identifiability of individuals. In general, data which was obtained with informed consent or data which did not contain individually-identifying information was allowed to be used without additional procedures when sharing the data. Omics data such as genomic data or transcriptomic data were difficult to classify, because they contain many types and there seems to be a variety of rules in different countries.

Conclusions
Usability of high-quality data is essential to innovation in medicine.
Integration of clinical data and biological data such as individual genomic data is one way to create such high-quality data. However, as an individual's data becomes more deeply integrated, the potential risks in terms of privacy increase, which would in turn increase the difficulty of obtaining consent. New mechanisms which complement the present scheme of privacy protection-which mainly consist of informed consent and the de-identification of the data-will be needed for the safe gathering and sharing of high-quality data, which is necessary for the development of next-generation medicine.

Background
Skeletal muscle plays an important role in many vital processes and is maintained by numerous pathways regulating protein synthesis and degradation. Loss of skeletal muscle mass and strength induces by the various conditions, including muscle disuse resulted in bed rest or unloading, age-related muscle weakness (sarcopenia), cachexia, multiple muscular dystrophies, myopathies, denervation and etc. Noticeably, muscle atrophy can affect specific fiber types and different muscles have the differential susceptibility to muscle wasting. However, the factors underlying these differences remain to be elucidated.

Materials and Methods
In the frame of international consortium, we are aiming to reconstruct transcriptional network in variety of muscles in normal and atrophic conditions using Cap Analysis of Gene Expression (CAGE) and Small RNA expression analysis. For the analysis, we are selecting 20-30 target muscles according to several criteria. Firstly, we are considering muscles which is frequently affected or not affected at all by disuse, sarcopenia, cachexia or muscle diseases. The second criterion is an availability of skeletal muscle to a biopsy. Since we are going to elicit age-dependent muscle atrophic changes, two or more age groups will be examined. Besides, male and female comparison will be carried out. It is assumed that as the research objects macaque, subset of human muscles and primary cell lines from patients with dystrophy will be used. As a result, the atlas of promoters and enhancers of RNA transcription in human and animal muscles in normal and pathological conditions will be created.

Results and Conclusion
The determination of the transcriptional factors and other genomic regulatory elements involved to differential response of muscle types may lead to a better understanding of the pathogenesis of the various muscle wasting conditions. Thus, it may help to find out new therapeutic targets for the specific disorders.

A72
Immuno-genomic subtype of liver cancer correlates with mechanisms of immune suppression and The recent clinical success of immune checkpoint inhibitors demonstrated an importance of understanding anti-tumor immunity, and it is required to explore the immunological and clinical significance of cancer genomics. We analyzed the transcriptomes and whole genomes of liver cancers in order to find immunogenomic subtypes that provides mechanistic insights and prognostic power.

Methods
We have performed RNA-Seq of 234 liver cancers and 196 adjacent liver tissues, as well as whole-genome sequencing [1]. Cytolytic activity (CYT) within tissues was computed as the mean expression level of GZMA and PRF1. The immune cell fractions were computed by CIBERSORT on RNA-seq data. Neoantigens were predicted by NetMHCPan3.0 using somatic mutations profiles and HLA genotypes from whole genome sequencing.

Results
CYT in tumors was associated with better overall survival (hazard ratio (HR) = 0.46, 95% CI 0.27-0.78) (Fig. 1). In contrast, CYT in adjacent liver tissues was associated with poor disease-free survival (HR = 2.187, 95% CI 1.07-4.48), probably due to the development of metachronous tumors originated from chronic hepatitis. Tumor CYT was significantly lower than liver CYT (p < 1×10 -15 ), indicating immune suppression by tumors. Liver fibrosis and the hepatitis C virus infection was associated with liver CYT but not with tumor CYT. A correlation between tumor CYT and liver CYT was positive but marginal (Pearson's correlation coefficient r = 0.18). Neither mutation burden nor the number of neoantigen was associated with tumor CYT (p > 0.05). To define immune subtype, we examined mRNA levels of immune suppressive genes. Some tumors had excessive expression of IDO1, CSF2, and PTGS1 relative to tumor CYT and had poor prognosis, consistently with the involvement of myeloid-derived suppressor cells (MDSCs). IL2RA and TNFRSF4 also showed similar trends, possibly reflecting regulatory T cells (Tregs). Independent estimation of Treg fraction using CIBERSORT was also associated with poor prognosis (p < 0.01). The absolute expression level of these genes did not have prognostic power, suggesting that the imbalance between positive and negative immune regulators is important for the immunity against liver cancers.

Conclusion
The local imbalance between cytotoxic lymphocytes, MDSC and Tregs could be used to define immune subtype of liver cancer. Association between the immune subtype and somatic alterations will be also discussed. The concept of ELSI was developed originally with the human genome project in NIH and later spread around the world. Unlike traditional bioethics, ELSI is interdisciplinary field that studies the development of genomic medicine and considers its social implementation. However, during the last twenty years, the circumstances surrounding genomic medicine have changed drastically. International collaborative research is increasingly important, including data sharing and international biobanks. In order to respond to this situation, ELSI research must become international in its scope. One vital part of this is grasp how ELSI research is practiced in other countries and regions. Understanding the diversity of approaches in ELSI research will lay the groundwork for more fruitful collaboration between researchers around the globe.

Materials and methods
We surveyed major organizations from around the world, including National Human Genome Research Institute in US, Genome Canada in Canada, Economic and Social Research Council in UK, Netherlands Genomics Initiative in Nederland, ELSA Norway network in Norway, Australian Genomics Health Alliance in Australia and Japan Agency for Medical Research and Development in Japan. Our primary method research was to survey the literature published by and about these institutions. We referred to papers and online information and compared mission statements, research subjects, and the achievements of each institution.

Results
As a result of the above examination, we were able to identify many of the differences and similarities of these institutions. One of our findings concerned the vast difference in the ELSI research agendas of different institutions, owing to the state of genetic research and the social conditions in the country. We were also able to confirm the importance of collaboration among foreign countries, as well as the benefits of cross-disciplinary programs between doctors and patients, experts and citizens, and the humanities and natural sciences Conclusions Based on our research, we conclude that, along with the development of genomic medicine, ELSI research must be globalized, taking into account common goals and values on the one hand, and respecting diversity on the other.

A74
Fine Many fine-mapping methods have been developed that aim to select a small sets of variants underlying a genome-wide association peak with a high probability of containing the true causal variant. Due to a lack of real world traits with a known causal variant, these methods are primarily tested through simulations that may not accurately reflect the underlying biology of genetic associations. Recently, many genetic associations with DNA methylation levels have been identified, including the case where a SNP in the methylated CpG site disrupts DNA methylation. We used this rare case of a known causal variant to test three differing fine-mapping approachesthe J-test, BSLMM and BIMBAMby constructing a 95% credible set for the causal genetic variant and testing for presence of the CpG site SNP. Using DNA methylation data from the Lothian Birth Cohorts of 1921 and 1936 (n=1366), we identified genetic associations with DNA methylation at 2266 CpG sites disrupted by a SNP with minor allele frequency greater than 10%. All three methods generated 95% credible sets containing the CpG site SNP in only~50% of cases, compared to the expected 95% obtained from simulated data. The low coverage of the known causal variant in credible sets was replicated in an independent cohort (n=1886). A conditional analysis of the region surrounding the CpGs site determined 34% of methylation sites were associated with multiple independent genetic effects, invalidating assumptions made when attempting to fine-map a single causal variant. For those CpG sites with evidence for a single underlying genetic effect, the regional linkage disequilibrium structure showed a masking of the effect of the SNP at the CpG site. This occurs through the most associated SNP both affecting the methylation at the CpG site and having its allele associated with less methylation occurring at a higher frequency and in complete LD with the allele disrupting methylation the CpG site SNP. This masking was more evident in CpG islands, where the fine-mapping methods captured the CpG-SNP less often than in non-island regions (OR=1.6, p=2×10 −3 ). The identified underlying genetic complexity when fine-mapping a Fig. 1 (abstract A72). Survival and CYT "trivial" case of SNPs within CpG sites altering DNA methylation has consequences for the fine-mapping of variants underlying higher order traits, particularly as many associations to complex traits are likely to act through effects on genomic regulation such as altered DNA methylation.

A75
Identification of chromosomal structural rearrangements by low-pass whole genome sequencing in recurrent miscarriage couples The majority of genetic loci underlying common disease risk act through changing gene expression, measured using bulk populations of mature cells. A crucial step that is missing is evidence of variation in the expression of genes as cells progress from a pluripotent to mature state. This is especially important for cardiovascular disease, as the majority of cardiac cells have limited properties for renewal postneonatal. To investigate the dynamic changes in gene expression across the cardiac lineage, we generated RNA-sequencing data captured from 43,168 single cells progressing through in vitro cardiac-directed differentiation from pluripotency. We developed two single-cell transcriptomics analysis software, including a novel and generalized unsupervised cell clustering approach (Clustering at Optimal Resolution -CORE) and a machine learning method for prediction of cell transition between clusters (single cell Global fate Potential of Subpopulations -scGPS). Using these methods, we were able to reconstruct the cell fate choices as cells transition from a pluripotent state to mature cardiomyocytes, uncovering intermediate cell populations that do not progress to maturity, and distinct cell trajectories that terminate in cardiomyocytes that differ in their contractile forces. These data reveal transcriptional networks underlying lineage derivation of mesoderm, definitive endoderm, vascular endothelium, cardiac precursors, and definitive cell types that comprise cardiomyocytes and a previously unrecognized cardiac outflow tract population. We identified the non-DNA binding homeodomain protein, HOPX, as functionally necessary for endothelial specification and dysregulation of HOPX during in vitro cardiac-directed differentiation underlies the molecular and physiological immaturity of stem cell-derived cardiomyocytes. Second, we identify new gene markers that denote lineage specification and demonstrate a substantial increase in their utility for cell identification over current pluripotent and cardiogenic markers. By integrating results from analysis of the single-cell lineage RNA-sequence data with population-based GWAS of cardiovascular disease and cardiac tissue eQTLs, we show that the pathogenicity of disease-associated genes is highly dynamic as cells transition across their developmental lineage, and exhibit variation between cell fate trajectories. Third, through the integration of single-cell RNA-sequence data with population-scale genetic data we have identified genes significantly altered at cell specification events providing insights into a context-dependent role in cardiovascular disease risk. This study provides insights into genetic networks governing lineage progression of hPSC cardiac directed differentiation by small molecule Wnt modulation, establishes a valuable data resource on cardiomyocyte differentiation, and presents novel analysis methods and software with broad applications. Background Technological advances trigger the generation of massively parallel genome-wide transcriptome data, known as RNA-sequencing data. Correct quantification of transcripts is the first and foremost problem to analyze such data. However, it faces considerable challenges mainly due to problems in comprehending the distribution of reads and ambiguity in mapping reads to proper isoforms, thus leading to problems in modeling and estimation of transcript abundance. Moreover, estimation of isoform abundance using millions of reads needs to be done under very realistic conditions of the experiment. The reads that hold key information to this estimation problem may be of different types for single-end or pair-end reads.

Materials and Methods
In order to estimate this abundance, we have developed a statistical method for isoform level quantification based on maximum likelihood approach under very general conditions of the nature and distribution of reads. Our likelihood function is multinomial type with indicators as latent variables. We adopt EM algorithm to obtain exact estimates and we avoid approximations or plug-in estimates in maximizing the likelihood function unlike existing methods. Among other features, our proposed method is also robust to the assumption of the probability distribution of reads, thus requiring no prior knowledge about read distribution.

Results
We have studied our method extensively using simulated as well as real data. We did simulations under various models with different distribution of reads. Our method shows very promising result in comparison to other available methods [1,2,3,4,5,6]. It outperforms other methods significantly especially when the number of alternately spliced isoforms is relatively large. It also shows superior performance when some isoforms are extremely low abundant. Our method shows good correlation with other methods on real data especially with eXpress [3] and Salmon [4] . It shows high correlation with qRT -PCR estimates. The confidence intervals calculated using our method is narrower than Cufflinks [1] estimates.

Conclusion
We propose a novel statistical method for isoform level quantification from RNA-seq data. It surpasses other existing algorithms in terms of accuracy when number of alternately spliced isoforms is very large and also when some isoforms are extremely low abundant. It can work with mixture of paired and single end reads and it also produces confidence intervals. It is robust with respect to distribution of reads. Moreover, this method is scalable with respect to memory allocation and is computationally very fast, thus making it extremely useful and feasible approach in practical implementation with real data. Operable breast cancer (OBC) patients undergoing surgery in the luteal, as opposed to the follicular, phase of the menstrual cycle were reportedly conferred with survival benefit. Clinically, a single dose of depot hydroxyprogesterone administered prior to surgery was shown to improve disease-free survival in OBC patients. We aimed to understand the molecular basis of the action of pre-operative hydroxyprogesterone in OBC patients.

Methods
Whole transcriptome sequencing (RNA-Seq) was performed on paired breast tumour samples that were obtained from 18 OBC patients who were exposed to hydroxyprogesterone before surgery and 13 patients who were unexposed to hydroxyprogesterone but only underwent surgery. RNA-Seq data were analysed using the Tuxedo protocol. Unpaired Student's t-test followed by Benjamini-Hochberg correction was performed for the differentially expressed genes between the two groups such that they qualified stringent filters in the exposed group. Significantly altered genes were evaluated for their functional annotation and some of these were validated using qPCR.

Results
Our study revealed 207 genes as significantly modulated due to hydroxyprogesterone. Of these, 142 genes were found up-regulated postsurgery among exposed patients, and down-regulated post-surgery among unexposed patients. Some of the major significantly enriched processes (with their genes forming an interaction network) included (a) response to: progesterone, steroid hormone, drug, regulation of hormone levels, (b) negative regulation of: tumour necrosis factor production, nitric oxide biosynthetic process, inflammatory response, (c) cellular response to: osmotic stress; oxidative stress; external stimulus, radiation; zinc ion; reactive oxygen species, hypoxia, (d) protein targeting to: endoplasmic reticulum; mitochondria, protein oligomerization and (e) gene expression, nonsense-mediated decay, nucleosome assembly, protein-DNA complex assembly. These results suggest that cellular stress is modulated due to hydroxyprogesterone exposure. Network analysis predicted UBC as the central node suggesting a stress responsive behaviour due to hydroxyprogesterone in patients irrespective of their progesterone receptor status. A high correlation and a similar trend of expression level between RNA-Seq and qPCR assays were obtained for 10 out of 14 chosen genes. Validated genes included progesterone-modulated genes (TSPO, PLN and CLDN4).

Conclusions
Significantly modulated genes identified in our study provide an initial set of biological correlates to the finding of the mentioned clinical trial. Hydroxyprogesterone possibly primes tumours towards growth retardation after having sensitized them towards surgeryinduced stress. Background REP522 is a small and elusive family of repeat elements with 368 copies in the genome. REP522 is an unclassified interspersed repeat of 1.8 Kb in size, with large palindrome of~600nt in the center. Previously, we reported that many REP522 repeats harbor bidirectional promoters that are activated in multiple cancer types [1]. Such REP522 elements are silent in normal cells, but become active promoters in cancer cells and drive the expression of multiple long noncoding RNAs and pseudogenes. Later, we showed by integrating DNA methylation and histone modification data that REP522 is epigenetically activated in lung cancer by DNA hypomethylation and active promoter histone modification [2]. However, the activation mechanism and the role of REP522 in cancer remains unknown.

Materials and methods
We performed in silico analysis of REP522 sequences and explored their potential to act as promoter and/or exon sequences using latest transcriptome annotation data. We also analyzed genomic localization and clustering of REP522 elements on the genome. Additionally, we used RNA-Seq data from 21 tumor types profiled by The Cancer Genome Atlas (TCGA) and performed comprehensive gene activation and correlation analyses in 7916 primary tumors and 725 normal tissue controls.

Results
We show that the large palindrome of~600nt in the center of REP522 repeats acts as a bidirectional promoter. Based on secondary structure prediction, the palindrome sequence can theoretically fold into a hairpin structure with~300nt arm length, the implication of the predicted hairpin structure is unknown as the sequence is occupied by nucleosomes when given REP522 element is transcriptionally active in cancer. Based on the RNA-seq data from TCGA, we show the frequency of REP522 activation across 21 cancer types and the correlations of expression of transcripts initiated from REP522 promoters.

Conclusions
The results from computational predictions and RNA-seq data analysis are encouraging and offer the first view into the role of REP522 in cancer. However, further studies, including wet-lab experiments and follow-up are required to elucidate the REP522 function and mechanism of activation in cancer. Due to the environmental and genetic background, it is known that some diseases show the population-specific tendency. In this study we focused on the genetic background for Middle Eastern population. Cardiovascular diseases are one of major death factors and actually causing a high mortal rate in Middle East such as in Saudi Arabia. We developed a method to detect the SNPs responsible to the diseases in a particular population and apply it to the Middle Eastern population. First, to reveal the genetic background of the diseases in the Middle Eastern population, we integrated a genomic variation database and human health information. Although there are some projects about human genomic variation, we use Human Genome Diversity Project (HGDP) database because they contain genetic variations of the Middle Eastern population. Also, ClinVar provides information about genomic variation and human health. We integrated these two databases. Then, we performed statistical analysis such as Fisher's exact test to reveal SNPs significantly associated with a population of interest. As an example, we focused on cardiovascular diseases in the Middle Eastern population. We examined allele frequencies of variations and found promising SNPs that are related to genes; CHRNA3, CACNA1C, CACNB2, and MYH6. Interestingly, these genes code for proteins involved in the nicotinic acetylcholine receptor-signaling pathway. They are related to nicotine addiction, which contributes to the development of cardiovascular diseases. Considering higher smoking ratio in Middle East, our results are convincing and our methods worked well. We can apply this method to other populations and diseases for further studies and clinical applications. Majority of the human genome encodes for RNAs (lncRNAs) yet most efforts to-date have focused on functionally characterizing protein coding genes. Growing experimental evidence suggests, however, that lncRNAs are key regulators in many biological processes and are widely implicated in human disease. Further, our recent computational work on human atlas of lncRNAs shows their diverse genomic and epigenomic origins and provides functional evidence for nearly 20,000 lncRNA genes. It is thus clear that understanding of the function of the non-coding part of the genome is required for general understanding of the function and regulation of the whole human genome.

Results
The sixth edition of the FANTOM project (FANTOM6) aims to systematically elucidate the function of lncRNAs in human by subjecting them to high-throughput perturbations in multiple cell types followed by profiling of the resulted transcriptional changes with Cap Analysis of Gene Expression (CAGE) to uncover their "molecular phenotypes". In the pilot phase, we used unbiased selection to choose hundreds of lncRNA genes (both published as well as novel entities discovered in our FANTOM5 project) and perform a loss of function experiments in human skin fibroblast. We found that more than half of the suppressed genes resulted in an appreciable and non-random transcriptional change. Further, integrating the perturbation data with transcripts features showed that the transcriptome response was independent of the lncRNAs expression levels, their cellular localization and the conservation also did not seem to play a role. Still, we found that some of the transcript properties such as polyadenylation or specificity to the cell type of study produced a higher response. We also noticed that perturbing antisense transcripts resulted in a larger number of perturbed genes. Pathway and gene ontologies enrichments of affected genes revealed that different lncRNAs are involved in distinct biological processes, yet some commonalities across them exist.

Conclusions
We present experimental evidence that long-ncRNA molecules are more biologically relevant than previously known by showing that their suppression in a cell type of interest non-randomly affected expression profiles of genes that function collectively in specific molecular/biological processes.

A83
Estimating frequency of pathogenic variants in a Japanese population by using the whole-genome reference panel of ToMMo Yumi Yamaguchi-Kabata 1,2 , Jun Yasuda 1,2 , Osamu Tanabe 1,2 , Yoichi Suzuki 1,2,6 , Hiroshi Kawame 1,2 , Nobuo Fuse 1,2 , Masao Nagasaki 1,2,3 , Yosuke Kawai 1,2 , Kaname Kojima 1,2 , Fumiki Katsuoka 1,2 , Sakae Saito 1,2 , Inaho Danjoh 1,2 , Ikuko N Motoike 1,2,3 , Riu Yamashita 1,2,3 , Seizo Koshiba 1,2 , Daisuke Saigusa 1,2 , Gen Tamiya 1,2,7 , Shigeo Kure 1,2 , Nobuo Yaegashi 1,2 , Yoshio Kawaguchi 1 , Fuji Nagami 1 , Shinichi Kuriyama 1,2,5 , Junichi Sugawara 1,2 , Naoko Minegishi 1,2 , Atsushi Hozawa 1,2 , Soichi Ogishima 1,2 , Hideyasu Kiyomoto 1,2,8 , Takako Takai Clarifying population frequencies of pathological variants is essential to construct infrastructure for genomic medicine for different populations; however, rare variants that may have biological or medical effects have not been clarified for the Japanese population. Here, by analyzing a whole-genome reference panel of a Japanese population based on 2,049 individuals (2KJPN, available at iJGVD; http://ijgvd. megabank.tohoku.ac.jp/), we characterized the genomes of Japanese individuals in terms of allele frequency of functional and pathological variants. Among the 28 million autosomal single nucleotide variants (SNVs) in 2KJPN, 6,862 SNVs were identified as pathologically annotated variants that overlap with known pathogenic variants in the Human Gene Mutation Database and ClinVar. In particular, we focus on 1) actionable genes recommended by the American College of Medical Genetics and Genomics (ACMG) for returning results of secondary findings, 2) congenital metabolic disorders for newborn screening (NBS), and 3) identifying genome-wide pathogenic variants showing higher frequencies in Japanese compared with other ethnic groups. Our results by an automatic analysis on the 57 autosomal ACMG genes showed that 21% of the individuals were found to have at least one reported pathogenic allele. However, after reviewing variants through literature survey in several genes, we found that reevaluation is indispensable for constructing the information infrastructure of genomic medicine for the Japanese population. By focusing on 32 genes for 17 congenital metabolic diseases for newborn screening (NBS) in Japan, we identified reported pathogenic variants and estimated their carrier frequencies by variant filtering based on variant annotations and allele frequencies. Our results showed that variant frequencies were relatively higher in PCCA and SLC25A13 among the genes examined. To understand genetic basis of different incidence rates among ethnic groups, further evaluation by appropriate variant interpretation is needed. In addition, we identified genome-wide pathogenic SNVs showing higher allele frequencies in 2KJPN than in other ethnic groups, and these SNVs included clinically important variants (e.g. CDH1 for hereditary diffuse gastric cancer, APRT deficiency and CETP for cholesterol ester transfer protein deficiency) for personalized medicine and prevention for the Japanese. Our results and ongoing activities on variant curation would lay the foundation for i) evaluating the relationships of genomic variants and disease prevalence, and ii) improving diagnostic strategies and genetic testing in Japan and East Asia.  [1]. I found that putative transcriptional target genes tend to include similar functions of genes. The normalized numbers of functional enrichments of putative transcriptional target genes changed according to the criteria of enhancer-promoter associations, and showed the same tendency in various cell types. DNA binding motif sequences of CTCF indicate their orientation bias at chromatin interaction anchorsthe forward-reverse orientation is frequently observed. The normalized numbers of functional enrichments were significantly larger in enhancer-promoter associations shortened at the genomic locations of forward-reverse orientation of CTCF binding sites than enhancer-promoter associations shortened at CTCF binding sites without considering their orientation in various cell types. The numbers of chromatin interactions of Hi-C experiments overlapped with enhancer-promoter associations showed the same tendency. The expression level of transcriptional target genes predicted by each transcription factor also changed, according to the criteria of enhancer-promoter associations. To predict DNA binding motif sequences of transcription factors and repeat DNA sequences involved in chromatin interactions, here I examined the expression level of transcriptional target genes predicted by enhancer-promoter associations shortened at the genomic locations of DNA binding motif sequences of a transcription factor, instead of CTCF. I found that forward-reverse orientation of cohesin (SMC3 and RAD21) as well as CTCF significantly affected the expression level of putative transcriptional target genes in various cell types. DNA binding motif sequences of other transcription factors and repeat DNA sequences significantly affected the expression level of putative transcriptional target genes in forward-reverse or reverseforward orientation. + Both authors contributed equally to this work Background: Osteoarthritis (OA) is a common joint disorder with increasing impact in an aging society. To understand its pathogenesis, molecular profiling such as RNA expression and DNA methylation for diseased tissues have been reported. On the other hand, chromatin variations in OA are so far unexplored, mainly due to the technical difficulties in applying traditional epigenomic tools on clinical samples. Methods: In this study, we employed an epigenomic method, Assay for Transposase-Accessible Chromatin with high throughput sequencing (ATAC-seq), to characterize the genome-wide chromatin alterations associated with knee OA by comparing damaged to intact primary cartilage tissues in patients.
Results: In total 109,215 open chromatin regions were detected, and 4,450 of which were differentially accessible between normal and diseased cartilage tissues. We further defined 2,318 coding and noncoding genes with overlapping differential accessible chromatin region at their promoters and enhancers as OA related, which significantly overlapped with those identified by RNA-seq in previous study using different donors. Further analyses indicated these genes are enriched for pathways regulating chondrogenesis, endochondral ossification, and mesenchymal stem cell differentiation. Consistently, analyses for differential accessible regions distal to the TSS showed bone-related enhancers and super-enhancers are dysregulated in OA diseased tissue. Moreover, integrating ATAC-seq data with public available database allowed us to identify altered cis-regulatory elements as well as transcription factors binding, and link them to important genes involved in these pathways. Furthermore, ATAC-seq data allows us to prioritize functional SNP relevant to OA discovered in previous GWAS studies. Taken together, abnormal self-healing in OA knee cartilage was observed in this study, suggesting induced endochondral ossification-like cartilage-to-bone conversion is a characteristics of OA progression.
Conclusions: This study demonstrates a genome-wide investigation of accessible chromatin regions can be a powerful method by probing changes of regulatory genomic elements in clinical samples.

A86
Rewiring MIR retrotransposons in long non-coding RNAs to regulate translation of paired antisense genes: the case of MAPT -AS1 Simone R 1,4 , Javad F Background Retrotransposons constitute~42% of the human genome. They are non-randomly distributed within the transcriptome, enriched in introns and long non-coding RNAs (lncRNAs) compared to proteincoding genes. Little is known about the function of specific classes of retrotransposon-derived lncRNAs.

Materials and methods
We took an unbiased transcriptome-wide approach to investigate lncRNAs with exapted mammalian-wide interspersed repeats (MIR) and identified 1,197 lncRNA genes containing MIRs (MIR-lncRNAs). They form sense-antisense pairs with 1,045 protein-coding genes, most forming a network of interacting proteins. Gene ontology analysis revealed that MIR -lncRNAs overlap with genes commonly expressed in the nervous and immune systems. We focused on one neuronally restricted MIR-lncRNA, known as MAPT-AS1.
Using RNA sequencing and qRT-PCR, we assessed expression of MAPT-AS1 and MAPT in brain and differentiating human induced pluripotent stem cells. In neuroblastoma cell lines, we either silenced or stably expressed MAPT-AS1 splice variants and targeted deletion mutants to identify the essential functional domains of MAPT-AS1 and characterized them by western blot, qRT-PCR, polysome profiling and luciferase reporter assays.

Results
As an example relevant to human disease, we focused on MAPT-AS1, an alternatively spliced, cytoplasmic-enriched antisense lncRNA that overlaps with the 5'-untranslated region of the tau gene, MAPT. Using loss-of-function experiments and cell lines stably expressing variants of MAPT-AS1, we showed that it specifically represses MAPT translation. This repression requires both the overlapping 5' region of MAPT-AS1, complementary to the MAPT internal ribosome entry site (IRES), and a 3' inverted MIR element. A truncated 90nt MAPT-AS1 construct consisting of only the 5' overlapping region and the MIR retained the capacity to repress MAPT translation. Furthermore, two 7mer motifs within the MIR, complementary or identical to sequences within 18S ribosomal RNA, are essential for MAPT translational repression.

Conclusions
Our data suggest that MAPT-AS1 lncRNA functions by interfering with ribosome recruitment to the MAPT mRNA. MIR-lncRNAs frequently overlap with genes implicated in neurodegenerative diseases and may thus establish an additional layer of translational regulation, with implications for proteostasis in neurodegeneration.

A87
Eukaryotic genome annotation using RNA-seq and homology information Keywords: bioinformatics, eukaryotic genome, genome annotation Kentaro Recent advances in the large-scale sequencing technology have expanded opportunities to perform the de novo sequencing of various organisms. The genome annotation is fundamental information for all genome-based analyses. RNA-seq data obtained from NGS and known protein sequences stored in the public databases are very useful for constructing annotations. However, the numbers of raw RNA-reads and all known proteins are large, additionally their similarities to the target genes are varied, so it is difficult to combine these data to construct a reliable genome annotation. Here we present the automatic genome annotation pipeline that integrates the RNA-seq and protein sequences derived from various species. In the current ]) on three different species' genome (mammalian, insect, and fungi). The accuracies were 97%, 96%, and 88% of BUSCO-scores in the mammalian, insect, and fungi, respectively. Our pipeline archived high accuracies without complicated parameter settings, and will help to construct reliable genome annotations and to facilitate whole-genome analyses.

Background
Cognitive ability is highly heritable [1] and a major determinant of human health and well-being [2]. Recent genome-wide metaanalyses have identified 24 genomic loci linked to variation in general cognitive ability [3][4][5][6][7], but much about its genetic underpinnings remains to be discovered. Here, we present the largest genetic association study to date (N=279,930) of cognitive ability.

Materials and methods
We performed a genome-wide meta-analysis of 16 independent cohorts of European ancestry. Cognitive ability was assessed using various measures of performance in fluid domains of cognitive functioning. Despite sample and methodological variations, genetic correlations (rg) between cohorts were considerable (mean=0.63), and there was no evidence of heterogeneity between cohorts in the single nucleotide polymorphism (SNP) associations. GWAS was conducted following a standardized quality control procedure, including e.g. checks on missingness, genotyping errors, relatedness and corrections for possible population stratification. Meta-analysis was carried out in Metal.

Results
We detected 12,701 genome-wide significant variants, represented by 246 independent lead SNPs in 206 genomic loci (191 novel). Roughly half of the significant SNPs are located within a gene, and 4.4% are highly likely to have a regulatory function. The results implicated 1041 genes via positional, eQTL and chromatin interaction mapping, and gene-based analyses. 89 genome-wide significant hits were exonic nonsynonymous variants, with clear implications on gene function. Stratified LD Score regression showed enrichment for heritability of SNPs located in conserved regions of the genome, coding regions, H3K9ac histone regions, and super-enhancers. While conserved regions have been implicated for cognitive ability and neuropsychiatric phenotypes, enrichment in coding regions has not previously been linked to variation in cognitive ability. Pathway analyses showed independent associations of three biological pathways (neurogenesis, neuron differentiation and synaptic structure), several brain tissues, and specific brain cell types (cortical and hippocampal pyramidal neurons, and striatal medium spiny neuron cells). We confirm previously reported genetic correlations with several neuropsychiatric traits, and used Mendelian randomization to test for credible causal associations. We found a strong protective effect of cognitive ability on ADHD and Alzheimer's dementia, bidirectional causation and strong pleiotropy between cognitive ability and schizophrenia, and a risk effect of cognitive ability on autism.

Conclusions
This study provides the most extensive meta-analysis of genetic variants associated with cognitive ability to date, provides novel insight into the underlying neurobiology and into causal relationships with other traits. The results are a rich resource of functionally relevant outcomes, which can provide starting points for further functional work.

Background
Hereditary breast and ovarian cancer (HBOC) is one of the most frequent monogenic disorders, what led to identification of number of high-penetrance (BRCA1 and BRCA2, PTEN, TP53, CDH1, and STK11), moderate-penetrance (PALB2, ATM, BRIP1, and CHEK2) genes. Rapid development of high throughput sequencing and decrease in cost of the analysis led to the utilization of gene panel sequencing for clinical research and fast accumulation of pathogenic mutations identification. Present study is the first attempt to use NGS for hereditary cancer study in different ethnic groups in Russia.

Materials and methods
Individuals were chosen to be included in this study according to the following criteria: (1) young age of disease onset, (2) the presence of relatives with breast or ovarian cancer diagnosis. The NimbleGen SeqCap EZ Choice kit ("Roche") was used for target enrichment, and sequencing was performed using Illumina MiSeq ("Illumina") using paired-end 2 × 251 nucleotide single-index sequencing. HGMD Professional 2017.1 and BIC databases were used to identify pathogenic mutations.

Results
We have launched "Hereditary Oncogenomics Russia" project to study hereditary cancer syndromes in Russian Federation aiming to create biobank of blood samples and mutation database of pathogenic mutations in cancer related genes. In order to collect samples from patients suffering from hereditary cancer according to NCCN recommendations several cancer centers were recruited. All participants provided written informed consent. On the first step we analyzed 378 samples from patients with breast or ovarian cancer. Of the 378 patients, 26% carried mutation in BRCA1 gene, 17% in BRCA2 gene. Thus, less than 50% of cases might be explained by pathogenic BRCA1 or BRCA2 mutation. However, in the majority of BRCA-negative cases mutation in other reparation related genes, including CDK12, FANCL, APC, ATM, CDH1, MUTYH, PALB2, FANCI, CHEK2, RAD1C, was found.

Conclusions
Among patients with fulfilling NCCN criteria of hereditary breast or ovarian cancer, in addition to well-known founder mutations in BRCA1 and BRCA2 genes, we found other frequent pathogenic alteration in these genes. Panel sequencing revealed that in BRCA negative cases, one of the reparation related genes might play leading role in cancer development. Our first results stress out the critical importance of country-wide national project for identification of spectrum of germline pathogenic mutations in Russia, as a multiethnic country, and following updating of clinical recommendations for cancer-associated mutation screening strategy.

Background
Long-time sun exposure is thought as a risk factor for age-related macular degeneration (AMD) which is a leading cause of the blindness in developed countries. In rats, intense light exposure induces retinal degeneration resembling AMD pathology (an experimental retinal photic injury model). The susceptibility to the degeneration differs in different rat strains. Therefore, we tried to identify the gene(s) which causes the susceptibility difference and might be related to AMD onset.

Materials and methods
The retinal photic injury was induced by white light exposure (3000 lux, 3 hrs). The susceptibility was evaluated by Morris water maze test for eyesight and pathological analysis of the eyes for retinal degeneration. The genotyping of each rat individual was performed using polymorphic microsatellite and SNP markers which can distinguish the strain specificity. A targeted exome analysis was performed with a next generation sequencer.

Results
A rat strain (WKY) susceptible to and another strain (LEW) resistant to the retinal photic injury were crossed to produce F1 offspring. The F1 was susceptible, meaning that the susceptibility was a Mendelian autosomal dominant trait to the resistance. We named the trait "the susceptibility to retina photic injury" and the responsible gene "Rpi1".
In order to identify the Rpi1 gene, the first backcross (BC1) offspring was produced by mating F1 with the recessive LEW. Using the Morris water maze test before/after the light exposure, susceptible individuals in the BC1 generation were selected to produce BC2 offspring. Thus, as the backcross and selection of susceptible individuals were iterated, the fraction of WKY genome was gradually decreased. An obtained BC11 offspring had a 3.8-Mb WKY genome region within chromosome 5q36 as a sole WKY component, in which the Rpi1 gene must exist. We performed the targeted exome analysis for this region to identify Rpi1 and found 18 loci with non-synonymous base changes in coding regions of 15 genes. Using additional backcross individuals and another photic injury-susceptible strain F344, recently we could narrow down to only one candidate gene (tentatively Gene R) with reasonable relationship between genotype and phenotype.

Conclusions
We consider that the Gene R is Rpi1 itself. The Gene R has never been reported as the susceptibility gene of AMD. We are now extensively making efforts to give more direct evidence for the Gene R as the causal gene of rat retina photic injury susceptibility difference and possible relationship with AMD.

Background
Invasive meningococcal disease (IMD) is a rare condition affecting children and young adults due to infection with Neisseria meningitidis, resulting in meningitis or sepsis. Although the majority of the general population is colonized by N. meningitidis, only a small minority go on to develop IMD, suggesting that those that succumb to invasive disease may possess an underlying genetic susceptibility. The notion of a genetic contribution to disease manifestation is supported by the finding that patients with congenital complement deficiencies are susceptible to recurrent IMD, yet these conditions are rare. With the aim of identifying other genetic factors underlying IMD, we carried out whole exome sequencing (WES) of approximately 300 IMD patients from an extensive and well characterized cohort of >2,000 childhood IMD patients.

Materials and Methods
The WES analysis focused on rare variants (MAF<0.01) predicted to have a detrimental effect on protein function. We undertook analysis of seven multiplex families identifying IBD variants. The rest of the index cases were analysed as a cohort and put through gene and pathway burden tests to identify any genes/pathways that were enriched in the collection of patients.

Results
These analyses revealed a number of patients harbouring mutations in known primary immunodeficiency genes (approx. 10%) as well as a novel configuration of mutations underlying the complement genes. Furthermore, novel mutations in the coagulation pathway and mucosal immunity genes were identified and functionally confirmed.

Conclusions
The identification of genes involved in IMD through WES has demonstrated the complex genetic architecture of meningococcal immunity revealed some novel and unexpected genes/pathways that modulate disease susceptibility and severity. The results from this study provide us with a more comprehensive understanding of IMD pathogenesis. is a challenge, because we deal with a very complex multifactorial network. Therefore, out goal is to develop test systems with a limited complexity, but which still are informative in terms of molecular tumor development.
To reach this, we first have to singularize cells from tumor tissue. Within surgical tissue, tumor core tissue has to be discriminated from surrounding tissue by MRI or immunohistochemistry. Afterwards, microdissected tissue samples have to be dissociated by protease digest. In a next step, the mixture of cells has to be separated into the different cell types by immunopanning and/or FACS sorting. RNAseq helps to identify gene expression profiles specific for individual cells within the tumor compared to the tumor microenvironment. For us the most important parameters are the adhesion and migration activities of the various cell types of different origin on different matrices. To establish a test system as a starting point for a more sophisticated experimental setup to characterize tumorigenic tissues, we firstly isolated mesenchymal stem cells (MSC) from fat tissue or bone marrow, before transfer to microwells covered with electrodes to monitor cell adhesion by electrochemical impedance sensing. We found that pre-coating of the microwells with either collagen type I or IV induced different impedance profiles, corresponding to distinct differentiation pathways which was confirmed by histochemical staining as well as protein and gene expression analyses.
In another approach, we used impedance sensing to measure prostate or breast cancer cell migration towards the secretome of mesenchymal stem cells. MSC cell culture supernatant was fractionated by size exclusion and ion exchange chromatography. We identified type I and III collagen, fibronectin and Laminin 421 as extracellular matrix proteins to be the main drivers of rapid cancer cell migration requiring as little as two hours for a full migration response. The challenge now is to adjust this experimental approach to individual cell types isolated from different tumor types and also to different matrices simulating the tumor microenvironment. Deep Learning, as a class of machine learning algorithms and neural network architectures, shows high promise in biomedicine. In particular, Convolutional Neural Networks (CNNs) have been successfully applied in a wide range of domains, including the prediction of transcription factor binding sites, the identification of genomic variants, and bioimaging. CNN architectures are highly efficient in modeling 2D spatial information, discovering features of interest through layers of convolutional filter units inspired by the structure of the brain visual cortex. The convolution filters require a distance or neighbourhood function in the input feature space, which is a limitation to their general application to omics data. Overcoming existing approaches that convert omics data into images for subsequent CNN processing, we propose a new framework that uses direct feature remapping into appropriate metric spaces. We recently developed omicsCNN, a generalized CNN solution for omics data. The framework is implemented as a novel Keras layer, OmicsConv. On metagenomics 16S sequencing data, we applied the patristic distance to define the embedding and build a classifier of distinct subtypes of Inflammatory Bowel Disease, with superior accuracy to other machine learning methods.
Here we present a further development of the omicsCNN architecture customized for application to transcriptomics data. We applied the omicsCNN solution for transcriptomics to expression data (both microarrays and RNAseq) from the MicroArray/Sequencing Quality Control study (MAQC/SEQC) neuroblastoma dataset, including 498 patients (176 high risk) to predict event free and overall survival endpoints. The framework uses Gene Ontology (GO) mutual semantic similarities combined with a gene co-expression measure, obtaining a multilayer network, whose layers are computed from the three GO categories and co-expression. For each layer, the metric between two genes is given by the diffusion distance, which quantifies the likelihood that two random walkers originating in the two genes meet in another node of the net. The global distance in the network is defined as the L2 Euclidean product of the diffusion distances of each layer, thus enabling the OmicsConv layer. We evaluate the solution in terms of predictive accuracy, also comparing with a set of machine learning methods. We discuss the impact of OmicsCNN as a natural deep learning setup for transcriptomics data modelling and evaluate insights from features discovered by the model. Colorectal cancer (CRC) is the 3rd most common cancer diagnosed in males and second in females, with approximately around 1.4 million cases of CRC were diagnosed and over 690, 000 deaths were reported in 2012. The majority of CRC are classified as adenocarcinoma (AC) and approximately 10-15% of CRC patients are presented with mucinous carcinoma (MC) subtype. MC is characterised by secretion of mucin in which the presence of mucin causes the cancer cells to spread faster and become more aggressive compared to AC subtype. Therefore, colorectal MC has been associated with poor response to treatment, low survival rates, worse prognosis among patients and present at more advanced stage of disease than non-mucinous adenocarcinoma. However, the knowledge on the epigenome-wide methylation profile associated with this type of malignancy is scarce. This study aims to determine differentially expressed and methylated genes in colorectal MC as compared to AC, to integrate the methylation status with gene expression profiles and to correlate with clinicopathologic features. We used the TCGA-generated methylation data based on Illumina Methylation 450 beadchip in β value format retrieved from and level 3 RNA sequencing dataset in RSEM (RNA-Seq by Expectation Maximization) format downloaded from Firebrowse. After filtering for only patients with paired methylation and gene expression datasets, a total of 289 CRC patients were identified. Two hundred fifty patients were diagnosed with AC and 39 were diagnosed with MC histology. Additionally, gene expression data from 173 normal colon and methylation data from 45 normal colon were included as comparisons. Unpaired t-test with a Benjamini Hochberg (BH) false discovery rate (FDR) multiple testing correction was performed using Bioconductor in R. We identified 1336 genes which were significantly demethylated in MC versus AC; 788 were hypermethylated and 548 were hypomethylated with 295 and 131 genes had Δβ ≥ 0.1 and ≤ -0.1, respectively. From RNAseq analysis we identified 3645 significant differentially expressed genes with 2025 were upregulated and 1620 were downregulated. EIF6 is the most significantly hypermethylated (22.8% hypermethyation; p = 9.92E-07) in MC versus AC and its expression was downregulated -1.631 fold (p = 1.90E-04). On the other hand, BAG3 is the most significantly hypomethylated (20.8% hypomethylation, p = 1.65E-03) despite modest but significant upregulation of expression (0.309, p = 8.20E-03). Both genes were involved in carcinogenesis processes. These preliminary findings suggest that the aberrant DNA methylation of EIF6 and BAG3 might be involved in MC development. Background Major Depressive Disorder (MDD) is a notably complex illness with a lifetime prevalence of 14%. It is often chronic or recurrent and is thus accompanied by considerable morbidity, excess mortality, substantial costs, and heightened risk of suicide. MDD is a major cause of disability worldwide. Twin studies attribute~40% of the variation in liability to MDD to additive genetic effects, and may be greater for recurrent, early-onset, and postpartum MDD. GWA studies of MDD have had notable difficulties in identifying loci. Previous findings suggest that an appropriately designed study should identify susceptibility loci.

Materials and methods
We conducted a Genome-Wide Association (GWA) meta-analysis in 130,664 MDD cases and 330,470 controls. We used the standard GWA mega/meta-analysis methods of the core PGC quality control, imputation, and analysis pipeline (i.e., "ricopili").

Results
We identified 44 independent loci that met criteria for statistical significance. We can make a strong case for the identification of RBFOX1 and NEGR1. The genetic findings were associated with clinical features of MDD, and implicated prefrontal and anterior cingulate cortex in the pathophysiology of MDD (regions exhibiting anatomical differences in MDD cases vs controls) . Genes that are targets of antidepressant medications were strongly enriched for MDD association signals (P=8.5e -10), suggesting the relevance of these findings for improved pharmacotherapy of MDD. Sets of genes involved in gene splicing and in creating isoforms were also enriched for smaller MDD GWA P-values, and these gene sets have also been implicated in schizophrenia and autism. Genetic risk for MDD was correlated with that for many adult and childhood onset psychiatric disorders. Our analyses suggested important relations of genetic risk for MDD with educational attainment, body mass, and schizophrenia: the genetic basis of lower educational attainment and higher body mass were putatively causal for MDD whereas MDD and schizophrenia reflected a partly shared biological etiology.

Conclusions
We present extensive analyses of these results which provide new insights into the nature of MDD. All humans carry lesser or greater numbers of genetic risk factors for MDD, and a continuous measure of risk underlies the observed clinical phenotype. MDD is not a distinct entity that neatly demarcates normalcy from pathology but rather a useful clinical construct associated with a range of adverse outcomes and the end result of a complex process of intertwined genetic and environmental effects. These findings help refine and define the fundamental basis of MDD Human pluripotent stem cell-derived cardiomyocytes show promise for clinical application, especially disease modeling and drug screening. To reach this potential, however, improved maturity and reduced cellular heterogeneity of in vitro differentiated cardiomyocytes is needed. Here, we addressed transcriptional heterogeneity in induced pluripotent stem cell (iPSC)-derived cardiomyocytes (iPSC-cardiomyocytes) by single-cell RNA sequencing. We successfully defined differentiation states at the single-cell level and clarified heterogeneities in the gene expression patterns of iPSC-cardiomyocytes, especially those cultured over long periods. We developed a novel maturation index to perform in silico sorting and score the maturation level of the cells. Additionally, we identified a new cell surface marker of mature iPSC-cardiomyocytes by which we could enrich well-matured iPSC-cardiomyocytes from a heterogeneous population of iPSCcardiomyocytes. Our single-cell RNA sequencing approach could help clarify variation of in vitro differentiated cells and provide a new strategy to purify cells of interest. Congenital diaphragmatic hernia (CDH), characterized by malformation of the diaphragm and hypoplasia of the lungs, is one of the most common and severe birth defects. This phenotype is associated with high mortality rates and long-term morbidity among survivors. There is a growing body of evidence demonstrating that genetic factors contribute to many forms of CDH in humans, although the pathogenesis remains largely elusive. Small variants such as single nucleotide polymorphisms (SNPs) have been studied in recent whole exome sequencing (WES) efforts but larger copy number variants (CNVs) have never been studied on a large scale at the genome level. In this study, to capture CNVs within CDH candidate regions in a cost-effective manner, we developed and tested a targeted array comparative genomic hybridization (aCGH) platform to identify CNVs within 140 candidate CNV regions across 196 patients with CDH and 987 normal control samples. As a result, we identified 6 significant CNVs from this patient-control study. Three of these 6 significant CNVs were found in multiple patients with CDH, but not in any controls, increasing the likelihood that they are disease-causing. These CDH-specific CNVs reveal high-priority candidate genes including HLX, LHX1, and HNF1B. The remaining 3 significant CNVs were found in multiple patients and a low frequency in controls, but were significantly enriched in patients with CDH. The candidate genes within these significant CNVs form functional networks with other known CDH genes and play putative roles in both DNA binding/transcription regulation and embryonic organ development. Investment in functional studies will further our understanding their contribution to the pathogenesis of CDH. Background: The mouse model of Parkinson's disease (PD) using the neurotoxin 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) provides an important and reliable tool to study the disease 1 . Herein We used RNA sequencing (RNA-Seq) to evaluate the role played by alternative exon usage and splicing events in the MPTP mouse model of PD. 2 Materials and methods: MPTP and saline were administered to adult mice by intraperitoneal injection for MPTP-treated and sham groups respectively. Whole striatum was collected from both MPTPtreated and sham groups at Day 4, and total RNA was extracted using TRIzol followed by phenol/chloroform. RNA-Seq libraries were prepared from total RNA using the ribosomal RNA depletion method. RNA-Seq was performed on the Illumina HiSeq X10 platform by sequencing constructed individual cDNA library with a configuration of 90 million 150 bp pair-end reads per sample. Reads were aligned to the mouse full genome (UCSC version mm10) and splice junctions were identified by using RNA STAR version 2.5. Differential exon usage of RNA-Seq data was detected using the R language package DEXseq. Significance level was set at 5% false discovery rate. Functional analysis including gene ontology (GO) enrichment and KEGG pathway was performed using DAVID bioinformatics resources 6.8. Selected exons and transcripts were validated using reverse transcription polymerase chain reaction. Results: In total 1048 exons in 873 genes showed significant differential usage that resulted in an absolute value of log2 (fold change) greater than or equal to 0.5, with 409 exons up-regulated and 639 exons down-regulated in the MPTP-treated group compared to the sham group. Among the genes with differential usage, 774 were protein-coding genes, and 99 were non-coding. KEGG pathway analysis using DAVID showed that these genes were mostly enriched in neurotransmission and axon guidance pathways, especially in the dopaminergic synapse pathway (Table 1 and Fig. 1), which had a 3.5fold enrichment of genes. GO analysis in Table 2 revealed that the top three enriched biological processes were transport, nervous system development, and positive regulation of synapse assembly, which were in line with the KEGG pathway findings.
Conclusions: In the current study, our results revealed remarkable exon usage change of the striatum of the MPTP mouse model of PD, suggesting that alternative splicing could play a substantial role in synapse assembly, especially for dopaminergic synapses. The current study hence provided new insight into the importance of alternative splicing in PD.  Background Gene expression profiles have been examined extensively in diseases including hematological malignancies. Although the diagnostic tests that sub classified leukemia have been improved, leukemia patients occasionally exhibit different responses to treatment. In order to find more precise molecular markers, we performed differential gene expression profiling in human acute myeloid leukemia (AML) and chronic myeloid leukemia (CML) cell lines (HL60 and K562, respectively) using Nanostring nCounter® MAX Analysis System (Nanostring Technologies, Seattle, WA).

Materials and methods
Total RNA was extracted from HL60 and K562 cell lines using innu-PREP RNA mini kit (AJ Innuscreen GmbH, Germany) according to the manufacturer's instructions. The RNA quality was assessed on the 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA) and the concentration was determined with a Nanodrop spectrophotometer (ND-1000, Thermo Scientific, Wilmington, MA, USA). We performed gene expression profiling of 230 human cancer-related genes with six internal reference genes using nCounter GX Human Cancer Reference Kit (NanoString Technologies). A total of 100 ng of RNA for each sample was prepared as per the manufacturer's instructions under the high sensitivity protocol. Normalization and subsequent data processing were performed by using the nSolver TM Analysis Software v2.6 (Nanostring Technologies). Differentially expressed mRNAs were identified through fold change (≥2.0) and p values < 0.05 filtering.

Results
We identified distinctive gene expression patterns in K562 and HL60 cell lines. The most significantly up regulated genes in K562 cells included FGFR3, WT1, CCNA2, FGF2 and HSP90AB1 while CSF3R, BCL2A1, TNFSF10, AKT1 and GNAS were most significantly down regulated. In HL 60 cells, WT1, CCNA2, PRKAR1A, MYB and CHEK1 were the most significantly up regulated while FOS, AKT1, GNAS, TP53 and IL1B were the most significantly down regulated genes. Several genes that were up regulated in K562 were found to be down regulated in HL60 such as FGF2, GATA1, IL6 and PIM1. FGFR1 and SPI1 were significantly up regulated in HL60 but were found to be significantly down regulated in K562 cell line.

Conclusions
In conclusion, our results suggest that gene expression profiling identified FGF2, GATA1, IL6, PIM1, FGFR1 and SPI1 to be differentially expressed between AML and CML cells. These findings may also help to assess future markers in developing therapies targeting mRNA. Objectives: Was to determine the effect of SFRP1 protein on progression of prostate cancer xenoinjert established with cells expressing TMPRSS2-ERG fusion Methods: Cell culture. LNCaP and VCaP cell lines from ATCC, were grown in RPMI 1640 and DMEM medium. Confluent cells were trypsinized and 3.5 x 105 cells were inoculated to balb/c nu/nu mice with 4-6 week of age. Growth of xenoinjerto was registered by measuring volume with next formule: V = π/6 × (large diameter × [short diameter]2). Blood of mice was taken from their tail for determine prostate specific antigen (PSA) levels during experiment. PSA was determine by ELISA assay with abnova kit designed for measuring human PSA. When volume of xenoinjert was around 300 mm3 a castration surgery was made. One week after surgery of castration mice were randomized into two groups: 1) control group (vehicle) and 2) problem group (SFRP1 administration). SFRP1 protein and vehicle was administrated subcutaneously around tumor tissue during experiment one time per week. At the end of experiment mice were sacrificed and tumor tissue was collected into two tubes for RNA and protein isolation for expression assays and one container with formalin for immunohistochemistry assays. Results: the SFRP1 protein administration in xenoinjerto with cells expressing TMPRSS2-ERG fusion promoted growth of tumor tissue compared to vehicle. Furthermore, levels of PSA in blood increased with SFRP1 administration. Interestingly we observe that SFRP1 administration promoted expression of TMPRSS2-ERG, KLK3 and LEF-1 genes. Genetic analysis of whole exome sequencing undertaken in familial IMD cases revealed a novel missense mutation in BPIFA1 gene. The encoding protein is specifically expressed in the human nasopharyngeal mucosa as part of the host innate immune defence against infectious agents in the airway. Previous studies have shown that BPIFA1 has a role in host defence against other Gram-negative bacteria and viruses, possibly through inhibition of biofilm formation or import of viral particles into epithelial cells. The objective of this study was to characterise the role of BPIFA1 in the context of meningococcal infection and investigate whether the novel missense mutation leads to increased susceptibility to IMD in the carriers.

Method
We tested the recombinant BPIFA1 activity on Nm biofilm formation as well as bacterial adhesion and invasion of airway epithelial cells. We used transfection in cultured cells, with recombinant expression plasmids encoding WT or mutant BPIFA1 to assess the functional effect at protein and mRNA levels.

Results
We showed that recombinant BPIFA1 significantly inhibits Nm biofilm formation. Using in-vitro human colonisation models, we demonstrated that purified BPIFA1 is protective against Nm infection by deterring bacterial adherence and invasion of the epithelial cells. In addition, the mutant protein had impaired ability in anti-biofilm and adhesion and invasion defences of the epithelial cells.

Conclusion
Our results demonstrated that BPIFA1 plays an important host immune defence role against Nm infection. The identified missense mutation may have the potential to predispose carriers to invasive pathogenic infections such as IMD. The cohesin complex plays a key role in modulating higher-order chromosome structure, thereby mediating sister chromatid cohesion and controlling developmental gene expression. Stable binding of cohesin to chromosomes requires acetylation on a cohesin subunit Smc3, and in vertebrates, two paralogues Esco1 and Esco2 are responsible for this reaction. It is widely believed that the cohesin complex engaged in sister chromatid cohesion is acetylated in a DNA replication -coupled manner. Previous studies suggest that Esco2 is expressed specifically during S phase and shows physical interaction with replication machinery, implying its primary role in formation of cohesion. Here we provide the first evidence that Esco2 acetylates Smc3 in heterochromatic regions in S phase cells, by quantitative ChIPseq analysis utilizing spike-in control. Conventional ChIP-seq analysis of acetylated Smc3 identified very few binding sites in HeLa cell genome, which marks a sharp contrast with a binding profile of Esco1. To understand the genome regions where Esco2 functions in the process of DNA replication progression, we compared quantitative ChIP-seq profiles of acetylated Smc3 between control and Esco2 knock-down cells.
Optimization of binning size enabled us to detect large-scale genome (a few Mb) regions where Esco2-dependent Smc3 acetylation occurred. These regions coincided with lately-replicated regions, or heterochromatic regions, and Esco2 was bound evenly to the same regions in early-to-mid, but not in late, S phase cells. Interestingly, MCM replicative helicase also showed very similar binding dynamics in S phase cells, implying that Esco2 binding to heterochromatin depends on prereplication complex. Taken together, our results suggest that Esco2 fulfills replication-coupled cohesin acetylation primarily in heterochromatic regions, whereas Esco1 is rather enriched in transcriptionally active euchromatic region (Minamino et al., 2015).

A105
Genetic analysis of the cause of semaphorin 5A downregulation in human glioblastomas and exploration of its tumor suppressor function for brain cancer treatment

RESULTS
While neither mutation nor significant change in mRNA levels of Sema5A was observed across different grades of astrocytoma specimens, a number of alternatively spliced variants of Sema5A transcript were found to be expressed at comparable level as the wild-type counterpart. Ectopic expression of these frameshift splice variants in HEK293 cells produced premature truncation isoforms of Sema5A protein, which are unstable and readily targeted for proteasomal degradation. In-silico analysis of Sema5A gene predicts it as target of several miRNAs that are strongly upregulated in glioblastomas. Forced expression of these candidate miRNAs in glioblastoma cells significantly reduces Sema5A protein level, which can be restored by treating the cells with corresponding anti-miR inhibitors. These findings suggest that both alternative splicing and translational repression by miRNA lead to a decline in Sema5A protein expression in glioblastomas. Importantly, both reconstitution of expression and exogenous supply of Sema5A is sufficient to suppress invasiveness and anchorageindependent growth of glioblastoma cells. To explore the therapeutic potential of Sema5A against glioblastomas, domains that exhibit high potency of anti-tumorigenic function have been identified. CONCLUSIONS Sema5A is downregulated in glioblastomas by aberrant splicing and miRNA-mediated gene silencing, which causes a loss of its tumor suppressor function. Identification of the anti-tumorigenic domains of Sema5A in this study allows further exploration of its therapeutic potential for brain cancer treatment. Single-cell RNA-seq is a method to quantify expression profiles in the single-cell resolution, and it is widely used in various biomedical research projects. Single-cell transcriptome profiles obtained in these projects have been published in public repositories by the original data producers. These profiles are also useful for investigators who want to solve their own questions not targeted in the original project. However, there are several hurdles to reuse the published single-cell resources. One issue is that these data tend to miss a part of essential metadata, for example, sample treatment methods and sequencing specifications. Another issue that these datasets are processed by different computational methods, and it makes hard to compare and integrate among datasets obtained by different groups.

KEYWORDS
To solve these issues, we reprocessed public single-cell RNA-seq data and constructed a new public database named "SCPortalen" (http://single-cell.clst.riken.jp/) to provide the reprocessed singlecell resources 1 . In our reprocessing, we manually curated the original metadata and fixed missing items by reading original papers and sometimes directly contacting to authors. In this curation process, we also annotated sample ontology terms (cell ontology, Uberon, and so on) to single-cell samples to improve searchability and comparability. We then performed a unique quality assessment procedure to all single-cell datasets in order to provide mutually comparable quality metrics. After that, we reprocessed every single-cell transcriptome data using a unique preprocessing methods including mapping to the latest reference genome, quantification of expression levels, and standard analyses like PCA and t-SNE clustering. The all curated metadata and reprocessing results are available in the SCPortalen database. Users can browse them in their web browser and can download these files.
In the big-data era, how the large-scale data including published ones by other research groups could become more important strategies in research projects. We hope that our approach can be a hint for reuse the published data for the new science and the database can contribute to many research projects. Bulk analyses of high-throughput genomic and proteomic technologies have produced invaluable publicly available data on ageing. However, these data are limited in terms of accurately defining cellular states, development and disease state. Single-cell omics on the other hand help overcome these obstacles and promises greater understanding of cellular processes. Consequently, we are currently generating a resource dataset for ageing studies with a focus on single cell RNA-seq. We present here our initial results produced from various tissues taken at different ages of the mouse. One aspect of our analysis will be to determine whether we observe dynamic evolution of cells in heterogeneity throughout ageing, which may reveal key regulatory pathways that are gradually deregulated during ageing. We will also present our plan for obtaining single cell ATAC-seq and multi-omic datasets (DNA methylation, CAGE, translatome (ribosome profiling), proteomics and metabolomics) for the selected cell types of interest and integration analysis. Additionally, a unique aspect of our dataset is the utilization of both germ-free and SPF mouse models, which will enable us to introduce and access the effects of various environmental factors in the ageing process. Taken together, these data will help us better understand the overall genomic and epigenomic landscape changes taking place in the cell during ageing and we hope that these will serve as invaluable resources for the scientific community.

A110
The Background ACC is an aggressive, rare malignancy with poor prognoses and limited treatment options. In 2015, a patient presented at our clinic with ACC and dual lung metastases. We performed whole genome sequencing (WGS) plus targeted-sequencing of both metastases. Furthermore, we developed a liquid biopsy assay via ctDNA, which we have applied quarterly and tracked with droplet digital PCR. We hypothesised that a genomics-based strategy may identify optimal treatment options for this patient, and that serial liquid biopsy may detect relapse earlier than imaging based approaches. Materials and Methods WGS was performed by HiSeq -X of germline (~30X) and 2 metastases (~90X), followed by a custom Roche/ Nimblegen panel designed to target~400 cancer-associated genes (~0.9Mb) and applied to germline (~900X), metastases (~570X) and ctDNA (~300X) which had been subjected to library preparation using KAPA or Rubicon protocols, prior to sequencing on a HiSeq-2500. Droplet digital PCR analysis against variants of interest was performed using Bio-Rad's QX200 system. Bioinformatics followed GATK best practices for the identification of germline variants, with Strelka, Manta, Sequenza, andMANTIS analysis in the tumour (WGS), or Connor, Strelka and CNVkit for targeted sequencing.
Results No germline cancer predisposition variants were detected. Targeted-capture revealed 1306 variants within the metastases, of which 362 were shared. WGS of both metastases revealed hyperdiploid cancers with extensive loss-of-heterozygosity but few structural rearrangements. Loss-of-function mutations were identified in cancer-associated genes including TP53 and PTEN, and a splice-site mutation observed in MSH2, resulting in a tumour mismatch repair and consequent microsatellite instability. The quarterly ctDNA mutation profiles are quietas confirmed by droplet digital PCR, which is consistent with the patient being in remission.
Discussion We present an extensive WGS/capture-sequencing analysis of an ACC patient with lung metastases. Encouragingly, a critical splice-site mutation detected in MSH2 appears to be a promising therapeutic target through immunotherapy, and we anticipate that ctDNA will sensitively reveal if the patient relapses. Drug response profiling is a powerful technique for drug mode of action analysis and repositioning, which allows for the characterization of cellular drug responses at a molecular level. However, the existing methods have been hampered by their low resolution, at both genomic and cellular scales, and their inability to account for the inherent cellular heterogeneity. We now have an opportunity to overcome these limitations with the recent advances in single cell high throughput transcriptomics approaches, as we can now resolve the minute differences in cellular make up and internal states by measuring the expression levels of both existing and novel promoters, enhancers, and lncRNA in an unbiased manner. Taken together, we can investigate the effects of population heterogeneity on their response to drugs and perform a more detailed mode of action analysis. In this study, we measure the drug response in 3 different cell types (fibroblasts, MCF7, and HepG2) to 2 types of histone deacetylase inhibitors (TSA and VPA) and examine how differential responses are produced. We employ the newly developed C1 CAGE method, which can robustly detect the expression of individual promoters and enhancers at a genome-wide scale, in a single experiment. Crohn's disease (CD) is a multi-factorial inflammatory bowel disease resulting from defects in mucosal immunity and intestinal epithelial barrier function. Genetic predisposition is important in susceptibility to CD disease. CARD15/NOD2 was the first CD susceptibly gene identified. It has a protective effect in healthy individual by recognizing lipopolysaccharide components of the cell wall of the pathogenic bacteria responsible for CD. In the present study we investigated 9 polymorphic SNPs in the CARD15/NOD2 gene in 64 CD Kuwaiti patients and 36 apparently normal controls of matched age and sex. The distribution of individual alleles between the tested groups were compared using the chi-square test. Only one SNP (IVS8 +158 ) showed a statistically significant lower (P value < 0.0009) representation in patient group assigning this SNP as a protective one in our population. To the best of our knowledge this is the first study investigated the frequency and nature of CARD15/NOD2 mutations in Kuwaiti patients with CD. Further studies adding more SNPs are recommended to clarify the genetic nature of this devastating disorder in Kuwait. Key Words: Crohn's disease, CARD15/NOD2 gene, single nucleotide polymorphism, Kuwait Background African-descendant breast cancer patients have exhibited a lower sensitivity to tamoxifen than do European-ancestry patients. Recently, we found the influence of interethnic DNA methylation difference on drug absorption, distribution, metabolism and excretion (ADME), suggesting that ethnicity should be carefully accounted for in pharmacoepigenomics [1]. This study aims to establish a mechanism for the differential sensitivity to tamoxifen in breast cancer patients from different ancestry populations through an interethnic difference of DNA methylation in ADME genes.

Materials and methods
We analyzed the whole DNA methylome and RNA transcriptome data of primary tumor tissues of 84 African-descendant and 508 European-descendant breast cancer patients from The Cancer Genome Atlas. A differential methylation analysis was performed to identify ethnicity-associated CpGs (E-CpGs) and a differential expression analysis was performed to identify differentially expressed genes. We constructed Bayesian networks by using PC algorithm and Tabusearch-based algorithm to examine a casual effect of the identified E-CpGs on differentially expressed genes related to drug ADME.

Results
We identified 35 E-CpGs located in or close to 24 ADME genes, where E-CpG cg05834639 on SLC7A5 exhibited an interesting pattern: cg05834639 had a significantly lower methylation level in the African-descendant patients than that in the European-descendant patients (p < 0.021). The methylation level of cg05834639 was negatively associated with the gene expression of SLC7A5 (correlation < -0.501); thus, the expression level of SLC7A5 was significantly higher in the African-descendant patients than that in the Europeandescendant patients (p < 5.25 × 10 -5 ). In the Bayesian network analysis, the results showed a stable directed link from E-CpG cg05834639 to the SLC7A5 expressiona reproducibility rate of 66.8% by using the PC algorithm and a reproducibility rate of 99.8% by using the Tabu-search-based algorithm.

Conclusions
Our results showed that cg05834639 on ADME gene SLC7A5 was an E-CpG and directly regulated the SLC7A5 expression. It was reported that higher SLC7A5 expression caused a poorer response to tamoxifen therapy in breast cancer patients [2]. Based on the evidences, this study has established a hypethetical mechanism about ethnic disparity of response to tamoxifen therapy in breast cancer patients through an interethnic difference of DNA methylation in ADME gene SLC7A5.
Childhood-Onset Psychosis (COP) is a very rare and debilitating disorder characterised by the onset of psychotic symptoms before age 14. Here we report early findings from a study exploring the potential of exome sequencing as a diagnostic tool in child psychiatry clinics. We hypothesise that some occurrences of COP are the result of known Mendelian disorders manifesting primarily with psychiatric symptoms.
In this study, we will extract saliva DNA from 100 trios, run pedigree analysis, examine rare schizophrenia-associated copy number variants (CNVs) and perform exome sequencing to identify rare (MAF<1%) inherited or de novo SNVs or INDELs. To date we have recruited 27 families (12 male and 15 female probands) and documented 3-generation pedigrees per case. All probands have a primary diagnosis of a psychotic illness with 12/27 having a comorbid neurodevelopmental disorder. According to family report 9/27 probands have a family history of a psychiatric diagnosis. Cognitive assessment (WASI-II) on the first 17 probands shows a variation in their full-scale intelligence quotient (FSIQ) whereby 6/17 scored extremely low (FSIQ<70), 7/17 scored borderline to low average (FSIQ 70-90) and 4/17 scored average (FSIQ 90-110). In addition, 8/27 probands are on Clozapine, highlighting their disease severity. Genetic analysis of the first 7 probands indicates that none have common schizophrenia-associated CNVs, however exome sequence analysis revealed one proband to be harbouring a very rare pathogenic variant in gene RET, giving rise to a Pheochromocytomaa phenotype associated with the Mendelian disease, multiple endocrine neoplasia type 2a that can manifest with psychosis. Determination of the pathogenicity of the RET variant in causing psychosis in the probands will be investigated with further clinical assessments. The identification of rare highly penetrant variants that predispose patients to psychosis suggests that exome sequencing may be useful in the child psychiatric clinics as a diagnostic tool, as well as providing a platform for understanding the neurobiology underlying COP.

A118
Microarray molecular profiling to define pathogenesis and biomarkers of atherosclerosis Maria S. Nazarenko 1,2,3 , Aleksei A. Sleptcov 1 , Anton V. We hypothesize that microarray molecular profiling of vascular and blood cells of patients with severe atherosclerosis will be important to clarify the disease mechanisms and biomarkers.

Materials and methods
The Agilent SurePrint G3 Human CGH+SNP 2×400K and Illumina HumanMethylation27 BeadChip microarrays were used for DNA testing from advanced atherosclerotic plaques of carotid and coronary arteries, and healthy internal thoracic arteries. The results for certain loci were verified by quantitative real time PCR and bisulfite pyrosequencing with blood and vascular tissues. The study was approved by Research Institute of Medical Genetics' Ethics Board, approval number 0145.

Results
In total, we identified 90 high confidence copy number variations (CNVs) in arteries. We found that 16% CNVs regions were enriched with genes previously associated with mainly CAD risk factors but not the disease as a whole. The identified CNV genes were associated with immune/inflammation responses, olfactory transduction and metabolic pathways. The gain in chromosomal region 10q24.31 (ERLIN1) was not listed in the Database of Genomic Variants. Furthermore, two patients contained the gain in 10q24.31 (ERLIN1) that affected only the blood DNA. Overall, the disease-related DNA methylation effect size was relatively modest. Only a minority of differentially methylated genes have previously been linked to atherosclerosis-related diseases in gene based association studies. The advanced atherosclerotic plaques in comparison with the healthy arteries were characterized by the predominant DNA hypermethylation changes. These genes were annotated with muscle system process and positive regulation of cytosolic calcium ion concentration in Gene Ontology terms. In contrast, hypomethylated genes encode molecules belonging to different biological processes such as development, immune/inflammation responses, lipid storage, and programmed cell death. In advanced atherosclerotic plaques the most pronounced hypomethylation was registered in 2q31.1 (HOXD4/HOXD3/MIR10B). Moreover, methylation changes at this locus in blood cells were consistently associated with smoke and ischemic stroke.

Conclusions
Analysis of DNA extracted from blood of patients with severe atherosclerosis indicated a possible somatic origin for CNV. The vast majority of CpG-sites were hypermethylated in the atherosclerotic plaques and connected with genes involved in smooth muscle cell function. Functional enrichment analysis of CNVs and differentially methylated genes demonstrated that common term was related to immune/inflammation responses. Our data support the hypothesis that CNVs and DNA methylation could be new biomarkers to uncover susceptibility loci for future atherosclerosis association studies with larger cohorts, paired tissues and more sensitive single cells methods.

Background
Macrophages and smooth muscle cells (SMCs) play a significant role in atherosclerosis. Genomic alteration in cells arising during an atherosclerotic process can be important for plaque instability. Our purpose was an assessment of genomic alterations in cells obtained from atherosclerotic plaque using single-cell microarray technique.

Materials and methods
Laser capture microdissection of eight fresh frozen autopsies of coronary arteries with atherosclerosis was performed to obtain macrophages (n = 30-40 cells per sample; Anti-Human CD68 Antibody) and smooth muscle cells (Alpha-SMA Antibody, 1A4). Whole genome amplification (GenomePlex® WGA Kit) of these cells along with white blood cells from the same persons was performed with subsequent array-CGH (SurePrint G3 Human CGH 8x60K Microarray). The study was approved by Research Institute of Medical Genetics' Ethics Board, approval number 0159.

Results
We found several aneuploidies in macrophages (9 trisomies and 3 monosomies) and SMCs (1 trisomy and 8 monosomies). In two patients, SMCs had a normal karyotype. The ratio of duplications and deletions in macrophages and SMCs was 1:0.8 and 1:7, respectively. Comparative analysis of genomic rearrangements in macrophages demonstrated that duplication in chromosome region 9q34.13-q34.2, about 2Mb in size, was detected in macrophages in 6 patients.

Conclusions
Our study indicates that genomic alterations are diverse in their structure and are widely represented in cells involved in the atherosclerotic process. In macrophages obtained from atherosclerotic plaques, aneuploidy with a predominance of trisomy was found, while monosomies prevailed in SMCs. A significant prevalence of deletions in SMCs was found. Duplication in macrophages in chromosomal region 9q34.13-q34.2 contains genes that can potentially be associated with atherosclerosis. Parkinson's disease (PD) is an age-related, chronic and progressive neurodegenerative disorder characterized by a loss of dopaminergic neurons, resulting in both non-motor and motor symptoms. While several genetic and environmental contributory risk factors have been identified, more exact methods for diagnosing and assessing prognosis of PD have yet to be discovered. Identifying altered expression of genes (both coding and non-coding) and pathways by analyzing transcriptomes of earlystage PD patients reveals potential novel biomarkers for PD diagnosis.

Methods
Changes in expression at the transcriptome level were assessed using no-amplification non-tagging cap analysis of gene expression (nAnT-iCAGE) over two years, starting with 22 PD patients and 10 healthy controls in the first year and with 17 PD patients and 10 healthy controls in the second year. Sequencing data was mapped to the hg38 human annotation, CAGE tags were clustered and differential expression analyses were performed with edgeR and DESeq2. Neural stem cell (NSC) and dopaminergic (DA) neuron pairs derived from iPS were studied in a similar manner. Gene ontology, gene set enrichment and network analyses were also performed.

Results
Blood transcriptome changes between PD patients and healthy controls are minimal, though a number of differentially expressed transcripts can be detected in early stage PD, including a number of non-coding RNAs. Differential expression was more readily apparent in iPS-derived NSC and DA neurons, though key differences appear between the different cell types. Some differentially regulated transcripts have known links to PD or other neurodegenerative disorders, though many are novel in nature, and may constitute putative biomarkers.

Conclusions
Our data suggests that though the blood transcriptome is largely similar between disease and healthy states, there is still benefit to thorough analyses for the study of neurodegenerative disease. We identified a number of potential candidate biomarkers that are now being validated.

Ethics Approval
The study was approved by the ethics evaluation committee of Juntendo University (Approval Number: 15-104) and the ethics review committee of RIKEN (H26-27). #contributed equally The recent insights in the higher order chromatin organisation in human nucleus [1] allowed us to propose the 3D-GNOME (3-Dimensional GeNOme Modeling Engine), a complete computational pipeline for 3D simulation using Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) data [2]. 3D-GNOME consists of three integrated components: a graph-distance-based heatmap normalization tool, a 3D modeling platform, and an interactive 3D visualization tool. Using ChIA-PET and Hi-C data derived from human B-lymphocytes, we demonstrate the effectiveness of 3D-GNOME in building 3D genome models at multiple levels, including the entire genome, individual chromosomes, and specific segments at megabase (Mb) and kilobase (kb) resolutions of single average and ensemble structures. Further incorporation of CTCF-motif orientation and high-resolution looping patterns in 3D simulation provided additional reliability of potential biologically plausible topological structures. Additionally, we developed 3D-GNOME web service [3], which generates 3D structures from 3C data and provides tools to visually inspect and annotate the resulting structures, in addition to a variety of statistical plots and heatmaps which characterize the selected genomic region. 3D-GNOME simulates the structure and provides a convenient user interface for further analysis. Alternatively, a user may generate structures using published ChIA-PET data for the GM12878 cell line by simply specifying a genomic region of interest. 3D-GNOME is freely available at http://3dgno me.cent.uw.edu.pl/ providing unique insights in the topological mechanism of human variations and diseases. Finally, we present yet not published the catalogue of topological variability for human domains as observed in 2,504 genomes sequenced in the 1000 Genome Project [4]. We use a computational method for modelling three-dimensional structure of chromatin on multiple scales reaching the high-resolution scale of individual chromatin loops to model the high-resolution chromatin structure. We propose the first approach to modelling conformational changes induced by structural variants at the level of chromatin loops. We show that structural variants, which influence strongly gene transcription levels, localize preferably in genomic sites brought together by RNA polymerase II to create loops and show evidence that they have a significant influence on CTCFmediated chromatin loops. The results are validated using Hi-C datasets for trios of 1000 Genomes Project from Bing Ren laboratory. Our unpublished results assess the impact of structural variants on the organization of chromatin topological domains and sheds light on the mechanisms regulating gene transcription, among which the local arrangement of chromatin loops seems to be the leading one.
Specifically, we analyzed a 6.5-week-old human fetal heart studying 4,000 dissociated single cells by using the 10x Chromium technology. Principal component analysis of the single cell gene expression clearly identified three clusters, which known marker genes assigned to outflow tract, atria and ventricles. Dimensionality reduction of the same information identified instead 17 clusters revealing a high heterogeneity of the outflow tract. Cell fate trajectory inference identified three clear branches of cell differentiation. Progenitor cells and cardiomyocytes were ordered early and late in pseudotime, respectively, as expected, while the third branch was localized early in pseudotime, and contained cells characterized by stemness. We confirmed and extended these results by using Spatial Transcriptomics, a technology to visualize the gene expression information in tissue sections at 100 μm resolution. We studied the spatial gene expression of a second 6.5-week-old human heart, and of two additional human fetal hearts at 5 and 9 weeks of development. We identified a clear gene expression pattern determined by the heart structure, where the outflow tract had a distinctive gene expression signature compared to atria and ventricles in all three time points. To conclude, the combination of Spatial Transcriptomics and singlecell sequencing demonstrates functional clusters related to the anatomical architecture of the heart. Moreover, our combined approach provides the spatial location of cells with stem cell signatures and complete gene profiles of progenitors committed to the cardiomyocyte linage. Altogether, these finding offer new insights in the field of human embryology and cardiovascular medicine opening up questions for future studies in the field of myocardial regeneration. It has been strongly suggested that chromatin conformations in nuclei play an important role in regulating gene expression, and many reports have demonstrated that destructions of these threedimensional (3D) structures lead to disease status. Recent advances of high-throughput sequencing have enabled it to display 3D proximities among each genomic locus on the chromatin contact matrix using Hi-C analysis method. However, large-scale computational resources are indispensable in current protocols of Hi-C analysis for high-resolution contact map, and the repeatability of those contact matrix are still uncertain in the current practical procedures. These defects prevent frequent use of Hi-C analysis. To obtain repeatable matrix of chromatin conformations without heavy computational task, we developed a new algorithm to predict chromatin 3D structures using deep neural network model with the feature vectors based on genomic sequences, map positions on reference genome sequence, and epigenetic data from chromatin immunoprecipitation-sequencing (ChIP-Seq). To evaluate our algorithm, we establish a model to predict the chromatin 3D structure of nucleus in a human lymphoblastoid cell line GM12878, and assessed its ability to predict the existence of chromatin looping conformations. The information of map position on human reference sequence (UCSC hg19), and DNA sequences were obtained from prior reports (Rozowsky et al., 2012), and ChIP-Seq data associated with epigenetic status were obtained from the ENCODE datasets. These alignment data were vectorized and normalized before inputting into our models. For supervised learning of neural network model, we prepared the combination of genomic loci tagged as chromatin-loop-positive (N = 8526) using HICCUPs protocol as previously reported (Rao et al 2014), and also prepared randomly selected genomic loci combinations other than positively-tagged loci as chromatin-loopnegative (N = 131365). These data were randomly split into training, validation, and test data set. We configured the binary classification model with the RMSprop optimizer, and the binary cross entropy loss function. Other hyper-parameters of the model were optimized using Bayes optimization method. Model trainings were continued until the validation loss no longer improves. We showed ROC curves and measured the areas under the ROC curve (AUC) to quantify the prediction performance. The model we selected as the best-performed one displayed its high accuracy (96.25%), sensitivity (99.35%), and specificity (95.22%). The AUC of the model was 0.997 against a test data set. These results suggest that our algorithm to predict chromatin conformation using neural network is an effective method for analyzing chromatin 3D structures substituted for Hi-C analysis. Performance of models are measured by the accuracy, sensitivity, specificity, and areas under the receiver-operating characteristic curve (AUC) on whole test data set. The training and prediction of each model was performed as follows: model A with only mapping positions on reference genome as input data, model B with only genomic sequences, model C with only epigenomic data, and model D with mapping positions, genomic sequences, and epigenomic data. A cut-off value of 0.5 was used to determine their classification results. We displayed the representative results of each model. The ROC curve of the best-performed model against whole test data set is displayed with a red-colored solid line. A blue-colored dotted line means a random guessing line.