Identification of compound heterozygous variants in the noncoding RNU4ATAC gene in a Chinese family with two successive foetuses with severe microcephaly

Background Whole-exome sequencing (WES) over the last few years has been increasingly employed for clinical diagnosis. However, one caveat with its use is that it inevitably fails to detect disease-causative variants that occur within noncoding RNA genes. Our experience in identifying pathogenic variants in the noncoding RNU4ATAC gene, in a Chinese family where two successive foetuses had been affected by severe microcephaly, is a case in point. These foetuses exhibited remarkably similar phenotypes in terms of their microcephaly and brain abnormalities; however, the paucity of other characteristic phenotypic features had made a precise diagnosis impossible. Given that no external causative factors had been reported/identified during the pregnancies, we sought a genetic cause for the phenotype in the proband, the second affected foetus. Results A search for chromosomal abnormalities and pathogenic copy number variants proved negative. WES was also negative. These initial failures prompted us to consider the potential role of RNU4ATAC, a noncoding gene implicated in microcephalic osteodysplastic primordial dwarfism type-1 (MOPD1), a severe autosomal recessive disease characterised by dwarfism, severe microcephaly and neurological abnormalities. Subsequent targeted sequencing of RNU4ATAC resulted in the identification of compound heterozygous variants, one being the most frequently reported MOPD1-causative mutation (51G>A), whereas the other was a novel 29T>A variant. Four distinct lines of evidence (allele frequency in normal populations, evolutionary conservation of the affected nucleotide, occurrence within a known mutational hotspot for MOPD1-causative variants and predicted effect on RNA secondary structure) allowed us to conclude that 29T>A is a new causative variant for MOPD1. Conclusions Our findings highlight the limitations of WES in failing to detect variants within noncoding RNA genes and provide support for a role for whole-genome sequencing as a first-tier genetic test in paediatric medicine. Additionally, the identification of a novel RNU4ATAC variant within the mutational hotspot for MOPD1-causative variants further strengthens the critical role of the 5′ stem-loop structure of U4atac in health and disease. Finally, this analysis enabled us to provide prenatal diagnosis and genetic counselling for the mother’s third pregnancy, the first report of its kind in the context of inherited RNU4ATAC variants.


Background
Microcephaly is usually defined in terms of a head circumference more than two standard deviations below the mean for age and sex; it can occur in the womb or may develop during the first few years of life [1,2]. Abnormal growth of the head may occur as a consequence of a number of factors, both genetic and environmental (e.g. exposure to certain viruses such as rubella, drugs and alcohol during pregnancy) [3]. The genetic causes are highly heterogeneous; thus, a search for microcephaly in the Human Phenotype Ontology database [4] yielded 652 genes. Depending on the precise nature of the condition involved, microcephaly may be associated with seizures, developmental delay, intellectual disability or other problems. It may even be associated with substantial physical disability and premature death; there is no treatment for microcephaly. Therefore, it is extremely important to identify the genetic causes of severe microcephaly in affected families with a view to providing prenatal diagnosis and genetic counselling in subsequent pregnancies.
With the decreased cost of next-generation sequencing, whole-exome sequencing (WES) has rapidly evolved from its original application as a tool for gene discovery in research settings to an important diagnostic tool in a clinical context [5][6][7], especially for diseases that are characterised by a significant level of genetic heterogeneity [8]. However, one caveat with WES is that disease-causative variants which occur within noncoding RNA genes will invariably be missed [9,10]. Here, we highlight this issue by describing our experience of identifying novel compound heterozygous variants in the noncoding RNU4ATAC gene (OMIM #601428), in a Chinese family with two successive foetuses affected by severe microcephaly.

Family description
A 30-year-old woman was referred to our centre at the First Affiliated Hospital of Sun Yat-Sen University after her second foetus (II:2) had been found to have severe microcephaly at 24 gestational weeks (GW), just as her first one (II:1; Fig. 1a) had previously. Clinical findings in the two affected foetuses, who were terminated at 36 GW (II:1) and 30 GW (II:2) respectively, are illustrated in Fig. 2 and summarised in Table 1. However, no precise diagnosis of the underlying abnormality could be made based upon these clinical findings owing to the paucity of characteristic features beyond severe microcephaly. In the case of II:1, standard G-banding karyotyping using cord blood cells taken at 35 GW revealed a normal karyotype, whereas chromosomal microarray analysis failed to detect any pathogenic copy number variations; no further analyses were performed at the time.
The parents were of North Chinese origin, healthy and nonconsanguineous. Exposure to known causative environmental factors during pregnancy was neither reported nor identified. Taken together with the remarkably similar clinical phenotypes in the two affected foetuses ( Fig. 2; Table 1), a genetic cause was considered to be likely. An extensive molecular genetic analysis was therefore performed on foetus II:2.
Extensive karyotyping and chromosomal microarray analysis failed to identify any chromosomal abnormality or pathogenic copy number variations in II:2 We first performed standard G-banding karyotyping using cord blood cells from II:2 (taken at 29 GW), but no chromosomal abnormalities were found. In the meantime, we also performed chromosomal microarray analysis using genomic DNA prepared from the cord blood cells taken from II:2. No pathogenic copy number variants were identified by reference to data available in OMIM [11], DGV [12] and DECIPHER [13].
WES also failed to reveal a genetic cause of the microcephaly in II:2 We further employed WES to search for putative causal variants in an unbiased and hypothesis-free manner. The resulting single-nucleotide variants (SNVs) and small insertions or deletions (indels) were subjected to the following prioritizations: (i) variants that cause nonsynonymous, frameshift and in-frame changes and variants that occurred at splice sites; (ii) variants with a minor allele frequency of less than 5% according to either the 1000 Genomes Project [14] or the ESP5400 data of the National Heart, Lung, and Blood Institute GO Exome Sequencing Project [15]; (iii) in case of missense variants, those predicted to be deleterious using the programs of PolyPhen-2 [16], SIFT [17] and Mutation Taster [18] and (iv) variants occurring in known microcephaly-causing or microcephaly-associated genes as well as in candidate genes selected on the basis of known biological, physiological or functional relevance to microcephaly. However, no variants survived this process of prioritisation.
Targeted sequencing of the noncoding RNU4ATAC gene identified causal variants in II:2 After failing to detect any pathogenic lesion by karyotyping, chromosomal microarray analysis and WES, we began to consider the potential involvement of noncoding RNA genes in the aetiology of microcephaly. An extensive literature research resulted in the recognition of two such genes. The first was the miR-17-92a-1 cluster host gene (MIR17HG; OMIM #609415). Large-scale copy number variants that serve either to delete or duplicate the entire MIR17HG locus cause Feingold syndrome 2 (OMIM #614326), a rare autosomal dominant disorder characterised by variable combinations of microcephaly, limb malformations, oesophageal and duodenal atresias and learning disability [19][20][21][22][23][24][25][26]. Although the disease entity under study here is most consistent with a model of autosomal recessive inheritance, we nevertheless revisited our chromosomal microarray analysis data and confirmed the absence of large deletions or duplications involving the MIR17HG locus.
In what follows, we shall term this critical canonical stem region the mutational hotspot region for MOPD1causative variants. We, therefore, speculated that variants in the RNU4A-TAC gene, which would not have been detected by WES, might underlie the severe microcephaly in this family. Subsequent targeted testing of the RNU4ATAC gene by Sanger sequencing identified compound heterozygous variants, 29T>C (rs779143800) and 51G>A (rs188343279), in II:2. Carrier analysis confirmed that the two variants had been inherited from the mother and father, respectively (Fig. 1a). Although it was suspected that the affected foetus II:1 had also inherited these two variants, this could not be confirmed due to the non-availability of genetic material.
51G>A was among the first described MOPD1causative variants [27,28] and represents the most common MOPD1-causative variant so far reported. By contrast, 29T>C has not been previously reported in MOPD1 patients. It has however been reported at a very low frequency in normal populations; thus, it is present in heterozygous form in two individuals in the Genome Aggregation Database [42], corresponding to an allele frequency of 1.6 × 10 − 5 . Further, in common with all 11 previously reported MOPD1causative RNU4ATAC SNVs, 29T>C affected one of the evolutionarily highly conserved positions of U4atac (Fig. 1b). Furthermore, and most importantly, 29T is located within the mutational hotspot region for MOPD1-causative variants (Fig. 1b). In this latter regard, our current understanding of the pathogenetic mechanism underlying the six known MOPD1causing RNU4ATAC SNVs occurring within the mutational hotspot is that they abrogate U4atac snRNA function by disrupting the 5′ stem-loop structure [27,28,41]. Accordingly, we compared the potential effect of 29T>C on the 5′ stem-loop structure of U4atac with those of the aforementioned six known MOPD1causative SNVs. To this end, wild-type and the seven mutant sequences spanning positions 20 to 58 (i.e. the sequence forming the 5′ stem-loop structure; Fig. 1b) of U4atac were separately subjected to Mfold analysis [43] under default conditions. All seven SNVs were predicted to significantly affect the 5′ stem-loop structure of U4atac as compared with the wild-type.
In particular, 29T>C was predicted to alter the secondary structure in the same way as the pathogenic nucleotide substitutions 30G>A, 50G>A and 53C>G; it was also predicted to alter the secondary structure in a similar way to the most common pathogenic 51G>A variant (Fig. 3). These observations, taken together, strongly suggest that 29T>C constitutes a novel causative variant for MOPD1.

Prenatal diagnosis of the third pregnancy
Prenatal diagnosis was performed on the third foetus (II:3) (Fig. 1a). Genomic DNA was prepared from amniotic fluid cells taken by ultrasound-mediated amniocentesis at 16 GW. However, neither the RNU4ATAC 51G>A variant nor the 29T>C variant was detected. Normal foetal growth was confirmed by continual ultrasound monitoring during the whole period of pregnancy (Fig. 2g, h). The third foetus was born healthy after 40 GW.

Discussion
In this study, we relate our experience of how the genetic cause was finally identified in a Chinese family presenting with two successive foetuses with severe microcephaly. In brief, negative findings from karyotyping, chromosomal microarray analysis and WES in foetus II:2 prompted us to consider the potential role of noncoding genes in causing microcephaly in the family. Consequently, targeted sequencing of the noncoding RNU4ATAC gene resulted in the identification of compound heterozygous variants, one being the most frequently reported MOPD1-causative 51G>A, the other being a novel 29T>A variant. Based upon the four lines of evidence, namely allele frequency in normal populations, evolutionary conservation, occurrence within a known mutational hotspot for MOPD1-causative variants and predicted effect on the 5′ stem-loop structure, we were able to conclude with confidence that the newly found 29T>A variant represents a new causative variant for MOPD1. Here, we should like to make two additional points. First, in the context of in silico analysis, many algorithms have been designed to predict the functional consequences of intronic or missense variants found in protein-coding genes. However, these tools are inappropriate for use with the RNU4ATAC variants discussed here, whose functional consequences depend upon their potential effect on RNA secondary structure. Currently, Mfold analysis is the gold standard for performing RNA secondary structure predictions. Second, stringent standards and guidelines have been proposed for investigating the causality of sequence variants in human genetic disease [44,45]. Apart from the aforementioned four lines of evidence supporting causality of the detected compound heterozygous RNU4ATAC variants, we would like to add a new consideration. The genomic structure of RNU4ATAC is very simple, comprising only 130 nucleotides. All the so far reported MOPD1causative RNU4ATAC variants were invariably located within the 130 nucleotides.
Most previous studies have reported homozygous or compound heterozygous RNU4ATAC variants in patients with a diagnosis, or suggestive diagnosis, of MOPD1 [27,28,[36][37][38][39][40]. Only very recently have RNU4ATAC variants been described in foetuses [41]; all four foetuses (two of whom were twins) had severe microcephaly together with some other brain and skeletal abnormalities including corpus callosum agenesis, short limb, brachydactyly and ossification delay, suggestive of a diagnosis of MOPD1. By contrast, the two foetuses in the family under study here showed only severe microcephaly and corpus callosum agenesis. The identification of compound heterozygous RNU4ATAC variants in II:2, therefore, provided a definite diagnosis of the disease that could not otherwise have been made merely on the basis of clinical findings.
In a more general context, our study adds to the increasing appreciation that variants in noncoding RNA genes are an underestimated cause of human inherited disease. Here, we further emphasise this point by citing a recent finding concerning the RNU4ATAC gene. Compound heterozygous RNU4ATAC variants have also been reported to cause Roifman syndrome (OMIM #300258) [10,46], a rare congenital association of antibody deficiency, spondyloepiphyseal chondro-osseous dysplasia, retinal dystrophy, poor pre-and postnatal growth and cognitive delay, which is phenotypically quite different from MOPD1. It should be noted that Roifman syndrome-causative compound heterozygous RNU4A-TAC variants comprise one variant that is located within MOPD1-implicated structural elements and one variant that is located outside of MOPD1-implicated structural elements [10,46].

Conclusions
In a general context, our findings highlight one key limitation of WES, namely that it fails to detect disease causative variants within noncoding RNA genes. This provides support for a role for wholegenome sequencing as a first-tier genetic test in paediatric medicine [9]. This is also the first report of MOPD1-causative RNU4ATAC variants in the Chinese population and the first report of prenatal diagnosis and genetic counselling provided for a subsequent

Karyotyping and chromosomal microarray analysis
Standard G-banding karyotyping was performed. The array experiments were performed using the highresolution Affymetrix CytoScan HD microarray (Affymetrix Inc., Santa Clara, CA) in accordance with the manufacturer's protocols. The results were analysed using the Chromosome Analysis Suite software version 1.2.2; the reporting threshold of the copy number was set at 10 kb, with marker count at ≥ 50, as previously reported [47].

WES
Genomic DNA was fragmented randomly and then purified by means of the magnetic particle method. Sequences were captured by Agilent SureSelect version 4 (Agilent Technologies, Santa Clara, CA) according to the manufacturer's protocols. The DNA libraries, after enrichment and purification, were sequenced on the NextSeq500 sequencer according to the manufacturer's instructions (Illumina, San Diego). The sequencing reads were aligned to GRCh37.p10 using Burrows-Wheeler Aligner software (version 0.59) [48]. Local realignment and base quality recalibration of the Burrows-Wheeler aligned reads were then performed using the GATK IndelRealigner [49] and GATK BaseRecalibrator [50], respectively. SNVs and small indels were identified by the GATK UnifiedGenotyper [51]. Variants were annotated using the Consensus Coding Sequences Database at the National Centre for Biotechnology Information [52].
Targeted sequencing of the RNU4ATAC gene Primer sequences and PCR conditions are available upon request.

RNA secondary structure prediction
This was performed by means of Mfold analysis under default conditions [43].