Evidence that DNA repair genes, a family of tumor suppressor genes, are associated with evolution rate and size of genomes

Adaptive radiation and evolutionary stasis are characterized by very different evolution rates. The main aim of this study was to investigate if any genes have a special role to a high or low evolution rate. The availability of animal genomes permitted comparison of gene content of genomes of 24 vertebrate species that evolved through adaptive radiation (representing high evolutionary rate) and of 20 vertebrate species that are considered as living fossils (representing a slow evolutionary rate or evolutionary stasis). Mammals, birds, reptiles, and bony fishes were included in the analysis. Pathway analysis was performed for genes found to be specific in adaptive radiation or evolutionary stasis respectively. Pathway analysis revealed that DNA repair and cellular response to DNA damage are important (false discovery rate = 8.35 × 10−5; 7.15 × 10−6, respectively) for species evolved through adaptive radiation. This was confirmed by further genetic in silico analysis (p = 5.30 × 10−3). Nucleotide excision repair and base excision repair were the most significant pathways. Additionally, the number of DNA repair genes was found to be linearly related to the genome size and the protein number (proteome) of the 44 animals analyzed (p < 1.00 × 10−4), this being compatible with Drake’s rule. This is the first study where radiated and living fossil species have been genetically compared. Evidence has been found that cancer-related genes have a special role in radiated species. Linear association of the number of DNA repair genes with the species genome size has also been revealed. These comparative genetics results can support the idea of punctuated equilibrium evolution. Electronic supplementary material The online version of this article (10.1186/s40246-019-0210-x) contains supplementary material, which is available to authorized users.


Background
Adaptive radiation is a well-known phenomenon in evolutionary biology, where a taxon is split in multiple species which become adapted in a variety of environments in short evolutionary time. Although this phenomenon is mostly known in islands like the great examples of Darwin finches [1] and the Hawaiian drosophilas, other major adaptive radiations have occurred in other animals like cichlids, bats, and cetaceans [2][3][4][5]. It is very likely that common evolutionary and molecular processes have been followed in all taxa that have experienced adaptive radiation [6,7]. No such common molecular pathways have been identified so far.
We could consider living fossil species and adaptive radiation as two very different evolutionary strategies: slow evolutionary rate versus rapid evolutionary rate respectively. Living fossils are characterized by morphological stasis, low taxonomic diversity, and certain rareness. Quantitative criteria have been published recently [8,9]. The apparent absence of diversification and their morphological stability suggest highly effective adaptations that reduce the need for phenotypic change, regardless of environmental or genetic changes [8,10]. Living fossils are frequently referred to as an example of evolutionary success and evolutionary stasis [11,12]. Evolutionary stasis is a common finding in the fossil record [13]. The punctuated equilibrium theory of evolution is based on these fossil observations [14,15]. Characteristic examples of taxa that are considered by most biologists as living fossils are the crocodilians, coelacanths, and ornithorhynchus. Like in the case of adaptive radiation, our knowledge is insufficient for any special genes that are under selection in living fossil species.
This study was mainly aiming at the identification of any common molecular pathways that contributed to a special evolutionary process in animals. We are mostly interested on genes that are related with disease, since evolutionary studies may contribute to a better understanding of the function of those genes. We supposed that living fossil species (LF) and radiated species (R, those that have been evolved through adaptive radiation) represent two animal categories with a very different rate and form of evolution. We took advantage of the plentiful animal genomes that have been sequenced since presently, and we performed an analytical comparative genetics study. Strict inclusion and statistical criteria were applied (see the "Methods" section). In total, 20 LF and 24 R vertebrate genomes (bony fishes, reptiles, birds, mammals) have been analyzed. Interestingly, only one major genetic difference was revealed related to DNA repair genes, one of the most important categories of tumor suppressor genes.

Species included in this study-genome data
The literature was carefully searched for all animal species that can be characterized as living fossils (LF) (slow evolutionary rate) or radiated (R) (they have experienced adaptive radiation). Additional inclusion criteria are as follows: species with a completed genome project, species with available annotation and gene symbol data (for reliable interspecies comparison). Annotation of genomes has been performed by the submitters under the same NCBI standards. We included animal classes with representative species in both living fossil species and radiated species for a reliable comparison. Genome and gene data used for this work are updated since April of 2019, according to Genome and Gene databases of NCBI (https://www.ncbi.nlm.nih.gov/). In total, 44 species were included in this analysis.

Gene analysis
Official gene symbols were used for comparison among species. A custom algorithm was developed for finding all common genes in the LF species group and in the R species group. Next, the two lists of common genes were compared. This was performed through the "unique values" function of Excel 2016. After comparison, two gene lists were created: genes that are common in LF but not found in R and genes that are common in R and not found in LF. We considered that these genes are probably associated with a special type of evolutionary process. Genes were analyzed under the concept of presence/absence. Copy numbers were not considered. All gene lists can be found in Additional file 1: Table S1.

Pathway analysis and DNA repair gene analysis
Panther 14.1 online software [16,17] was used for pathway analysis of the two LF and R unique gene lists. The software analyzes the submitted gene lists with reference to the human genome. Two algorithms of the software were used: pathway and reactome profile analysis. Results were compared between LF and R to find any pathways that are unique in any of the two evolutionary processes. False discovery rate (FDR) is the statistical outcome that is a special type of adjusted p value. Significant level alpha was set to 0.0001 for highly reliable results.
To confirm if DNA repair genes represent a major genetic difference between the two vertebrate categories, all 44 species' genomes were analyzed for their content in DNA repair genes. An updated list of all 151 known DNA repair genes was used [18]. Content analysis (presence/absence) was performed using the official gene symbols. An extra search was performed using the gene aliases for any missed misnamed genes. Content analysis was performed through the "duplicate values" function of Excel 2016. Results in detail can be found in Additional file 2: Table S2.

Statistical analysis
All statistical analysis needed for this work was performed through the statistical package STATAv.13 (StataCorp LLC, Texas, USA). The basic statistical analysis included univariate linear regression and independent t test (two-tailed). The heat map was performed through the "color gradient" function of Excel 2016. Significant level alpha was set to 0.01 for identifying the most significant categories of DNA repair genes.

Species analyzed
Strict inclusion criteria were applied for the 44 species analyzed in this study. Several fossil and molecular studies that are cited below justify the classification "living fossil" or "radiated." A more detailed description of "living fossil" species can be found in the book Living Fossils of [19]. Additionally, the 20 LF species satisfy the very accurate living fossil quantification system of [9]. Genome projects information can be found in Table 1.

Gene and pathway analysis
Evolutionary stasis and rapid evolutionary speciation can be characterized as opposite evolutionary procedures or at least very different evolutionary phenomena. This is the first study that compares genetically those two very different categories of vertebrate species. Gene or annotation information was inadequate for most invertebrate LF or R species, so they were not included in this study.
The procedure we followed is very simple. We downloaded the annotated genome information for all 44 species. Then, we found the common genes in LF species and the common genes in R species, creating two separate gene lists (Additional file 1: Table S1). The next step was to compare the two lists to find any genes that are common in LF but not found in R species and genes that are common in R but not found in LF species. We consider that these genes may be under selection since they are found only in species with a special evolutionary profile. In total, 1534 genes were found to be specific for LF species and 2263 genes to be specific for R species.
Analysis of the two final gene lists was performed by Panther 14.1 software, under two algorithms: pathways (biological processes) and reactome. We looked for unique biological processes and reactomes in LF-and Rspecific genes respectively. Using the strict criterion of FDR ≤ 0.0001, only one process/pathway was found to be significant in R-specific genes by both algorithms, this being DNA repair (DNA repair and cellular response to DNA damage; FDR = 8.35 × 10 −5 and 7.15 × 10 −6 , respectively). Not any common significant pathways came out in the biological processes and reactome analyses for LF-specific genes.
Step by step analysis and all analytical output can be found in Additional file 1: Table S1. The flowchart of analysis can be found in Table 2.

DNA repair gene analysis
In order to confirm the pathway analysis results, we analyzed the 44 genomes for their content in DNA repair genes, using a list of all known DNA repair genes since presently (updated list of Wood et al. [18]). Subcategories of DNA repair genes were also considered in the analysis. Results in detail can be found in Additional file 2: Table S2. The results highly confirmed the previously performed pathway analysis (Table 3). R species' genomes are significantly enriched in DNA repair genes (p = 5.3 × 10 −3 ). The most significant subcategories are the nucleotide excision repair (p = 5.00 × 10 −4 ) and base excision repair (p = 9.80 × 10 −3 ). Many other subcategories seem to be significantly enriched in R species under the criterion of p < 0.05. Conserved DNA damage response and non-homologousend-joining are not significant at all (Table 3). A heat map diagram shows that indeed the R species' genomes are enriched in DNA repair genes in comparison with the LF species, especially for mammals, reptiles, and birds (Fig. 1).
The top 20 genes with the highest existence rate in R species in relation to LF species can be found in Additional file 2: Table S2. Eleven out of the top 20 (55%) are genes related with nucleotide excision repair and base excision repair. All gene rates are available in Additional file 2: Table S2.

Genome and proteome size analysis
Interestingly, the number of DNA repair genes is linearly related with the genome size and the number of proteins (p < 1.00 × 10 −4 ). We used genome and proteome data    1 Heat map showing the quantity of DNA repair genes, from red to blue in ascending order, per species' genome (numbers at the top of the figure represent the species code that is found in Table 1). Each DNA repair gene pathway was analyzed separately in rows. Radiated species' genomes are richer in DNA repair genes. Analytical data can be found in Additional file 2: Table S2. M mammals, B&R birds and reptiles, BF bony fishes Fig. 2 Linear regression analysis. The number of DNA repair genes is linearly related to genome size and protein number. As a negative control, we show that genome size is not linearly related with protein number  (Fig. 2). The two linear associations are independently significant since genome size is not linearly related with the number of proteins (Fig. 2). It is well known that genome size is not related with organism complexity [63]; thus, we consider that this association is not due to increased complexity of large genomes. Not any association was found when genome size means of LF and R species were compared (results not shown). This result may also explain Drake's rule. This is about the density of accumulated mutations per generation (mutagenesis rate) that is roughly inversely proportional to genome size [64][65][66]. Here, we found that larger genomes have more DNA repair genes (and possibly lower mutagenesis rate, if DNA errors are corrected at a higher rate) that may explain Drake's rule, being unexplained for years.

Why DNA repair genes
There is evidence that LF species are evolving slower than R species. Additionally, some data show that mutagenesis and nucleotide diversity [59,67] may be higher in R species than in LF species and that some R species with huge bodies (whales) have duplicated DNA repair genes to be protected by cancer [68,69]. According to these data, we could hypothesize that R species may be at risk due to high mutation load. This could be balanced with more DNA repair genes, repairing as much DNA damages as possible. It seems that DNA repair at the nucleotide level (nucleotide excision repair and base excision repair) is more important than other DNA repair pathways (Table 3, Additional file 2: Table  S2). Another explanation is that LF species are probably more protected from spontaneous DNA changes since due to the vast evolutionary time that they exist, stabilizing selection has formed their genome in a way that they are protected from random DNA changes that could change their general morphological features. Certain genes in LF genomes may act in a canalizing way that keeps these species in a narrow state of development and evolution since they are evolutionary successful. R species are not characterized by those features, and probably they need more or certain DNA repair genes to continue to diversify under a non-deleterious mutagenesis rate. We could consider that this is the first evidence for genes related with punctuated equilibrium evolution (long evolutionary stasis followed by short speciation explosions) [14,15].
The fact that the number of DNA repair genes is related with the genome and proteome size is quite logical since larger genomes need more protection from spontaneous mutagenesis. This is the first time that a class of genes has been associated with genome size and number of proteins in animals.

Conclusions
A big number of genomes have been compared under the prism of evolutionary stasis and adaptive radiation. The analysis concluded that DNA repair genes might play a previously unknown significant role in evolution. It seems that more DNA repair genes are found in vertebrate taxa that have experienced recent adaptive radiation. Additionally, DNA repair genes were found to be statistically associated with the genome size and protein number in vertebrates. DNA repair genes are considered as tumor suppressor genes. There is evidence that tumor suppressor genes are related to environmental adaptation in humans [70,71] and selective pressures along the evolution of mammals [72]. We can imagine that certain evolutionary procedures may be DNA repair-dependent, this showing the way for future analyses and experiments.

Not applicable
Authors' contributions KV conceived the study idea, analyzed and interpreted most of the data of this study, and prepared the first draft of the paper. HD contributed to genomic data retrieval and proofread the paper. CC performed a custom-made bioinformatics algorithm for the initial comparative genomics data analysis and proofread the paper. All authors read and approved the final manuscript.

Funding
Not any funding exists for this research.
Availability of data and materials All data generated or analyzed during this study are included in this published article [and its supplementary information files].
Ethics approval and consent to participate Not applicable

Competing interests
The authors declare that they have no competing interests.