Why are keratins important?

© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. The human fertilized egg (zygote) is probably the most implausible cell on Earth, because it is equipped with all the genes needed to create a human body—comprising ~ 37 trillion cells and representing more than 200 distinctly different cell-types. This complexity emerges from how and when the regulatory genes become activated to express the essential structural genes at the right time during embryogenesis, fetogenesis, postpartum, and all the way to adulthood. How does that 1-cell zygote hold itself together? How do the nucleus and nucleolus remain intact from the cytoplasm, and how do all the cytoplasmic organelles persist as distinct subcellular bodies? Likewise, how do all the zygote’s successor cells hold themselves together, despite their diversification into > 200 cell-types that are localized uniquely into dozens of specific tissues? The answer, in part, lies in (regulatory) genes involved in signaling and adhesion, and formation of filaments and fibrils and their (structural) gene products (proteins). If these types of Animalia proteins had not evolved (with orgins as early as the the first eukaryote) to hold cells together and keep cells organized within a particular tissue, life on Earth would be drastically different. One large subset of these genes responsible for keeping everything in their place is the Intermediate Filament (IntFil) gene superfamily. When we were first invited to join as coauthors on the Ho et al. [1] project, we knew nothing about IntFil genes and their proteins, nor why anyone would want to study them. IntFils arose during early metazoan evolution to provide mechanical support for plasma membranes that are connected and interact with other cells and the extracellular matrix. IntFils are ubiquitous structural components that comprise, in a cell type-specific manner, the cytoskeleton infrastructure in all animal tissues. All IntFil proteins show a distinctly organized extended α-helical conformation, which is predisposed to form two-stranded coiled coils that reflect the basic building blocks of highly flexible, stress-resistant cytoskeletal filaments. In this issue, Ho et al. [1] studied the evolutionary history of IntFil genes. Although IntFils are divided into six types, the coauthors focused on the type I “acidic” and type II “basic” keratin genes—which are much larger in number and evolutionarily emerged more recently than the other four types. The first keratin gene appeared in sponge, three keratin genes are found in arthropods, and then more rapid increases in keratin genes occurred in lungfish and amphibian genomes, concomitant with the sea animal-to land animal transition which occurred 440 to 410 million years ago. The human genome has 27 of 28 type I keratin genes clustered at chromosome (Chr) 17q21.2, and all 26 type II keratin genes clustered at Chr 12q13.13. The mouse genome has 27 of 28 type I keratin genes clustered on Chr 11, and all 26 type II clustered on Chr 15; all the mouse keratin genes are syntenic with the human keratin genes. On the other hand, the zebrafish genome has 18 type I keratin genes scattered on five chromosomes and three type II keratin genes on two chromosomes. The two clusters (“evolutionary blooms”) of type I and type II keratin genes, each located along a chromosomal segment, have been found in all seven nonhuman mammalian genomes that have been examined to date, but not in fish genomes [1]. Open Access

The human fertilized egg (zygote) is probably the most implausible cell on Earth, because it is equipped with all the genes needed to create a human body-comprising ~ 37 trillion cells and representing more than 200 distinctly different cell-types. This complexity emerges from how and when the regulatory genes become activated to express the essential structural genes at the right time during embryogenesis, fetogenesis, postpartum, and all the way to adulthood.
How does that 1-cell zygote hold itself together? How do the nucleus and nucleolus remain intact from the cytoplasm, and how do all the cytoplasmic organelles persist as distinct subcellular bodies? Likewise, how do all the zygote's successor cells hold themselves together, despite their diversification into > 200 cell-types that are localized uniquely into dozens of specific tissues? The answer, in part, lies in (regulatory) genes involved in signaling and adhesion, and formation of filaments and fibrils and their (structural) gene products (proteins). If these types of Animalia proteins had not evolved (with orgins as early as the the first eukaryote) to hold cells together and keep cells organized within a particular tissue, life on Earth would be drastically different.
One large subset of these genes responsible for keeping everything in their place is the Intermediate Filament (IntFil) gene superfamily. When we were first invited to join as coauthors on the Ho et al.
[1] project, we knew nothing about IntFil genes and their proteins, nor why anyone would want to study them.
IntFils arose during early metazoan evolution to provide mechanical support for plasma membranes that are connected and interact with other cells and the extracellular matrix. IntFils are ubiquitous structural components that comprise, in a cell type-specific manner, the cytoskeleton infrastructure in all animal tissues. All IntFil proteins show a distinctly organized extended α-helical conformation, which is predisposed to form two-stranded coiled coils that reflect the basic building blocks of highly flexible, stress-resistant cytoskeletal filaments. In this issue, Ho et al.
[1] studied the evolutionary history of IntFil genes. Although IntFils are divided into six types, the coauthors focused on the type I "acidic" and type II "basic" keratin genes-which are much larger in number and evolutionarily emerged more recently than the other four types.
The first keratin gene appeared in sponge, three keratin genes are found in arthropods, and then more rapid increases in keratin genes occurred in lungfish and amphibian genomes, concomitant with the sea animal-to land animal transition which occurred 440 to 410 million years ago. The human genome has 27 of 28 type I keratin genes clustered at chromosome (Chr) 17q21.2, and all 26 type II keratin genes clustered at Chr 12q13.13. The mouse genome has 27 of 28 type I keratin genes clustered on Chr 11, and all 26 type II clustered on Chr 15; all the mouse keratin genes are syntenic with the human keratin genes. On the other hand, the zebrafish genome has 18 type I keratin genes scattered on five chromosomes and three type II keratin genes on two chromosomes. The two clusters ("evolutionary blooms") of type I and type II keratin genes, each located along a chromosomal segment, have been found in all seven nonhuman mammalian genomes that have been examined to date, but not in fish genomes [1]. To make the cross-species trees, Ho et al.

Open Access
[1] used the interactive Fast-Fourier Transform method in MAFFT to build multiple sequence alignments, evolutionary relationships were estimated by Markov-chain Monte Carlo [10] in the Bayesian Phylogenetics program and sampling every 1,000 generations in parallel using the BEA-GLE library [11], following which the within-chain and between-chain variance potential scale reduction factor [12] was used to evaluate sufficient sampling. Finally, the sampled posteriors from the two independent executions were combined to generate a maximum clade-credibility tree [13]-summarizing the posterior distribution of estimated evolutionary relationships and branch lengths.
This bioinformatics analysis led Ho et al.
[1] to conclude that type I KRT18 resembles most closely the ancestral precursor of all other type I keratins, and the type II KRT8 resembles most closely the ancestral precursor of all other type II keratins. It is suggested for other gene superfamilies-containing evolutionary blooms in which an ancestral ordering is difficult to resolve-that the comparative genomics approach used in this publication might be helpful in determining which is the earliest diverging gene in a cluster.
Lastly, comparative-genomics approaches on genes relevant to human health and disease can offer insight into the nature and etiology of specific disorders. Are there keratin gene variants known to cause human disease? Ho et al. [1] found that the ClinVar database currently lists 26 human disease-causing variants within the various domains of keratin proteins. Fisk and Nebert Human Genomics (2022)  10. Markov-chain Monte Carlo (MCMC) sampling provides a class of algorithms for systematic random sampling from high-dimensional probability distributions. Unlike simple Monte Carlo-sampling methods that are able to draw independent samples from the distribution, MCMC methods draw samples where the next sample is dependent on the existing sample, called a Markov Chain; this allows the algorithms to narrow in on the quantity that is being approximated from the distribution-even with a large number of random variables. 11. BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. https:// beagle-dev. github. io/ 12. "Potential scale reduction factor" (PSRF) is an estimated factor by which the scale of the current distribution for the target distribution might be decreased, if the simulations were continued for an infinite number of iterations; each PSRF declines to 1 as the number of iterations approaches infinity. https:// mc-stan. org/ docs/2_ 18/ refer ence-manual/ notat ion-for-sampl es-chains-and-draws. html 13. Each clade within the tree is given a score, based on the fraction of times that it appears in the set of sampled posterior trees, and the product of these scores is then taken as the tree's score. The tree with the highest score is therefore assigned the maximum clade-credibility tree (MCCT). https:// beast2. blogs. auckl and. ac. nz/ summa rizing-poste rior-trees/

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.