- Single nucleotide polymorphism
- variant classification
- pathogenic
- VUS
- benign
- dominant
- recessive
- genetic heterogeneity
- incomplete penetrance
- Mendelian inheritance
- Complex inheritance
- human disease
Genes which correlate with dominant inherited diseases are mainly found in low single nucleotide polymorphism (SNP) hotspot regions of human chromosomes, whereas genes that correlate with recessive inherited diseases are mainly found in dense SNP hotspot regions along human chromosomes (1). Therefore, SNPs’ haplotypes are expected to play a role in allele-specific expression and to affect the principles of Mendelian or Complex inheritance (1).
In continuation of this research, I am currently focusing on chromosome regions identified by tag SNPs. These tag SNPs were identified using a metaSNP approach, employing a pairwise-linkage disequilibrium (LD) based Tagger on datasets from the 1000 Genomes Project (this article is under CC BY 4.0 license) (2). A tag SNP is a representative SNP in a region of the genome characterized by high LD that represents a group of SNPs called a haplotype. According to a recent study, tag SNPs among human populations can be considered representative of the human genome (3). Therefore, tag SNPs are highly informative not only within populations of the same continental group but also among reference populations (Europeans, Middle Eastern, and Central/South Asian) and of more distant and differentiated populations (Oceania, Americas) (3).
Taking into account that a disease mutation is more likely to occur on a common haplotype (4) and the fact that tag SNP variants reported by Elmas et al. concern a small set of around 15-20 tag SNPs per chromosome, which represent the genetic diversity of thousands of multi-population samples (2), I tried to explore whether the mode of inheritance within the most representative haplotypes captured by the tag SNPs across autosomal chromosomes aligns with the results of my previous study (1). For this reason, I retrieved the genes containing or being adjacent to each reported tag SNP of autosomal chromosomes, as identified by metaSNP in the 1000 Genomes Project genotypes by Elmas et al. in supplementary table S5 of ref (2). For the above inquiry, I used the NCBI Genome Data Viewer (5) based on genome assembly GRCh37/hg19. Then, the identified genes by this analysis were searched in OMIM database (6) for their association with a specific autosomal dominant (AD) or autosomal recessive (AR) disease.
In total, 462 tag SNPs have been reported across human autosomal chromosomes (2), with the vast majority being intronic or intergenic variants (2). In chromosomal regions related with these tag SNPs, I retrieved 19 AD and 37 AR OMIM diseases (Table I). According to my previous study, the chromosomes were classified in relation to SNP hotspot regions density from lowest to highest as: 13, 1, 14, 21, 18, 2, 20, 5, 12, 15, 17, 6, 10, 3, 11, 22, 4, 7, 9, 19, 8, and 16 (1). Knowing that AD and AR diseases are differentially presented across chromosomes due to differences in the frequencies of SNP hotspot regions, which can affect the mode of inheritance (1), non-parametric tests (SPSS Inc., Chicago, IL, USA) were performed to examine differences in disease distribution among chromosome regions trapped by the tag SNPs. One sample Kolmogorov-Smirnov test revealed that there is statistically significant difference in the number of AD diseases (p<0.001) and AR diseases (p<0.001) among chromosomes. Furthermore, the Wilcoxon signed-rank test indicated that there is a marginally significant difference in the number of both AD and AR diseases among chromosomes (p=0.047). These findings align with expectations, as autosomal chromosomes with different densities of SNP hotspot regions affect the number of inherited diseases across them (1), a phenomenon reflected also in chromosomal regions characterized by captured diversity from tag SNPs.
Tag SNPs across autosomal chromosomes and related OMIM diseases.
It may appear strange to observe peaks of AD disorders on chromosome 16 (n=4) and 19 (n=4) (Figure 1), because these chromosomes, known for their highest density of SNP hotspots (1), would not typically harbor such high numbers of AD disorders. One possible explanation is that mutations related to AD diseases corresponding to the tag SNPs of chromosome 16 and 19 are evolutionary novel de novo “dominant” deleterious variants, which have not yet been removed from the population genomic respiratory (1). It is worth mentioning that chromosomes 16 and 19 have been characterized as having significantly high local de novo mutation (DNM) rate, probably due to atypical mutational processes (7). Furthermore, chromosomes 8, 9, 16, and 19 have been reported to have the highest local DNM rates (7), which is in accordance with my previous classification of chromosomes according to their SNP hotspot density, also noted as being the highest polymorphic (1). I repeated the Wilcoxon signed-rank test without including the data of AD and AR diseases of chromosomes 8, 9, 16, and 19 in order to exclude their high DNM rate bias from the analysis and to isolate the possible effect of SNP hotspots. The analysis revealed that there is a highly significant difference in the number of both AD and AR diseases among the remaining chromosomes (p=0.005).
The number of autosomal dominant (AD) and autosomal recessive (AR) diseases in chromosomes’ regions captured by tag SNPs.
In conclusion, the present short report underscores the significance of haplotypic data of SNPs in influencing Mendelian or Complex mode of inheritance and their potential use to explain inconsistencies in variant classification. Despite the significance of sophisticated analysis and interpretation methods developed for variant classification based on evidence from population, computational, functional, and segregation data (8), variant classification often exhibits inconsistencies among scientists (9, 10). The American College of Medical Genetics and Genomics (ACMG)-recommended five variant classification categories (pathogenic, likely pathogenic, uncertain significance, likely benign, and benign) (11). The new developed Artificial Intelligence (AI) algorithms should make use of SNP haplotypic phase as an evolutionary genetic clock and informative index of crucial chromosomes regions, since the generation and expression of genetic variants in a specific genomic locus is affected by genomic features of the surrounding region such as adjacent nucleotide context, histone and chromatin modifications, DNA replication timing, chromatin recombination, and de novo mutation rates (1, 7, 12). Therefore, high SNP hotspot loci not only point out chromosomal regions susceptible to mutability and specific mutagenic mechanisms, but also, as I have previously presented, they can also affect mutation expression (1, 12). By leveraging the wealth of knowledge regarding human genome features, AI algorithms could be the key step toward achieving a more uniform variant classification and shed light on previously elusive heritability factors.
Footnotes
Conflicts of Interest
The Author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- Received January 10, 2024.
- Revision received February 1, 2024.
- Accepted February 2, 2024.
- Copyright © 2024 International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).