ESP Pangenome

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42495135,
year = {2026},
author = {Eripogu, KK and Maharathi, P and Li, WH},
title = {Comparative genomics of Nocardia seriolae reveals a conserved metabolic core and extensive accessory genome plasticity.},
journal = {Frontiers in microbiology},
volume = {17},
number = {},
pages = {1881314},
pmid = {42495135},
issn = {1664-302X},
abstract = {INTRODUCTION: Fish nocardiosis is a chronic and economically significant bacterial disease in aquaculture, yet its genomic basis remains poorly resolved beyond single-species studies. It remains unclear whether fish-associated Nocardia share conserved persistence-associated features or exhibit lineage-specific genomic diversification.

MATERIALS AND METHODS: We conducted a comparative genomic analysis of 22 Nocardia genomes, including 20 N. seriolae isolates and single representatives of N. salmonicida and N. crassostreae. Genome-wide analyses included phylogenomics, gene-content comparison, pangenome analysis, functional annotation, virulence-associated homolog screening, genomic island detection, and secondary biosynthetic gene cluster prediction.

RESULTS: The conserved genome core was enriched in central metabolism, lipid-associated cell envelope biogenesis, iron acquisition, and stress-response pathways. Virulence-associated homologs were dominated by persistence-associated and metabolic functions, whereas classical toxin systems were limited, although several transport- and secretion-associated homologs were detected, consistent with their potential contribution to host interaction and intracellular persistence. Phylogenomic and gene-content analyses revealed clear species-level divergence but limited host-associated structuring within N. seriolae. Pangenome analysis supported a robust open pangenome structure (γ = 0.386), with extensive accessory gene diversity enriched in regulatory functions, mobile genetic elements, and secondary metabolic pathways. Genomic islands were dominated by insertion-sequence-associated genes, recombinases, regulators, and hypothetical proteins, whereas prophage- and toxin-related signatures were rare. Secondary metabolite analysis revealed extensive biosynthetic diversity, with most biosynthetic gene clusters showing low similarity to characterized reference pathways. However, ectoine- and nocobactin-associated pathways were broadly conserved.

CONCLUSION: These genome findings are consistent with a persistence-associated pathogenicity model in which fish-associated Nocardia, particularly N. seriolae, may depend more on metabolic resilience, stress adaptation, iron acquisition, and accessory genome plasticity than on classical toxin-mediated virulence. Collectively, the results highlight the importance of accessory genome diversification, iron acquisition, and stress adaptation in shaping host-associated lifestyles and provide a comparative genomic foundation for future functional investigations and aquaculture disease-management strategies.},
}

RevDate: 2026-07-23

Whelan FJ (2026)

How the social lives of bacteria affect their pangenome.

Essays in biochemistry pii:237850 [Epub ahead of print].

Although the study of microbes started with type strains and reference genomes, advances in sequencing technology and new interest in mixed microbial communities have made us aware that a single genome cannot and does not reflect the diversity of a given bacterial species. Bacteria rarely occupy an environmental or host niche alone and quickly diversify into strains upon colonization of a new niche. The genetic diversity present within a phylogenetically related set of bacterial strains (the 'pangenome') is influenced by the niche that they occupy and how they interact with the other microorganisms that they share that niche with. In this review, I examine how the social lives of bacteria can affect their genetic diversity and the bioinformatic techniques that we use to detect that diversity.

Additional Links: PMID-42488935

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42488935,
year = {2026},
author = {Whelan, FJ},
title = {How the social lives of bacteria affect their pangenome.},
journal = {Essays in biochemistry},
volume = {},
number = {},
pages = {},
doi = {10.1042/EBC20250039},
pmid = {42488935},
issn = {1744-1358},
support = {MR/Y016343/1//UK Research and Innovation (UKRI)/ ; SBF009\1062//Academy of Medical Sciences (The Academy of Medical Sciences)/ ; },
abstract = {Although the study of microbes started with type strains and reference genomes, advances in sequencing technology and new interest in mixed microbial communities have made us aware that a single genome cannot and does not reflect the diversity of a given bacterial species. Bacteria rarely occupy an environmental or host niche alone and quickly diversify into strains upon colonization of a new niche. The genetic diversity present within a phylogenetically related set of bacterial strains (the 'pangenome') is influenced by the niche that they occupy and how they interact with the other microorganisms that they share that niche with. In this review, I examine how the social lives of bacteria can affect their genetic diversity and the bioinformatic techniques that we use to detect that diversity.},
}

RevDate: 2026-07-23

Lao J, Zhang L, Huang X, et al (2026)

Longitudinal surveillance of antibiotic resistance and virulence evolution in Clostridioides difficile: a 4-year retrospective study of hospitalized patients in a tertiary hospital in China.

Microbiology spectrum [Epub ahead of print].

UNLABELLED: Clostridioides difficile (C. difficile) is the primary pathogen responsible for nosocomial infectious diarrhea and pseudomembranous colitis. In China, metronidazole and vancomycin are the preferred treatments for C. difficile infection (CDI). This study aimed to investigate the evolution of vancomycin (VA) and metronidazole (MTZ) resistance, as well as the longitudinal changes in virulence over time, using next-generation sequencing, drug susceptibility tests, and analysis of resistance and virulence genes. Additionally, we monitored the emergence of the highly virulent C. difficile strain RT027 and the spread and potential outbreak of C. difficile in the hospital setting. A random stratified sampling method was used to select 114 fecal samples from inpatients at Affiliated Hangzhou First People's Hospital, School of Medicine, Westlake University, between 2021 and 2024. Clinical data from the enrolled patients were also collected. We conducted antigen and toxin protein detection for C. difficile, strain isolation and identification, drug sensitivity tests, whole genome sequencing, and bioinformatics analysis. This included comparisons of drug resistance genes, detection of toxin genes, and the construction of phylogenetic trees based on pan-genome analysis to investigate the resistance and toxin gene variations in C. difficile. Among the 114 samples collected from Affiliated Hangzhou First People's Hospital, School of Medicine, Westlake University, no vancomycin- or metronidazole-resistant strains were identified. However, the average minimum inhibitory concentration (MIC) of C. difficile to vancomycin increased annually (H = 33.208, P < 0.05). The average MIC of C. difficile to metronidazole was highest in 2022 but decreased in 2023 and 2024 (H = 41.990, P < 0.05). Notably, in 2024, one C. difficile strain exhibited an MIC for metronidazole at the resistance threshold (2.00 μg/mL). Further Spearman correlation analysis of the strain years with drug sensitivity results revealed a positive correlation between strain years and the MIC levels of vancomycin and metronidazole (r = 0.528, P < 0.05; r = 0.377, P < 0.05). The proportion of toxin-producing strains increased annually, with 100% of strains in 2024 producing toxins, representing the highest proportion compared to the previous three years (X[2] =11.75, P < 0.05). Both vancomycin and metronidazole remain effective for the treatment of CDI in clinical practice. However, the sensitivity of C. difficile to these two drugs is gradually decreasing, and the rate of toxin gene carriage is also rising in clinical cases. No hospital outbreaks of C. difficile infections were identified in this study.

IMPORTANCE: Clostridioides difficile has developed resistance to multiple antibiotics, including cephalosporins, clindamycin, and fluoroquinolones. This has exacerbated the global antibiotic resistance crisis. In China, according to current treatment guidelines, vancomycin and metronidazole are the preferred first-line drugs for treating C. difficile infections. However, there are reports indicating the emergence of new resistance to both vancomycin and metronidazole. Although there is extensive research on the long-term antibiotic resistance of C. difficile abroad, research on the continuous monitoring of antibiotic resistance and potential outbreaks of C. difficile in China is relatively limited. To fill this gap, we studied positive C. difficile strains from a tertiary general hospital in China. Through Next-Generation Sequencing (NGS), drug sensitivity testing, and analysis of drug resistance and virulence genes, we revealed the evolution of C. difficile's resistance to vancomycin and metronidazole, as well as changes in virulence, and monitored the spread within the hospital and potential outbreaks of C. difficile.

Additional Links: PMID-42489458

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42489458,
year = {2026},
author = {Lao, J and Zhang, L and Huang, X and Du, G and Yang, W and Lin, B and Wu, S and Zhao, H and Xiang, G and Wang, L and Wang, X},
title = {Longitudinal surveillance of antibiotic resistance and virulence evolution in Clostridioides difficile: a 4-year retrospective study of hospitalized patients in a tertiary hospital in China.},
journal = {Microbiology spectrum},
volume = {},
number = {},
pages = {e0049426},
doi = {10.1128/spectrum.00494-26},
pmid = {42489458},
issn = {2165-0497},
abstract = {UNLABELLED: Clostridioides difficile (C. difficile) is the primary pathogen responsible for nosocomial infectious diarrhea and pseudomembranous colitis. In China, metronidazole and vancomycin are the preferred treatments for C. difficile infection (CDI). This study aimed to investigate the evolution of vancomycin (VA) and metronidazole (MTZ) resistance, as well as the longitudinal changes in virulence over time, using next-generation sequencing, drug susceptibility tests, and analysis of resistance and virulence genes. Additionally, we monitored the emergence of the highly virulent C. difficile strain RT027 and the spread and potential outbreak of C. difficile in the hospital setting. A random stratified sampling method was used to select 114 fecal samples from inpatients at Affiliated Hangzhou First People's Hospital, School of Medicine, Westlake University, between 2021 and 2024. Clinical data from the enrolled patients were also collected. We conducted antigen and toxin protein detection for C. difficile, strain isolation and identification, drug sensitivity tests, whole genome sequencing, and bioinformatics analysis. This included comparisons of drug resistance genes, detection of toxin genes, and the construction of phylogenetic trees based on pan-genome analysis to investigate the resistance and toxin gene variations in C. difficile. Among the 114 samples collected from Affiliated Hangzhou First People's Hospital, School of Medicine, Westlake University, no vancomycin- or metronidazole-resistant strains were identified. However, the average minimum inhibitory concentration (MIC) of C. difficile to vancomycin increased annually (H = 33.208, P < 0.05). The average MIC of C. difficile to metronidazole was highest in 2022 but decreased in 2023 and 2024 (H = 41.990, P < 0.05). Notably, in 2024, one C. difficile strain exhibited an MIC for metronidazole at the resistance threshold (2.00 μg/mL). Further Spearman correlation analysis of the strain years with drug sensitivity results revealed a positive correlation between strain years and the MIC levels of vancomycin and metronidazole (r = 0.528, P < 0.05; r = 0.377, P < 0.05). The proportion of toxin-producing strains increased annually, with 100% of strains in 2024 producing toxins, representing the highest proportion compared to the previous three years (X[2] =11.75, P < 0.05). Both vancomycin and metronidazole remain effective for the treatment of CDI in clinical practice. However, the sensitivity of C. difficile to these two drugs is gradually decreasing, and the rate of toxin gene carriage is also rising in clinical cases. No hospital outbreaks of C. difficile infections were identified in this study.

IMPORTANCE: Clostridioides difficile has developed resistance to multiple antibiotics, including cephalosporins, clindamycin, and fluoroquinolones. This has exacerbated the global antibiotic resistance crisis. In China, according to current treatment guidelines, vancomycin and metronidazole are the preferred first-line drugs for treating C. difficile infections. However, there are reports indicating the emergence of new resistance to both vancomycin and metronidazole. Although there is extensive research on the long-term antibiotic resistance of C. difficile abroad, research on the continuous monitoring of antibiotic resistance and potential outbreaks of C. difficile in China is relatively limited. To fill this gap, we studied positive C. difficile strains from a tertiary general hospital in China. Through Next-Generation Sequencing (NGS), drug sensitivity testing, and analysis of drug resistance and virulence genes, we revealed the evolution of C. difficile's resistance to vancomycin and metronidazole, as well as changes in virulence, and monitored the spread within the hospital and potential outbreaks of C. difficile.},
}

RevDate: 2026-07-23
CmpDate: 2026-07-23

Liu Q, Lian J, H Tang (2026)

Evolutionary and pan-genomic analysis of the bZIP gene family in 21 Camellia sinensis.

Frontiers in plant science, 17:1885680.

Basic leucine zipper (bZIP) transcription factors are important regulators of plant development and stress responses, yet their evolutionary dynamics in tea plant have largely been inferred from a single reference genome. Here, we performed a broad evolutionary and pan-genomic analysis of the bZIP family using 1,015 plant genomes and 21 Camellia sinensis genomes. Across plants, 81,340 bZIP genes were identified, revealing broad conservation of this family across major lineages and a significant copy-number expansion in angiosperms. In tea, 1,635 non-redundant bZIP genes were identified, with 73-88 members per genome, indicating an overall conserved family size among tea germplasms. Phylogenetic analysis classified these genes into 13 subfamilies, among which S, A, D, I and G represented the major expanded groups. Orthogroup analysis resolved 77 bZIP orthogroups, including 22 core orthogroups and 55 dispensable orthogroups, suggesting substantial hidden variation despite stable total gene numbers. WGD/segmental duplication was the dominant expansion mechanism, accounting for 59.20% of tea bZIP genes, followed by dispersed duplication. Copy-number variation was widespread, with 72 of 77 orthogroups showing CNV across genomes. Most homologous gene pairs evolved under purifying selection, whereas dispensable genes exhibited relatively relaxed constraints compared with core genes. Transcriptome analysis in 'Shuchazao' revealed tissue-biased expression and divergent drought responses, with A, S, I and M subfamilies showing stronger PEG-induced responsiveness. Together, these results establish a pan-genome-informed bZIP resource and highlight CNV, duplication mode and dispensable gene variation as potential drivers of tea bZIP diversification.

Additional Links: PMID-42490954

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42490954,
year = {2026},
author = {Liu, Q and Lian, J and Tang, H},
title = {Evolutionary and pan-genomic analysis of the bZIP gene family in 21 Camellia sinensis.},
journal = {Frontiers in plant science},
volume = {17},
number = {},
pages = {1885680},
pmid = {42490954},
issn = {1664-462X},
abstract = {Basic leucine zipper (bZIP) transcription factors are important regulators of plant development and stress responses, yet their evolutionary dynamics in tea plant have largely been inferred from a single reference genome. Here, we performed a broad evolutionary and pan-genomic analysis of the bZIP family using 1,015 plant genomes and 21 Camellia sinensis genomes. Across plants, 81,340 bZIP genes were identified, revealing broad conservation of this family across major lineages and a significant copy-number expansion in angiosperms. In tea, 1,635 non-redundant bZIP genes were identified, with 73-88 members per genome, indicating an overall conserved family size among tea germplasms. Phylogenetic analysis classified these genes into 13 subfamilies, among which S, A, D, I and G represented the major expanded groups. Orthogroup analysis resolved 77 bZIP orthogroups, including 22 core orthogroups and 55 dispensable orthogroups, suggesting substantial hidden variation despite stable total gene numbers. WGD/segmental duplication was the dominant expansion mechanism, accounting for 59.20% of tea bZIP genes, followed by dispersed duplication. Copy-number variation was widespread, with 72 of 77 orthogroups showing CNV across genomes. Most homologous gene pairs evolved under purifying selection, whereas dispensable genes exhibited relatively relaxed constraints compared with core genes. Transcriptome analysis in 'Shuchazao' revealed tissue-biased expression and divergent drought responses, with A, S, I and M subfamilies showing stronger PEG-induced responsiveness. Together, these results establish a pan-genome-informed bZIP resource and highlight CNV, duplication mode and dispensable gene variation as potential drivers of tea bZIP diversification.},
}

RevDate: 2026-07-24

McKindles K, Seto K, Ahrendt S, et al (2026)

Single-cell genomics, metagenomics, and transcriptomics of Rhizophydium megarrhizum, an obligate fungal parasite of Planktothrix agardhii.

Aquatic ecology, 60(3):92.

UNLABELLED: Chytrids (phylum Chytridiomycota) are zoosporic fungi that play key roles as parasites of aquatic microorganisms, yet they are understudied and genomic resources for algal-infecting chytrids remain scarce. Here, we present the first comparative genomic analysis of multiple isolates of a single chytrid species (order Rhizophydiales) infecting the cyanobacterium Planktothrix agardhii. Isolates were collected from Sandusky Bay, Lake Erie, across two bloom years (2018 and 2019). Using single cell sequencing and metagenomic assembly, we generated individual genomes averaging 15.36 ± 0.12 Mbp in size with ~ 75% completeness, and a pangenome. Gene ontology analyses highlighted the presence of categories related to cellular structure, biosynthetic regulation, and interspecies interactions. As a preliminary exploration of gene expression during infection, we also performed RNA sequencing on a subset of size-sorted samples. These data suggest that chytrids consistently express high levels of cytoskeletal genes, alongside numerous hypothetical proteins, and that zoospores may upregulate carbohydrate-binding proteins implicated in host recognition. On the host side, P. agardhii showed transcriptional shifts in pathways associated with buoyancy and nutrient acquisition, patterns that could represent defensive adjustments or parasite-driven manipulation. Together, this study generates reference genomes for Planktothrix-infective chytrids, identifies conserved gene content across isolates from different bloom years, and provides preliminary transcriptomic insights into parasite and host responses. These resources lay the foundation for deeper investigations into chytrid genome evolution, infection biology, and their ecological roles in shaping cyanobacterial bloom dynamics.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10452-026-10329-8.

Additional Links: PMID-42494985

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42494985,
year = {2026},
author = {McKindles, K and Seto, K and Ahrendt, S and Salamov, A and Chovatia, M and Wang, M and Barry, K and Grigoriev, IV and McKay, RM and James, TY},
title = {Single-cell genomics, metagenomics, and transcriptomics of Rhizophydium megarrhizum, an obligate fungal parasite of Planktothrix agardhii.},
journal = {Aquatic ecology},
volume = {60},
number = {3},
pages = {92},
pmid = {42494985},
issn = {1386-2588},
abstract = {UNLABELLED: Chytrids (phylum Chytridiomycota) are zoosporic fungi that play key roles as parasites of aquatic microorganisms, yet they are understudied and genomic resources for algal-infecting chytrids remain scarce. Here, we present the first comparative genomic analysis of multiple isolates of a single chytrid species (order Rhizophydiales) infecting the cyanobacterium Planktothrix agardhii. Isolates were collected from Sandusky Bay, Lake Erie, across two bloom years (2018 and 2019). Using single cell sequencing and metagenomic assembly, we generated individual genomes averaging 15.36 ± 0.12 Mbp in size with ~ 75% completeness, and a pangenome. Gene ontology analyses highlighted the presence of categories related to cellular structure, biosynthetic regulation, and interspecies interactions. As a preliminary exploration of gene expression during infection, we also performed RNA sequencing on a subset of size-sorted samples. These data suggest that chytrids consistently express high levels of cytoskeletal genes, alongside numerous hypothetical proteins, and that zoospores may upregulate carbohydrate-binding proteins implicated in host recognition. On the host side, P. agardhii showed transcriptional shifts in pathways associated with buoyancy and nutrient acquisition, patterns that could represent defensive adjustments or parasite-driven manipulation. Together, this study generates reference genomes for Planktothrix-infective chytrids, identifies conserved gene content across isolates from different bloom years, and provides preliminary transcriptomic insights into parasite and host responses. These resources lay the foundation for deeper investigations into chytrid genome evolution, infection biology, and their ecological roles in shaping cyanobacterial bloom dynamics.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10452-026-10329-8.},
}

RevDate: 2026-07-24
CmpDate: 2026-07-24

Sebastian PJ, Schlesener C, Byrne BA, et al (2026)

Beyond AMR and virulence databases: genome wide associations of closely related Vibrio alginolyticus and emerging Vibrio diabolicus provide framework for identifying novel genetic markers.

Frontiers in microbiology, 17:1796882.

Vibrio alginolyticus is a frequently implicated species for vibriosis in humans and diverse wildlife, but it has previously been difficult to identify from the closely related and emerging Vibrio diabolicus. Comparisons of both species, including antimicrobial resistance (AMR) and virulence characterizations, are scarce and impeded by intraspecies diversity, minimal genomes, discordant classification methods, and gene databases with limited utility to understudied species. The species identities of 3,442 public domain genomes (SRA files) within the Harveyi clade were re-evaluated using genomic methods. Public genomes identified as V. diabolicus and V. alginolyticus were combined with previously published genomes isolated from humans, sea otters (Enydra lutris), or coastal environments (V. diabolicus n = 88, V. alginolyticus n = 163, Vibrio parahaemolyticus n = 287) for pangenome-wide association studies to identify species-specific gene clusters (95% identification threshold). Additional genome wide associations with isolation source (humans versus sea otters) were investigated, including AMR and virulence related gene clusters. Genomic reclassification identified 29 of 150 misclassified public domain V. alginolyticus genomes, including 26 reclassified as V. diabolicus. In total, 28 previously misclassified V. diabolicus genomes (n = 37 total) were identified, including 10 human-derived strains. GWAS identified 643 and 477 gene clusters specific to V. alginolyticus and V. diabolicus, respectively, while some multilocus sequencing analysis (MLSA) gene clusters were non-specific. Gene clusters (n = 109) associated with either V. alginolyticus isolated from humans or sea otters were identified including one annotated to a multidrug resistance gene (mdtk_1). No V. diabolicus gene clusters were associated with host species after multiple comparison correction, although pre-correction associations related to antimicrobial resistance were detected (cat_1, ampC). The genomic methods of classification presented provide accurate species identification for V. diabolicus and V. alginolyticus beyond current MLSA/MLST schemes, although target species-specific genes were identified that may be useful for improved future schemes. While limited sample size of V. diabolicus hampered the ability to detect host associated markers, the GWAS approach employed provide a reusable framework for discovering insights into host adaptation and prioritizing target genes for future functional AMR and virulence validation experiments in both species.

Additional Links: PMID-42495129

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42495129,
year = {2026},
author = {Sebastian, PJ and Schlesener, C and Byrne, BA and Miller, M and Weimer, BC and Johnson, CK},
title = {Beyond AMR and virulence databases: genome wide associations of closely related Vibrio alginolyticus and emerging Vibrio diabolicus provide framework for identifying novel genetic markers.},
journal = {Frontiers in microbiology},
volume = {17},
number = {},
pages = {1796882},
pmid = {42495129},
issn = {1664-302X},
abstract = {Vibrio alginolyticus is a frequently implicated species for vibriosis in humans and diverse wildlife, but it has previously been difficult to identify from the closely related and emerging Vibrio diabolicus. Comparisons of both species, including antimicrobial resistance (AMR) and virulence characterizations, are scarce and impeded by intraspecies diversity, minimal genomes, discordant classification methods, and gene databases with limited utility to understudied species. The species identities of 3,442 public domain genomes (SRA files) within the Harveyi clade were re-evaluated using genomic methods. Public genomes identified as V. diabolicus and V. alginolyticus were combined with previously published genomes isolated from humans, sea otters (Enydra lutris), or coastal environments (V. diabolicus n = 88, V. alginolyticus n = 163, Vibrio parahaemolyticus n = 287) for pangenome-wide association studies to identify species-specific gene clusters (95% identification threshold). Additional genome wide associations with isolation source (humans versus sea otters) were investigated, including AMR and virulence related gene clusters. Genomic reclassification identified 29 of 150 misclassified public domain V. alginolyticus genomes, including 26 reclassified as V. diabolicus. In total, 28 previously misclassified V. diabolicus genomes (n = 37 total) were identified, including 10 human-derived strains. GWAS identified 643 and 477 gene clusters specific to V. alginolyticus and V. diabolicus, respectively, while some multilocus sequencing analysis (MLSA) gene clusters were non-specific. Gene clusters (n = 109) associated with either V. alginolyticus isolated from humans or sea otters were identified including one annotated to a multidrug resistance gene (mdtk_1). No V. diabolicus gene clusters were associated with host species after multiple comparison correction, although pre-correction associations related to antimicrobial resistance were detected (cat_1, ampC). The genomic methods of classification presented provide accurate species identification for V. diabolicus and V. alginolyticus beyond current MLSA/MLST schemes, although target species-specific genes were identified that may be useful for improved future schemes. While limited sample size of V. diabolicus hampered the ability to detect host associated markers, the GWAS approach employed provide a reusable framework for discovering insights into host adaptation and prioritizing target genes for future functional AMR and virulence validation experiments in both species.},
}

RevDate: 2026-07-21
CmpDate: 2026-07-21

Awuah D, Hounkpe A, Anane-Asamoah J, et al (2026)

The use of artificial intelligence in advancing molecular biology in Africa: a narrative review.

Molecular genetics and genomics : MGG, 301(1):.

Artificial intelligence (AI) is rapidly becoming a core methodological pillar of molecular biology and precision medicine, and Africa is a uniquely consequential setting for this transition because the continent combines the world's greatest human genomic diversity with the most severe underrepresentation of that diversity in the datasets and reference resources on which AI models are built and benchmarked. This narrative review examines, for a genetics and genomics readership, where AI-driven methods are already strengthening African molecular biology, where the supporting evidence remains preliminary, and what is required to translate technical capability into scientifically robust and equitable benefit. The central argument is that AI is especially consequential in African molecular biology, not simply because it automates analysis, but because it can help unlock insight from African genomic diversity, pathogen biology, and clinically relevant multi-omics data that remain underrepresented in global models. Across core molecular domains, AI is accelerating protein structure prediction, high-throughput variant calling and pan-genomic reference construction, genome-wide association analysis, transcriptomic interpretation, drug discovery, and CRISPR guide design. African initiatives such as H3Africa, the African Genome Variation Project, H3ABioNet, and the H3D Centre show that locally generated datasets and African-led computational pipelines can already support meaningful discovery, from improved variant interpretation to structure-guided therapeutic prioritization. At the same time, persistent barriers remain, including underrepresentation of African genomes in training data and reference genomes, uneven computational infrastructure, limited interdisciplinary training, fragmented governance, and the risk that AI-derived benefits will remain inaccessible to the populations whose data enable them. We conclude that the future impact of AI in African molecular biology will depend less on adopting global tools in the abstract and more on building African-led datasets, validation pipelines, governance frameworks, and translational pathways that make molecular discovery both scientifically robust and equitably useful. Looking ahead, the central perspective offered by this review is that Africa's exceptional genomic diversity should be treated as a scientific asset rather than an analytical liability: realising this will require population-representative pan-genome references, sustained computational capacity, and governance structures that ensure African populations are not only the source of the underlying data but also the principal beneficiaries of the discoveries it enables.

Additional Links: PMID-42481850

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42481850,
year = {2026},
author = {Awuah, D and Hounkpe, A and Anane-Asamoah, J and Dakubo, WK and Adu, IK and Barnie, PA and Ansah, EO and Fosu, K and Aidoo, CO and Essien, V and Adam, Y and Ninson, E and Quansah, R and Kyei, F},
title = {The use of artificial intelligence in advancing molecular biology in Africa: a narrative review.},
journal = {Molecular genetics and genomics : MGG},
volume = {301},
number = {1},
pages = {},
pmid = {42481850},
issn = {1617-4623},
mesh = {Humans ; *Artificial Intelligence/trends ; Africa ; *Molecular Biology/methods/trends ; Genomics/methods ; Genome, Human/genetics ; Genome-Wide Association Study ; },
abstract = {Artificial intelligence (AI) is rapidly becoming a core methodological pillar of molecular biology and precision medicine, and Africa is a uniquely consequential setting for this transition because the continent combines the world's greatest human genomic diversity with the most severe underrepresentation of that diversity in the datasets and reference resources on which AI models are built and benchmarked. This narrative review examines, for a genetics and genomics readership, where AI-driven methods are already strengthening African molecular biology, where the supporting evidence remains preliminary, and what is required to translate technical capability into scientifically robust and equitable benefit. The central argument is that AI is especially consequential in African molecular biology, not simply because it automates analysis, but because it can help unlock insight from African genomic diversity, pathogen biology, and clinically relevant multi-omics data that remain underrepresented in global models. Across core molecular domains, AI is accelerating protein structure prediction, high-throughput variant calling and pan-genomic reference construction, genome-wide association analysis, transcriptomic interpretation, drug discovery, and CRISPR guide design. African initiatives such as H3Africa, the African Genome Variation Project, H3ABioNet, and the H3D Centre show that locally generated datasets and African-led computational pipelines can already support meaningful discovery, from improved variant interpretation to structure-guided therapeutic prioritization. At the same time, persistent barriers remain, including underrepresentation of African genomes in training data and reference genomes, uneven computational infrastructure, limited interdisciplinary training, fragmented governance, and the risk that AI-derived benefits will remain inaccessible to the populations whose data enable them. We conclude that the future impact of AI in African molecular biology will depend less on adopting global tools in the abstract and more on building African-led datasets, validation pipelines, governance frameworks, and translational pathways that make molecular discovery both scientifically robust and equitably useful. Looking ahead, the central perspective offered by this review is that Africa's exceptional genomic diversity should be treated as a scientific asset rather than an analytical liability: realising this will require population-representative pan-genome references, sustained computational capacity, and governance structures that ensure African populations are not only the source of the underlying data but also the principal beneficiaries of the discoveries it enables.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Artificial Intelligence/trends
Africa
*Molecular Biology/methods/trends
Genomics/methods
Genome, Human/genetics
Genome-Wide Association Study

RevDate: 2026-07-22
CmpDate: 2026-07-22

Biswas R, Sinha SS, Roy A, et al (2026)

An integrated subtractive genomics and immunoinformatics approach for designing a universal multi-epitope vaccine against Brucella spp.

Frontiers in bioinformatics, 6:1818265.

INTRODUCTION: Brucella spp. are Gram-negative bacteria accountable for brucellosis in immunocompromised individuals and livestock. Due to the slow-growing latent phenotype, current antibiotics are insufficient to treat the infection. The lack of an approved vaccine for human use against this pathogen represents a significant public health concern and indicates the urgent need for novel prophylactic interventions.

METHODOLOGY: In this study, the reverse vaccinology method was combined with pan-genome analysis to identify potential vaccine targets. Proteins have been screened for antigenicity, solubility, immunogenicity, and subcellular localization. B cell and T cell epitopes exhibiting high immunogenicity and solubility have been identified. Multi-epitope vaccine constructs have been evaluated and further analyzed depending on their physicochemical properties. Molecular docking, conformational dynamics, in silico cloning, and immune simulations were conducted to identify the optimal vaccine candidate.

RESULTS: Four proteins, trigger factor, outer membrane protein assembly factor BamA, urease subunit beta (UreB), and urease subunit alpha (UreC1) were considered for potential vaccine targets. A total of 26 B cell and 97 T cell epitopes with notable immunogenicity and solubility have been shortlisted. Twelve multi-epitope vaccine constructs were generated, among which Vc7 has been chosen based on structural and physicochemical properties. Molecular docking analysis revealed a good correlation with 2FSE and 2Z65, which were further analyzed to reveal that Vc7 exhibited stronger binding affinity (-135.24 kcal/mol) towards 2FSE, mediated by hydrophobic contacts, salt bridges, and intermolecular hydrogen bonds, making it the ideal vaccine complex and validated through a 150 ns molecular dynamics simulation. In silico cloning established construct compatibility, and immune simulation confirmed Vc7's potential to elicit T cell, B cell, antibody, and cytokine-mediated responses.

CONCLUSION: Vc7 has been identified as a structurally stable and highly immunogenic construct, suggesting its potential as a universal multi-epitope vaccine candidate for the prevention of brucellosis.

Additional Links: PMID-42482969

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42482969,
year = {2026},
author = {Biswas, R and Sinha, SS and Roy, A and Ramaiah, S and Anbarasu, A},
title = {An integrated subtractive genomics and immunoinformatics approach for designing a universal multi-epitope vaccine against Brucella spp.},
journal = {Frontiers in bioinformatics},
volume = {6},
number = {},
pages = {1818265},
pmid = {42482969},
issn = {2673-7647},
abstract = {INTRODUCTION: Brucella spp. are Gram-negative bacteria accountable for brucellosis in immunocompromised individuals and livestock. Due to the slow-growing latent phenotype, current antibiotics are insufficient to treat the infection. The lack of an approved vaccine for human use against this pathogen represents a significant public health concern and indicates the urgent need for novel prophylactic interventions.

METHODOLOGY: In this study, the reverse vaccinology method was combined with pan-genome analysis to identify potential vaccine targets. Proteins have been screened for antigenicity, solubility, immunogenicity, and subcellular localization. B cell and T cell epitopes exhibiting high immunogenicity and solubility have been identified. Multi-epitope vaccine constructs have been evaluated and further analyzed depending on their physicochemical properties. Molecular docking, conformational dynamics, in silico cloning, and immune simulations were conducted to identify the optimal vaccine candidate.

RESULTS: Four proteins, trigger factor, outer membrane protein assembly factor BamA, urease subunit beta (UreB), and urease subunit alpha (UreC1) were considered for potential vaccine targets. A total of 26 B cell and 97 T cell epitopes with notable immunogenicity and solubility have been shortlisted. Twelve multi-epitope vaccine constructs were generated, among which Vc7 has been chosen based on structural and physicochemical properties. Molecular docking analysis revealed a good correlation with 2FSE and 2Z65, which were further analyzed to reveal that Vc7 exhibited stronger binding affinity (-135.24 kcal/mol) towards 2FSE, mediated by hydrophobic contacts, salt bridges, and intermolecular hydrogen bonds, making it the ideal vaccine complex and validated through a 150 ns molecular dynamics simulation. In silico cloning established construct compatibility, and immune simulation confirmed Vc7's potential to elicit T cell, B cell, antibody, and cytokine-mediated responses.

CONCLUSION: Vc7 has been identified as a structurally stable and highly immunogenic construct, suggesting its potential as a universal multi-epitope vaccine candidate for the prevention of brucellosis.},
}

RevDate: 2026-07-22

Yu M, Wang Y, Jiang L, et al (2026)

Genomic identification, pangenome analysis, and antimicrobial resistance of Aeromonas spp. isolated from food and foodborne outbreaks.

International journal of food microbiology, 460:111979 pii:S0168-1605(26)00360-0 [Epub ahead of print].

Aeromonas spp. are extensively distributed across diverse aquatic environments and recognized as pathogens capable of causing diseases in aquatic animals. Pathogenic Aeromonas causes foodborne gastroenteritis in humans and can also lead to extra-intestinal infections. However, accurate identification of Aeromonas species remains challenging. This study aimed to accurately identify Aeromonas spp. and compare their virulence gene profiles, antimicrobial resistance patterns, and molecular evolutionary relationships. A total of 42 Aeromonas isolates were obtained from retail food and foodborne disease outbreaks. They were initially identified using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) and further confirmed by genomic methods. Average nucleotide identity (ANI) can accurately identify Aeromonas species. However, a higher ANI threshold is required to distinguish closely related species. The genus Aeromonas was found to possess an open pan-genome, enabling the acquisition of new genetic elements and enhancing environmental adaptability. All isolates encoded β-lactamase resistance genes, and 90.5% (38/42) of these were conferred resistance to ampicillin and amoxicillin-sulbactam, with the 95% confidence interval (CI) of 77.9%-96.2%. Some strains harbored antimicrobial resistance genes, such as mcr, tetE, sul, qnr, and so forth, and conferred resistance to the corresponding antibiotics. Some strains contained mobile elements carrying antimicrobial resistance gene clusters, such as transposon Tn5393 and antibiotic-resistant plasmids, providing mechanistic insights into their potential for horizontal antimicrobial gene transfer and adaptive evolution. Certain Aeromonas species possessed numerous virulence genes, including ast, hlyA, rtx, aerA, and hutX, and genes encoding flagellar, pili, and secretion systems. A. dhakensis, A. salmonicida, A. hydrophila, A. veronii, and A. enteropelogenes were predicted to have higher virulence potential. In contrast, A. caviae, the main Aeromonas species associated with foodborne disease outbreaks, exhibited relatively fewer virulence genes. This study emphasized the pathogenic potential and antimicrobial resistance profiles of Aeromonas species. Continuous monitoring of resistance patterns and contamination levels in food products is crucial for minimizing infection risks and preventing disease outbreaks caused by Aeromonas spp.

Additional Links: PMID-42485680

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42485680,
year = {2026},
author = {Yu, M and Wang, Y and Jiang, L and Liu, D and Hu, B and Yang, X},
title = {Genomic identification, pangenome analysis, and antimicrobial resistance of Aeromonas spp. isolated from food and foodborne outbreaks.},
journal = {International journal of food microbiology},
volume = {460},
number = {},
pages = {111979},
doi = {10.1016/j.ijfoodmicro.2026.111979},
pmid = {42485680},
issn = {1879-3460},
abstract = {Aeromonas spp. are extensively distributed across diverse aquatic environments and recognized as pathogens capable of causing diseases in aquatic animals. Pathogenic Aeromonas causes foodborne gastroenteritis in humans and can also lead to extra-intestinal infections. However, accurate identification of Aeromonas species remains challenging. This study aimed to accurately identify Aeromonas spp. and compare their virulence gene profiles, antimicrobial resistance patterns, and molecular evolutionary relationships. A total of 42 Aeromonas isolates were obtained from retail food and foodborne disease outbreaks. They were initially identified using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) and further confirmed by genomic methods. Average nucleotide identity (ANI) can accurately identify Aeromonas species. However, a higher ANI threshold is required to distinguish closely related species. The genus Aeromonas was found to possess an open pan-genome, enabling the acquisition of new genetic elements and enhancing environmental adaptability. All isolates encoded β-lactamase resistance genes, and 90.5% (38/42) of these were conferred resistance to ampicillin and amoxicillin-sulbactam, with the 95% confidence interval (CI) of 77.9%-96.2%. Some strains harbored antimicrobial resistance genes, such as mcr, tetE, sul, qnr, and so forth, and conferred resistance to the corresponding antibiotics. Some strains contained mobile elements carrying antimicrobial resistance gene clusters, such as transposon Tn5393 and antibiotic-resistant plasmids, providing mechanistic insights into their potential for horizontal antimicrobial gene transfer and adaptive evolution. Certain Aeromonas species possessed numerous virulence genes, including ast, hlyA, rtx, aerA, and hutX, and genes encoding flagellar, pili, and secretion systems. A. dhakensis, A. salmonicida, A. hydrophila, A. veronii, and A. enteropelogenes were predicted to have higher virulence potential. In contrast, A. caviae, the main Aeromonas species associated with foodborne disease outbreaks, exhibited relatively fewer virulence genes. This study emphasized the pathogenic potential and antimicrobial resistance profiles of Aeromonas species. Continuous monitoring of resistance patterns and contamination levels in food products is crucial for minimizing infection risks and preventing disease outbreaks caused by Aeromonas spp.},
}

RevDate: 2026-07-22
CmpDate: 2026-07-22

Tran TTH, Hoang TH, Tran MH, et al (2026)

VN1K is a pangenome-informed multi-omics and phenomics resource for the Vietnamese population.

Nature communications, 17(1):.

The population of Vietnam remains underrepresented in global genomic databases. Here, we present VN1K, a resource of multi-omics and phenotypic information for 1011 unrelated Vietnamese individuals. We present high-depth short-read whole-genome sequencing data for all samples along with various -omics datasets. Using a high-sensitivity variant detection pipeline, which includes a pangenome graph reference and a deep-learning framework, we identify approximately 42 million variants with 7 million short insertions/deletions and 90 thousand structural variants. VN1K also features a whole-genome methylation profile based on long read sequencing. We create a genotype imputation panel with high accuracy on the Vietnamese population, allowing us to identify variants with significantly different allele frequencies in the Vietnamese population compared to other populations. We establish the functional relevance of some of these variants, particularly those in genes associated with genetic disorders, immune diseases, and drug responses, by integrating the allele frequency differences with known genotype-phenotype associations and clinical annotations. Further, we map various loci related to hepatitis B virus infection, triglyceride levels, LDL-C levels, serum glucose levels, HbA1c levels, and levels of two liver enzymes (ALT and AST). The VN1K dataset is accessible via genome.vinbigdata.org, an integrated platform with both linear and graph-based genome browsers.

Additional Links: PMID-42486871

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42486871,
year = {2026},
author = {Tran, TTH and Hoang, TH and Tran, MH and Pham, TM and Nguyen, NN and Vu, GM and Duong, VC and Vu, QT and Nguyen, NT and Vu, HQ and Nguyen, TM and Nguyen, TK and Nguyen, SV and Dang, T and Nguyen, H and Do, T and Le, C and Nguyen, DT and Nguyen, HTT and Le, NQ and Nguyen, QH and Le, LT and Pham, T and Vu, DM and Le, HTT and Ngo, TD and Nguyen, LT and Hoang, Y and Dao, DX and Phan, GH and Tran, T and Tran, Q and Ha, CT and Nguyen, L and Luu, HN and Dao, M and Le, L and Le, VS and Thuy Duong, N and Nguyen, Q and Le, DH and Vu, V and Vo, NS},
title = {VN1K is a pangenome-informed multi-omics and phenomics resource for the Vietnamese population.},
journal = {Nature communications},
volume = {17},
number = {1},
pages = {},
pmid = {42486871},
issn = {2041-1723},
mesh = {Humans ; Vietnam ; Multiomics ; Gene Frequency ; *Phenomics/methods ; *Genome, Human/genetics ; Whole Genome Sequencing ; Polymorphism, Single Nucleotide ; Phenotype ; Genotype ; Genomics ; Databases, Genetic ; Southeast Asian People ; },
abstract = {The population of Vietnam remains underrepresented in global genomic databases. Here, we present VN1K, a resource of multi-omics and phenotypic information for 1011 unrelated Vietnamese individuals. We present high-depth short-read whole-genome sequencing data for all samples along with various -omics datasets. Using a high-sensitivity variant detection pipeline, which includes a pangenome graph reference and a deep-learning framework, we identify approximately 42 million variants with 7 million short insertions/deletions and 90 thousand structural variants. VN1K also features a whole-genome methylation profile based on long read sequencing. We create a genotype imputation panel with high accuracy on the Vietnamese population, allowing us to identify variants with significantly different allele frequencies in the Vietnamese population compared to other populations. We establish the functional relevance of some of these variants, particularly those in genes associated with genetic disorders, immune diseases, and drug responses, by integrating the allele frequency differences with known genotype-phenotype associations and clinical annotations. Further, we map various loci related to hepatitis B virus infection, triglyceride levels, LDL-C levels, serum glucose levels, HbA1c levels, and levels of two liver enzymes (ALT and AST). The VN1K dataset is accessible via genome.vinbigdata.org, an integrated platform with both linear and graph-based genome browsers.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Vietnam
Multiomics
Gene Frequency
*Phenomics/methods
*Genome, Human/genetics
Whole Genome Sequencing
Polymorphism, Single Nucleotide
Phenotype
Genotype
Genomics
Databases, Genetic
Southeast Asian People

RevDate: 2026-07-22

Bian J, Yang G, Xu D, et al (2026)

A pangenome of tetraploid wheat reveals the genetic architecture underlying domestication and genomic diversity for breeding.

Nature genetics [Epub ahead of print].

Tetraploid wheat (Triticum turgidum L., BBAA), a key pasta crop, serves as an untapped genetic resource with rich genomic diversity for hexaploid bread wheat improvement. Here we de novo assembled 12 genomes spanning all 10 recognized tetraploid wheat (genome BBAA) subspecies, and a graph-based pangenome was constructed. Chromosome rearrangements drove subgenome asymmetry and shaped genomic divergence, with an average of 0.25 million structural variations per accession, predominantly attributable to transposon activity. Using 736 globally distributed tetraploid wheat accessions, we identified locally adapted subgroups with untapped breeding potential and discovered a novel retrotransposon‑induced loss‑of‑function Btr1-A allele responsible for convergent adaptation of non-brittle rachis. Genome-wide association studies identified 287 loci associated with 32 traits. A homeodomain-leucine zipper transcription factor HAT14-B that enhances both spikelet number and grain size was identified. This subspecies-wide pangenome enriches Triticeae AB subgenome resources and facilitates the discovery and application of agronomically important genetic variations.

Additional Links: PMID-42486906

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42486906,
year = {2026},
author = {Bian, J and Yang, G and Xu, D and Zhang, Y and Jia, Y and Zhang, G and Qin, Z and Zhang, S and Wang, Y and Jiang, M and Pan, Y and Chen, B and Liu, F and Lu, Y and Ding, S and Wang, J and Yuan, C and Chen, Y and Liu, S and He, H and Wang, S and Zhu, Q and Chen, S and Sang, Q and Deng, XW and Mao, H and Nie, X and Song, B},
title = {A pangenome of tetraploid wheat reveals the genetic architecture underlying domestication and genomic diversity for breeding.},
journal = {Nature genetics},
volume = {},
number = {},
pages = {},
pmid = {42486906},
issn = {1546-1718},
support = {SYS202206//Natural Science Foundation of Shandong Province (Shandong Provincial Natural Science Foundation)/ ; 32372100//National Natural Science Foundation of China (National Science Foundation of China)/ ; awarded//National Natural Science Foundation of China (National Science Foundation of China)/ ; },
abstract = {Tetraploid wheat (Triticum turgidum L., BBAA), a key pasta crop, serves as an untapped genetic resource with rich genomic diversity for hexaploid bread wheat improvement. Here we de novo assembled 12 genomes spanning all 10 recognized tetraploid wheat (genome BBAA) subspecies, and a graph-based pangenome was constructed. Chromosome rearrangements drove subgenome asymmetry and shaped genomic divergence, with an average of 0.25 million structural variations per accession, predominantly attributable to transposon activity. Using 736 globally distributed tetraploid wheat accessions, we identified locally adapted subgroups with untapped breeding potential and discovered a novel retrotransposon‑induced loss‑of‑function Btr1-A allele responsible for convergent adaptation of non-brittle rachis. Genome-wide association studies identified 287 loci associated with 32 traits. A homeodomain-leucine zipper transcription factor HAT14-B that enhances both spikelet number and grain size was identified. This subspecies-wide pangenome enriches Triticeae AB subgenome resources and facilitates the discovery and application of agronomically important genetic variations.},
}

RevDate: 2026-07-22

Anonymous (2026)

A tetraploid wheat pangenome reveals genomic variation and domestication footprints.

Nature genetics [Epub ahead of print].

Additional Links: PMID-42486907

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42486907,
year = {2026},
author = {},
title = {A tetraploid wheat pangenome reveals genomic variation and domestication footprints.},
journal = {Nature genetics},
volume = {},
number = {},
pages = {},
pmid = {42486907},
issn = {1546-1718},
}

RevDate: 2026-07-22
CmpDate: 2026-07-22

Alfiky A, de la Rosa JMO, M Sadek (2026)

A dual-tier plasmid network model underpins the evolutionary success of pandemic Klebsiella pneumoniae ST11.

Scientific reports, 16(1):.

The convergence of antimicrobial resistance and hypervirulence in high-risk Klebsiella pneumoniae clones represents a major public health threat. However, evolutionary mechanisms enabling specific lineages to achieve pandemic dominance remain unclear. In this study, we integrated pangenomics and network analysis across 1,010 complete genomes from 38 countries. Species-wide dynamics revealed an extremely open pangenome (α = 0.59). In contrast, the dominant ST11 lineage, representing 30% of isolates, exhibited extremely low within-lineage phylogenetic diversity, consistent with a recent clonal expansion concentrated in East Asia. The East Asian ST11 lineage exhibited the lowest pangenome diversity (α = 0.86) associated with fixation of persistence and plasmid-stabilization systems and purging of redundant defense mechanisms. This configuration sustains a dual-tier plasmid network comprising a lineage-anchored IncFII(pHN7A8) replicon for vertical stability alongside high-connectivity hubs such as IncFIB(K) facilitating horizontal gene transfer. Chromosomal integration and tandem amplification of key resistance determinants (blaKPC-2, blaCTX-M-15) further reinforced this architecture. Consequently, 34.2% of isolates exhibited convergence of carbapenem resistance and hypervirulence. Within the East Asian ST11 clade, two dominant sub-lineages emerged: KL47:O13 (25.5%) and KL64:O2α (72%). Despite lower IncFII(pHN7A8) penetrance, KL64 became the dominant sub-lineage, indicating that factors beyond plasmid carriage, possibly including surface antigen properties, contribute to its epidemiological success. These findings indicate that ST11 success arises from synergy between species-wide pangenome openness and lineage-specific genomic optimization, and highlight plasmid network topology as a complementary framework for genomic surveillance of adaptive clonal expansion.

Additional Links: PMID-42486965

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42486965,
year = {2026},
author = {Alfiky, A and de la Rosa, JMO and Sadek, M},
title = {A dual-tier plasmid network model underpins the evolutionary success of pandemic Klebsiella pneumoniae ST11.},
journal = {Scientific reports},
volume = {16},
number = {1},
pages = {},
pmid = {42486965},
issn = {2045-2322},
mesh = {*Klebsiella pneumoniae/genetics/pathogenicity/drug effects/classification ; *Plasmids/genetics ; *Klebsiella Infections/epidemiology/microbiology ; Phylogeny ; *Evolution, Molecular ; Pandemics ; Genome, Bacterial ; Humans ; Virulence/genetics ; Gene Transfer, Horizontal ; },
abstract = {The convergence of antimicrobial resistance and hypervirulence in high-risk Klebsiella pneumoniae clones represents a major public health threat. However, evolutionary mechanisms enabling specific lineages to achieve pandemic dominance remain unclear. In this study, we integrated pangenomics and network analysis across 1,010 complete genomes from 38 countries. Species-wide dynamics revealed an extremely open pangenome (α = 0.59). In contrast, the dominant ST11 lineage, representing 30% of isolates, exhibited extremely low within-lineage phylogenetic diversity, consistent with a recent clonal expansion concentrated in East Asia. The East Asian ST11 lineage exhibited the lowest pangenome diversity (α = 0.86) associated with fixation of persistence and plasmid-stabilization systems and purging of redundant defense mechanisms. This configuration sustains a dual-tier plasmid network comprising a lineage-anchored IncFII(pHN7A8) replicon for vertical stability alongside high-connectivity hubs such as IncFIB(K) facilitating horizontal gene transfer. Chromosomal integration and tandem amplification of key resistance determinants (blaKPC-2, blaCTX-M-15) further reinforced this architecture. Consequently, 34.2% of isolates exhibited convergence of carbapenem resistance and hypervirulence. Within the East Asian ST11 clade, two dominant sub-lineages emerged: KL47:O13 (25.5%) and KL64:O2α (72%). Despite lower IncFII(pHN7A8) penetrance, KL64 became the dominant sub-lineage, indicating that factors beyond plasmid carriage, possibly including surface antigen properties, contribute to its epidemiological success. These findings indicate that ST11 success arises from synergy between species-wide pangenome openness and lineage-specific genomic optimization, and highlight plasmid network topology as a complementary framework for genomic surveillance of adaptive clonal expansion.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Klebsiella pneumoniae/genetics/pathogenicity/drug effects/classification
*Plasmids/genetics
*Klebsiella Infections/epidemiology/microbiology
Phylogeny
*Evolution, Molecular
Pandemics
Genome, Bacterial
Humans
Virulence/genetics
Gene Transfer, Horizontal

RevDate: 2026-07-23
CmpDate: 2026-07-23

Elsakhawy OK, Abouelkhair MA, SA Kania (2026)

Whole genome sequencing and molecular characterization of two Bacillus licheniformis strains isolated from hot springs in yellowstone ational park.

Frontiers in bioinformatics, 6:1867986.

Bacillus licheniformis is a Gram-positive, endospore-forming bacterium with broad biotechnological applications. Thermophilic environments such as hot springs may harbor strains with unique biosynthetic capabilities relevant to drug discovery. In this study, we isolated two B. licheniformis strains (S3 and S4) from the Five Sisters hot spring in Yellowstone National Park (68 °C and 65 °C, pH 8) and performed whole-genome sequencing using both the Oxford Nanopore long read and Illumina platforms. Hybrid de novo assembly using Unicycler yielded genome sizes of 4.80 Mbp (S3, 14 contigs) and 4.79 Mbp (S4, 22 contigs); GC contents were 45.12% and 45.10%, and N50 values were 4,546,802 bp and 2,415,736 bp, for S3 and S4, respectively. Both strains were assigned to Multi-Locus Sequence Typing sequence type ST-42. Pangenome comparison with 61 complete B. licheniformis genomes revealed an open pangenome of 10,374 genes, with 3,272 core genes, 430 soft core, 1,250 shell, and 5,422 cloud genes. AMRFinderPlus identified the blaP, encoding a class A beta-lactamase and its regulatory elements (blaI and blaR1); erm(D), encoding a 23S rRNA methyltransferase conferring macrolide-lincosamide-streptogramin B resistance; and catA, encoding a chloramphenicol O-acetyltransferase that inactivates chloramphenicol through acetylation in both strains. A chromosomal arsBC locus was identified in both B. licheniformis S3 and S4, consistent with the arsenic-rich geothermal environment of Five Sisters hot spring. These findings highlight the biosynthetic potential of B. licheniformis strains isolated from extreme environments and provide a genomic foundation for future exploration of novel bioactive compounds with potential applications in drug discovery, agriculture, and biotechnology.

Additional Links: PMID-42487679

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42487679,
year = {2026},
author = {Elsakhawy, OK and Abouelkhair, MA and Kania, SA},
title = {Whole genome sequencing and molecular characterization of two Bacillus licheniformis strains isolated from hot springs in yellowstone ational park.},
journal = {Frontiers in bioinformatics},
volume = {6},
number = {},
pages = {1867986},
pmid = {42487679},
issn = {2673-7647},
abstract = {Bacillus licheniformis is a Gram-positive, endospore-forming bacterium with broad biotechnological applications. Thermophilic environments such as hot springs may harbor strains with unique biosynthetic capabilities relevant to drug discovery. In this study, we isolated two B. licheniformis strains (S3 and S4) from the Five Sisters hot spring in Yellowstone National Park (68 °C and 65 °C, pH 8) and performed whole-genome sequencing using both the Oxford Nanopore long read and Illumina platforms. Hybrid de novo assembly using Unicycler yielded genome sizes of 4.80 Mbp (S3, 14 contigs) and 4.79 Mbp (S4, 22 contigs); GC contents were 45.12% and 45.10%, and N50 values were 4,546,802 bp and 2,415,736 bp, for S3 and S4, respectively. Both strains were assigned to Multi-Locus Sequence Typing sequence type ST-42. Pangenome comparison with 61 complete B. licheniformis genomes revealed an open pangenome of 10,374 genes, with 3,272 core genes, 430 soft core, 1,250 shell, and 5,422 cloud genes. AMRFinderPlus identified the blaP, encoding a class A beta-lactamase and its regulatory elements (blaI and blaR1); erm(D), encoding a 23S rRNA methyltransferase conferring macrolide-lincosamide-streptogramin B resistance; and catA, encoding a chloramphenicol O-acetyltransferase that inactivates chloramphenicol through acetylation in both strains. A chromosomal arsBC locus was identified in both B. licheniformis S3 and S4, consistent with the arsenic-rich geothermal environment of Five Sisters hot spring. These findings highlight the biosynthetic potential of B. licheniformis strains isolated from extreme environments and provide a genomic foundation for future exploration of novel bioactive compounds with potential applications in drug discovery, agriculture, and biotechnology.},
}

RevDate: 2026-07-23
CmpDate: 2026-07-23

Lagad RR, Rafi S, A Goswami (2026)

Genomic-island cassette architecture provides interpretable signal for exploratory classification of poultry-associated Enterococcus cecorum lineages.

Frontiers in microbiology, 17:1882753.

BACKGROUND: Enterococcus cecorum is an emerging poultry pathogen whose antimicrobial resistance and host-associated traits are often carried on genomic islands. Standard comparative genomics workflows usually reduce genomes to unordered gene inventories and may miss informative neighborhood structure within island-associated modules.

METHODS: We tested whether GI (genomic island)-anchored cassette organization provides signal for distinguishing pathogenic from commensal poultry-associated E. cecorum lineages. We encoded genomic-island-anchored cassette organization as 84 genome-level summary features and evaluated this representation in 145 genomes (95 commensal, 50 pathogenic) using locked 5-fold genome-grouped cross-validation.

RESULTS: The cassette-summary Random Forest model achieved an area under the receiver operating characteristic curve (AUROC) of 0.918 ± 0.067, outperforming GI burden (AUROC 0.791 ± 0.050) and assembly-quality (AUROC 0.743 ± 0.015) baselines and performing similarly to a corrected AMR gene-content baseline (AUROC 0.906 ± 0.044). A conservative GI-restricted gene product presence/absence proxy achieved AUROC 0.887 ± 0.083, while a full joint-run pangenome GPA baseline remains a necessary future benchmark. Fragmentation-controlled analyses confirmed cassette signal remained informative after quality filtering (AUROC 0.827 in assemblies with ≤50 contigs; n = 91), while leave-one-BioProject-out validation yielded AUROC 0.694, indicating that deployment in novel surveillance contexts requires prospective validation. SHapley Additive exPlanations (SHAP) analysis localized discriminant signal to GI-anchored modules enriched for AMR cargo, mobility load, and GI AMR density.

CONCLUSION: These results suggest that cassette architecture captures signal consistent with biologically meaningful genomic organization beyond bulk island burden and supports its use as an interpretable exploratory representation for surveillance-oriented analysis of poultry-associated E. cecorum, while prospective validation in independent surveillance collections and a full joint-run pangenome gene presence/absence benchmark remain necessary before operational deployment claims can be made.

Additional Links: PMID-42487707

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42487707,
year = {2026},
author = {Lagad, RR and Rafi, S and Goswami, A},
title = {Genomic-island cassette architecture provides interpretable signal for exploratory classification of poultry-associated Enterococcus cecorum lineages.},
journal = {Frontiers in microbiology},
volume = {17},
number = {},
pages = {1882753},
pmid = {42487707},
issn = {1664-302X},
abstract = {BACKGROUND: Enterococcus cecorum is an emerging poultry pathogen whose antimicrobial resistance and host-associated traits are often carried on genomic islands. Standard comparative genomics workflows usually reduce genomes to unordered gene inventories and may miss informative neighborhood structure within island-associated modules.

METHODS: We tested whether GI (genomic island)-anchored cassette organization provides signal for distinguishing pathogenic from commensal poultry-associated E. cecorum lineages. We encoded genomic-island-anchored cassette organization as 84 genome-level summary features and evaluated this representation in 145 genomes (95 commensal, 50 pathogenic) using locked 5-fold genome-grouped cross-validation.

RESULTS: The cassette-summary Random Forest model achieved an area under the receiver operating characteristic curve (AUROC) of 0.918 ± 0.067, outperforming GI burden (AUROC 0.791 ± 0.050) and assembly-quality (AUROC 0.743 ± 0.015) baselines and performing similarly to a corrected AMR gene-content baseline (AUROC 0.906 ± 0.044). A conservative GI-restricted gene product presence/absence proxy achieved AUROC 0.887 ± 0.083, while a full joint-run pangenome GPA baseline remains a necessary future benchmark. Fragmentation-controlled analyses confirmed cassette signal remained informative after quality filtering (AUROC 0.827 in assemblies with ≤50 contigs; n = 91), while leave-one-BioProject-out validation yielded AUROC 0.694, indicating that deployment in novel surveillance contexts requires prospective validation. SHapley Additive exPlanations (SHAP) analysis localized discriminant signal to GI-anchored modules enriched for AMR cargo, mobility load, and GI AMR density.

CONCLUSION: These results suggest that cassette architecture captures signal consistent with biologically meaningful genomic organization beyond bulk island burden and supports its use as an interpretable exploratory representation for surveillance-oriented analysis of poultry-associated E. cecorum, while prospective validation in independent surveillance collections and a full joint-run pangenome gene presence/absence benchmark remain necessary before operational deployment claims can be made.},
}

RevDate: 2026-07-23
CmpDate: 2026-07-23

Timkina E, Palyzová A, Marešová H, et al (2026)

Genomic signatures of radiation stress adaptations in Kocuria rhizophila: insights from strain 301 of the Jáchymov radon springs.

Frontiers in microbiology, 17:1814458.

Several strains of Kocuria rhizophila have been reported to tolerate ionizing radiation and other environmental stresses; however, this phenotype is unevenly distributed across the species. Here, we characterize K. rhizophila strain 301, isolated from chronically radioactive radon springs in Jáchymov (Czech Republic). Strain 301 displayed exceptional stress resilience, retaining approximately 10% viability after exposure to 1.0 kGy of γ-irradiation, and maintaining ~20% survival following 30 days of desiccation. These responses sharply contrasted with the pronounced sensitivity of the type strain K. rhizophila TA68. Complete genome sequencing produced a single circular chromosome of 2.77 Mbp. Comparative genomic analyses revealed extensive duplication of genes involved in DNA repair, antioxidant defense, and metal ion homeostasis, including multiple paralogs of uvrA, uvrD, sodA, and Mn/Fe transport systems. In contrast to the paradigm established by Deinococcus radiodurans, strain 301 lacks radiation-specific or lineage-exclusive genes, instead suggesting that resilience may be associated with quantitative reinforcement of conserved cellular pathways. Pan-genome analysis further demonstrated a closed K. rhizophila pangenome, with strain 301 forming a distinct phylogenetic lineage. Together, these findings position K. rhizophila 301 as a model system of adaptation to chronic radiation exposure and illustrate how sustained environmental pressure may promote modifications and adaptations based on core functions rather than novel innovative genetic traits.

Additional Links: PMID-42487716

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42487716,
year = {2026},
author = {Timkina, E and Palyzová, A and Marešová, H and Zavala-Meneses, SG and Mat́átková, O and Jarošová Kolouchová, I},
title = {Genomic signatures of radiation stress adaptations in Kocuria rhizophila: insights from strain 301 of the Jáchymov radon springs.},
journal = {Frontiers in microbiology},
volume = {17},
number = {},
pages = {1814458},
pmid = {42487716},
issn = {1664-302X},
abstract = {Several strains of Kocuria rhizophila have been reported to tolerate ionizing radiation and other environmental stresses; however, this phenotype is unevenly distributed across the species. Here, we characterize K. rhizophila strain 301, isolated from chronically radioactive radon springs in Jáchymov (Czech Republic). Strain 301 displayed exceptional stress resilience, retaining approximately 10% viability after exposure to 1.0 kGy of γ-irradiation, and maintaining ~20% survival following 30 days of desiccation. These responses sharply contrasted with the pronounced sensitivity of the type strain K. rhizophila TA68. Complete genome sequencing produced a single circular chromosome of 2.77 Mbp. Comparative genomic analyses revealed extensive duplication of genes involved in DNA repair, antioxidant defense, and metal ion homeostasis, including multiple paralogs of uvrA, uvrD, sodA, and Mn/Fe transport systems. In contrast to the paradigm established by Deinococcus radiodurans, strain 301 lacks radiation-specific or lineage-exclusive genes, instead suggesting that resilience may be associated with quantitative reinforcement of conserved cellular pathways. Pan-genome analysis further demonstrated a closed K. rhizophila pangenome, with strain 301 forming a distinct phylogenetic lineage. Together, these findings position K. rhizophila 301 as a model system of adaptation to chronic radiation exposure and illustrate how sustained environmental pressure may promote modifications and adaptations based on core functions rather than novel innovative genetic traits.},
}

RevDate: 2026-07-20
CmpDate: 2026-07-20

Scherer J, Corá RK, Bonatto D, et al (2026)

Pangenome Dynamics and Functional Diversification in the Marine Genus Pseudoalteromonas: Association to Colony Pigmentation.

Marine biotechnology (New York, N.Y.), 28(4):.

Pseudoalteromonas species are ecologically versatile marine bacteria widely recognized for their capacity to synthesize diverse bioactive metabolites and psychrophilic enzymes with biotechnological relevance. Here, we present a comprehensive comparative genomic analysis of 53 reference genomes to elucidate the dynamics, functional diversity, and biosynthetic potential of this genus. Reference genomes representing each deposited species were selected in order to avoid bias associated with unequal numbers of genomes per species. Pangenome reconstruction revealed an open structure comprising a small core genome (1,350 gene families) and a large proportion of accessory and strain-specific genes, reflecting extensive genomic plasticity. Functional annotation indicated that accessory regions are enriched for genes involved in secondary metabolism, stress adaptation, and environmental resilience. Notably, biosynthetic gene cluster (BGC) mining uncovered a rich repertoire of potentially novel RiPPs and other secondary metabolite clusters, underscoring Pseudoalteromonas as a promising source of unexplored bioactive compounds. Statistical analyses revealed that pigmented strains harbor significantly higher numbers of BGCs compared to non-pigmented strains, while only a weak and non-significant correlation was observed between BGC abundance and carbohydrate-active enzyme (CAZyme) content. No significant effect of the isolation source was detected on either BGC or CAZyme distributions. Together, these findings provide new insights into the genomic basis of ecological adaptation and metabolic diversification in Pseudoalteromonas, supporting the role of pigmentation as a proxy for enhanced biosynthetic potential, while carbohydrate utilization capabilities evolve more independently and offering a framework for targeted bioprospecting of marine-derived metabolites with industrial and environmental applications.

Additional Links: PMID-42474582

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42474582,
year = {2026},
author = {Scherer, J and Corá, RK and Bonatto, D and Macedo, AJ},
title = {Pangenome Dynamics and Functional Diversification in the Marine Genus Pseudoalteromonas: Association to Colony Pigmentation.},
journal = {Marine biotechnology (New York, N.Y.)},
volume = {28},
number = {4},
pages = {},
pmid = {42474582},
issn = {1436-2236},
mesh = {*Pseudoalteromonas/genetics/metabolism ; *Genome, Bacterial ; Multigene Family ; Phylogeny ; *Pigmentation/genetics ; Secondary Metabolism/genetics ; },
abstract = {Pseudoalteromonas species are ecologically versatile marine bacteria widely recognized for their capacity to synthesize diverse bioactive metabolites and psychrophilic enzymes with biotechnological relevance. Here, we present a comprehensive comparative genomic analysis of 53 reference genomes to elucidate the dynamics, functional diversity, and biosynthetic potential of this genus. Reference genomes representing each deposited species were selected in order to avoid bias associated with unequal numbers of genomes per species. Pangenome reconstruction revealed an open structure comprising a small core genome (1,350 gene families) and a large proportion of accessory and strain-specific genes, reflecting extensive genomic plasticity. Functional annotation indicated that accessory regions are enriched for genes involved in secondary metabolism, stress adaptation, and environmental resilience. Notably, biosynthetic gene cluster (BGC) mining uncovered a rich repertoire of potentially novel RiPPs and other secondary metabolite clusters, underscoring Pseudoalteromonas as a promising source of unexplored bioactive compounds. Statistical analyses revealed that pigmented strains harbor significantly higher numbers of BGCs compared to non-pigmented strains, while only a weak and non-significant correlation was observed between BGC abundance and carbohydrate-active enzyme (CAZyme) content. No significant effect of the isolation source was detected on either BGC or CAZyme distributions. Together, these findings provide new insights into the genomic basis of ecological adaptation and metabolic diversification in Pseudoalteromonas, supporting the role of pigmentation as a proxy for enhanced biosynthetic potential, while carbohydrate utilization capabilities evolve more independently and offering a framework for targeted bioprospecting of marine-derived metabolites with industrial and environmental applications.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Pseudoalteromonas/genetics/metabolism
*Genome, Bacterial
Multigene Family
Phylogeny
*Pigmentation/genetics
Secondary Metabolism/genetics

RevDate: 2026-07-20

Tan B, Zafra C, C Ng (2026)

Comparative genomics of the Nap2-2B clade reveals substrate partitioning and niche diversification among uncultured hydrocarbon-degrading Desulfotomaculales.

Scientific reports pii:10.1038/s41598-026-63016-x [Epub ahead of print].

Uncultured Nap2-2B bacteria (order Desulfotomaculales; formerly family Peptococcaceae) are frequently detected in methanogenic hydrocarbon-degrading environments, yet their metabolic diversity remains poorly understood. Here, we analysed 17 GTDB r232 metagenome-assembled genomes (MAGs) from four genera within this clade. A bac120 phylogeny places Nap2-2B as a monophyletic family-level lineage within Desulfotomaculales. Glycyl radical enzyme phylogeny and operon context reveal strict substrate partitioning: SCADC1-2-3 encodes alkylsuccinate synthase for aliphatic hydrocarbon activation, 46-80 and UBA4053 encode benzylsuccinate synthase for aromatic activation, and JAIMBK01 lacks hydrocarbon activation genes but retains complete dissimilatory sulfate reduction pathway genes. Pangenome-level pathway reconstruction identifies complementary cofactor biosynthetic potential, notably in cobalamin and pantothenate biosynthesis, consistent with possible cofactor complementation. Genome-scale metabolic modeling suggests that the alkane-degrading SCADC1-2-3 lineage can support syntrophic hexane degradation, whereas the aromatic lineage cannot grow on the alkane FBA test because it lacks AssA and PFOR. A parallel aromatic-substrate FBA for 46-80 MAGs did not yield growth under minimal curation, reflecting the greater complexity of the downstream benzoyl-CoA pathway. Together, these data support a syntrophic guild structured by substrate partitioning, putative cofactor complementation, and distinct electron-disposal strategies that may shape methanogenic hydrocarbon attenuation in anoxic tailings environments.

Additional Links: PMID-42477049

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42477049,
year = {2026},
author = {Tan, B and Zafra, C and Ng, C},
title = {Comparative genomics of the Nap2-2B clade reveals substrate partitioning and niche diversification among uncultured hydrocarbon-degrading Desulfotomaculales.},
journal = {Scientific reports},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41598-026-63016-x},
pmid = {42477049},
issn = {2045-2322},
abstract = {Uncultured Nap2-2B bacteria (order Desulfotomaculales; formerly family Peptococcaceae) are frequently detected in methanogenic hydrocarbon-degrading environments, yet their metabolic diversity remains poorly understood. Here, we analysed 17 GTDB r232 metagenome-assembled genomes (MAGs) from four genera within this clade. A bac120 phylogeny places Nap2-2B as a monophyletic family-level lineage within Desulfotomaculales. Glycyl radical enzyme phylogeny and operon context reveal strict substrate partitioning: SCADC1-2-3 encodes alkylsuccinate synthase for aliphatic hydrocarbon activation, 46-80 and UBA4053 encode benzylsuccinate synthase for aromatic activation, and JAIMBK01 lacks hydrocarbon activation genes but retains complete dissimilatory sulfate reduction pathway genes. Pangenome-level pathway reconstruction identifies complementary cofactor biosynthetic potential, notably in cobalamin and pantothenate biosynthesis, consistent with possible cofactor complementation. Genome-scale metabolic modeling suggests that the alkane-degrading SCADC1-2-3 lineage can support syntrophic hexane degradation, whereas the aromatic lineage cannot grow on the alkane FBA test because it lacks AssA and PFOR. A parallel aromatic-substrate FBA for 46-80 MAGs did not yield growth under minimal curation, reflecting the greater complexity of the downstream benzoyl-CoA pathway. Together, these data support a syntrophic guild structured by substrate partitioning, putative cofactor complementation, and distinct electron-disposal strategies that may shape methanogenic hydrocarbon attenuation in anoxic tailings environments.},
}

RevDate: 2026-07-20

Rathna V, Kukreti A, Prasannakumar MK, et al (2026)

Correction: Pan-genome and antibiotic resistance insights into Xanthomonas citri pv. punicae pathotypes.

BMC microbiology, 26(1):.

Additional Links: PMID-42477548

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42477548,
year = {2026},
author = {Rathna, V and Kukreti, A and Prasannakumar, MK and C, M and R, K and Patil, SS and Venkateshbabu, G and Devanna, P and J, H and Banakar, SN and Narayan, A and Vaidya, K and Rymbai, S and Mahesh, HB and Soolanayakanahally, RY and Kagale, S},
title = {Correction: Pan-genome and antibiotic resistance insights into Xanthomonas citri pv. punicae pathotypes.},
journal = {BMC microbiology},
volume = {26},
number = {1},
pages = {},
pmid = {42477548},
issn = {1471-2180},
}

RevDate: 2026-07-21

Guan J, Li X, Miao H, et al (2026)

Pangenome-resolved structural variation drives adaptation and trait evolution in cucumber.

Nature genetics [Epub ahead of print].

Cucumber (Cucumis sativus L.) is a global vegetable crop and powerful model for sex determination, fruit development and vascular biology. We present high-quality genome assemblies for 125 cultivated and wild accessions, capturing worldwide genetic diversity. Syntenic gene family analysis characterized 37,897 gene families and revealed haplotype diversity shaped by geographic expansion. Comparative analyses uncovered copy-number variations linked to local adaptation, including a CsFT tandem duplication promoting early flowering at higher latitudes. This resource reduces reference bias, enabling the annotation of resistance loci and the discovery of CsCcu, a nucleotide-binding leucine-rich repeat-type R gene conferring scab resistance. We cataloged 135,597 structural variations and quantified their regulatory effects, with ~30% driving trait diversification among geographic groups. Integrating structural variations into genome-wide association studies identified 172 quantitative trait loci for 38 agronomic traits, including a rare long terminal repeat insertion regulating fruit length via CsSPL1. Our findings provide a genomic toolkit for cucumber evolution research and precision breeding.

Additional Links: PMID-42481758

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42481758,
year = {2026},
author = {Guan, J and Li, X and Miao, H and Yan, X and Liu, X and Gu, X and Visser, RGF and Bai, Y and Huang, S and Zhang, Z and Dong, S and Zhang, S},
title = {Pangenome-resolved structural variation drives adaptation and trait evolution in cucumber.},
journal = {Nature genetics},
volume = {},
number = {},
pages = {},
pmid = {42481758},
issn = {1546-1718},
abstract = {Cucumber (Cucumis sativus L.) is a global vegetable crop and powerful model for sex determination, fruit development and vascular biology. We present high-quality genome assemblies for 125 cultivated and wild accessions, capturing worldwide genetic diversity. Syntenic gene family analysis characterized 37,897 gene families and revealed haplotype diversity shaped by geographic expansion. Comparative analyses uncovered copy-number variations linked to local adaptation, including a CsFT tandem duplication promoting early flowering at higher latitudes. This resource reduces reference bias, enabling the annotation of resistance loci and the discovery of CsCcu, a nucleotide-binding leucine-rich repeat-type R gene conferring scab resistance. We cataloged 135,597 structural variations and quantified their regulatory effects, with ~30% driving trait diversification among geographic groups. Integrating structural variations into genome-wide association studies identified 172 quantitative trait loci for 38 agronomic traits, including a rare long terminal repeat insertion regulating fruit length via CsSPL1. Our findings provide a genomic toolkit for cucumber evolution research and precision breeding.},
}

RevDate: 2026-07-18

Muscò A, Longhi G, Selleri E, et al (2026)

Mucin O-glycan degradation by GH101 underpins mucosal persistence of Bifidobacterium bifidum PRL2010.

Applied microbiology and biotechnology pii:10.1007/s00253-026-13964-1 [Epub ahead of print].

Host-derived mucin O-glycans constitute a key chemical component of the human intestinal niche and are continuously encountered by gut-adapted bifidobacterial cells, thereby playing a pivotal role in host-microbe interactions. Among these microbes, Bifidobacterium bifidum is one of the most persistent members of the human gut microbiota. Comparative genomic analyses of mucosa-associated strains of this species have revealed a gene conserved across its pangenome, predicted to encode a GH101 glycoside hydrolase involved in mucin degradation. In this study, we performed a molecular characterization of the GH101 enzyme encoded by B. bifidum PRL2010. Insertional mutagenesis of the GH101 gene from the PRL2010 genome resulted in a marked reduction in bacterial adhesion to mucin-secreting epithelial cells, along with a significant growth difference when N-acetyl-galactosamine was provided as the sole carbon source. These findings indicate that GH101 mediates the initial cleavage of O-GalNAc core 1 structures, representing a crucial step for the adhesion to human mucosa. However, this enzyme represents only one component of the broader set of glycosidases required for the complete degradation of host mucins. Overall, our results establish a mechanistic link between metabolic specialization and ecological fitness, providing insights into how mucin-adapted commensals may contribute to interactions with the human gut mucosa. KEY POINTS: • The GH101 glycoside hydrolase of Bifidobacterium bifidum PRL2010 plays a pivotal role in host mucin utilization by initiating the cleavage of O-GalNAc core 1 structures. • Insertional mutant of the GH101 gene impairs both mucosal adhesion and growth on N-acetyl-galactosamine. • These findings reveal a mechanistic connection between mucin glycan metabolism and ecological fitness by mucin-adapted bifidobacterial commensals.

Additional Links: PMID-42470485

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42470485,
year = {2026},
author = {Muscò, A and Longhi, G and Selleri, E and Gennaioli, E and Lugli, GA and Tarracchini, C and Bianchi, MG and Bussolati, O and Ventura, M and Turroni, F},
title = {Mucin O-glycan degradation by GH101 underpins mucosal persistence of Bifidobacterium bifidum PRL2010.},
journal = {Applied microbiology and biotechnology},
volume = {},
number = {},
pages = {},
doi = {10.1007/s00253-026-13964-1},
pmid = {42470485},
issn = {1432-0614},
abstract = {Host-derived mucin O-glycans constitute a key chemical component of the human intestinal niche and are continuously encountered by gut-adapted bifidobacterial cells, thereby playing a pivotal role in host-microbe interactions. Among these microbes, Bifidobacterium bifidum is one of the most persistent members of the human gut microbiota. Comparative genomic analyses of mucosa-associated strains of this species have revealed a gene conserved across its pangenome, predicted to encode a GH101 glycoside hydrolase involved in mucin degradation. In this study, we performed a molecular characterization of the GH101 enzyme encoded by B. bifidum PRL2010. Insertional mutagenesis of the GH101 gene from the PRL2010 genome resulted in a marked reduction in bacterial adhesion to mucin-secreting epithelial cells, along with a significant growth difference when N-acetyl-galactosamine was provided as the sole carbon source. These findings indicate that GH101 mediates the initial cleavage of O-GalNAc core 1 structures, representing a crucial step for the adhesion to human mucosa. However, this enzyme represents only one component of the broader set of glycosidases required for the complete degradation of host mucins. Overall, our results establish a mechanistic link between metabolic specialization and ecological fitness, providing insights into how mucin-adapted commensals may contribute to interactions with the human gut mucosa. KEY POINTS: • The GH101 glycoside hydrolase of Bifidobacterium bifidum PRL2010 plays a pivotal role in host mucin utilization by initiating the cleavage of O-GalNAc core 1 structures. • Insertional mutant of the GH101 gene impairs both mucosal adhesion and growth on N-acetyl-galactosamine. • These findings reveal a mechanistic connection between mucin glycan metabolism and ecological fitness by mucin-adapted bifidobacterial commensals.},
}

RevDate: 2026-07-18
CmpDate: 2026-07-18

Sisay T, Berhan A, Mihrete K, et al (2026)

Genomic regulation of the diphtheria toxin gene and Its implications for molecular diagnostics and surveillance in low-resource settings.

Molecular biology reports, 53(1):.

Corynebacterium diphtheriae remains a significant, though often underestimated, public health concern, particularly in low- and middle-income countries. The pathogenicity of the disease is primarily determined by diphtheria toxin (DT), which is produced by the tox gene, a bacteriophage-associated element, and is tightly regulated by the iron-dependent transcriptional repressor DtxR, encoded by the dtxR gene. Despite extensive investigation into the molecular biology of DT, its regulation within the broader genomic organization, as well as its implications for diagnostic methods and surveillance strategies, have not yet been fully elucidated. This review consolidates existing evidence regarding the genomic context and molecular regulation of the tox gene, encompassing chromosomal organization, variability in GC content, genomic islands, and mechanisms of horizontal gene transfer. Significant attention is focused on lysogenic conversion mediated by corynephages and regulatory pathways responsive to iron. We also evaluate both established and novel molecular diagnostic approaches, including PCR, real-time PCR, sequencing technologies, and isothermal amplification methods like loop-mediated isothermal amplification (LAMP). Recent genomic discoveries, including pan-genome variation, CRISPR-Cas mechanisms, and the emergence of non-toxigenic tox-bearing strains are analyzed in relation to diagnostic precision and epidemiological surveillance. Understanding the genomic regulation and evolutionary dynamics of toxin production is essential for improving diagnostic accuracy and strengthening surveillance systems, particularly in resource-limited settings where diphtheria is often underdiagnosed and underreported.

Additional Links: PMID-42470537

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42470537,
year = {2026},
author = {Sisay, T and Berhan, A and Mihrete, K and Hunie, E and Bizuye, A},
title = {Genomic regulation of the diphtheria toxin gene and Its implications for molecular diagnostics and surveillance in low-resource settings.},
journal = {Molecular biology reports},
volume = {53},
number = {1},
pages = {},
pmid = {42470537},
issn = {1573-4978},
mesh = {*Diphtheria Toxin/genetics/metabolism ; *Corynebacterium diphtheriae/genetics/pathogenicity ; Humans ; *Diphtheria/diagnosis/genetics/microbiology ; Genomic Islands ; Gene Expression Regulation, Bacterial ; Bacterial Proteins/genetics/metabolism ; Genome, Bacterial ; DNA-Binding Proteins ; },
abstract = {Corynebacterium diphtheriae remains a significant, though often underestimated, public health concern, particularly in low- and middle-income countries. The pathogenicity of the disease is primarily determined by diphtheria toxin (DT), which is produced by the tox gene, a bacteriophage-associated element, and is tightly regulated by the iron-dependent transcriptional repressor DtxR, encoded by the dtxR gene. Despite extensive investigation into the molecular biology of DT, its regulation within the broader genomic organization, as well as its implications for diagnostic methods and surveillance strategies, have not yet been fully elucidated. This review consolidates existing evidence regarding the genomic context and molecular regulation of the tox gene, encompassing chromosomal organization, variability in GC content, genomic islands, and mechanisms of horizontal gene transfer. Significant attention is focused on lysogenic conversion mediated by corynephages and regulatory pathways responsive to iron. We also evaluate both established and novel molecular diagnostic approaches, including PCR, real-time PCR, sequencing technologies, and isothermal amplification methods like loop-mediated isothermal amplification (LAMP). Recent genomic discoveries, including pan-genome variation, CRISPR-Cas mechanisms, and the emergence of non-toxigenic tox-bearing strains are analyzed in relation to diagnostic precision and epidemiological surveillance. Understanding the genomic regulation and evolutionary dynamics of toxin production is essential for improving diagnostic accuracy and strengthening surveillance systems, particularly in resource-limited settings where diphtheria is often underdiagnosed and underreported.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Diphtheria Toxin/genetics/metabolism
*Corynebacterium diphtheriae/genetics/pathogenicity
Humans
*Diphtheria/diagnosis/genetics/microbiology
Genomic Islands
Gene Expression Regulation, Bacterial
Bacterial Proteins/genetics/metabolism
Genome, Bacterial
DNA-Binding Proteins

RevDate: 2026-07-18

Singh J, Gudi S, Maughan PJ, et al (2026)

A reference-grade chromosome-level genome assembly of a historically important U.S. bread wheat cultivar Timstein.

Scientific data pii:10.1038/s41597-026-07930-9 [Epub ahead of print].

We report a near telomere-to-telomere high quality genome assembly of the historically important spring wheat cultivar, Timstein, generated using PacBio HiFi long-read sequencing data followed by Hi-C scaffolding. The assembly spanned 14.76 Gb, accounting for all 21 chromosomes of the A, B, and D subgenomes. Gene annotations identified around 105 K high-confidence (HC) gene models. The genome was comprised of ~85% transposable elements, primarily from the Gypsy, Copia, and CACTA families. For each subgenome, the BUSCO completeness score ranging from 97.4 to 99.6% and LTR Assembly Index (LAI) values surpassing 13 indicated the assembly quality was reference grade. Synteny analysis with IWGSC Chinese Spring (CS) RefSeq v2.1 revealed strong chromosomal collinearity between two genomes. Timstein has been extensively studied in classical genetics research for stem and leaf rust resistance and septoria nodorum blotch (SNB) susceptibility. This high-quality genome assembly provides cultivar-resolved references that can expand the wheat pangenome, supports structural and functional genomics studies, and enables fine mapping of disease resistance/susceptibility loci for their utilization in wheat genetics and breeding programs.

Additional Links: PMID-42471315

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42471315,
year = {2026},
author = {Singh, J and Gudi, S and Maughan, PJ and Seneviratne, S and Running, KLD and Singh, G and Gill, U and Faris, JD and Gupta, R},
title = {A reference-grade chromosome-level genome assembly of a historically important U.S. bread wheat cultivar Timstein.},
journal = {Scientific data},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41597-026-07930-9},
pmid = {42471315},
issn = {2052-4463},
support = {3060-21000-046-000D//Agricultural Research Service/ ; ND02243//National Institute of Food and Agriculture/ ; },
abstract = {We report a near telomere-to-telomere high quality genome assembly of the historically important spring wheat cultivar, Timstein, generated using PacBio HiFi long-read sequencing data followed by Hi-C scaffolding. The assembly spanned 14.76 Gb, accounting for all 21 chromosomes of the A, B, and D subgenomes. Gene annotations identified around 105 K high-confidence (HC) gene models. The genome was comprised of ~85% transposable elements, primarily from the Gypsy, Copia, and CACTA families. For each subgenome, the BUSCO completeness score ranging from 97.4 to 99.6% and LTR Assembly Index (LAI) values surpassing 13 indicated the assembly quality was reference grade. Synteny analysis with IWGSC Chinese Spring (CS) RefSeq v2.1 revealed strong chromosomal collinearity between two genomes. Timstein has been extensively studied in classical genetics research for stem and leaf rust resistance and septoria nodorum blotch (SNB) susceptibility. This high-quality genome assembly provides cultivar-resolved references that can expand the wheat pangenome, supports structural and functional genomics studies, and enables fine mapping of disease resistance/susceptibility loci for their utilization in wheat genetics and breeding programs.},
}

RevDate: 2026-07-20
CmpDate: 2026-07-20

Young CE, O'Sullivan H, Alattas H, et al (2026)

Refining Salinivibrio pangenome dynamics and biotechnological potential through comparative analysis.

Microbial genomics, 12(7):.

Current understanding of genomic diversity within the halophilic genus Salinivibrio relies predominantly on draft genomes, with only seven complete genomes among the 62 publicly available. Previous pangenome analysis suggested a closed genomic structure while concluding that Salinivibrio lacks polyhydroxyalkanoate (PHA) degradation capacity despite possessing biosynthesis genes. Here, we present eight complete Salinivibrio genomes from Pearse Lakes (Rottnest Island, Western Australia) generated using Oxford Nanopore long-read sequencing, alongside re-analysis of 38 high-quality public genomes (≥90% completeness and ≤5% contamination cut-off). Pangenome analysis revealed a more open structure than previously reported, with a core genome comprising 25% of total gene clusters and an accessory genome accounting for 71%. Panstripe analysis demonstrated significant temporal signal in gene gain and loss events associated with phylogenetic branch length (core: P=1.72×10[-4]; tip: P=2.64×10[-14]). All 46 genomes contained complete PHA biosynthesis operons (phaB-phaA-phaP-phaC) with high sequence conservation under strong purifying selection (Z=30.30, P<0.001). In a genome that readily gains and loses genes, this conservation indicates that PHA synthesis is a maintained pathway, which is difficult to reconcile with a previous report that Salinivibrio lacks PHA degradation capacity. We therefore searched the genomes by Hidden Markov Model-based homology rather than standard annotation and identified seven putative depolymerases that form a single accessory cluster in 15% of strains, all previously annotated as 3-oxoadipate enol-lactonase-2. These candidates retained all catalytic residues characteristic of active depolymerases but are divergent from reference PHA depolymerases which could explain why annotation missed them. They remain putative and require biochemical confirmation. Both the expanded pangenome and these candidates emerged from standardized homology-based re-analysis, showing that annotation-dependent approaches can overlook genomic diversity and divergent enzyme families in non-model organisms. Together, these results establish Salinivibrio as a genomically dynamic genus with potential for halophilic bioplastic production.

Additional Links: PMID-42474466

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42474466,
year = {2026},
author = {Young, CE and O'Sullivan, H and Alattas, H and Tiwari, R and Macrae, A and Murphy, DV and Scott, C},
title = {Refining Salinivibrio pangenome dynamics and biotechnological potential through comparative analysis.},
journal = {Microbial genomics},
volume = {12},
number = {7},
pages = {},
doi = {10.1099/mgen.0.001786},
pmid = {42474466},
issn = {2057-5858},
mesh = {*Genome, Bacterial ; Phylogeny ; *Vibrionaceae/genetics/classification/metabolism ; Multigene Family ; Polyhydroxyalkanoates/metabolism/biosynthesis ; Biotechnology ; Lakes/microbiology ; Operon ; Western Australia ; },
abstract = {Current understanding of genomic diversity within the halophilic genus Salinivibrio relies predominantly on draft genomes, with only seven complete genomes among the 62 publicly available. Previous pangenome analysis suggested a closed genomic structure while concluding that Salinivibrio lacks polyhydroxyalkanoate (PHA) degradation capacity despite possessing biosynthesis genes. Here, we present eight complete Salinivibrio genomes from Pearse Lakes (Rottnest Island, Western Australia) generated using Oxford Nanopore long-read sequencing, alongside re-analysis of 38 high-quality public genomes (≥90% completeness and ≤5% contamination cut-off). Pangenome analysis revealed a more open structure than previously reported, with a core genome comprising 25% of total gene clusters and an accessory genome accounting for 71%. Panstripe analysis demonstrated significant temporal signal in gene gain and loss events associated with phylogenetic branch length (core: P=1.72×10[-4]; tip: P=2.64×10[-14]). All 46 genomes contained complete PHA biosynthesis operons (phaB-phaA-phaP-phaC) with high sequence conservation under strong purifying selection (Z=30.30, P<0.001). In a genome that readily gains and loses genes, this conservation indicates that PHA synthesis is a maintained pathway, which is difficult to reconcile with a previous report that Salinivibrio lacks PHA degradation capacity. We therefore searched the genomes by Hidden Markov Model-based homology rather than standard annotation and identified seven putative depolymerases that form a single accessory cluster in 15% of strains, all previously annotated as 3-oxoadipate enol-lactonase-2. These candidates retained all catalytic residues characteristic of active depolymerases but are divergent from reference PHA depolymerases which could explain why annotation missed them. They remain putative and require biochemical confirmation. Both the expanded pangenome and these candidates emerged from standardized homology-based re-analysis, showing that annotation-dependent approaches can overlook genomic diversity and divergent enzyme families in non-model organisms. Together, these results establish Salinivibrio as a genomically dynamic genus with potential for halophilic bioplastic production.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Genome, Bacterial
Phylogeny
*Vibrionaceae/genetics/classification/metabolism
Multigene Family
Polyhydroxyalkanoates/metabolism/biosynthesis
Biotechnology
Lakes/microbiology
Operon
Western Australia

RevDate: 2026-07-17
CmpDate: 2026-07-17

Sirén J, Paten B, Human Pangenome Reference Consortium (2026)

GBZ-base and GAF-base: Indexed pangenome file formats.

bioRxiv : the preprint server for biology pii:2026.07.10.737775.

MOTIVATION: Existing pangenome file formats are designed for batch processing. Graphs must be loaded into memory, and alignment files must be read sequentially. Indexed file formats that can be used directly from disk would be more appropriate for interactive applications.

RESULTS: We propose GBZ-base and GAF-base - SQLite-backed file formats comparable to GBZ and GAF. GBZ-base supports efficient extraction of local subgraphs, and GAF-base lets us extract all alignments to the subgraph. Additionally, GAF-base is smaller than any other file format for sequence-to-graph alignments.

From https://github.com/jltsiren/gbz-base and https://crates.io/crates/gbz-base under the MIT license.

Additional Links: PMID-42465382

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42465382,
year = {2026},
author = {Sirén, J and Paten, B and , },
title = {GBZ-base and GAF-base: Indexed pangenome file formats.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.64898/2026.07.10.737775},
pmid = {42465382},
issn = {2692-8205},
abstract = {MOTIVATION: Existing pangenome file formats are designed for batch processing. Graphs must be loaded into memory, and alignment files must be read sequentially. Indexed file formats that can be used directly from disk would be more appropriate for interactive applications.

RESULTS: We propose GBZ-base and GAF-base - SQLite-backed file formats comparable to GBZ and GAF. GBZ-base supports efficient extraction of local subgraphs, and GAF-base lets us extract all alignments to the subgraph. Additionally, GAF-base is smaller than any other file format for sequence-to-graph alignments.

From https://github.com/jltsiren/gbz-base and https://crates.io/crates/gbz-base under the MIT license.},
}

RevDate: 2026-07-17
CmpDate: 2026-07-17

Hunt M, Torres MDT, Alikhan NF, et al (2026)

AllTheBacteria: a community resource empowers biology and discovers novel peptide antibiotics.

bioRxiv : the preprint server for biology pii:2024.03.08.584059.

Public microbial genomes encode an immense record of biological diversity, evolution and molecular function, but much of this information remains difficult to reuse because raw sequencing data are not uniformly assembled, quality controlled, annotated or searchable at scale. Here we present AllTheBacteria, an open, community-built resource that transforms public bacterial short-read whole-genome sequencing reads into a uniformly processed discovery platform. The current analysed release contains 2,440,377 high-quality bacterial and archaeal genomes from 11,273 species, together with standardized taxonomic assignments, genome annotations, antimicrobial resistance calls, antiphage-defence annotations, protein structure predictions and AI-ready sequence tables. We show that this infrastructure enables applications that would otherwise be impractical, from global sequence search and outbreak contextualization to pangenome method development, antimicrobial resistance reservoir mapping and antiphage-defence ecology. As a stringent experimental demonstration, we mined 3,919,096 encrypted peptide fragments from AllTheBacteria proteomes using our deep learning model APEX 1.1, identifying 1,867 candidates with predicted antimicrobial activity. We synthesized 24 representative peptides and tested them against 20 clinically relevant bacterial strains, including antibiotic-resistant pathogens. Multiple peptides showed low-micromolar activity, membrane-responsive conformational transitions and selective envelope perturbation. A lead molecule, ATB20, reduced Acinetobacter baumannii burden in a murine skin abscess model with efficacy comparable to polymyxin B and no overt toxicity. Together, these results establish AllTheBacteria as both a foundational community resource for microbiology and a renewable engine for AI-guided antimicrobial discovery.

Additional Links: PMID-42465405

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42465405,
year = {2026},
author = {Hunt, M and Torres, MDT and Alikhan, NF and Anderson, D and Andreani, ML and Blom, J and Bouras, G and Brinkman, FSL and Carroll, LM and Croxen, MA and Floto, RA and Hall, MB and Hawkey, J and Horsfield, ST and Jia, B and Lacey, JA and Lee, HS and Lima, L and MacAlasdair, N and Mallawaarachchi, S and Matlock, W and Moustafa, AM and Petit, R and Ramnath, V and Raghuram, V and Russell, MJ and Sanderson, T and Saratto, T and Schwengers, O and Seemann, T and Shaw, LP and Shen, W and Thomson, N and Tonkin-Hill, G and Toussaint, J and Viet, TL and von Wachsmann, J and Wan, F and Weimann, A and Wheatley, RM and Wiatrak, M and Xie, O and de la Fuente-Nunez, C and Lees, JA and Iqbal, Z},
title = {AllTheBacteria: a community resource empowers biology and discovers novel peptide antibiotics.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.1101/2024.03.08.584059},
pmid = {42465405},
issn = {2692-8205},
abstract = {Public microbial genomes encode an immense record of biological diversity, evolution and molecular function, but much of this information remains difficult to reuse because raw sequencing data are not uniformly assembled, quality controlled, annotated or searchable at scale. Here we present AllTheBacteria, an open, community-built resource that transforms public bacterial short-read whole-genome sequencing reads into a uniformly processed discovery platform. The current analysed release contains 2,440,377 high-quality bacterial and archaeal genomes from 11,273 species, together with standardized taxonomic assignments, genome annotations, antimicrobial resistance calls, antiphage-defence annotations, protein structure predictions and AI-ready sequence tables. We show that this infrastructure enables applications that would otherwise be impractical, from global sequence search and outbreak contextualization to pangenome method development, antimicrobial resistance reservoir mapping and antiphage-defence ecology. As a stringent experimental demonstration, we mined 3,919,096 encrypted peptide fragments from AllTheBacteria proteomes using our deep learning model APEX 1.1, identifying 1,867 candidates with predicted antimicrobial activity. We synthesized 24 representative peptides and tested them against 20 clinically relevant bacterial strains, including antibiotic-resistant pathogens. Multiple peptides showed low-micromolar activity, membrane-responsive conformational transitions and selective envelope perturbation. A lead molecule, ATB20, reduced Acinetobacter baumannii burden in a murine skin abscess model with efficacy comparable to polymyxin B and no overt toxicity. Together, these results establish AllTheBacteria as both a foundational community resource for microbiology and a renewable engine for AI-guided antimicrobial discovery.},
}

RevDate: 2026-07-17
CmpDate: 2026-07-17

Bhure M, Shukla N, Purohit H, et al (2026)

Genome-wide investigation of outbreak-associated Vibrio cholerae in Gujarat, India identifies antimicrobial resistance genes, virulence determinants, and mobile genetic elements.

Frontiers in microbiology, 17:1851551.

This study investigates the 2024 cholera outbreak in Gujarat, India, utilizing combined whole-genome analysis of clinical Vibrio cholerae isolates and wastewater surveillance. A total of, 69 V. cholerae isolates were recovered from affected patients, predominantly belonging to the O1 serogroup (51 isolates). Antimicrobial susceptibility test (AST) of 34 isolates revealed complete resistance to ampicillin and partial resistance to cotrimoxazole, whereas all isolates were susceptible to doxycycline, ciprofloxacin, chloramphenicol, tetracycline, and gentamicin. Whole-genome sequencing of 20 selected isolates revealed that the isolates belong to the seventh pandemic El Tor (7PET) lineage, sequence type ST69. Phylogenomic analyses using a multi-method approach, core genes, Composition Vector (CV) Tree, SNPs, and multilocus sequence typing (MLST) showed tight clustering with limited diversity among the isolates. All isolates contained 13-15 antimicrobial resistance genes, with high consistency between genotype-phenotype for most antibiotics, although discordance was observed for ciprofloxacin, cotrimoxazole, and chloramphenicol. Sixteen genes were identified as virulence factors, and 11 isolates also had ctxA/ctxB. All isolates also had two to four integrative conjugative elements (ICEs) containing antimicrobial resistance genes (ARGs) and important Vibrio cholerae pathogenicity islands (VPI-1, VPI-2) and Vibrio cholerae seventh pandemic islands (VSP-1, VSP-2). The pangenome analysis highlights extensive genomic flexibility within species, likely driven by horizontal gene transfer and ecological adaptation; however, further outbreak-specific investigations are required to determine their direct role in current outbreak. The detection of ctxA-positive signals in wastewater, 20% (28/140) of the samples, suggests a possible surveillance signal during the outbreak. These results highlight the presence of antimicrobial-resistant 7PET O1 El Tor strains in Gujarat outbreaks and support continued genomic monitoring to guide focused public health interventions in endemic areas. Furthermore, this study also underscores the importance of wastewater surveillance for monitoring V. cholerae.

Additional Links: PMID-42466126

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42466126,
year = {2026},
author = {Bhure, M and Shukla, N and Purohit, H and Patel, N and Chavda, P and Mistry, M and Shingala, H and Solanki, B and Shah, C and Joshi, M and Joshi, C and Bagatharia, S and Pandit, R},
title = {Genome-wide investigation of outbreak-associated Vibrio cholerae in Gujarat, India identifies antimicrobial resistance genes, virulence determinants, and mobile genetic elements.},
journal = {Frontiers in microbiology},
volume = {17},
number = {},
pages = {1851551},
pmid = {42466126},
issn = {1664-302X},
abstract = {This study investigates the 2024 cholera outbreak in Gujarat, India, utilizing combined whole-genome analysis of clinical Vibrio cholerae isolates and wastewater surveillance. A total of, 69 V. cholerae isolates were recovered from affected patients, predominantly belonging to the O1 serogroup (51 isolates). Antimicrobial susceptibility test (AST) of 34 isolates revealed complete resistance to ampicillin and partial resistance to cotrimoxazole, whereas all isolates were susceptible to doxycycline, ciprofloxacin, chloramphenicol, tetracycline, and gentamicin. Whole-genome sequencing of 20 selected isolates revealed that the isolates belong to the seventh pandemic El Tor (7PET) lineage, sequence type ST69. Phylogenomic analyses using a multi-method approach, core genes, Composition Vector (CV) Tree, SNPs, and multilocus sequence typing (MLST) showed tight clustering with limited diversity among the isolates. All isolates contained 13-15 antimicrobial resistance genes, with high consistency between genotype-phenotype for most antibiotics, although discordance was observed for ciprofloxacin, cotrimoxazole, and chloramphenicol. Sixteen genes were identified as virulence factors, and 11 isolates also had ctxA/ctxB. All isolates also had two to four integrative conjugative elements (ICEs) containing antimicrobial resistance genes (ARGs) and important Vibrio cholerae pathogenicity islands (VPI-1, VPI-2) and Vibrio cholerae seventh pandemic islands (VSP-1, VSP-2). The pangenome analysis highlights extensive genomic flexibility within species, likely driven by horizontal gene transfer and ecological adaptation; however, further outbreak-specific investigations are required to determine their direct role in current outbreak. The detection of ctxA-positive signals in wastewater, 20% (28/140) of the samples, suggests a possible surveillance signal during the outbreak. These results highlight the presence of antimicrobial-resistant 7PET O1 El Tor strains in Gujarat outbreaks and support continued genomic monitoring to guide focused public health interventions in endemic areas. Furthermore, this study also underscores the importance of wastewater surveillance for monitoring V. cholerae.},
}

RevDate: 2026-07-16

Zhao P, Peng C, Gao Y, et al (2026)

Pangenome Graph Reveals the Structural Variation Landscape in 2929 Cattle Samples and Its Impact on Gene Regulation.

Genomics, proteomics & bioinformatics pii:8735849 [Epub ahead of print].

Structural variations (SVs) represent a significant source of genomic diversity, with demonstrated roles in livestock gene expression and traits. However, a comprehensive understanding of the SV landscape across large sample sets and its impact on gene regulation in cattle remains incomplete. This study aimed to construct high-fidelity pangenome graphs by integrating both assembly-based and whole-genome sequencing (WGS) derived SV catalogs. We evaluated the efficacy of pangenome graphs for SV genotyping and identified 80,328 high-quality SVs from a cohort of 2929 samples. We systematically characterized these SVs, including their linkage disequilibrium with single nucleotide polymorphisms (SNPs), functional annotations, formation mechanisms, and genomic distributions. Furthermore, we generated paired WGS (24.4 ×) and blood RNA-seq data in 170 Simmental cattle. Utilizing our pangenome graphs, we identified 637 SV-expression quantitative trait loci (SV-eQTL), which accounted for 10.81% of expression heritability of target genes, with 38.09% of the effects linked to promoter/enhancer regions. Forty-six of these SV-eQTL were replicated using CattleGTEx results through SV imputation using a joint SNP-SV reference panel. Notably, insertions in the GHSR gene were significantly associated with its expression levels, likely linked to Bos indicus cattle adaptation to heat tolerance. Our findings provide novel insights into the SV landscape and its contribution to gene regulation, underscoring its importance in cattle genetics and genomics.

Additional Links: PMID-42461009

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42461009,
year = {2026},
author = {Zhao, P and Peng, C and Gao, Y and Dai, S and Yang, L and He, J and Yu, D and Wang, Z and Wang, X and Zhou, Y and Fang, L and Liu, GE},
title = {Pangenome Graph Reveals the Structural Variation Landscape in 2929 Cattle Samples and Its Impact on Gene Regulation.},
journal = {Genomics, proteomics & bioinformatics},
volume = {},
number = {},
pages = {},
doi = {10.1093/gpbjnl/qzag065},
pmid = {42461009},
issn = {2210-3244},
abstract = {Structural variations (SVs) represent a significant source of genomic diversity, with demonstrated roles in livestock gene expression and traits. However, a comprehensive understanding of the SV landscape across large sample sets and its impact on gene regulation in cattle remains incomplete. This study aimed to construct high-fidelity pangenome graphs by integrating both assembly-based and whole-genome sequencing (WGS) derived SV catalogs. We evaluated the efficacy of pangenome graphs for SV genotyping and identified 80,328 high-quality SVs from a cohort of 2929 samples. We systematically characterized these SVs, including their linkage disequilibrium with single nucleotide polymorphisms (SNPs), functional annotations, formation mechanisms, and genomic distributions. Furthermore, we generated paired WGS (24.4 ×) and blood RNA-seq data in 170 Simmental cattle. Utilizing our pangenome graphs, we identified 637 SV-expression quantitative trait loci (SV-eQTL), which accounted for 10.81% of expression heritability of target genes, with 38.09% of the effects linked to promoter/enhancer regions. Forty-six of these SV-eQTL were replicated using CattleGTEx results through SV imputation using a joint SNP-SV reference panel. Notably, insertions in the GHSR gene were significantly associated with its expression levels, likely linked to Bos indicus cattle adaptation to heat tolerance. Our findings provide novel insights into the SV landscape and its contribution to gene regulation, underscoring its importance in cattle genetics and genomics.},
}

RevDate: 2026-07-16

Michoud G, Geers A, Peter H, et al (2026)

Evolutionary radiation of Polaromonas from mountain glaciers downstream.

Current biology : CB pii:S0960-9822(26)00816-X [Epub ahead of print].

Habitat transitions are central to microbial ecology and evolution and have been extensively studied across vastly different environments, such as between saline and non-saline environments. However, microbial habitat transitions along other large-scale environmental gradients remain poorly studied. This is particularly true for transitions involving the cryosphere, despite building evidence suggesting the Cryogenian as important for evolutionary radiation. Here, we investigated ecosystem transitions and the related genomic adaptations of the cosmopolitan cryospheric Polaromonas bacterium. We constructed a pangenome from 282 high-quality genomes, sourced from glaciers, glacier-fed streams (GFSs), lakes, wetlands, groundwater, rivers, and soils. Phylogenetic reconciliation suggested that the ancestral Polaromonas genome radiated from glacier ecosystems into various downstream environments through multiple independent transitions. These transitions were likely marked by extensive horizontal gene transfer and gene loss, with mobile genetic elements such as plasmids and prophages playing key roles in genomic diversification. Predicted ancestral genomes encoded versatile metabolic and stress-response capacities, which support adaptation to fluctuating and extreme conditions in the various cryospheric habitats. Compared to the ancestral Polaromonas genome, distinct genomic signatures were associated with specific habitats: GFS lineages possess expanded stress-tolerance repertoires, glacier lineages gained chemolithotrophic and anaerobic pathways, lake and wetland genomes acquired phototrophic functions, and soil lineages expanded substrate transport and stress tolerance. Together, our findings highlight the role of genomic plasticity in the ecological success of Polaromonas and also underscore the cryosphere as a potential evolutionary cradle from which lineages dispersed and adapted to downstream aquatic and terrestrial environments.

Additional Links: PMID-42462719

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42462719,
year = {2026},
author = {Michoud, G and Geers, A and Peter, H and Thorpe, AC and Zhong, ZP and Rich, V and Battin, TJ},
title = {Evolutionary radiation of Polaromonas from mountain glaciers downstream.},
journal = {Current biology : CB},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.cub.2026.06.070},
pmid = {42462719},
issn = {1879-0445},
abstract = {Habitat transitions are central to microbial ecology and evolution and have been extensively studied across vastly different environments, such as between saline and non-saline environments. However, microbial habitat transitions along other large-scale environmental gradients remain poorly studied. This is particularly true for transitions involving the cryosphere, despite building evidence suggesting the Cryogenian as important for evolutionary radiation. Here, we investigated ecosystem transitions and the related genomic adaptations of the cosmopolitan cryospheric Polaromonas bacterium. We constructed a pangenome from 282 high-quality genomes, sourced from glaciers, glacier-fed streams (GFSs), lakes, wetlands, groundwater, rivers, and soils. Phylogenetic reconciliation suggested that the ancestral Polaromonas genome radiated from glacier ecosystems into various downstream environments through multiple independent transitions. These transitions were likely marked by extensive horizontal gene transfer and gene loss, with mobile genetic elements such as plasmids and prophages playing key roles in genomic diversification. Predicted ancestral genomes encoded versatile metabolic and stress-response capacities, which support adaptation to fluctuating and extreme conditions in the various cryospheric habitats. Compared to the ancestral Polaromonas genome, distinct genomic signatures were associated with specific habitats: GFS lineages possess expanded stress-tolerance repertoires, glacier lineages gained chemolithotrophic and anaerobic pathways, lake and wetland genomes acquired phototrophic functions, and soil lineages expanded substrate transport and stress tolerance. Together, our findings highlight the role of genomic plasticity in the ecological success of Polaromonas and also underscore the cryosphere as a potential evolutionary cradle from which lineages dispersed and adapted to downstream aquatic and terrestrial environments.},
}

RevDate: 2026-07-16

Ashraf H, Doerr D, Ebler J, et al (2026)

Building and applying pangenome references to capture genetic diversity.

Nature reviews. Genetics [Epub ahead of print].

Reference genomes serve as a coordinate system and are central to almost all analyses in genomics. However, linear reference genomes are based on a single individual or a small number of individuals and do not represent genetic diversity. Recent advances in de novo genome assembly, powered by long-read sequencing technologies, now enable the sequence reconstruction of many genomes to reference quality. These pangenomes integrate sequences from multiple individuals into graph-based or multi-haplotype representations, capturing genetic variation beyond a single linear reference. The widespread adoption of such pangenome references, which encode a diverse set of haplotypes, thus removes biases and enables the discovery of variants relative to all included haplotype backgrounds. The emergence of corresponding computational tools for analysing structural variants and complex genetic loci opens up opportunities in genome-wide association studies and rare-disease genetics. Here we review these opportunities, as well as challenges concerning pangenomes that need to be addressed by the research community.

Additional Links: PMID-42463514

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42463514,
year = {2026},
author = {Ashraf, H and Doerr, D and Ebler, J and Marschall, T},
title = {Building and applying pangenome references to capture genetic diversity.},
journal = {Nature reviews. Genetics},
volume = {},
number = {},
pages = {},
pmid = {42463514},
issn = {1471-0064},
abstract = {Reference genomes serve as a coordinate system and are central to almost all analyses in genomics. However, linear reference genomes are based on a single individual or a small number of individuals and do not represent genetic diversity. Recent advances in de novo genome assembly, powered by long-read sequencing technologies, now enable the sequence reconstruction of many genomes to reference quality. These pangenomes integrate sequences from multiple individuals into graph-based or multi-haplotype representations, capturing genetic variation beyond a single linear reference. The widespread adoption of such pangenome references, which encode a diverse set of haplotypes, thus removes biases and enables the discovery of variants relative to all included haplotype backgrounds. The emergence of corresponding computational tools for analysing structural variants and complex genetic loci opens up opportunities in genome-wide association studies and rare-disease genetics. Here we review these opportunities, as well as challenges concerning pangenomes that need to be addressed by the research community.},
}

RevDate: 2026-07-17

Wang H, Liang J, Li X, et al (2026)

A pan-genomic perspective: comprehensive dissection of the FIG superfamily unravels its evolution, expansion, and environmental adaptation in wheat.

BMC plant biology pii:10.1186/s12870-026-09530-6 [Epub ahead of print].

BACKGROUND: As a major global food crop, wheat faces dual pressures from population growth and deteriorating agricultural environments in efforts to increase its yield. The members of the FIG superfamily are associated with photosynthetic carbon metabolism and stress-responsive pathways in plants.

RESULTS: In this study, we conducted the first systematic pan-genomic analysis to investigate the evolution and function of the FIG superfamily in wheat. The results revealed that this family underwent significant expansion during plant evolution from aquatic to terrestrial habitats and from lower to higher forms, with functional divergence apparently predating the green algal stage as inferred from phylogenetic patterns. The members of this family were primarily derived from three ancestral species, and their expansion was largely driven by whole-genome duplication. Most members were found to be under purifying selection, whereas the TaVTC4-4B was subjected to positive selection. This gene is constitutively highly expressed in green tissues, and its promoter is enriched with cis-regulatory elements associated with light responsiveness, JA/ABA signaling, and stress responses. Expression analysis indicated its strong responsiveness to salt stresses.

CONCLUSIONS: This study elucidates the evolutionary trajectory and functional landscape of the wheat FIG superfamily, laying a theoretical foundation for the potential genetic improvement of photosynthetic efficiency and stress resilience in wheat.

Additional Links: PMID-42464074

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42464074,
year = {2026},
author = {Wang, H and Liang, J and Li, X and Ou, G and Xing, J and Lang, Y and Fan, Y and Shi, X and Liu, R and Wang, L and Rao, Y and Guan, H and Qu, M and Song, J and Lu, K and Zhou, M},
title = {A pan-genomic perspective: comprehensive dissection of the FIG superfamily unravels its evolution, expansion, and environmental adaptation in wheat.},
journal = {BMC plant biology},
volume = {},
number = {},
pages = {},
doi = {10.1186/s12870-026-09530-6},
pmid = {42464074},
issn = {1471-2229},
support = {312529Y052//Talent Startup Funding of Jinhua Academy, Zhejiang Chinese Medical University/ ; },
abstract = {BACKGROUND: As a major global food crop, wheat faces dual pressures from population growth and deteriorating agricultural environments in efforts to increase its yield. The members of the FIG superfamily are associated with photosynthetic carbon metabolism and stress-responsive pathways in plants.

RESULTS: In this study, we conducted the first systematic pan-genomic analysis to investigate the evolution and function of the FIG superfamily in wheat. The results revealed that this family underwent significant expansion during plant evolution from aquatic to terrestrial habitats and from lower to higher forms, with functional divergence apparently predating the green algal stage as inferred from phylogenetic patterns. The members of this family were primarily derived from three ancestral species, and their expansion was largely driven by whole-genome duplication. Most members were found to be under purifying selection, whereas the TaVTC4-4B was subjected to positive selection. This gene is constitutively highly expressed in green tissues, and its promoter is enriched with cis-regulatory elements associated with light responsiveness, JA/ABA signaling, and stress responses. Expression analysis indicated its strong responsiveness to salt stresses.

CONCLUSIONS: This study elucidates the evolutionary trajectory and functional landscape of the wheat FIG superfamily, laying a theoretical foundation for the potential genetic improvement of photosynthetic efficiency and stress resilience in wheat.},
}

RevDate: 2026-07-17

Solun GK, Dogrusoz U, Bingöl Z, et al (2026)

PG2: algorithms and a web-based tool for effective layout and visual analysis of pangenome graphs.

BMC bioinformatics pii:10.1186/s12859-026-06555-4 [Epub ahead of print].

BACKGROUND: The advent of cost-effective whole-genome assembly has enabled the creation of comprehensive pangenomes with resolved haplotypes across various organisms. This technological leap drives the refinement of tailored methodologies to manage the intricate sequences and variations in extensive collections of related genomes. These methodologies often utilize graphical representations of pangenomes to enhance algorithms for tasks like sequence alignment, visualization, and functional genomics. Leveraging the insights provided by pangenomes, these approaches exhibit improved efficiency in bioinformatics tasks such as read mapping, variant calling, and genotyping. Pangenome graphs are positioned to become invaluable assets in genomics, offering seamless reconciliation of diverse sequence and coordinate systems. While their potential to replace linear reference genomes is uncertain, their adaptability ensures their utility in future pangenomic models. Utilizing graphs for visual representation aids in exploring critical insights and identifying key patterns, facilitating analysis by highlighting connections, trends, and patterns for improved comprehension.

RESULTS: Towards this goal, we present some algorithms to effectively layout and visually analyze pangenome graphs. Then, we introduce an open-source, flexible, easy-to-use, web-based platform named PG2 (PanGenoGrapher), realizing these algorithms.

CONCLUSIONS: Our primary objective here is to incorporate the capabilities of advanced visualization techniques into the analysis of pangenome graphs, thereby enhancing their utility and accessibility for researchers and practitioners in the field. The source code and user guide are openly available on GitHub at https://github.com/iVis-at-Bilkent/pangenographer. A publicly accessible sample deployment is hosted at http://pg2.cs.bilkent.edu.tr. In addition, a demonstration video illustrating the primary use cases of PG2 is available at https://www.youtube.com/watch?v=yCd7-aGY6CQ.

Additional Links: PMID-42464101

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42464101,
year = {2026},
author = {Solun, GK and Dogrusoz, U and Bingöl, Z and Alkan, C},
title = {PG2: algorithms and a web-based tool for effective layout and visual analysis of pangenome graphs.},
journal = {BMC bioinformatics},
volume = {},
number = {},
pages = {},
doi = {10.1186/s12859-026-06555-4},
pmid = {42464101},
issn = {1471-2105},
abstract = {BACKGROUND: The advent of cost-effective whole-genome assembly has enabled the creation of comprehensive pangenomes with resolved haplotypes across various organisms. This technological leap drives the refinement of tailored methodologies to manage the intricate sequences and variations in extensive collections of related genomes. These methodologies often utilize graphical representations of pangenomes to enhance algorithms for tasks like sequence alignment, visualization, and functional genomics. Leveraging the insights provided by pangenomes, these approaches exhibit improved efficiency in bioinformatics tasks such as read mapping, variant calling, and genotyping. Pangenome graphs are positioned to become invaluable assets in genomics, offering seamless reconciliation of diverse sequence and coordinate systems. While their potential to replace linear reference genomes is uncertain, their adaptability ensures their utility in future pangenomic models. Utilizing graphs for visual representation aids in exploring critical insights and identifying key patterns, facilitating analysis by highlighting connections, trends, and patterns for improved comprehension.

RESULTS: Towards this goal, we present some algorithms to effectively layout and visually analyze pangenome graphs. Then, we introduce an open-source, flexible, easy-to-use, web-based platform named PG2 (PanGenoGrapher), realizing these algorithms.

CONCLUSIONS: Our primary objective here is to incorporate the capabilities of advanced visualization techniques into the analysis of pangenome graphs, thereby enhancing their utility and accessibility for researchers and practitioners in the field. The source code and user guide are openly available on GitHub at https://github.com/iVis-at-Bilkent/pangenographer. A publicly accessible sample deployment is hosted at http://pg2.cs.bilkent.edu.tr. In addition, a demonstration video illustrating the primary use cases of PG2 is available at https://www.youtube.com/watch?v=yCd7-aGY6CQ.},
}

RevDate: 2026-07-17
CmpDate: 2026-07-17

Rana P, Chaudhary C, Pruthi R, et al (2026)

Molecular insights and translational opportunities to enhance heat tolerance in rice.

The plant genome, 19(3):e70281.

Heat stress is an increasingly serious threat to rice (Oryza sativa L.) productivity, yet the genetic and regulatory architecture underlying thermotolerance remain poorly resolved and fragmented across studies. Earlier research focused on individual pathways or specific developmental stages; however, recent advances now support an integrated understanding of heat stress adaptation in rice. This review synthesizes emerging insights into molecular physiology, regulatory signaling, epigenetic memory, and genome-scale variation associated with thermotolerance. We highlight the interconnected roles of calcium reactive oxygen species signaling, heat shock transcription factor networks, translational regulation, and chromatin-based stress memory in shaping reproductive-stage tolerance and maintaining grain quality under elevated temperatures. The review also emphasizes the value of pangenome analyses and structural variant discovery for identifying heat-responsive genes and regulatory elements absent from single-reference genomes. In addition, genome-wide association studies, haplotype-based breeding, genomic selection, and CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-based genome editing are discussed as promising approaches for functional validation and deployment of favorable alleles controlling polygenic heat resilience. Despite these advances, several challenges continue to hinder translation into breeding-ready outcomes, including limited field-based validation of candidate genes, poor integration of multi-omics datasets into predictive breeding frameworks, and insufficient understanding of reproductive-stage regulatory networks. Furthermore, genotype × environment interactions, together with trade-offs among yield, grain quality, and stress resilience, strongly influence the stability and transferability of thermotolerance traits across diverse agroecological environments. By integrating mechanistic insights with genome-scale diversity and predictive breeding tools, this review outlines a genomics-enabled roadmap for developing heat-resilient rice cultivars under intensifying global warming and supporting sustainable global rice production.

Additional Links: PMID-42464962

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42464962,
year = {2026},
author = {Rana, P and Chaudhary, C and Pruthi, R and Sharma, M and Singh, B and Kondi, RKR and Bulle, M and Subudhi, PK},
title = {Molecular insights and translational opportunities to enhance heat tolerance in rice.},
journal = {The plant genome},
volume = {19},
number = {3},
pages = {e70281},
doi = {10.1002/tpg2.70281},
pmid = {42464962},
issn = {1940-3372},
support = {2023-68012-39002//Strengthening Agricultural Systems/ ; //National Institute of Food and Agriculture/ ; },
mesh = {*Oryza/genetics/physiology ; *Thermotolerance/genetics ; Plant Breeding ; Heat-Shock Response/genetics ; Gene Expression Regulation, Plant ; },
abstract = {Heat stress is an increasingly serious threat to rice (Oryza sativa L.) productivity, yet the genetic and regulatory architecture underlying thermotolerance remain poorly resolved and fragmented across studies. Earlier research focused on individual pathways or specific developmental stages; however, recent advances now support an integrated understanding of heat stress adaptation in rice. This review synthesizes emerging insights into molecular physiology, regulatory signaling, epigenetic memory, and genome-scale variation associated with thermotolerance. We highlight the interconnected roles of calcium reactive oxygen species signaling, heat shock transcription factor networks, translational regulation, and chromatin-based stress memory in shaping reproductive-stage tolerance and maintaining grain quality under elevated temperatures. The review also emphasizes the value of pangenome analyses and structural variant discovery for identifying heat-responsive genes and regulatory elements absent from single-reference genomes. In addition, genome-wide association studies, haplotype-based breeding, genomic selection, and CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-based genome editing are discussed as promising approaches for functional validation and deployment of favorable alleles controlling polygenic heat resilience. Despite these advances, several challenges continue to hinder translation into breeding-ready outcomes, including limited field-based validation of candidate genes, poor integration of multi-omics datasets into predictive breeding frameworks, and insufficient understanding of reproductive-stage regulatory networks. Furthermore, genotype × environment interactions, together with trade-offs among yield, grain quality, and stress resilience, strongly influence the stability and transferability of thermotolerance traits across diverse agroecological environments. By integrating mechanistic insights with genome-scale diversity and predictive breeding tools, this review outlines a genomics-enabled roadmap for developing heat-resilient rice cultivars under intensifying global warming and supporting sustainable global rice production.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Oryza/genetics/physiology
*Thermotolerance/genetics
Plant Breeding
Heat-Shock Response/genetics
Gene Expression Regulation, Plant

RevDate: 2026-07-17
CmpDate: 2026-07-17

Guarracino A, Gyamfi A, Human Pangenome Reference Consortium, et al (2026)

Concerted evolution and unorthodox recombination of human subtelomeres.

bioRxiv : the preprint server for biology pii:2026.07.10.737660.

Human subtelomeres contain duplicated sequence that is shared among the ends of non-homologous chromosomes and provides a substrate for ectopic exchange [1-6]. However, incomplete reference assemblies and chromosome-by-chromosome analyses have prevented a population-scale view of the extent and organization of subtelomeric exchange [7-9]. Here we apply a reference-free pangenome approach to 465 near-complete human assemblies, comparing every chromosome end against every other, and find that high-identity pseudo-homolog regions occur on 41 of 48 chromosome arms. These regions form structured sequence communities in which previously described exchange systems appear as local peaks within a broader continuum. Human and mouse chromosome-contact maps show preferential proximity between subtelomeres with similar sequences. In mouse meiosis this proximity is strongest at the zygotene bouquet, when telomeres cluster at the nuclear envelope; in human data it persists even in adjacent flanks that lack the shared sequence used to define each pair. In a three-generation telomere-to-telomere pedigree, whole-genome comparison identifies putative recombination between subtelomeric regions on non-homologous chromosomes that matches this community organization, while recovering the obligate Xp/Yp PAR1 recombination in the male germline. These results generalize known subtelomeric exchange systems into a near-ubiquitous architecture and support recurrent ectopic exchange as a genome-wide force in the concerted evolution of human chromosome ends.

Additional Links: PMID-42465281

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42465281,
year = {2026},
author = {Guarracino, A and Gyamfi, A and , and Garrison, E},
title = {Concerted evolution and unorthodox recombination of human subtelomeres.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.64898/2026.07.10.737660},
pmid = {42465281},
issn = {2692-8205},
abstract = {Human subtelomeres contain duplicated sequence that is shared among the ends of non-homologous chromosomes and provides a substrate for ectopic exchange [1-6]. However, incomplete reference assemblies and chromosome-by-chromosome analyses have prevented a population-scale view of the extent and organization of subtelomeric exchange [7-9]. Here we apply a reference-free pangenome approach to 465 near-complete human assemblies, comparing every chromosome end against every other, and find that high-identity pseudo-homolog regions occur on 41 of 48 chromosome arms. These regions form structured sequence communities in which previously described exchange systems appear as local peaks within a broader continuum. Human and mouse chromosome-contact maps show preferential proximity between subtelomeres with similar sequences. In mouse meiosis this proximity is strongest at the zygotene bouquet, when telomeres cluster at the nuclear envelope; in human data it persists even in adjacent flanks that lack the shared sequence used to define each pair. In a three-generation telomere-to-telomere pedigree, whole-genome comparison identifies putative recombination between subtelomeric regions on non-homologous chromosomes that matches this community organization, while recovering the obligate Xp/Yp PAR1 recombination in the male germline. These results generalize known subtelomeric exchange systems into a near-ubiquitous architecture and support recurrent ectopic exchange as a genome-wide force in the concerted evolution of human chromosome ends.},
}

RevDate: 2026-07-17
CmpDate: 2026-07-17

Shivakumar VS, Langmead B, Human Pangenome Reference Consortium (2026)

Navigating the pangenome coordinate system with Shredtools.

bioRxiv : the preprint server for biology pii:2026.07.03.736354.

Existing notions of pangenome coordinates rely on hard-to-compute multiple sequence alignments. On the other hand, pangenome-wide exact unique matches (multi-MUMs) can be computed efficiently, and represent conserved stretches of columns in the underlying MSA. We introduce Shredtools, which uses multi-MUMs as pangenome waypoints and allows for sophisticated queries in pangenome coordinates. Its primary query is extract , which takes an interval of one sequence and extracts the smallest window containing it that is syntenic pangenome-wide. Shredtools' extract query can extract a gene region from 476 human genomes in half a second. Other queries help to refine these results, by finding local exact matches to improve the density of multi-MUM coverage ("enhance") and by selectively discarding sequences to improve the precision of the syntenic region ("zoom"). The Shredtools web interface (available at https://vikshiv.github.io/shredtools) allows for client-side handling of extract queries with index queries handled via simple and fast HTTP Range requests, simplifying usage and enabling pangenomescale discoveries.

Additional Links: PMID-42465356

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42465356,
year = {2026},
author = {Shivakumar, VS and Langmead, B and , },
title = {Navigating the pangenome coordinate system with Shredtools.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.64898/2026.07.03.736354},
pmid = {42465356},
issn = {2692-8205},
abstract = {Existing notions of pangenome coordinates rely on hard-to-compute multiple sequence alignments. On the other hand, pangenome-wide exact unique matches (multi-MUMs) can be computed efficiently, and represent conserved stretches of columns in the underlying MSA. We introduce Shredtools, which uses multi-MUMs as pangenome waypoints and allows for sophisticated queries in pangenome coordinates. Its primary query is extract , which takes an interval of one sequence and extracts the smallest window containing it that is syntenic pangenome-wide. Shredtools' extract query can extract a gene region from 476 human genomes in half a second. Other queries help to refine these results, by finding local exact matches to improve the density of multi-MUM coverage ("enhance") and by selectively discarding sequences to improve the precision of the syntenic region ("zoom"). The Shredtools web interface (available at https://vikshiv.github.io/shredtools) allows for client-side handling of extract queries with index queries handled via simple and fast HTTP Range requests, simplifying usage and enabling pangenomescale discoveries.},
}

RevDate: 2026-07-15
CmpDate: 2026-07-15

Wang G, Zhu N, Sun X, et al (2026)

Pan-Genome and Transcriptome-Guided Analysis Reveals Duplication-Driven Evolution and Candidate MYB-bHLH Modules Associated with Fruit Development in Pear.

Plants (Basel, Switzerland), 15(13): pii:plants15131961.

Gene duplication and subsequent selection are central to genome evolution and transcription factor diversification, but the conservation and divergence of the basic helix-loop-helix (bHLH) family in pear remain unclear from a pan-genome perspective. Here, we performed a pan-genome and transcriptome-guided analysis across 15 pear genome assemblies, including Asian pear, European pear, and hybrid/haplotype assemblies. Genome-wide duplicated gene pairs were classified into different duplication types, and Ka, Ks, and Ka/Ks values were calculated to establish an evolutionary background for duplicated pear genes. Based on this framework, 3222 bHLH were identified and grouped into evolutionary clades and orthologous gene groups. The pear bHLH family contained conserved core members and variable dispensable members, indicating both functional conservation and genome diversification. Duplication and Ka/Ks analyses showed that WGD/segmental duplication contributed to bHLH expansion and that most duplicated PbrbHLH gene pairs were constrained by purifying selection. By integrating 17-tissue and fruit-development transcriptomes from three pear cultivars, 39 fruit-development-associated PbrbHLHs were selected. Co-expression analysis with 185 PbrMYBs identified candidate MYB-bHLH co-expression modules from the available pear fruit-development transcriptomes. These results provide an evolutionary framework for pear bHLH diversification and candidate regulatory modules for future functional studies.

Additional Links: PMID-42452169

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42452169,
year = {2026},
author = {Wang, G and Zhu, N and Sun, X and Qi, K and Guo, Z},
title = {Pan-Genome and Transcriptome-Guided Analysis Reveals Duplication-Driven Evolution and Candidate MYB-bHLH Modules Associated with Fruit Development in Pear.},
journal = {Plants (Basel, Switzerland)},
volume = {15},
number = {13},
pages = {},
doi = {10.3390/plants15131961},
pmid = {42452169},
issn = {2223-7747},
support = {BK20230757//the Natural Science Foundation of Jiangsu Province, China/ ; BT-2025-TCYC-0065//Xinjiang Tianchi Talents Program/ ; },
abstract = {Gene duplication and subsequent selection are central to genome evolution and transcription factor diversification, but the conservation and divergence of the basic helix-loop-helix (bHLH) family in pear remain unclear from a pan-genome perspective. Here, we performed a pan-genome and transcriptome-guided analysis across 15 pear genome assemblies, including Asian pear, European pear, and hybrid/haplotype assemblies. Genome-wide duplicated gene pairs were classified into different duplication types, and Ka, Ks, and Ka/Ks values were calculated to establish an evolutionary background for duplicated pear genes. Based on this framework, 3222 bHLH were identified and grouped into evolutionary clades and orthologous gene groups. The pear bHLH family contained conserved core members and variable dispensable members, indicating both functional conservation and genome diversification. Duplication and Ka/Ks analyses showed that WGD/segmental duplication contributed to bHLH expansion and that most duplicated PbrbHLH gene pairs were constrained by purifying selection. By integrating 17-tissue and fruit-development transcriptomes from three pear cultivars, 39 fruit-development-associated PbrbHLHs were selected. Co-expression analysis with 185 PbrMYBs identified candidate MYB-bHLH co-expression modules from the available pear fruit-development transcriptomes. These results provide an evolutionary framework for pear bHLH diversification and candidate regulatory modules for future functional studies.},
}

RevDate: 2026-07-15
CmpDate: 2026-07-15

Wu N, Feng Y, Ning X, et al (2026)

Pan-Genome Analysis Reveals Evolutionary Dynamics and Functional Divergence of the NAC Gene Family in Soybean.

Plants (Basel, Switzerland), 15(13): pii:plants15132010.

Soybean (Glycine max) is an important model crop for studying plant functional genes, such as the NAC transcription factor (TF) gene family. The NAC transcription factor (TF) family is one of the largest plant-specific TF families and plays critical roles in plant growth, development, and stress responses. In this study, we performed a pan-genome-wide analysis of NAC genes using 29 soybean genomes. A total of 5051 NAC genes were identified and clustered into 245 orthologous gene groups (OGGs), including 58 core, 88 soft-core, 32 shell, and 67 cloud groups. Based on phylogenetic relationships, the representative NAC OGGs were assigned to 18 subfamilies, 17 of which contained soybean NAC genes. Gene duplication analysis indicated that whole-genome duplication (WGD)/segmental duplication was the predominant driver of NAC family expansion, accounting for 90.88% of duplication events. Approximately 39.30% of NAC genes carried at least one intact transposable element (TE) within 2 kb upstream or downstream regions. NAC genes with copy number variation (CNV) harbored more nearby TEs than non-CNV genes (1.54 vs. 1.31 TEs per gene), and dispensable NAC genes contained more nearby TEs than core NAC genes (1.59 vs. 1.33 TEs per gene). These results indicate a significant association between local TE abundance and NAC gene CNV or dispensability. Selection pressure analysis showed that dispensable NAC genes had higher Ka, Ks, and Ka/Ks values than core genes, suggesting relatively relaxed evolutionary constraints. Expression profiling across six tissues revealed distinct transcriptional patterns among NAC subfamilies. Structurally conserved subfamilies generally showed broader expression, whereas structurally divergent subfamilies displayed greater expression variability. Regulatory network and Gene Ontology (GO) enrichment analyses suggested that conserved subfamilies were mainly associated with stress responses, while divergent subfamilies were related to cell wall regulation, signal transduction, and ion homeostasis. Further analysis of Wm82 drought RNA-seq data prioritized several putative drought-responsive NAC candidates, including Glyma.16G043200, Glyma.06G248900, Glyma.07G050600, Glyma.12G206900, and Glyma.18G261300. Overall, these findings elucidate the mechanisms of expansion and the functional divergence of the NAC gene family at the soybean pan-genome level, providing a theoretical basis for understanding NAC gene evolution and facilitating future crop improvement.

Additional Links: PMID-42452216

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42452216,
year = {2026},
author = {Wu, N and Feng, Y and Ning, X and Yao, D},
title = {Pan-Genome Analysis Reveals Evolutionary Dynamics and Functional Divergence of the NAC Gene Family in Soybean.},
journal = {Plants (Basel, Switzerland)},
volume = {15},
number = {13},
pages = {},
doi = {10.3390/plants15132010},
pmid = {42452216},
issn = {2223-7747},
abstract = {Soybean (Glycine max) is an important model crop for studying plant functional genes, such as the NAC transcription factor (TF) gene family. The NAC transcription factor (TF) family is one of the largest plant-specific TF families and plays critical roles in plant growth, development, and stress responses. In this study, we performed a pan-genome-wide analysis of NAC genes using 29 soybean genomes. A total of 5051 NAC genes were identified and clustered into 245 orthologous gene groups (OGGs), including 58 core, 88 soft-core, 32 shell, and 67 cloud groups. Based on phylogenetic relationships, the representative NAC OGGs were assigned to 18 subfamilies, 17 of which contained soybean NAC genes. Gene duplication analysis indicated that whole-genome duplication (WGD)/segmental duplication was the predominant driver of NAC family expansion, accounting for 90.88% of duplication events. Approximately 39.30% of NAC genes carried at least one intact transposable element (TE) within 2 kb upstream or downstream regions. NAC genes with copy number variation (CNV) harbored more nearby TEs than non-CNV genes (1.54 vs. 1.31 TEs per gene), and dispensable NAC genes contained more nearby TEs than core NAC genes (1.59 vs. 1.33 TEs per gene). These results indicate a significant association between local TE abundance and NAC gene CNV or dispensability. Selection pressure analysis showed that dispensable NAC genes had higher Ka, Ks, and Ka/Ks values than core genes, suggesting relatively relaxed evolutionary constraints. Expression profiling across six tissues revealed distinct transcriptional patterns among NAC subfamilies. Structurally conserved subfamilies generally showed broader expression, whereas structurally divergent subfamilies displayed greater expression variability. Regulatory network and Gene Ontology (GO) enrichment analyses suggested that conserved subfamilies were mainly associated with stress responses, while divergent subfamilies were related to cell wall regulation, signal transduction, and ion homeostasis. Further analysis of Wm82 drought RNA-seq data prioritized several putative drought-responsive NAC candidates, including Glyma.16G043200, Glyma.06G248900, Glyma.07G050600, Glyma.12G206900, and Glyma.18G261300. Overall, these findings elucidate the mechanisms of expansion and the functional divergence of the NAC gene family at the soybean pan-genome level, providing a theoretical basis for understanding NAC gene evolution and facilitating future crop improvement.},
}

RevDate: 2026-07-15

Calvigioni M, Rossi V, Celandroni F, et al (2026)

Identification and in-depth characterization of clinical isolates of Peribacillus frigoritolerans.

Microbiology spectrum [Epub ahead of print].

UNLABELLED: Peribacillus frigoritolerans is a bacterial species commonly found in the environment and used as a plant-growth promoter and biocontrol agent in agriculture. Recent evidence has proven that Peribacillus spp. are also able to cause severe infections in humans, thus emerging as new human pathogens. In this study, for the first time, 10 P. frigoritolerans strains were isolated from human samples (both superficial and sterile deep body sites) and characterized in terms of morphology, lifestyle, genetics, and virulence. The molecular identification by MALDI-TOF mass spectrometry and 16S rRNA gene sequencing was inconclusive, while whole-genome sequencing was effective in properly identifying isolates within the species P. frigoritolerans. The pangenome analysis provided an overview of the virulence potential of P. frigoritolerans, revealing the presence of genes involved in antibiotic resistance and toxin/exoenzyme production. Phenotypically, the strains displayed different features and behaviors, indicating strain-specific properties and high intra-species variability. A part of the strains exhibited virulence factors, being able to swim and swarm, form biofilms, and produce enzymes and toxins. Antibiotic susceptibility testing revealed resistance to ampicillin for all strains and resistance to erythromycin and clindamycin for some of them. Antimicrobial activity against Gram-positive bacteria and fungi was demonstrated, further corroborating the presence of putative bacteriocin/antimicrobial peptide-encoding genes. An association between the overall virulence potential and infection site/severity was hypothesized. Altogether, these findings highlight the extreme diversity within the species, reveal the strain-dependent pathogenic potential of P. frigoritolerans, and support its role as a candidate human pathogen.

IMPORTANCE: This study provides insights into the infectious role of Peribacillus frigoritolerans, an almost unknown bacterial species with agrobiotechnological potential but no history of human infections. This is the first report of P. frigoritolerans isolation from human clinical samples. Ten P. frigorit-olerans strains were herein characterized for their morphology, lifestyle, genetics, and virulence, highlighting an extreme intra-species variability and the potential to act as pathogens in humans. Importantly, this study points out the need for unconventional methods for proper identification of this species, since traditional techniques result inconclusive. Resistance to commonly prescribed antibiotics was also evidenced, confirming the importance of antimicrobial testing on clinical iso-lates. This study lays the foundation for a more in-depth characterization of Peribacillus spp. in the clinical context.

Additional Links: PMID-42454910

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42454910,
year = {2026},
author = {Calvigioni, M and Rossi, V and Celandroni, F and Barnini, S and Lupetti, A and Mazzantini, D and Ghelardi, E},
title = {Identification and in-depth characterization of clinical isolates of Peribacillus frigoritolerans.},
journal = {Microbiology spectrum},
volume = {},
number = {},
pages = {e0024526},
doi = {10.1128/spectrum.00245-26},
pmid = {42454910},
issn = {2165-0497},
abstract = {UNLABELLED: Peribacillus frigoritolerans is a bacterial species commonly found in the environment and used as a plant-growth promoter and biocontrol agent in agriculture. Recent evidence has proven that Peribacillus spp. are also able to cause severe infections in humans, thus emerging as new human pathogens. In this study, for the first time, 10 P. frigoritolerans strains were isolated from human samples (both superficial and sterile deep body sites) and characterized in terms of morphology, lifestyle, genetics, and virulence. The molecular identification by MALDI-TOF mass spectrometry and 16S rRNA gene sequencing was inconclusive, while whole-genome sequencing was effective in properly identifying isolates within the species P. frigoritolerans. The pangenome analysis provided an overview of the virulence potential of P. frigoritolerans, revealing the presence of genes involved in antibiotic resistance and toxin/exoenzyme production. Phenotypically, the strains displayed different features and behaviors, indicating strain-specific properties and high intra-species variability. A part of the strains exhibited virulence factors, being able to swim and swarm, form biofilms, and produce enzymes and toxins. Antibiotic susceptibility testing revealed resistance to ampicillin for all strains and resistance to erythromycin and clindamycin for some of them. Antimicrobial activity against Gram-positive bacteria and fungi was demonstrated, further corroborating the presence of putative bacteriocin/antimicrobial peptide-encoding genes. An association between the overall virulence potential and infection site/severity was hypothesized. Altogether, these findings highlight the extreme diversity within the species, reveal the strain-dependent pathogenic potential of P. frigoritolerans, and support its role as a candidate human pathogen.

IMPORTANCE: This study provides insights into the infectious role of Peribacillus frigoritolerans, an almost unknown bacterial species with agrobiotechnological potential but no history of human infections. This is the first report of P. frigoritolerans isolation from human clinical samples. Ten P. frigorit-olerans strains were herein characterized for their morphology, lifestyle, genetics, and virulence, highlighting an extreme intra-species variability and the potential to act as pathogens in humans. Importantly, this study points out the need for unconventional methods for proper identification of this species, since traditional techniques result inconclusive. Resistance to commonly prescribed antibiotics was also evidenced, confirming the importance of antimicrobial testing on clinical iso-lates. This study lays the foundation for a more in-depth characterization of Peribacillus spp. in the clinical context.},
}

RevDate: 2026-07-16
CmpDate: 2026-07-16

Cárdenas JP, Vidal-Veuthey B, Meza K, et al (2026)

In-silico analysis of Bifidobacterium bifidum strain 900791 genome in the context of the B. bifidum pangenome.

Frontiers in cellular and infection microbiology, 16:1744409.

INTRODUCTION: Bifidobacterium bifidum is a key member of the human gut microbiota with well-recognised roles in intestinal homeostasis, glycan metabolism, and immunomodulation. Strain 900791, isolated from the meconium of a Siberian infant, has been used as a commercial probiotic ingredient for decades and has demonstrated, in previously published clinical trials, an ability to improve lactose tolerance and reduce gastrointestinal symptoms in both children and adults; however, the genomic basis for these properties has not been characterised.

METHODS: We present the complete genome sequence and comprehensive in silico genomic and pangenomic analysis of B. bifidum strain 900791. Hybrid sequencing was used for genome assembly. Phylogenomic analysis, including core genome MLST (cgMLST), integrated 229 high-quality B. bifidum genomes, representing the largest dataset for this species to date. Functional annotation and carbohydrate-active enzyme (CAZyme) profiling, antimicrobial resistance prediction, and bioinformatic screening for probiotic-associated genomic features were also performed.

RESULTS: Hybrid sequencing yielded a single circular chromosome of 2,280,092 bp, comprising 1,852 protein-coding sequences. Phylogenomic analysis revealed that strain 900791 belongs to a clonal subgroup of nine closely related strains (>99% cgMLST identity), consistent with a geographically structured lineage. The species pangenome comprised 4,152 orthogroups and a core of 1,450 gene families; 23 orthogroups were exclusive to the 900791 clonal subgroup, including a predicted lantibiotic biosynthetic cluster. CAZyme profiling identified glycoside hydrolase families associated with human milk oligosaccharide degradation (GH2, GH20, GH33, GH84), mucin glycan cleavage (including ten GH families), and lactose metabolism (GH2, GH42). Safety assessment identified only species-typical resistance to mupirocin and rifampicin, with no acquired resistance markers. Bioinformatic screening of the clonal subgroup detected the presence of adhesion-associated proteins, acid resistance systems, bile salt tolerance determinants, oxidative stress response proteins, and two putative bacteriocin gene clusters.

DISCUSSION: These findings provide a genomic framework consistent with the documented clinical role of strain 900791 in lactose tolerance and support its further investigation as a candidate probiotic. The probiotic-associated features identified here may help explain its observed properties and represent priority targets for experimental validation in future in vitro and in vivo studies.

Additional Links: PMID-42459333

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42459333,
year = {2026},
author = {Cárdenas, JP and Vidal-Veuthey, B and Meza, K and Almonacid, DE and Gutkevich, A},
title = {In-silico analysis of Bifidobacterium bifidum strain 900791 genome in the context of the B. bifidum pangenome.},
journal = {Frontiers in cellular and infection microbiology},
volume = {16},
number = {},
pages = {1744409},
pmid = {42459333},
issn = {2235-2988},
mesh = {*Bifidobacterium bifidum/genetics/classification/isolation & purification ; *Genome, Bacterial ; Phylogeny ; Humans ; Probiotics ; Computational Biology ; Genomics ; Multilocus Sequence Typing ; Whole Genome Sequencing ; Computer Simulation ; Sequence Analysis, DNA ; Drug Resistance, Bacterial ; },
abstract = {INTRODUCTION: Bifidobacterium bifidum is a key member of the human gut microbiota with well-recognised roles in intestinal homeostasis, glycan metabolism, and immunomodulation. Strain 900791, isolated from the meconium of a Siberian infant, has been used as a commercial probiotic ingredient for decades and has demonstrated, in previously published clinical trials, an ability to improve lactose tolerance and reduce gastrointestinal symptoms in both children and adults; however, the genomic basis for these properties has not been characterised.

METHODS: We present the complete genome sequence and comprehensive in silico genomic and pangenomic analysis of B. bifidum strain 900791. Hybrid sequencing was used for genome assembly. Phylogenomic analysis, including core genome MLST (cgMLST), integrated 229 high-quality B. bifidum genomes, representing the largest dataset for this species to date. Functional annotation and carbohydrate-active enzyme (CAZyme) profiling, antimicrobial resistance prediction, and bioinformatic screening for probiotic-associated genomic features were also performed.

RESULTS: Hybrid sequencing yielded a single circular chromosome of 2,280,092 bp, comprising 1,852 protein-coding sequences. Phylogenomic analysis revealed that strain 900791 belongs to a clonal subgroup of nine closely related strains (>99% cgMLST identity), consistent with a geographically structured lineage. The species pangenome comprised 4,152 orthogroups and a core of 1,450 gene families; 23 orthogroups were exclusive to the 900791 clonal subgroup, including a predicted lantibiotic biosynthetic cluster. CAZyme profiling identified glycoside hydrolase families associated with human milk oligosaccharide degradation (GH2, GH20, GH33, GH84), mucin glycan cleavage (including ten GH families), and lactose metabolism (GH2, GH42). Safety assessment identified only species-typical resistance to mupirocin and rifampicin, with no acquired resistance markers. Bioinformatic screening of the clonal subgroup detected the presence of adhesion-associated proteins, acid resistance systems, bile salt tolerance determinants, oxidative stress response proteins, and two putative bacteriocin gene clusters.

DISCUSSION: These findings provide a genomic framework consistent with the documented clinical role of strain 900791 in lactose tolerance and support its further investigation as a candidate probiotic. The probiotic-associated features identified here may help explain its observed properties and represent priority targets for experimental validation in future in vitro and in vivo studies.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Bifidobacterium bifidum/genetics/classification/isolation & purification
*Genome, Bacterial
Phylogeny
Humans
Probiotics
Computational Biology
Genomics
Multilocus Sequence Typing
Whole Genome Sequencing
Computer Simulation
Sequence Analysis, DNA
Drug Resistance, Bacterial

RevDate: 2026-07-14

Oles RE, Carrillo Terrrazas M, Loomis LR, et al (2026)

Comparative genomic analysis of Bacteroides fragilis from intestinal and extra-intestinal sites.

Microbiology spectrum [Epub ahead of print].

Bacteroides fragilis, a key member of the human gut microbiota, contributes to host health by maintaining intestinal homeostasis. Yet, it is also the most frequently isolated anaerobe in clinical infections. These contrasting roles raise questions about the genetic and ecological factors that explain why this common symbiont is disproportionately linked to infection. We analyzed 813 Division I B. fragilis genomes, including 147 new isolates from intestinal and extra-intestinal sites. Infection-associated isolates spanned all phylogroups, indicating no pathogenic lineage. We identified 16 phylogroups, distinguished by genes associated with capsule biosynthesis and interbacterial competition. Additionally, differential metabolomic analysis identified 12 metabolites associated with isolation source, while a microbial genome-wide association study uncovered 44 genes enriched in isolates from extra-intestinal sites, providing the first population-scale markers tied to clinical recovery sites. These results do not implicate a pathogenic lineage; instead, they point to associational links between accessory modules and recovery from extra-intestinal sites under permissive host conditions. This work underscores how genomic diversity and ecological context may jointly shape the clinical impact of gut commensals.IMPORTANCEBacteroides fragilis, a human gut resident, is paradoxically one of the most frequent anaerobes recovered from bloodstream and abscess infections. The genetic features that enable frequent recovery from extra-intestinal sites remain poorly defined. Using comparative genomic and metabolomic analyses of strains from intestinal and extra-intestinal sources, we show that strains isolated from infections are phylogenetically dispersed rather than restricted to a single lineage. We observe lineage-linked differences in capsule loci and competition systems, which suggests constrained gene flow and lineage-specific adaptation within the gut. Additionally, a subset of genes and metabolites is enriched among extra-intestinal isolates. Together, these findings suggest that extra-intestinal survival among B. fragilis strains reflects the interplay between species-wide genomic diversity and permissive host conditions, rather than the emergence of a single pathogenic lineage.

Additional Links: PMID-42446195

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42446195,
year = {2026},
author = {Oles, RE and Carrillo Terrrazas, M and Loomis, LR and Zuffa, S and Neal, MJ and Hsu, C-Y and Vasquez Ayala, A and Tribelhorn, C and Belda-Ferre, P and Zhao, J and Bryant, M and Zemlin, J and Young, J and Dulai, PS and Sandborn, WJ and Sivagnanam, M and Pride, D and Zengler, K and Dorrestein, PC and Knight, R and Chu, H},
title = {Comparative genomic analysis of Bacteroides fragilis from intestinal and extra-intestinal sites.},
journal = {Microbiology spectrum},
volume = {},
number = {},
pages = {e0030626},
doi = {10.1128/spectrum.00306-26},
pmid = {42446195},
issn = {2165-0497},
abstract = {Bacteroides fragilis, a key member of the human gut microbiota, contributes to host health by maintaining intestinal homeostasis. Yet, it is also the most frequently isolated anaerobe in clinical infections. These contrasting roles raise questions about the genetic and ecological factors that explain why this common symbiont is disproportionately linked to infection. We analyzed 813 Division I B. fragilis genomes, including 147 new isolates from intestinal and extra-intestinal sites. Infection-associated isolates spanned all phylogroups, indicating no pathogenic lineage. We identified 16 phylogroups, distinguished by genes associated with capsule biosynthesis and interbacterial competition. Additionally, differential metabolomic analysis identified 12 metabolites associated with isolation source, while a microbial genome-wide association study uncovered 44 genes enriched in isolates from extra-intestinal sites, providing the first population-scale markers tied to clinical recovery sites. These results do not implicate a pathogenic lineage; instead, they point to associational links between accessory modules and recovery from extra-intestinal sites under permissive host conditions. This work underscores how genomic diversity and ecological context may jointly shape the clinical impact of gut commensals.IMPORTANCEBacteroides fragilis, a human gut resident, is paradoxically one of the most frequent anaerobes recovered from bloodstream and abscess infections. The genetic features that enable frequent recovery from extra-intestinal sites remain poorly defined. Using comparative genomic and metabolomic analyses of strains from intestinal and extra-intestinal sources, we show that strains isolated from infections are phylogenetically dispersed rather than restricted to a single lineage. We observe lineage-linked differences in capsule loci and competition systems, which suggests constrained gene flow and lineage-specific adaptation within the gut. Additionally, a subset of genes and metabolites is enriched among extra-intestinal isolates. Together, these findings suggest that extra-intestinal survival among B. fragilis strains reflects the interplay between species-wide genomic diversity and permissive host conditions, rather than the emergence of a single pathogenic lineage.},
}

RevDate: 2026-07-14

Frolova M, Maguire B, Duessmann H, et al (2026)

HumanFilt: a multi-reference host depletion pipeline improves Fusobacterium detection accuracy in tumor WGS data sets.

mSystems [Epub ahead of print].

UNLABELLED: The study of tumor-associated microbiomes using whole-genome sequencing (WGS) has attracted considerable attention, but microbial signal detection remains controversial due to host contamination and methodological artifacts. As the necessity of human-read removal becomes increasingly evident, many groups now include this step in their data pre-processing workflows. In this work, we introduce an open-source tool, HumanFilt, designed for rigorous host-read removal and apply it to the re-analysis of WGS data from 10 mucinous rectal adenocarcinoma cases originally published by Reynolds et al. The workflow integrates k-mer-based classification (Kraken2), quality and adapter trimming (Trim Galore), vector filtering (BBDuk/UniVec_Core), and duplicate removal (FastUniq). After reducing data complexity, a multi-aligner, multi-reference approach (BWA-MEM/GRCh38, Bowtie2/T2T-CHM13, and Minimap2/Human Pangenome Reference Consortium v1.1) removes remaining host sequences, collectively eliminating more than 99.9% of human-derived reads. Although the additional alignment steps eliminated only a small fraction of total reads, they consistently removed millions of residual sequences per sample, underscoring the importance of rigorous filtering in data sets where non-human reads are a small minority. Taxonomic profiling with PathSeq and MetaPhlAn revealed reproducible enrichment of Fusobacterium species in tumor versus matched normal tissues, and comparison before and after filtering showed that this tumor-over-normal pattern was preserved despite an overall reduction in RPM values. Simulation analyses further showed that HumanFilt preserved more than 99.7% of true Fusobacterium signals, supporting high specificity without meaningful false-negative loss of microbial reads. In direct comparison with Deacon and NoHuman, HumanFilt achieved the most stringent host-read removal but also removed a greater proportion of PathSeq-classified Fusobacterium reads, highlighting the trade-off between maximal host depletion and preservation of ambiguous microbial signal. Cross-validation with immunofluorescence analysis using pan-Fusobacterium (detecting both Fusobacterium animalis and Fusobacterium nucleatum) and F. nucleatum-specific antibodies showed general consistency with Fusobacterium subspecies detected by WGS. Compared to the unfiltered analysis, host depletion markedly reduced artificial microbial signals in normal samples while preserving tumor-associated Fusobacterium, resulting in a more reliable microbial profile.

IMPORTANCE: We developed an open-source tool that enables rapid removal of human-derived sequences and applied it to rectal cancer whole-genome sequencing data. This approach reduced false microbial signals while preserving true tumor-associated Fusobacterium, and simulation analyses showed that it retained more than 99.7% of true Fusobacterium reads. Comparison with Deacon and NoHuman showed that HumanFilt achieved more stringent host depletion but also highlighted the trade-off between aggressive host filtering and preservation of ambiguous microbial signal. We also observed general consistency between the sequencing results and immunofluorescence staining in tissue. Together, these findings provide a more reliable basis for studying tumor-bacteria interactions.

Additional Links: PMID-42446235

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42446235,
year = {2026},
author = {Frolova, M and Maguire, B and Duessmann, H and Rayan, M and Burke, C and Korashy, HM and Raman, A and Gallagher, WM and Burke, JP and Furney, SJ and Prehn, JHM},
title = {HumanFilt: a multi-reference host depletion pipeline improves Fusobacterium detection accuracy in tumor WGS data sets.},
journal = {mSystems},
volume = {},
number = {},
pages = {e0055026},
doi = {10.1128/msystems.00550-26},
pmid = {42446235},
issn = {2379-5077},
abstract = {UNLABELLED: The study of tumor-associated microbiomes using whole-genome sequencing (WGS) has attracted considerable attention, but microbial signal detection remains controversial due to host contamination and methodological artifacts. As the necessity of human-read removal becomes increasingly evident, many groups now include this step in their data pre-processing workflows. In this work, we introduce an open-source tool, HumanFilt, designed for rigorous host-read removal and apply it to the re-analysis of WGS data from 10 mucinous rectal adenocarcinoma cases originally published by Reynolds et al. The workflow integrates k-mer-based classification (Kraken2), quality and adapter trimming (Trim Galore), vector filtering (BBDuk/UniVec_Core), and duplicate removal (FastUniq). After reducing data complexity, a multi-aligner, multi-reference approach (BWA-MEM/GRCh38, Bowtie2/T2T-CHM13, and Minimap2/Human Pangenome Reference Consortium v1.1) removes remaining host sequences, collectively eliminating more than 99.9% of human-derived reads. Although the additional alignment steps eliminated only a small fraction of total reads, they consistently removed millions of residual sequences per sample, underscoring the importance of rigorous filtering in data sets where non-human reads are a small minority. Taxonomic profiling with PathSeq and MetaPhlAn revealed reproducible enrichment of Fusobacterium species in tumor versus matched normal tissues, and comparison before and after filtering showed that this tumor-over-normal pattern was preserved despite an overall reduction in RPM values. Simulation analyses further showed that HumanFilt preserved more than 99.7% of true Fusobacterium signals, supporting high specificity without meaningful false-negative loss of microbial reads. In direct comparison with Deacon and NoHuman, HumanFilt achieved the most stringent host-read removal but also removed a greater proportion of PathSeq-classified Fusobacterium reads, highlighting the trade-off between maximal host depletion and preservation of ambiguous microbial signal. Cross-validation with immunofluorescence analysis using pan-Fusobacterium (detecting both Fusobacterium animalis and Fusobacterium nucleatum) and F. nucleatum-specific antibodies showed general consistency with Fusobacterium subspecies detected by WGS. Compared to the unfiltered analysis, host depletion markedly reduced artificial microbial signals in normal samples while preserving tumor-associated Fusobacterium, resulting in a more reliable microbial profile.

IMPORTANCE: We developed an open-source tool that enables rapid removal of human-derived sequences and applied it to rectal cancer whole-genome sequencing data. This approach reduced false microbial signals while preserving true tumor-associated Fusobacterium, and simulation analyses showed that it retained more than 99.7% of true Fusobacterium reads. Comparison with Deacon and NoHuman showed that HumanFilt achieved more stringent host depletion but also highlighted the trade-off between aggressive host filtering and preservation of ambiguous microbial signal. We also observed general consistency between the sequencing results and immunofluorescence staining in tissue. Together, these findings provide a more reliable basis for studying tumor-bacteria interactions.},
}

RevDate: 2026-07-14
CmpDate: 2026-07-14

Gregory JB, Harrison JW, Uehling JK, et al (2026)

Phylogenetically diverse Mucorales-Mycetohabitans endosymbiotic interactions identified from whole-genome sequencing using a targeted metagenomic assembly pipeline.

Microbial genomics, 12(7):.

Endosymbiotic bacteria of the genus Mycetohabitans are obligate intracellular associates of Mucorales fungi, yet the understanding of their diversity, distribution and evolutionary dynamics is in its infancy. By screening 1,696 public sequencing datasets from Mucorales fungi, we detected Mycetohabitans in 46 fungal accessions spanning 5 host taxa across the fungal genera Rhizopus and Apophysomyces. These included 13 previously unreported associations. Genome reconstruction yielded 38 Mycetohabitans metagenome-assembled genomes (MAGs), of which 34 were of high quality. Incorporating these MAGs into genome-based species delimitation expanded known Mycetohabitans diversity from four to nine species-level clusters, including novel host-associated lineages. Re-examination of fungal host identities revealed frequent misidentification of isolates in fungal collection catalogues and/or misannotation in GenBank, with nearly a quarter of positive datasets requiring correction through internal transcribed spacer and genome-scale verification. Host-symbiont associations were non-random under this revised framework, with significant structure detected by contingency analysis and ParaFit. MAG-focused pangenome analysis revealed an open pangenome and mosaic lineage-associated functional traits, including variation in metabolism, secretion, cell-envelope systems, metal resistance, antimicrobial-resistance-associated functions and mobile elements. The most distinctive lineage comprised two Apophysomyces-associated MAGs, provisionally named M. apophysomyceticola, which showed pronounced genome reduction compared with other sampled Mycetohabitans spp. and loss of multiple central metabolic, nutrient assimilation, cofactor biosynthesis, catabolic, stress-response and defence pathways, consistent with reduced metabolic flexibility and increased host dependence. Together, these results show that Mycetohabitans symbioses are more geographically widespread, taxonomically diverse and functionally differentiated than previously recognized. More broadly, this work demonstrates the value of public sequencing repositories for uncovering hidden fungal-bacterial symbioses, while emphasizing that repository-derived patterns must be interpreted considering host misidentification, uneven sampling and incomplete metadata. Overall, our work establishes a global framework for Mycetohabitans diversity and function, with implications for fungal ecology, evolution and clinical mycology.

Additional Links: PMID-42446350

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42446350,
year = {2026},
author = {Gregory, JB and Harrison, JW and Uehling, JK and Farrer, RA and Ballou, ER},
title = {Phylogenetically diverse Mucorales-Mycetohabitans endosymbiotic interactions identified from whole-genome sequencing using a targeted metagenomic assembly pipeline.},
journal = {Microbial genomics},
volume = {12},
number = {7},
pages = {},
pmid = {42446350},
issn = {2057-5858},
mesh = {*Symbiosis/genetics ; Phylogeny ; *Mucorales/genetics/classification/physiology ; Whole Genome Sequencing/methods ; Metagenomics/methods ; Metagenome ; Genome, Fungal ; },
abstract = {Endosymbiotic bacteria of the genus Mycetohabitans are obligate intracellular associates of Mucorales fungi, yet the understanding of their diversity, distribution and evolutionary dynamics is in its infancy. By screening 1,696 public sequencing datasets from Mucorales fungi, we detected Mycetohabitans in 46 fungal accessions spanning 5 host taxa across the fungal genera Rhizopus and Apophysomyces. These included 13 previously unreported associations. Genome reconstruction yielded 38 Mycetohabitans metagenome-assembled genomes (MAGs), of which 34 were of high quality. Incorporating these MAGs into genome-based species delimitation expanded known Mycetohabitans diversity from four to nine species-level clusters, including novel host-associated lineages. Re-examination of fungal host identities revealed frequent misidentification of isolates in fungal collection catalogues and/or misannotation in GenBank, with nearly a quarter of positive datasets requiring correction through internal transcribed spacer and genome-scale verification. Host-symbiont associations were non-random under this revised framework, with significant structure detected by contingency analysis and ParaFit. MAG-focused pangenome analysis revealed an open pangenome and mosaic lineage-associated functional traits, including variation in metabolism, secretion, cell-envelope systems, metal resistance, antimicrobial-resistance-associated functions and mobile elements. The most distinctive lineage comprised two Apophysomyces-associated MAGs, provisionally named M. apophysomyceticola, which showed pronounced genome reduction compared with other sampled Mycetohabitans spp. and loss of multiple central metabolic, nutrient assimilation, cofactor biosynthesis, catabolic, stress-response and defence pathways, consistent with reduced metabolic flexibility and increased host dependence. Together, these results show that Mycetohabitans symbioses are more geographically widespread, taxonomically diverse and functionally differentiated than previously recognized. More broadly, this work demonstrates the value of public sequencing repositories for uncovering hidden fungal-bacterial symbioses, while emphasizing that repository-derived patterns must be interpreted considering host misidentification, uneven sampling and incomplete metadata. Overall, our work establishes a global framework for Mycetohabitans diversity and function, with implications for fungal ecology, evolution and clinical mycology.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Symbiosis/genetics
Phylogeny
*Mucorales/genetics/classification/physiology
Whole Genome Sequencing/methods
Metagenomics/methods
Metagenome
Genome, Fungal

RevDate: 2026-07-15
CmpDate: 2026-07-15

Belkova N, Smurova N, Zugeeva R, et al (2026)

Pathogenic Potential of Pseudoxanthomonas kaohsiungensis Strain IMB-1 Based on Whole-Genome Sequencing.

Biology, 15(13): pii:biology15131010.

Mass spectrometry and high-throughput sequencing have been introduced into clinical bacteriology. We characterized strain IMB-1, previously isolated from the cerebrospinal fluid of a child, as Pseudoxanthomonas kaohsiungensis and analyzed its biological properties, resistance phenotype, and complete genome. The IMB-1 strain displayed amylolytic, weak lipolytic activities, and it exhibited a phenotypic resistance profile only for aminoglycosides. The dDDH calculation based on the complete genome sequence showed that strain IMB-1 was closely grouped with the type strain P. kaohsiungensis DSM 17583, and the dDDH (d4) value was 70.1%. A comparative pan-genome analysis was performed for four P. kaohsiungensis genomes, revealing a substantial shared core genome. The IMB-1 genome contained 508 unique gene clusters, representing the largest strain-specific gene set among the analyzed genomes, suggesting genomic plasticity and adaptation to the host-associated environment. Genome annotation revealed genes responsible for antibiotic, disinfecting agent, and antiseptic resistance. Gene clusters exhibiting the potential to form biofilms, adhere to the epithelial surface, and exhibit resistance to stress factors were identified. Our study demonstrates that strain IMB-1 is a potential opportunistic pathogen with significant pathogenic potential. The application of high-resolution whole-genome sequencing data in public health for pathogen identification and monitoring can improve the accuracy of infection source determination, reduce the scale and burden of outbreaks, and identify and quantify antimicrobial resistance in pathogens.

Additional Links: PMID-42450558

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42450558,
year = {2026},
author = {Belkova, N and Smurova, N and Zugeeva, R and Klimenko, E and Grigorova, E and Dorzhieva, M and Nemchenko, U},
title = {Pathogenic Potential of Pseudoxanthomonas kaohsiungensis Strain IMB-1 Based on Whole-Genome Sequencing.},
journal = {Biology},
volume = {15},
number = {13},
pages = {},
doi = {10.3390/biology15131010},
pmid = {42450558},
issn = {2079-7737},
support = {126020216227-3//Ministry of Science and Higher Education of the Russian Federation/ ; },
abstract = {Mass spectrometry and high-throughput sequencing have been introduced into clinical bacteriology. We characterized strain IMB-1, previously isolated from the cerebrospinal fluid of a child, as Pseudoxanthomonas kaohsiungensis and analyzed its biological properties, resistance phenotype, and complete genome. The IMB-1 strain displayed amylolytic, weak lipolytic activities, and it exhibited a phenotypic resistance profile only for aminoglycosides. The dDDH calculation based on the complete genome sequence showed that strain IMB-1 was closely grouped with the type strain P. kaohsiungensis DSM 17583, and the dDDH (d4) value was 70.1%. A comparative pan-genome analysis was performed for four P. kaohsiungensis genomes, revealing a substantial shared core genome. The IMB-1 genome contained 508 unique gene clusters, representing the largest strain-specific gene set among the analyzed genomes, suggesting genomic plasticity and adaptation to the host-associated environment. Genome annotation revealed genes responsible for antibiotic, disinfecting agent, and antiseptic resistance. Gene clusters exhibiting the potential to form biofilms, adhere to the epithelial surface, and exhibit resistance to stress factors were identified. Our study demonstrates that strain IMB-1 is a potential opportunistic pathogen with significant pathogenic potential. The application of high-resolution whole-genome sequencing data in public health for pathogen identification and monitoring can improve the accuracy of infection source determination, reduce the scale and burden of outbreaks, and identify and quantify antimicrobial resistance in pathogens.},
}

RevDate: 2026-07-13

Chaichana N, Singkhamanan K, Wonglapsuwan M, et al (2026)

Genome-guided characterization of Weissella species highlights strain-specific functional traits relevant to food fermentation and probiotics.

Microbiology spectrum [Epub ahead of print].

UNLABELLED: The genus Weissella comprises heterofermentative lactic acid bacteria (LAB) widely distributed in fermented foods, plant-associated environments, and host-associated niches. However, the functional diversity, safety-related genomic features, and evolutionary dynamics of the genus remain incompletely understood at the genomic level. This study performed a comprehensive comparative genomic analysis of 347 high-quality Weissella genomes representing 16 assigned species to elucidate their genomic diversity, pan-genome architecture, and functional potential relevant to food fermentation, probiotics, and biotechnology. Pan-genome analysis revealed an open pan-genome following a power-law model with an exponent of 0.304, dominated by shell and cloud genes, indicating substantial genomic plasticity and ongoing gene acquisition. In silico safety screening showed that the vast majority of genomes lacked detectable antimicrobial resistance or virulence-associated genes under the applied criteria, indicating a low detectable burden of these genetic determinants across the data set. Moreover, functional annotation revealed conserved carbohydrate-active enzyme (CAZyme) repertoires dominated by glycoside hydrolases (GHs) and glycosyltransferases (GTs), alongside species-specific variation in carbohydrate-binding modules (CBMs). Comparative synteny analysis identified six distinct architectures of exopolysaccharide (EPS) biosynthesis loci, with W. cibaria more frequently harboring regulator-rich, putatively complete EPS gene clusters. Putative probiotic-associated genetic markers were unevenly distributed across species, mainly involving stress resistance, vitamin biosynthesis, GIT tolerance, and oxidative stress resistance, whereas GABA production and antimicrobial metabolite-associated markers were not detected in the data set. Hence, this genome-resolved analysis provides a systematic framework for understanding Weissella diversity and supports a genome-guided, strain-level approach for prioritizing candidate strains for future experimental validation in food, health, and biotechnological applications.

IMPORTANCE: Microbial traits relevant to food fermentation, probiotic performance, and safety emerge from complex interactions among core metabolism, accessory genes, and genome plasticity; however, these relationships remain poorly resolved at the genus scale. By analyzing 347 high-quality genomes spanning 16 Weissella species, this study reveals how an open pan-genome, extensive accessory gene diversity, and lineage-specific gene architectures shape functional potential across this lactic acid bacterial genus. Key traits, including EPS biosynthesis, carbohydrate utilization capacity, stress resilience, and biosynthetic gene cluster distribution, are unevenly structured across species and strains, emphasizing the need for genome-guided strain selection rather than taxonomic assumptions. Integrating phylogenomics, gene content variation, and functional profiling links genome evolution with applied phenotypes. These findings advance understanding of how genomic diversity underpins ecological adaptation and biotechnological performance and provide a blueprint for selecting safe and functionally optimized strains in food and health applications.

Additional Links: PMID-42439591

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42439591,
year = {2026},
author = {Chaichana, N and Singkhamanan, K and Wonglapsuwan, M and Pomwised, R and Surachat, K},
title = {Genome-guided characterization of Weissella species highlights strain-specific functional traits relevant to food fermentation and probiotics.},
journal = {Microbiology spectrum},
volume = {},
number = {},
pages = {e0033126},
doi = {10.1128/spectrum.00331-26},
pmid = {42439591},
issn = {2165-0497},
abstract = {UNLABELLED: The genus Weissella comprises heterofermentative lactic acid bacteria (LAB) widely distributed in fermented foods, plant-associated environments, and host-associated niches. However, the functional diversity, safety-related genomic features, and evolutionary dynamics of the genus remain incompletely understood at the genomic level. This study performed a comprehensive comparative genomic analysis of 347 high-quality Weissella genomes representing 16 assigned species to elucidate their genomic diversity, pan-genome architecture, and functional potential relevant to food fermentation, probiotics, and biotechnology. Pan-genome analysis revealed an open pan-genome following a power-law model with an exponent of 0.304, dominated by shell and cloud genes, indicating substantial genomic plasticity and ongoing gene acquisition. In silico safety screening showed that the vast majority of genomes lacked detectable antimicrobial resistance or virulence-associated genes under the applied criteria, indicating a low detectable burden of these genetic determinants across the data set. Moreover, functional annotation revealed conserved carbohydrate-active enzyme (CAZyme) repertoires dominated by glycoside hydrolases (GHs) and glycosyltransferases (GTs), alongside species-specific variation in carbohydrate-binding modules (CBMs). Comparative synteny analysis identified six distinct architectures of exopolysaccharide (EPS) biosynthesis loci, with W. cibaria more frequently harboring regulator-rich, putatively complete EPS gene clusters. Putative probiotic-associated genetic markers were unevenly distributed across species, mainly involving stress resistance, vitamin biosynthesis, GIT tolerance, and oxidative stress resistance, whereas GABA production and antimicrobial metabolite-associated markers were not detected in the data set. Hence, this genome-resolved analysis provides a systematic framework for understanding Weissella diversity and supports a genome-guided, strain-level approach for prioritizing candidate strains for future experimental validation in food, health, and biotechnological applications.

IMPORTANCE: Microbial traits relevant to food fermentation, probiotic performance, and safety emerge from complex interactions among core metabolism, accessory genes, and genome plasticity; however, these relationships remain poorly resolved at the genus scale. By analyzing 347 high-quality genomes spanning 16 Weissella species, this study reveals how an open pan-genome, extensive accessory gene diversity, and lineage-specific gene architectures shape functional potential across this lactic acid bacterial genus. Key traits, including EPS biosynthesis, carbohydrate utilization capacity, stress resilience, and biosynthetic gene cluster distribution, are unevenly structured across species and strains, emphasizing the need for genome-guided strain selection rather than taxonomic assumptions. Integrating phylogenomics, gene content variation, and functional profiling links genome evolution with applied phenotypes. These findings advance understanding of how genomic diversity underpins ecological adaptation and biotechnological performance and provide a blueprint for selecting safe and functionally optimized strains in food and health applications.},
}

RevDate: 2026-07-13

Zeng Z, Wang J, Norbu N, et al (2026)

Comparative genomics clarifies phylogenetic relationships and genome evolution in Hippophae from the Qinghai-Tibet plateau.

Molecular phylogenetics and evolution pii:S1055-7903(26)00160-0 [Epub ahead of print].

The uplift of the Qinghai-Tibet Plateau (QTP) and associated climatic oscillations have shaped plant evolution, yet how adaptive strategies diversify along elevational, moisture, and latitudinal gradients within a single clade remains poorly understood. Hippophae (Elaeagnaceae) is distributed along these gradients, with all members sharing Frankia-mediated nitrogen-fixing symbiosis and dioecy. Here, we assembled chromosome-level genomes of H. neurocarpa, H. rhamnoides subsp. yunnanensis, and H. rhamnoides subsp. turkestanica, integrated them with four published assemblies, covering seven taxa (species and subspecies) of the genus. Whole-genome phylogenies dated the major intra-generic divergences to 6.71-2.83 Ma, coinciding with QTP uplift and Asian monsoon intensification during the late Miocene-Pliocene. Two ancient whole-genome duplications (ca. 35-40 and 25-30 Ma) predated the radiation, with retained paralogs significantly enriched in plant hormone signaling, ABA/MAPK cascades, cold-stress response, and reactive oxygen species metabolism. LTR retrotransposons showed a cross-species insertion peak at ∼ 0.2 Ma, and their flanking genes were repeatedly associated with ABA signaling, cold and UV-B responses, and flavonoid metabolism, suggesting a possible link to Pleistocene oscillations. Pan-genome analysis revealed core gene families comprising ∼ 55% of the pan-genome, while variable families differed along ecological gradients, with high-elevation H. tibetana harboring the highest proportion of species-specific genes. Lineage-specific positively selected genes were enriched in DNA damage repair, translational fidelity, and stress signaling. Together, these findings suggest that multidirectional ecological divergence in Hippophae was associated with ancient duplicate retention, transposon-associated regulatory variation, and lineage-specific selection, exemplifying rapid adaptive diversification in plants of the QTP and adjacent regions.

Additional Links: PMID-42442563

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42442563,
year = {2026},
author = {Zeng, Z and Wang, J and Norbu, N and Bonjor, N and Tan, X and Zhang, S and Li, W and Zhang, T and Zhang, W and Qiong, L},
title = {Comparative genomics clarifies phylogenetic relationships and genome evolution in Hippophae from the Qinghai-Tibet plateau.},
journal = {Molecular phylogenetics and evolution},
volume = {},
number = {},
pages = {108690},
doi = {10.1016/j.ympev.2026.108690},
pmid = {42442563},
issn = {1095-9513},
abstract = {The uplift of the Qinghai-Tibet Plateau (QTP) and associated climatic oscillations have shaped plant evolution, yet how adaptive strategies diversify along elevational, moisture, and latitudinal gradients within a single clade remains poorly understood. Hippophae (Elaeagnaceae) is distributed along these gradients, with all members sharing Frankia-mediated nitrogen-fixing symbiosis and dioecy. Here, we assembled chromosome-level genomes of H. neurocarpa, H. rhamnoides subsp. yunnanensis, and H. rhamnoides subsp. turkestanica, integrated them with four published assemblies, covering seven taxa (species and subspecies) of the genus. Whole-genome phylogenies dated the major intra-generic divergences to 6.71-2.83 Ma, coinciding with QTP uplift and Asian monsoon intensification during the late Miocene-Pliocene. Two ancient whole-genome duplications (ca. 35-40 and 25-30 Ma) predated the radiation, with retained paralogs significantly enriched in plant hormone signaling, ABA/MAPK cascades, cold-stress response, and reactive oxygen species metabolism. LTR retrotransposons showed a cross-species insertion peak at ∼ 0.2 Ma, and their flanking genes were repeatedly associated with ABA signaling, cold and UV-B responses, and flavonoid metabolism, suggesting a possible link to Pleistocene oscillations. Pan-genome analysis revealed core gene families comprising ∼ 55% of the pan-genome, while variable families differed along ecological gradients, with high-elevation H. tibetana harboring the highest proportion of species-specific genes. Lineage-specific positively selected genes were enriched in DNA damage repair, translational fidelity, and stress signaling. Together, these findings suggest that multidirectional ecological divergence in Hippophae was associated with ancient duplicate retention, transposon-associated regulatory variation, and lineage-specific selection, exemplifying rapid adaptive diversification in plants of the QTP and adjacent regions.},
}

RevDate: 2026-07-13

Kawarizadeh A, Marenda MS, Bailey KE, et al (2026)

Extensively drug-resistant and extended spectrum β-lactamase-producing Raoultella terrigena from the reproductive tract of a mare in Australia.

Scientific reports pii:10.1038/s41598-026-60203-8 [Epub ahead of print].

Raoultella terrigena is a bacterium commonly isolated from environmental sources, with rare reports of isolation from clinical samples. Multidrug resistance is also rare. In this study, R. terrigena was obtained from the uterine lavage of a thoroughbred mare. Broth microdilution and disc diffusion methods were used to determine susceptibility to antimicrobials. Whole genome sequencing was performed to identify and characterise the presence of mobile genetic elements, antimicrobial resistance and virulence genes. Pan-genome analysis allowed comparison of the genome of this isolate with other R. terrigena isolates. Phenotypic resistance was detected to cefazolin, ceftiofur, cefotaxime, ticarcillin/clavulanic acid, tetracycline, doxycycline, gentamicin, sulfamethoxazole/trimethoprim and chloramphenicol. The genome included two plasmids (IncQ1/IncU and IncFII/rep_cluster_2078 types). Also identified in the IncQ1/IncU plasmid were genes conferring resistance to aminoglycosides, fluoroquinolones, rifamycin, phenicols, cephalosporins, diaminopyrimidines, sulfonamides and tetracycline. Except for enrofloxacin susceptibility, the presence of resistance genes was consistent with observed phenotypes. Genes conferring resistance to antiseptics, biocides and metals were also detected. Phylogenetic analysis of the combined dataset, including the isolate in this study and publicly available genomes, revealed two distinct clades, with equine isolates clustering together with other animal-derived isolates. The emergence of antimicrobial resistance in rare opportunistic pathogens underlines the importance of continuous monitoring of such bacteria and emphasizing the need for developing antimicrobial stewardship programs in veterinary settings.

Additional Links: PMID-42443270

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42443270,
year = {2026},
author = {Kawarizadeh, A and Marenda, MS and Bailey, KE and Hardefeldt, LY and Mitchell, K and Chicken, C and Blishen, A and Bushell, RN and Young, ND and Gilkerson, JR},
title = {Extensively drug-resistant and extended spectrum β-lactamase-producing Raoultella terrigena from the reproductive tract of a mare in Australia.},
journal = {Scientific reports},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41598-026-60203-8},
pmid = {42443270},
issn = {2045-2322},
abstract = {Raoultella terrigena is a bacterium commonly isolated from environmental sources, with rare reports of isolation from clinical samples. Multidrug resistance is also rare. In this study, R. terrigena was obtained from the uterine lavage of a thoroughbred mare. Broth microdilution and disc diffusion methods were used to determine susceptibility to antimicrobials. Whole genome sequencing was performed to identify and characterise the presence of mobile genetic elements, antimicrobial resistance and virulence genes. Pan-genome analysis allowed comparison of the genome of this isolate with other R. terrigena isolates. Phenotypic resistance was detected to cefazolin, ceftiofur, cefotaxime, ticarcillin/clavulanic acid, tetracycline, doxycycline, gentamicin, sulfamethoxazole/trimethoprim and chloramphenicol. The genome included two plasmids (IncQ1/IncU and IncFII/rep_cluster_2078 types). Also identified in the IncQ1/IncU plasmid were genes conferring resistance to aminoglycosides, fluoroquinolones, rifamycin, phenicols, cephalosporins, diaminopyrimidines, sulfonamides and tetracycline. Except for enrofloxacin susceptibility, the presence of resistance genes was consistent with observed phenotypes. Genes conferring resistance to antiseptics, biocides and metals were also detected. Phylogenetic analysis of the combined dataset, including the isolate in this study and publicly available genomes, revealed two distinct clades, with equine isolates clustering together with other animal-derived isolates. The emergence of antimicrobial resistance in rare opportunistic pathogens underlines the importance of continuous monitoring of such bacteria and emphasizing the need for developing antimicrobial stewardship programs in veterinary settings.},
}

RevDate: 2026-07-13

Chen H, Xing L, Guan C, et al (2026)

Graph-based pan-genome reveals structural variations associated with agronomic traits in mung bean.

Nature genetics, 58(7):1696-1710.

Mung bean (Vigna radiata) is a globally important legume crop valued for its short growing cycle, nitrogen-fixing capacity and high nutritional value, particularly in developing countries. Here we report a comprehensive graph-based pan-genome assembled from 11 genetically diverse global accessions. The framework captures 75,268 gene families (50.86% core, 35.19% dispensable and 13.95% private) and 66,862 nonredundant structural variants. Integrating these structural variants and single nucleotide polymorphisms, genome-wide association studies across five environments identified candidate genes for 20 agronomic traits, underscoring the pivotal roles of these variants in driving mung bean domestication and improvement. Mechanistically, we demonstrate that a 68-bp promoter insertion in VrTIFY6B and a 136-bp promoter deletion in VrPGIP1 regulate flavonoid content and confer bruchid resistance, respectively. These genomic resources and actionable functional variants provide a powerful toolkit to accelerate mung bean improvement through marker-assisted breeding, genomic selection and genome editing to address global food security.

Additional Links: PMID-42432247

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42432247,
year = {2026},
author = {Chen, H and Xing, L and Guan, C and Liu, Y and Chen, T and Cao, R and Song, Q and Barmukh, R and Mir, RR and Hu, L and Zhou, B and Luo, G and Xu, D and Yin, F and Yuan, X and Wang, S and Chen, X and Wang, L and Jiao, C and Du, H and Zhou, M and Varshney, RK and Cheng, X},
title = {Graph-based pan-genome reveals structural variations associated with agronomic traits in mung bean.},
journal = {Nature genetics},
volume = {58},
number = {7},
pages = {1696-1710},
pmid = {42432247},
issn = {1546-1718},
support = {UMU2403-009RTX//Grains Research and Development Corporation (Grains Research & Development Corporation)/ ; UOQ2403-012RTX//Grains Research and Development Corporation (Grains Research & Development Corporation)/ ; 332572308//National Natural Science Foundation of China (National Science Foundation of China)/ ; 32241041//National Natural Science Foundation of China (National Science Foundation of China)/ ; 31401442//National Natural Science Foundation of China (National Science Foundation of China)/ ; },
abstract = {Mung bean (Vigna radiata) is a globally important legume crop valued for its short growing cycle, nitrogen-fixing capacity and high nutritional value, particularly in developing countries. Here we report a comprehensive graph-based pan-genome assembled from 11 genetically diverse global accessions. The framework captures 75,268 gene families (50.86% core, 35.19% dispensable and 13.95% private) and 66,862 nonredundant structural variants. Integrating these structural variants and single nucleotide polymorphisms, genome-wide association studies across five environments identified candidate genes for 20 agronomic traits, underscoring the pivotal roles of these variants in driving mung bean domestication and improvement. Mechanistically, we demonstrate that a 68-bp promoter insertion in VrTIFY6B and a 136-bp promoter deletion in VrPGIP1 regulate flavonoid content and confer bruchid resistance, respectively. These genomic resources and actionable functional variants provide a powerful toolkit to accelerate mung bean improvement through marker-assisted breeding, genomic selection and genome editing to address global food security.},
}

RevDate: 2026-07-11

Adjei MO, Guan C, Dilshad A, et al (2026)

Pan-genomic analysis reveals ecological adaptation and biocontrol of Bacillus pumilus.

Journal of plant physiology, 325:154834 pii:S0176-1617(26)00147-1 [Epub ahead of print].

Bacillus pumilus is known for its ecological resilience and plant-beneficial traits; however, the genomic basis of its biocontrol potential remains unclear. Here, we performed a comparative pan-genome analysis of ecologically diverse B. pumilus strains to explore the genetic determinants underlying plant association and antimicrobial activity. The species exhibited an open pan-genome containing 6035 gene clusters, including a conserved core genome of 3078 clusters and a diverse accessory genome. Functional annotation revealed conserved gene clusters involved in biofilm formation, sporulation, auxin biosynthesis, metal detoxification, short-chain fatty acid regulation, and nutrient-sensing regulators. Genome mining further identified conserved biosynthetic gene clusters encoding antimicrobial compounds and siderophores, including bacilysin and bacillibactin, which are associated with pathogen suppression and plant protection. Several of these strains also possessed unique gene clusters linked to nutrient acquisition, metal detoxification, and environmental adaptation. The universal presence of the chloramphenicol resistance gene (cat86) underscores a conserved adaptive trait. These findings indicate that the ecological versatility of B. pumilus is driven by a stable core genome combined with a dynamic accessory genome enriched in its secondary metabolite pathways, highlighting its potential for sustainable agriculture and biological control. This study presents original research findings based on comparative pan-genome analysis of Bacillus pumilus strains.

Additional Links: PMID-42435517

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42435517,
year = {2026},
author = {Adjei, MO and Guan, C and Dilshad, A and Fan, B},
title = {Pan-genomic analysis reveals ecological adaptation and biocontrol of Bacillus pumilus.},
journal = {Journal of plant physiology},
volume = {325},
number = {},
pages = {154834},
doi = {10.1016/j.jplph.2026.154834},
pmid = {42435517},
issn = {1618-1328},
abstract = {Bacillus pumilus is known for its ecological resilience and plant-beneficial traits; however, the genomic basis of its biocontrol potential remains unclear. Here, we performed a comparative pan-genome analysis of ecologically diverse B. pumilus strains to explore the genetic determinants underlying plant association and antimicrobial activity. The species exhibited an open pan-genome containing 6035 gene clusters, including a conserved core genome of 3078 clusters and a diverse accessory genome. Functional annotation revealed conserved gene clusters involved in biofilm formation, sporulation, auxin biosynthesis, metal detoxification, short-chain fatty acid regulation, and nutrient-sensing regulators. Genome mining further identified conserved biosynthetic gene clusters encoding antimicrobial compounds and siderophores, including bacilysin and bacillibactin, which are associated with pathogen suppression and plant protection. Several of these strains also possessed unique gene clusters linked to nutrient acquisition, metal detoxification, and environmental adaptation. The universal presence of the chloramphenicol resistance gene (cat86) underscores a conserved adaptive trait. These findings indicate that the ecological versatility of B. pumilus is driven by a stable core genome combined with a dynamic accessory genome enriched in its secondary metabolite pathways, highlighting its potential for sustainable agriculture and biological control. This study presents original research findings based on comparative pan-genome analysis of Bacillus pumilus strains.},
}

RevDate: 2026-07-11

Corbín-Agustí P, Álvarez-Herrera M, Román-Écija M, et al (2026)

A metabolic model based on a pangenome core reveals putative conserved biochemical features of the phytopathogen Xylella fastidiosa.

Microbiological research, 312:128616 pii:S0944-5013(26)00180-1 [Epub ahead of print].

Xylella fastidiosa is a xylem-limited phytopathogenic bacterium responsible for severe diseases in many economically important crops. Despite its impact, its metabolism remains poorly characterized due to fastidious growth and the limited availability of defined culture media. Here, we reconstruct the first pangenome-based genome-scale metabolic model for X. fastidiosa, integrating conserved metabolic functions from 18 strains across five subspecies. The resulting consensus model, iXfcore, is manually curated and used to explore the species' metabolic capabilities. Model simulations predict minimal nutritional requirements that guide us in the formulation of defined media to assess biofilm formation in vitro, supporting the utility of the resulting predictions. Network analysis also identifies a previously undescribed model-predicted candidate pathway for acetate assimilation, consistent with genomic evidence but requiring further empirical validation. In addition, the model predicts the overproduction of polyamines, compounds linked to virulence in other phytopathogens. Experimental analyses confirm polyamine production in multiple X. fastidiosa strains in vitro, providing the first evidence of polyamine detection in culture supernatants of this phytopathogen. Overall, iXfcore provides a systems-level framework to investigate X. fastidiosa metabolism, generate testable hypotheses on its physiology and putative virulence-associated traits, and support future strain-specific models and studies of host-pathogen metabolic interactions.

Additional Links: PMID-42435656

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42435656,
year = {2026},
author = {Corbín-Agustí, P and Álvarez-Herrera, M and Román-Écija, M and Álvarez, P and Tortajada, M and Landa, BB and Peretó, J},
title = {A metabolic model based on a pangenome core reveals putative conserved biochemical features of the phytopathogen Xylella fastidiosa.},
journal = {Microbiological research},
volume = {312},
number = {},
pages = {128616},
doi = {10.1016/j.micres.2026.128616},
pmid = {42435656},
issn = {1618-0623},
abstract = {Xylella fastidiosa is a xylem-limited phytopathogenic bacterium responsible for severe diseases in many economically important crops. Despite its impact, its metabolism remains poorly characterized due to fastidious growth and the limited availability of defined culture media. Here, we reconstruct the first pangenome-based genome-scale metabolic model for X. fastidiosa, integrating conserved metabolic functions from 18 strains across five subspecies. The resulting consensus model, iXfcore, is manually curated and used to explore the species' metabolic capabilities. Model simulations predict minimal nutritional requirements that guide us in the formulation of defined media to assess biofilm formation in vitro, supporting the utility of the resulting predictions. Network analysis also identifies a previously undescribed model-predicted candidate pathway for acetate assimilation, consistent with genomic evidence but requiring further empirical validation. In addition, the model predicts the overproduction of polyamines, compounds linked to virulence in other phytopathogens. Experimental analyses confirm polyamine production in multiple X. fastidiosa strains in vitro, providing the first evidence of polyamine detection in culture supernatants of this phytopathogen. Overall, iXfcore provides a systems-level framework to investigate X. fastidiosa metabolism, generate testable hypotheses on its physiology and putative virulence-associated traits, and support future strain-specific models and studies of host-pathogen metabolic interactions.},
}

RevDate: 2026-07-13

Li D, Wang YQ, Huang Y, et al (2026)

A Minimalist Core and Lineage-Specific Expansion Underpin the Adaptive Evolution of the Cyclic Nucleotide-gated Channel (CNGC) Family: Insights from the Tea (Camellia sinensis) Pan-Genome.

Journal of agricultural and food chemistry [Epub ahead of print].

Cyclic nucleotide-gated channels (CNGCs) are critical Ca[2+]-permeable cation channels that orchestrate plant stress responses, yet their evolutionary dynamics in tea plants (Camellia sinensis) remain unknown. Here we dissect the CNGC family using a pan-genome of 28 tea accessions. We identified 867 CNGC genes into 30 orthogroups, revealing a dual evolutionary strategy: an ultraconstrained, two-gene core (CsCNGC8/17) under strong purifying selection ensuring signaling fidelity, and Group IVc─a novel clade (39.4% of the family)─exhibiting lineage-specific motif reconfiguration, dispersed duplications and elevated Ka/Ks ratios. Expression profiling suggested that Group IVc members are broadly upregulated across multiple stresses, whereas core genes display condition-specific responses. This unique architecture─balancing a minimalist, conserved core with a massively expanded, plastic periphery─provides a robust adaptive solution to the lifelong stress exposure characteristic of perennial tea plants. These findings offer critical insights into stress-response evolution and valuable molecular targets for breeding resilient tea cultivars.

Additional Links: PMID-42438126

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42438126,
year = {2026},
author = {Li, D and Wang, YQ and Huang, Y and Jin, Y and Zhao, YJ and Ren, N and Lu, QH and Li, CL and Zheng, XQ and Li, QS},
title = {A Minimalist Core and Lineage-Specific Expansion Underpin the Adaptive Evolution of the Cyclic Nucleotide-gated Channel (CNGC) Family: Insights from the Tea (Camellia sinensis) Pan-Genome.},
journal = {Journal of agricultural and food chemistry},
volume = {},
number = {},
pages = {},
doi = {10.1021/acs.jafc.6c03626},
pmid = {42438126},
issn = {1520-5118},
abstract = {Cyclic nucleotide-gated channels (CNGCs) are critical Ca[2+]-permeable cation channels that orchestrate plant stress responses, yet their evolutionary dynamics in tea plants (Camellia sinensis) remain unknown. Here we dissect the CNGC family using a pan-genome of 28 tea accessions. We identified 867 CNGC genes into 30 orthogroups, revealing a dual evolutionary strategy: an ultraconstrained, two-gene core (CsCNGC8/17) under strong purifying selection ensuring signaling fidelity, and Group IVc─a novel clade (39.4% of the family)─exhibiting lineage-specific motif reconfiguration, dispersed duplications and elevated Ka/Ks ratios. Expression profiling suggested that Group IVc members are broadly upregulated across multiple stresses, whereas core genes display condition-specific responses. This unique architecture─balancing a minimalist, conserved core with a massively expanded, plastic periphery─provides a robust adaptive solution to the lifelong stress exposure characteristic of perennial tea plants. These findings offer critical insights into stress-response evolution and valuable molecular targets for breeding resilient tea cultivars.},
}

RevDate: 2026-07-10
CmpDate: 2026-07-10

Plender EG, Prodanov T, Lin J, et al (2026)

Complex structural variation, phylogeny, and disease associations of the mucin pangenome.

medRxiv : the preprint server for health sciences pii:2026.07.01.26356476.

Mucins are large glycoproteins that provide hydration and barrier function to epithelial tissues. Although genetically heterogeneous, all mucins harbor a large exon composed of variable number tandem repeats (VNTRs). Short-read sequencing has limited our understanding of mucin VNTR diversity and makes disease association studies challenging. We leverage 296 long-read phased genome assemblies to characterize 14 mucin family members, achieving ≥97% accuracy across 572 haplotypes. Phylogenetic haplogroup analysis reveals extraordinary structural heterozygosity, with MUC4 harboring the greatest allelic diversity (n=240 distinct lengths) and MUC12 the greatest size range (Δ = 55,233 bp; 23,080 amino acids). Ten mucins show significant population stratification (pFDR < 0.05). At the MUC4/MUC20 locus, we characterize higher-order structural variation, including a recurrent inversion, copy number variation, and interlocus gene conversion. Optimized genotyping achieves ≥95% haplogroup concordance across 10 loci. We apply this to 4,637 deeply phenotyped cystic fibrosis patients and identify a significant association between short MUC1 VNTRs and severe disease (p=0.0056), demonstrating the pangenome's utility for complex locus genotyping and disease discovery.

Additional Links: PMID-42428052

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42428052,
year = {2026},
author = {Plender, EG and Prodanov, T and Lin, J and Wong, I and Wertz, J and Gordon, WW and Bamshad, MJ and Munson, KM and O'Neal, WK and Bloom, JD and , and Marschall, T and Eichler, EE},
title = {Complex structural variation, phylogeny, and disease associations of the mucin pangenome.},
journal = {medRxiv : the preprint server for health sciences},
volume = {},
number = {},
pages = {},
doi = {10.64898/2026.07.01.26356476},
pmid = {42428052},
abstract = {Mucins are large glycoproteins that provide hydration and barrier function to epithelial tissues. Although genetically heterogeneous, all mucins harbor a large exon composed of variable number tandem repeats (VNTRs). Short-read sequencing has limited our understanding of mucin VNTR diversity and makes disease association studies challenging. We leverage 296 long-read phased genome assemblies to characterize 14 mucin family members, achieving ≥97% accuracy across 572 haplotypes. Phylogenetic haplogroup analysis reveals extraordinary structural heterozygosity, with MUC4 harboring the greatest allelic diversity (n=240 distinct lengths) and MUC12 the greatest size range (Δ = 55,233 bp; 23,080 amino acids). Ten mucins show significant population stratification (pFDR < 0.05). At the MUC4/MUC20 locus, we characterize higher-order structural variation, including a recurrent inversion, copy number variation, and interlocus gene conversion. Optimized genotyping achieves ≥95% haplogroup concordance across 10 loci. We apply this to 4,637 deeply phenotyped cystic fibrosis patients and identify a significant association between short MUC1 VNTRs and severe disease (p=0.0056), demonstrating the pangenome's utility for complex locus genotyping and disease discovery.},
}

RevDate: 2026-07-10
CmpDate: 2026-07-10

Zhu W, Xu G, Gu L, et al (2026)

Phenotypic and comparative genomic characterization of a human biliary-derived Kosakonia radicincitans isolate.

Frontiers in microbiology, 17:1885996.

INTRODUCTION: Kosakonia radicincitans is primarily recognized as a plant-associated and environmentally adapted member of Enterobacteriaceae, whereas recovery from human clinical specimens remains uncommon. Routine biochemical identification may misassign recently reclassified Kosakonia species to closely related Enterobacter taxa.

METHODS: We characterized strain ZJG61129, a K. radicincitans isolate recovered from bile during endoscopic retrograde cholangiopancreatography (ERCP) in a patient with choledocholithiasis. Colony morphology, VITEK 2 Compact identification, MALDI-TOF MS identification and antimicrobial susceptibility testing were performed. Whole-genome sequencing, average nucleotide identity (ANI), digital DNA-DNA hybridization (dDDH), 16S rRNA and core-genome phylogeny analyses, whole-genome comparison, pan-genome analysis, and resistance/virulence-associated gene screening were used for taxonomic confirmation and genomic characterization.

RESULTS: ZJG61129 formed smooth, moist colonies on blood agar and MacConkey agar. VITEK 2 Compact assigned the isolate to the Enterobacter cloacae complex with 92% confidence, whereas MALDI-TOF MS identified it as K. radicincitans with a score of 2.259. The genome consisted of a single circular chromosome of 5,510,622 bp with a GC content of approximately 54.0%. ANI and dDDH analyses confirmed assignment to K. radicincitans, and 16S rRNA together with core-genome phylogeny placed ZJG61129 within the Kosakonia lineage. Comparative genomics showed a conserved genomic backbone, substantial genomic plasticity, and no clear source-associated genomic structuring in the current dataset. ZJG61129 was susceptible to all antimicrobial agents tested. In silico resistance and virulence gene analyses did not indicate high-risk acquired resistance or a clearly high-virulence genotype.

DISCUSSION: ZJG61129 represents a human biliary-derived, environmental-like K. radicincitans isolate that may be misassigned by routine biochemical identification. Genome-based analysis is valuable for accurate recognition of uncommon, recently reclassified Enterobacteriaceae from biliary specimens.

Additional Links: PMID-42428306

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42428306,
year = {2026},
author = {Zhu, W and Xu, G and Gu, L and Geng, X and Dai, Y and Qin, F and Pang, X and Zheng, Y and Chen, L and Zhu, X},
title = {Phenotypic and comparative genomic characterization of a human biliary-derived Kosakonia radicincitans isolate.},
journal = {Frontiers in microbiology},
volume = {17},
number = {},
pages = {1885996},
pmid = {42428306},
issn = {1664-302X},
abstract = {INTRODUCTION: Kosakonia radicincitans is primarily recognized as a plant-associated and environmentally adapted member of Enterobacteriaceae, whereas recovery from human clinical specimens remains uncommon. Routine biochemical identification may misassign recently reclassified Kosakonia species to closely related Enterobacter taxa.

METHODS: We characterized strain ZJG61129, a K. radicincitans isolate recovered from bile during endoscopic retrograde cholangiopancreatography (ERCP) in a patient with choledocholithiasis. Colony morphology, VITEK 2 Compact identification, MALDI-TOF MS identification and antimicrobial susceptibility testing were performed. Whole-genome sequencing, average nucleotide identity (ANI), digital DNA-DNA hybridization (dDDH), 16S rRNA and core-genome phylogeny analyses, whole-genome comparison, pan-genome analysis, and resistance/virulence-associated gene screening were used for taxonomic confirmation and genomic characterization.

RESULTS: ZJG61129 formed smooth, moist colonies on blood agar and MacConkey agar. VITEK 2 Compact assigned the isolate to the Enterobacter cloacae complex with 92% confidence, whereas MALDI-TOF MS identified it as K. radicincitans with a score of 2.259. The genome consisted of a single circular chromosome of 5,510,622 bp with a GC content of approximately 54.0%. ANI and dDDH analyses confirmed assignment to K. radicincitans, and 16S rRNA together with core-genome phylogeny placed ZJG61129 within the Kosakonia lineage. Comparative genomics showed a conserved genomic backbone, substantial genomic plasticity, and no clear source-associated genomic structuring in the current dataset. ZJG61129 was susceptible to all antimicrobial agents tested. In silico resistance and virulence gene analyses did not indicate high-risk acquired resistance or a clearly high-virulence genotype.

DISCUSSION: ZJG61129 represents a human biliary-derived, environmental-like K. radicincitans isolate that may be misassigned by routine biochemical identification. Genome-based analysis is valuable for accurate recognition of uncommon, recently reclassified Enterobacteriaceae from biliary specimens.},
}

RevDate: 2026-07-10

Arnoux J, Mainguy J, Bry L, et al (2026)

Panorama: A robust pangenome-based method for predicting and comparing biological systems across species.

PLoS computational biology, 22(7):e1013856 pii:PCOMPBIOL-D-25-02697 [Epub ahead of print].

Over the last decade, the expansion in the number of available genomes has profoundly transformed the study of genetic diversity, evolution, and ecological adaptation in prokaryotes. However, traditional bioinformatic approaches based on the analysis of individual genomes are showing their limitations when faced with the sheer scale of the data. To overcome these constraints, the concept of pangenome has emerged, offering a comprehensive framework to capture the full genetic repertoire of a species. In this study, we present PANORAMA, an innovative pangenomic tool designed to exploit pangenome graphs, enabling their annotation and comparison to explore the genomic diversity of several species. Based on the PPanGGOLiN pangenome graphs, PANORAMA integrates advanced methods for rule-based prediction of macromolecular systems and comparative analysis of conserved features between different pangenomes, such as spots of insertion. We illustrate the use of PANORAMA on a dataset of 941 Pseudomonas aeruginosa genomes, evaluating its performance against reference defense system prediction tools such as PADLOC and DefenseFinder. The analysis was then extended to a larger set, including four species of Enterobacteriaceae (>6,000 genomes), demonstrating PANORAMA's ability to annotate, compare, and explore the diversity and distribution of biological systems across multiple species. This work provides new methods for the large-scale comparative study of microbial genomes and highlights the relevance of pangenome approaches in deciphering their evolutionary dynamics. PANORAMA is freely available and accessible at: https://github.com/labgem/PANORAMA.

Additional Links: PMID-42430441

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42430441,
year = {2026},
author = {Arnoux, J and Mainguy, J and Bry, L and Fernandez de Grado, Q and Hoblos, Y and Vallenet, D and Calteau, A},
title = {Panorama: A robust pangenome-based method for predicting and comparing biological systems across species.},
journal = {PLoS computational biology},
volume = {22},
number = {7},
pages = {e1013856},
doi = {10.1371/journal.pcbi.1013856},
pmid = {42430441},
issn = {1553-7358},
abstract = {Over the last decade, the expansion in the number of available genomes has profoundly transformed the study of genetic diversity, evolution, and ecological adaptation in prokaryotes. However, traditional bioinformatic approaches based on the analysis of individual genomes are showing their limitations when faced with the sheer scale of the data. To overcome these constraints, the concept of pangenome has emerged, offering a comprehensive framework to capture the full genetic repertoire of a species. In this study, we present PANORAMA, an innovative pangenomic tool designed to exploit pangenome graphs, enabling their annotation and comparison to explore the genomic diversity of several species. Based on the PPanGGOLiN pangenome graphs, PANORAMA integrates advanced methods for rule-based prediction of macromolecular systems and comparative analysis of conserved features between different pangenomes, such as spots of insertion. We illustrate the use of PANORAMA on a dataset of 941 Pseudomonas aeruginosa genomes, evaluating its performance against reference defense system prediction tools such as PADLOC and DefenseFinder. The analysis was then extended to a larger set, including four species of Enterobacteriaceae (>6,000 genomes), demonstrating PANORAMA's ability to annotate, compare, and explore the diversity and distribution of biological systems across multiple species. This work provides new methods for the large-scale comparative study of microbial genomes and highlights the relevance of pangenome approaches in deciphering their evolutionary dynamics. PANORAMA is freely available and accessible at: https://github.com/labgem/PANORAMA.},
}

RevDate: 2026-07-10

Ni Z, Zhang Z, Ning M, et al (2026)

An Anas pangenome graph reveals the role of structural variations in duck domestication.

Poultry science, 105(10):107388 pii:S0032-5791(26)01019-9 [Epub ahead of print].

Despite the significant phenotypic divergence between domestic ducks and their wild progenitors, the structural variants (SVs) underlying this differentiation remain largely unexplored due to the limitations of linear references in capturing genetic diversity across the genus Anas. Current pangenome efforts either lack base-level resolution or are constrained by genomic anchoring, hindering the exploration of genetic signatures of domestication. Here, we assembled chromosome-level genomes for a mallard (Anas platyrhynchos) and an eastern spot-billed duck (Anas zonorhyncha) that retained substantial domestic components. By integrating these with 14 high-quality Anas assemblies, we constructed a base-level resolution pangenome graph encompassing 243.4 Mb of non-reference sequences. Our graph-based pipeline demonstrated superior mapping performance and identified 67,165 population-level SVs, representing a five-fold increase in detection sensitivity compared to linear-based methods. Notably, SVs explained more genetic variance (PC1: 8.75% vs. 3.25%) and provided finer resolution of local ancestry than single nucleotide polymorphisms (SNPs), highlighting the unique advantage of SVs in resolving duck population structure. Leveraging the graph, we recovered 163 ancestral SVs that are prevalent in distant species but persist at low frequencies in domestic and wild populations. The loss of these ancestral sequences in gravity sensing and cilium assembly likely provided the physiological plasticity necessary for domestication. Moreover, highly divergent exonic SVs exhibited a non-random distribution, preferentially accumulating in UTRs over CDS regions. Specifically, highly divergent SVs located in UTRs of TMEM123 and FAM13A may be associated with viral resistance and the regulation of adipogenesis. Similarly, we identified a highly divergent 75 bp insertion located 3.66 kb upstream of the FAM184B gene, which may be associated with changes in muscle growth during duck domestication. In conclusion, the study establishes a comprehensive base-resolution pangenome resource for the Anas. Our findings reveal that SVs are essential for resolving complex evolutionary histories and suggest that SV-mediated regulatory evolution is an important driver of rapid phenotypic change during duck domestication.

Additional Links: PMID-42431169

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42431169,
year = {2026},
author = {Ni, Z and Zhang, Z and Ning, M and Xu, W and Lu, L and Ai, H and Huang, Y},
title = {An Anas pangenome graph reveals the role of structural variations in duck domestication.},
journal = {Poultry science},
volume = {105},
number = {10},
pages = {107388},
doi = {10.1016/j.psj.2026.107388},
pmid = {42431169},
issn = {1525-3171},
abstract = {Despite the significant phenotypic divergence between domestic ducks and their wild progenitors, the structural variants (SVs) underlying this differentiation remain largely unexplored due to the limitations of linear references in capturing genetic diversity across the genus Anas. Current pangenome efforts either lack base-level resolution or are constrained by genomic anchoring, hindering the exploration of genetic signatures of domestication. Here, we assembled chromosome-level genomes for a mallard (Anas platyrhynchos) and an eastern spot-billed duck (Anas zonorhyncha) that retained substantial domestic components. By integrating these with 14 high-quality Anas assemblies, we constructed a base-level resolution pangenome graph encompassing 243.4 Mb of non-reference sequences. Our graph-based pipeline demonstrated superior mapping performance and identified 67,165 population-level SVs, representing a five-fold increase in detection sensitivity compared to linear-based methods. Notably, SVs explained more genetic variance (PC1: 8.75% vs. 3.25%) and provided finer resolution of local ancestry than single nucleotide polymorphisms (SNPs), highlighting the unique advantage of SVs in resolving duck population structure. Leveraging the graph, we recovered 163 ancestral SVs that are prevalent in distant species but persist at low frequencies in domestic and wild populations. The loss of these ancestral sequences in gravity sensing and cilium assembly likely provided the physiological plasticity necessary for domestication. Moreover, highly divergent exonic SVs exhibited a non-random distribution, preferentially accumulating in UTRs over CDS regions. Specifically, highly divergent SVs located in UTRs of TMEM123 and FAM13A may be associated with viral resistance and the regulation of adipogenesis. Similarly, we identified a highly divergent 75 bp insertion located 3.66 kb upstream of the FAM184B gene, which may be associated with changes in muscle growth during duck domestication. In conclusion, the study establishes a comprehensive base-resolution pangenome resource for the Anas. Our findings reveal that SVs are essential for resolving complex evolutionary histories and suggest that SV-mediated regulatory evolution is an important driver of rapid phenotypic change during duck domestication.},
}

RevDate: 2026-07-07

Mohammad SF, Ali F, M Shynara (2026)

Pangenome-guided immunoinformatics design and in silico characterization of a multi-epitope vaccine candidate against Acinetobacter baumannii with nanoparticle assembly potential.

Scientific reports pii:10.1038/s41598-026-59935-4 [Epub ahead of print].

Acinetobacter baumannii is a critical multidrug-resistant pathogen causing severe healthcare infections with high mortality, yet no licensed vaccine exists. This study aims to identify universally conserved surface antigens through pangenome analysis, predict immunogenic epitopes using integrated machine learning, and computationally design and in silico characterize a self-assembling nanoparticle vaccine with dual adjuvants. A computational framework integrating pangenome analysis of 712 complete genomes, epitope prediction, and structural vaccinology was employed to design a multi-epitope nanoparticle vaccine candidate for experimental evaluation. Pangenome analysis identified 3894 core genes with 42 outer membrane proteins, prioritizing OmpA, BamA, and OmpW as antigen targets. Protein language models predicted conformational B-cell epitopes, while NetMHCpan-4.2 predicted T-cell epitopes across 125 HLA alleles. The final construct (AB-VAX-01, 289 amino acids) incorporates 15 epitopes fused with dual adjuvants (RS09 TLR4 and cGAMP STING agonists) and a foldon domain for nanoparticle assembly. Microsecond molecular dynamics simulations with replicates demonstrated stability with TLR4 and STING. Conservation analysis across all 712 genomes showed 96.2-100% epitope identity. Immune simulations predicted Th1-biased responses with 94.2% global population coverage. In silico cloning confirmed favorable codon adaptation parameters. This in silico characterized vaccine construct represents a promising candidate requiring experimental validation against A. baumannii infections.

Additional Links: PMID-42414399

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42414399,
year = {2026},
author = {Mohammad, SF and Ali, F and Shynara, M},
title = {Pangenome-guided immunoinformatics design and in silico characterization of a multi-epitope vaccine candidate against Acinetobacter baumannii with nanoparticle assembly potential.},
journal = {Scientific reports},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41598-026-59935-4},
pmid = {42414399},
issn = {2045-2322},
support = {SDXHQD2025086//Shandong Xiehe University,P.R.China/ ; },
abstract = {Acinetobacter baumannii is a critical multidrug-resistant pathogen causing severe healthcare infections with high mortality, yet no licensed vaccine exists. This study aims to identify universally conserved surface antigens through pangenome analysis, predict immunogenic epitopes using integrated machine learning, and computationally design and in silico characterize a self-assembling nanoparticle vaccine with dual adjuvants. A computational framework integrating pangenome analysis of 712 complete genomes, epitope prediction, and structural vaccinology was employed to design a multi-epitope nanoparticle vaccine candidate for experimental evaluation. Pangenome analysis identified 3894 core genes with 42 outer membrane proteins, prioritizing OmpA, BamA, and OmpW as antigen targets. Protein language models predicted conformational B-cell epitopes, while NetMHCpan-4.2 predicted T-cell epitopes across 125 HLA alleles. The final construct (AB-VAX-01, 289 amino acids) incorporates 15 epitopes fused with dual adjuvants (RS09 TLR4 and cGAMP STING agonists) and a foldon domain for nanoparticle assembly. Microsecond molecular dynamics simulations with replicates demonstrated stability with TLR4 and STING. Conservation analysis across all 712 genomes showed 96.2-100% epitope identity. Immune simulations predicted Th1-biased responses with 94.2% global population coverage. In silico cloning confirmed favorable codon adaptation parameters. This in silico characterized vaccine construct represents a promising candidate requiring experimental validation against A. baumannii infections.},
}

RevDate: 2026-07-09

Pierre B, Bacilieri R, This D, et al (2026)

Pangenomic analyses in the cultivated grapevine confirm high genomic collinearity and extensive dispensable gene content likely involved in adaptation.

G3 (Bethesda, Md.) pii:8729008 [Epub ahead of print].

Pangenomes have now been developed for several horticultural crops, yet the extent to which genome diversity in sequence and organization contribute to plant adaptation and major agronomic traits remains poorly understood. Here, we assembled the genomes of nine cultivated grapevine varieties and compared the genomes of 15 cultivated grapevine varieties for variation in gene and TE content. We found that genomic collinearity is highly conserved among varieties. We still observed substantial variation across genomes. Notably, we identified across varieties 55,662 orthologous genes, of which 55.3% appears to be dispensable. Dispensable genes are enriched for functions related to adaptation to biotic and abiotic constraints, suggesting that they may play a role in adaptation. Comparing our results with a recently published study, we found substantial differences with ∼12.6 % of the genes we classified as core genes being classified as dispensable genes in this other study. We then constructed a pangenome graph and used it to performed genome-wide association studies for three important traits in grapevine production, which allowed us to include large structural variants as markers in the analyses. We identified 32 loci that we did not detect when we used the PN40024 genome as a reference, 20 of which are newly reported associations. Overall, our results indicates that despite recent advances in characterizing plant pangenomes, current gene classification into core and dispensable gene categories should be taken with caution. They also highlight the value of incorporating structural variants into GWAS, to better characterize the genetic architecture of agronomic traits.

Additional Links: PMID-42423175

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42423175,
year = {2026},
author = {Pierre, B and Bacilieri, R and This, D and Lacombe, T and Segura, V and This, P and Mocoeur, A and Villoutreix, R and Sarah, G},
title = {Pangenomic analyses in the cultivated grapevine confirm high genomic collinearity and extensive dispensable gene content likely involved in adaptation.},
journal = {G3 (Bethesda, Md.)},
volume = {},
number = {},
pages = {},
doi = {10.1093/g3journal/jkag168},
pmid = {42423175},
issn = {2160-1836},
abstract = {Pangenomes have now been developed for several horticultural crops, yet the extent to which genome diversity in sequence and organization contribute to plant adaptation and major agronomic traits remains poorly understood. Here, we assembled the genomes of nine cultivated grapevine varieties and compared the genomes of 15 cultivated grapevine varieties for variation in gene and TE content. We found that genomic collinearity is highly conserved among varieties. We still observed substantial variation across genomes. Notably, we identified across varieties 55,662 orthologous genes, of which 55.3% appears to be dispensable. Dispensable genes are enriched for functions related to adaptation to biotic and abiotic constraints, suggesting that they may play a role in adaptation. Comparing our results with a recently published study, we found substantial differences with ∼12.6 % of the genes we classified as core genes being classified as dispensable genes in this other study. We then constructed a pangenome graph and used it to performed genome-wide association studies for three important traits in grapevine production, which allowed us to include large structural variants as markers in the analyses. We identified 32 loci that we did not detect when we used the PN40024 genome as a reference, 20 of which are newly reported associations. Overall, our results indicates that despite recent advances in characterizing plant pangenomes, current gene classification into core and dispensable gene categories should be taken with caution. They also highlight the value of incorporating structural variants into GWAS, to better characterize the genetic architecture of agronomic traits.},
}

RevDate: 2026-07-09
CmpDate: 2026-07-10

Hu WS, An SH, Kim DW, et al (2026)

Comparative genomics and phenotypes of Listeria monocytogenes isolated from enoki mushrooms in South Korea and China.

Food microbiology, 140:105182.

Listeria monocytogenes is a gram-positive and facultatively anaerobic foodborne pathogen causing listeriosis. The detection of L. monocytogenes in enoki mushrooms sourced from South Korea and China is particularly critical, given their confirmed implication in recent serious listeriosis outbreaks across the global food supply chain. This study investigates the prevalence, genetic diversity, and phenotypical characteristics of L. monocytogenes in enoki mushrooms from South Korea and China. Out of 129 samples, 24 (18.6%) tested positive, with contamination rates of 17.5% in Korean mushrooms and 19.7% in Chinese mushrooms. Whole-genome sequencing, cgMLST, and pan-genome analyses resolved lineage and sequence-type distributions, revealing predominant serogroup 1/2a (83.3%) and lineage II (90.9%). The pan-genomic assessment of L. monocytogenes strains originating from diverse geographical locations indicated the presence of open genomes, which establishes a strong genetic underpinning for adaptation to varied environments. These strains carrying a multitude of virulence genes that significantly contribute to their heightened pathogenic potential. The phylogenetic tree further demonstrated that these highly related Korean and Chinese isolates were intricately intermingled with outbreak-related strains from the USA, Canada, and Europe, confirming a minimal core-genome genetic distance across the global supply chain. This highly homogenous clone, which was ultimately traced back to enoki mushrooms in South Korea and China, suggests that the globalization of the food trade is the primary driver of its rapid, international dissemination.

Additional Links: PMID-42425654

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42425654,
year = {2026},
author = {Hu, WS and An, SH and Kim, DW and Im, HU and Lee, SH and Li, YB and Liu, ZX and Wang, ZW and Lee, AR and Kim, UI and Je, HJ and Seo, YJ and San Moon, J and Hong, S and Kang, YJ and Koo, OK},
title = {Comparative genomics and phenotypes of Listeria monocytogenes isolated from enoki mushrooms in South Korea and China.},
journal = {Food microbiology},
volume = {140},
number = {},
pages = {105182},
doi = {10.1016/j.fm.2026.105182},
pmid = {42425654},
issn = {1095-9998},
mesh = {*Listeria monocytogenes/genetics/isolation & purification/classification/drug effects ; China ; Phylogeny ; Republic of Korea ; *Genome, Bacterial ; *Agaricales ; Phenotype ; Genomics ; Genetic Variation ; Food Microbiology ; Whole Genome Sequencing ; Virulence Factors/genetics ; Food Contamination/analysis ; },
abstract = {Listeria monocytogenes is a gram-positive and facultatively anaerobic foodborne pathogen causing listeriosis. The detection of L. monocytogenes in enoki mushrooms sourced from South Korea and China is particularly critical, given their confirmed implication in recent serious listeriosis outbreaks across the global food supply chain. This study investigates the prevalence, genetic diversity, and phenotypical characteristics of L. monocytogenes in enoki mushrooms from South Korea and China. Out of 129 samples, 24 (18.6%) tested positive, with contamination rates of 17.5% in Korean mushrooms and 19.7% in Chinese mushrooms. Whole-genome sequencing, cgMLST, and pan-genome analyses resolved lineage and sequence-type distributions, revealing predominant serogroup 1/2a (83.3%) and lineage II (90.9%). The pan-genomic assessment of L. monocytogenes strains originating from diverse geographical locations indicated the presence of open genomes, which establishes a strong genetic underpinning for adaptation to varied environments. These strains carrying a multitude of virulence genes that significantly contribute to their heightened pathogenic potential. The phylogenetic tree further demonstrated that these highly related Korean and Chinese isolates were intricately intermingled with outbreak-related strains from the USA, Canada, and Europe, confirming a minimal core-genome genetic distance across the global supply chain. This highly homogenous clone, which was ultimately traced back to enoki mushrooms in South Korea and China, suggests that the globalization of the food trade is the primary driver of its rapid, international dissemination.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Listeria monocytogenes/genetics/isolation & purification/classification/drug effects
China
Phylogeny
Republic of Korea
*Genome, Bacterial
*Agaricales
Phenotype
Genomics
Genetic Variation
Food Microbiology
Whole Genome Sequencing
Virulence Factors/genetics
Food Contamination/analysis

RevDate: 2026-07-10

Lu T, Li C, Wei H, et al (2026)

NTM-DB: A Comprehensive Non-tuberculosis Mycobacteria Genomic Database.

Genomics, proteomics & bioinformatics pii:8729461 [Epub ahead of print].

Non-tuberculous mycobacteria (NTM) are a major group of environmental bacteria, approximately one-third of which cause serious human infections, particularly respiratory diseases. The global rise in the prevalence and severity of NTM infections has posed a major public health challenge. While high-throughput sequencing has generated vast genomic data on NTM, there remains a lack of comprehensive resources for cross-species genomic analysis. To address these limitations, we developed a specialized database, the Non-tuberculosis Mycobacteria Genomic Database (NTM-DB), tailored for NTM researchers and clinicians. NTM-DB offers the most comprehensive collection of NTM genomic and bioinformatic resources, including 16,469 genome assemblies (13,134 newly assembled genomes), 189 type/standard strain genomes representing 177 species and 12 subspecies, 705 multi-locus sequence typing (MLST) types, 33,240 resistance genes, and 74,315 virulence genes. A user-friendly interactive website was constructed to enable efficient browsing, MLST profiling, searching, online analysis, and downloading of the aforementioned data. Notably, with online analysis tools, users can perform customized genotyping, cross-species phylogeny, pan-genome, and virulence and drug resistance gene annotation analyses using our data and/or their uploaded data. Overall, with its comprehensive data, intuitive interface, and powerful analysis tools, NTM-DB serves as an important resource and reference for NTM researchers and clinicians, thereby improving the diagnosis and treatment of various NTM-related diseases and supporting both scientific discovery and clinical practice. NTM-DB is publicly accessible at https://ngdc.cncb.ac.cn/ntmdb.

Additional Links: PMID-42426966

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42426966,
year = {2026},
author = {Lu, T and Li, C and Wei, H and Zhang, Y and Fan, Z and Jiang, X and Wang, J and Wang, P and Shang, K and Huang, Y and Yang, H and Tuohetaerbaike, B and Li, Y and Niu, H and Zhang, W and Wen, H and Sheng, Y and Xiao, J and Chen, F},
title = {NTM-DB: A Comprehensive Non-tuberculosis Mycobacteria Genomic Database.},
journal = {Genomics, proteomics & bioinformatics},
volume = {},
number = {},
pages = {},
doi = {10.1093/gpbjnl/qzag062},
pmid = {42426966},
issn = {2210-3244},
abstract = {Non-tuberculous mycobacteria (NTM) are a major group of environmental bacteria, approximately one-third of which cause serious human infections, particularly respiratory diseases. The global rise in the prevalence and severity of NTM infections has posed a major public health challenge. While high-throughput sequencing has generated vast genomic data on NTM, there remains a lack of comprehensive resources for cross-species genomic analysis. To address these limitations, we developed a specialized database, the Non-tuberculosis Mycobacteria Genomic Database (NTM-DB), tailored for NTM researchers and clinicians. NTM-DB offers the most comprehensive collection of NTM genomic and bioinformatic resources, including 16,469 genome assemblies (13,134 newly assembled genomes), 189 type/standard strain genomes representing 177 species and 12 subspecies, 705 multi-locus sequence typing (MLST) types, 33,240 resistance genes, and 74,315 virulence genes. A user-friendly interactive website was constructed to enable efficient browsing, MLST profiling, searching, online analysis, and downloading of the aforementioned data. Notably, with online analysis tools, users can perform customized genotyping, cross-species phylogeny, pan-genome, and virulence and drug resistance gene annotation analyses using our data and/or their uploaded data. Overall, with its comprehensive data, intuitive interface, and powerful analysis tools, NTM-DB serves as an important resource and reference for NTM researchers and clinicians, thereby improving the diagnosis and treatment of various NTM-related diseases and supporting both scientific discovery and clinical practice. NTM-DB is publicly accessible at https://ngdc.cncb.ac.cn/ntmdb.},
}

RevDate: 2026-07-10

Weis KS, Kaur A, Ghosh P, et al (2026)

Genome evolution in plant pathogenic bacteria.

Genome biology and evolution pii:8729594 [Epub ahead of print].

Bacterial plant pathogens have ravaged crops since the dawn of agriculture and continue to pose a serious threat today. Bacteria and their plant hosts have co-evolved in an evolutionary arms race, with artificial selection due to agriculture tipping the scale in favor of the pathogen. This review gives an overview of plant pathogenic bacterial diversity, showing that pathogenicity has independently evolved numerous times, and that there is not one unifying trait determining plant pathogenicity. Instead, these bacteria represent repeated, independent evolutionary transitions driven by life in complex ecological networks, that include plant hosts, insect vectors, microbial competitors, and highly heterogenous abiotic environments. Their genomes reflect this interplay through a dynamic balance of architecture and flux. These structural features, along with highly variable pangenomes, capture the balance between genome stability and flux imposed by ecological constraints and epidemiological dynamics. Horizontal gene transfer via conjugative plasmids, prophages, integrative and conjugative elements, transposons, and in some lineages, natural competence, remains the major source of adaptive novelty, enabling rapid remodeling of virulence repertoires, metabolic capabilities, and antibiotic or heavy metal resistance genes. These changes create distinct selective landscapes. Agricultural practices such as chemical use, host resistance deployment, or seed trade, can drive recurrent bottlenecks, expansions, and admixture events that leave strong genomic signatures in pathogens. Finally, this review explores the genomic differences enabling the divergence of lifestyles, while also acknowledging knowledge gaps and future directions of research on the evolution of bacterial plant pathogens.

Additional Links: PMID-42427100

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42427100,
year = {2026},
author = {Weis, KS and Kaur, A and Ghosh, P and Potnis, N},
title = {Genome evolution in plant pathogenic bacteria.},
journal = {Genome biology and evolution},
volume = {},
number = {},
pages = {},
doi = {10.1093/gbe/evag174},
pmid = {42427100},
issn = {1759-6653},
abstract = {Bacterial plant pathogens have ravaged crops since the dawn of agriculture and continue to pose a serious threat today. Bacteria and their plant hosts have co-evolved in an evolutionary arms race, with artificial selection due to agriculture tipping the scale in favor of the pathogen. This review gives an overview of plant pathogenic bacterial diversity, showing that pathogenicity has independently evolved numerous times, and that there is not one unifying trait determining plant pathogenicity. Instead, these bacteria represent repeated, independent evolutionary transitions driven by life in complex ecological networks, that include plant hosts, insect vectors, microbial competitors, and highly heterogenous abiotic environments. Their genomes reflect this interplay through a dynamic balance of architecture and flux. These structural features, along with highly variable pangenomes, capture the balance between genome stability and flux imposed by ecological constraints and epidemiological dynamics. Horizontal gene transfer via conjugative plasmids, prophages, integrative and conjugative elements, transposons, and in some lineages, natural competence, remains the major source of adaptive novelty, enabling rapid remodeling of virulence repertoires, metabolic capabilities, and antibiotic or heavy metal resistance genes. These changes create distinct selective landscapes. Agricultural practices such as chemical use, host resistance deployment, or seed trade, can drive recurrent bottlenecks, expansions, and admixture events that leave strong genomic signatures in pathogens. Finally, this review explores the genomic differences enabling the divergence of lifestyles, while also acknowledging knowledge gaps and future directions of research on the evolution of bacterial plant pathogens.},
}

RevDate: 2026-07-10
CmpDate: 2026-07-10

Butler G, Ramakrishnan S, Collins T, et al (2026)

Indirect genomic effects shape cancer risk across species.

bioRxiv : the preprint server for biology pii:2026.06.29.735167.

Tumour prevalence varies dramatically throughout the animal kingdom despite broadly conserved cellular and developmental processes, raising the question of how evolution has shaped susceptibility [1,2] . Here, we link macroevolutionary variation in tumour prevalence to gene-level selection by integrating comparative genomics data from 109 species of birds and mammals using a Bayesian phylogenetic framework to estimate pangenome-wide rates of genetic evolution across >150 million years of evolutionary change. We identify 3,206 genes in which natural selection is associated with shifts in tumour prevalence, with more than 80% of which are linked to reduced prevalence, suggesting pervasive selection for cancer suppression. Using causal phylogenetic inference, we show that genes associated with reduced tumour prevalence act predominantly through indirect effects on body size, revealing growth as a key mediator of cancer risk across species. In contrast, genes associated with increased tumour prevalence exert direct effects independent of body size. Finally, at the species-level, we demonstrate that exceptionally low rates of benign tumours do not necessarily coincide with reduced malignancy, revealing that benign and malignant tumour processes are evolutionarily decoupled. Together, these results reveal how natural selection has fine-tuned the link between genotype, phenotype, and cancer risk across species.

Additional Links: PMID-42427579

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42427579,
year = {2026},
author = {Butler, G and Ramakrishnan, S and Collins, T and Baker, J and Amend, SR and , and Schatz, MC and Venditti, C and Pienta, KJ},
title = {Indirect genomic effects shape cancer risk across species.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.64898/2026.06.29.735167},
pmid = {42427579},
issn = {2692-8205},
abstract = {Tumour prevalence varies dramatically throughout the animal kingdom despite broadly conserved cellular and developmental processes, raising the question of how evolution has shaped susceptibility [1,2] . Here, we link macroevolutionary variation in tumour prevalence to gene-level selection by integrating comparative genomics data from 109 species of birds and mammals using a Bayesian phylogenetic framework to estimate pangenome-wide rates of genetic evolution across >150 million years of evolutionary change. We identify 3,206 genes in which natural selection is associated with shifts in tumour prevalence, with more than 80% of which are linked to reduced prevalence, suggesting pervasive selection for cancer suppression. Using causal phylogenetic inference, we show that genes associated with reduced tumour prevalence act predominantly through indirect effects on body size, revealing growth as a key mediator of cancer risk across species. In contrast, genes associated with increased tumour prevalence exert direct effects independent of body size. Finally, at the species-level, we demonstrate that exceptionally low rates of benign tumours do not necessarily coincide with reduced malignancy, revealing that benign and malignant tumour processes are evolutionarily decoupled. Together, these results reveal how natural selection has fine-tuned the link between genotype, phenotype, and cancer risk across species.},
}

RevDate: 2026-07-10
CmpDate: 2026-07-10

Lu S, Liao WW, DeGorter MK, et al (2026)

Pangenome-based human genome analysis improves trait association and genomic prediction.

bioRxiv : the preprint server for biology pii:2026.07.01.735728.

The Human Pangenome Reference Consortium has generated 462 open-access reference genomes and a variation graph that represents differences among them, providing a substrate for pangenome-based analysis methods that overcome the longstanding limitation of comparing all genomic data to a single linear reference. A key unresolved question is the extent to which these approaches can improve trait mapping. We investigate this using the genetics of gene expression variation as a model. We developed a graph-based method (EdgeDepth) for associating sequence variation with traits using short-read genome sequencing data, and show that it captures complex forms of genetic variation missed by other methods. We evaluated trait mapping performance using 430 samples with deep RNA-seq data, and found that pangenomic methods enable the detection of expression quantitative trait loci involving multiallelic indels and structural variants, leading to increased power at a subset of genes. These include 812 genes (7.9% of total) with ≥20% improvement in statistical significance relative to the 1000 Genomes Project callset, and 185 (1.8%) with a 50% improvement, 10 of which are candidates to explain prior GWAS results. Notably, these analyses implicate GBAP1 pseudogene copy number as a causal factor in Crohn's disease, likely via miRNA-mediated regulation of GBA1 , which explains prior GWAS results based on flanking SNPs. The inclusion of pangenome-specific variation also improved the performance of gene expression prediction models, with median variance explained increasing from 10.1% to 12.5%, and 14.6% of genes showing significant improvement (Δr [2] >0.05). Taken together, these results suggest that integration of pangenomic methods into human genetic studies will improve trait association and genomic prediction at a meaningful subset of genes.

Additional Links: PMID-42427655

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42427655,
year = {2026},
author = {Lu, S and Liao, WW and DeGorter, MK and Goddard, PC and Ebler, J and Lu, TY and , and Chaisson, MJP and Marschall, T and Montgomery, SB and Stitziel, NO and Hall, IM},
title = {Pangenome-based human genome analysis improves trait association and genomic prediction.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.64898/2026.07.01.735728},
pmid = {42427655},
issn = {2692-8205},
abstract = {The Human Pangenome Reference Consortium has generated 462 open-access reference genomes and a variation graph that represents differences among them, providing a substrate for pangenome-based analysis methods that overcome the longstanding limitation of comparing all genomic data to a single linear reference. A key unresolved question is the extent to which these approaches can improve trait mapping. We investigate this using the genetics of gene expression variation as a model. We developed a graph-based method (EdgeDepth) for associating sequence variation with traits using short-read genome sequencing data, and show that it captures complex forms of genetic variation missed by other methods. We evaluated trait mapping performance using 430 samples with deep RNA-seq data, and found that pangenomic methods enable the detection of expression quantitative trait loci involving multiallelic indels and structural variants, leading to increased power at a subset of genes. These include 812 genes (7.9% of total) with ≥20% improvement in statistical significance relative to the 1000 Genomes Project callset, and 185 (1.8%) with a 50% improvement, 10 of which are candidates to explain prior GWAS results. Notably, these analyses implicate GBAP1 pseudogene copy number as a causal factor in Crohn's disease, likely via miRNA-mediated regulation of GBA1 , which explains prior GWAS results based on flanking SNPs. The inclusion of pangenome-specific variation also improved the performance of gene expression prediction models, with median variance explained increasing from 10.1% to 12.5%, and 14.6% of genes showing significant improvement (Δr [2] >0.05). Taken together, these results suggest that integration of pangenomic methods into human genetic studies will improve trait association and genomic prediction at a meaningful subset of genes.},
}

RevDate: 2026-07-07
CmpDate: 2026-07-07

Wang MX, Kille B, Nute MG, et al (2026)

Seqwin: ultrafast identification of signature sequences in microbial genomes.

Bioinformatics (Oxford, England), 42(Supplement_1):.

MOTIVATION: Polymerase chain reaction (PCR) enables rapid, cost-effective diagnostics but requires prior identification of genomic regions that allow sensitive and specific detection of target microbial groups, herein referred to as microbial signature sequences. We introduce Seqwin, an open-source framework designed to automate microbial genome signature discovery. Tens of thousands of microbial genomes are now available for a single species, limiting the application of existing manual and automated approaches for identifying signatures. Modern approaches that are capable of leveraging all available microbial genomes will ensure sensitive and accurate DNA signature identification and enable robust pathogen detection for clinical, environmental, and public health applications.

RESULTS: Seqwin builds weighted pan-genome minimizer graphs and uses a traversal algorithm to identify signature sequences that occur frequently in target genomes but remain rare in non-targets. Unlike earlier tools that depend on strict presence or absence of sequences, Seqwin accommodates natural sequence variation and scales to very large genome collections. When applied to genomes from C. difficile, M. tuberculosis, and S. enterica, Seqwin recovered more high-quality signatures than alternative methods with lower computational burden. Seqwin's analysis of nearly 15 000 S. enterica genomes yielded over 200 candidate signatures in three minutes. Seqwin provides an open-source solution for the long-standing need for scalable microbial signature discovery and diagnostic assay design.

Seqwin is available on GitHub (https://github.com/treangenlab/Seqwin) and can be installed via Bioconda (https://bioconda.github.io/recipes/seqwin/README.html). Benchmarking datasets, outputs, and scripts are available on Zenodo (https://doi.org/10.5281/zenodo.19874011).

Additional Links: PMID-42412815

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42412815,
year = {2026},
author = {Wang, MX and Kille, B and Nute, MG and Zhou, S and Stadler, LB and Treangen, TJ},
title = {Seqwin: ultrafast identification of signature sequences in microbial genomes.},
journal = {Bioinformatics (Oxford, England)},
volume = {42},
number = {Supplement_1},
pages = {},
pmid = {42412815},
issn = {1367-4811},
support = {//NIH/ ; R21-AI190938/AI/NIAID NIH HHS/United States ; P01-AI152999/AI/NIAID NIH HHS/United States ; IIS-2239114//NSF/ ; EF-2126387//NSF/ ; /LM/NLM NIH HHS/United States ; T15LM007093//Training Program in Biomedical Informatics and Data Science/ ; },
mesh = {*Software ; Algorithms ; *Genome, Bacterial ; *Genome, Microbial ; Sequence Analysis, DNA/methods ; *Genomics/methods ; Mycobacterium tuberculosis/genetics ; },
abstract = {MOTIVATION: Polymerase chain reaction (PCR) enables rapid, cost-effective diagnostics but requires prior identification of genomic regions that allow sensitive and specific detection of target microbial groups, herein referred to as microbial signature sequences. We introduce Seqwin, an open-source framework designed to automate microbial genome signature discovery. Tens of thousands of microbial genomes are now available for a single species, limiting the application of existing manual and automated approaches for identifying signatures. Modern approaches that are capable of leveraging all available microbial genomes will ensure sensitive and accurate DNA signature identification and enable robust pathogen detection for clinical, environmental, and public health applications.

RESULTS: Seqwin builds weighted pan-genome minimizer graphs and uses a traversal algorithm to identify signature sequences that occur frequently in target genomes but remain rare in non-targets. Unlike earlier tools that depend on strict presence or absence of sequences, Seqwin accommodates natural sequence variation and scales to very large genome collections. When applied to genomes from C. difficile, M. tuberculosis, and S. enterica, Seqwin recovered more high-quality signatures than alternative methods with lower computational burden. Seqwin's analysis of nearly 15 000 S. enterica genomes yielded over 200 candidate signatures in three minutes. Seqwin provides an open-source solution for the long-standing need for scalable microbial signature discovery and diagnostic assay design.

Seqwin is available on GitHub (https://github.com/treangenlab/Seqwin) and can be installed via Bioconda (https://bioconda.github.io/recipes/seqwin/README.html). Benchmarking datasets, outputs, and scripts are available on Zenodo (https://doi.org/10.5281/zenodo.19874011).},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Software
Algorithms
*Genome, Bacterial
*Genome, Microbial
Sequence Analysis, DNA/methods
*Genomics/methods
Mycobacterium tuberculosis/genetics

RevDate: 2026-07-07
CmpDate: 2026-07-07

Sanaullah A, Brown NK, Shakya P, et al (2026)

RLBWT-based LCP computation in compressed space for terabase-scale pangenome analysis.

Bioinformatics (Oxford, England), 42(Supplement_1):.

MOTIVATION: Lossless full text indexes are utilized in a myriad of applications in bioinformatics. The continuously decreasing cost of generating biological data has resulted in the need to build full text indexes on biological datasets of increasing size. Many compressed full text indexes have been developed to address this problem. In particular, run-length Burrows-Wheeler transform (RLBWT) based compressed full text indexes have seen wide development and adoption. However, the construction of these RLBWT-based compressed full text indexes is still computationally expensive, sometimes prohibitively so, even for current dataset sizes.

RESULTS: Therefore, we present algorithms for the construction of RLBWT-based compressed full text indexes and their supporting data structures in compressed space. The algorithms have a space complexity of O(r) words and run in O(n) time for repetitive datasets, where r is the number of runs in the BWT, n is the length of the text, and repetitive datasets implies nr∈Ω(log n). We provide the first algorithm to compute LCP-related information for repetitive datasets in optimal time and O(r) space, greatly reducing memory requirements. The key idea behind this algorithm is the utilization of r samples of the inverse suffix array at regular intervals. For example, on the Human Pangenome Reference Consortium Release 2 dataset, this reduces peak memory from 2135 GiB to 170 GiB (12.6x reduction) compared to the previous best method (pfp-thresholds).

The implementation is available at https://github.com/ucfcbb/TeraTools.

Additional Links: PMID-42412836

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42412836,
year = {2026},
author = {Sanaullah, A and Brown, NK and Shakya, P and Deegutla, A and Naseri, A and Langmead, B and Zhi, D and Zhang, S},
title = {RLBWT-based LCP computation in compressed space for terabase-scale pangenome analysis.},
journal = {Bioinformatics (Oxford, England)},
volume = {42},
number = {Supplement_1},
pages = {},
pmid = {42412836},
issn = {1367-4811},
support = {R01HG010086/NH/NIH HHS/United States ; R01HG011392/NH/NIH HHS/United States ; },
mesh = {*Algorithms ; *Computational Biology/methods ; *Data Compression/methods ; *Genomics/methods ; },
abstract = {MOTIVATION: Lossless full text indexes are utilized in a myriad of applications in bioinformatics. The continuously decreasing cost of generating biological data has resulted in the need to build full text indexes on biological datasets of increasing size. Many compressed full text indexes have been developed to address this problem. In particular, run-length Burrows-Wheeler transform (RLBWT) based compressed full text indexes have seen wide development and adoption. However, the construction of these RLBWT-based compressed full text indexes is still computationally expensive, sometimes prohibitively so, even for current dataset sizes.

RESULTS: Therefore, we present algorithms for the construction of RLBWT-based compressed full text indexes and their supporting data structures in compressed space. The algorithms have a space complexity of O(r) words and run in O(n) time for repetitive datasets, where r is the number of runs in the BWT, n is the length of the text, and repetitive datasets implies nr∈Ω(log n). We provide the first algorithm to compute LCP-related information for repetitive datasets in optimal time and O(r) space, greatly reducing memory requirements. The key idea behind this algorithm is the utilization of r samples of the inverse suffix array at regular intervals. For example, on the Human Pangenome Reference Consortium Release 2 dataset, this reduces peak memory from 2135 GiB to 170 GiB (12.6x reduction) compared to the previous best method (pfp-thresholds).

The implementation is available at https://github.com/ucfcbb/TeraTools.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Algorithms
*Computational Biology/methods
*Data Compression/methods
*Genomics/methods

RevDate: 2026-07-06
CmpDate: 2026-07-06

Soto-Serrano A, Vincze T, Roberts RJ, et al (2026)

Comparative genomics and methylome profiling of Pseudolactococcus laudensis reveal signatures of niche adaptation and strain-level variation in mobile genetic elements and phage defence.

Microbial genomics, 12(7):.

Pseudolactococcus laudensis (formerly named Lactococcus laudensis) is an emerging lactic acid bacterium first isolated from raw milk in 2015 and subsequently detected in vegetables and dairy mesophilic starter cultures. Despite its recurrent isolation from diverse environments, the genetic basis of its niche adaptation, horizontal gene transfer and phage defence remains unexplored. Here, we perform the first comparative genomic and epigenomic analysis of P. laudensis using complete genomes of a plant-derived isolate (MCRI-603), a milk isolate (DSM 28961) and 20 strains from a Danish dairy mesophilic starter culture. Genomes were annotated and analysed using pangenomics, Clustering of Orthologous Genes and methylome profiling. Average nucleotide identity, pangenome and Clustering of Orthologous Genes analyses revealed niche-associated structure: dairy starter strains formed a tight cluster, while the plant isolate MCRI-603 and milk isolate DSM 28961 were more similar to each other than to the starter culture group. The pangenome comprised 4,946 genes, with 1,396 core genes. Dairy starter strains showed markedly elevated numbers of insertion sequences, pseudogenes, plasmids and genomic islands relative to MCRI-603, which was plasmid-free and carried very few insertion sequence elements or genomic islands. DSM 28961 displayed pseudogene count similar to the dairy starter strains but markedly fewer transposases. These patterns are consistent with a plant-associated origin of P. laudensis and progressive dairy specialization via mobile genetic element acquisition. The P. laudensis mobilome was found to carry key niche-related traits. Lactose utilization operons were plasmid-encoded, whereas exopolysaccharide-encoding loci, opp oligopeptide transport systems and several defence loci, including clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins (CRISPR-Cas), were consistently encoded within chromosomal integrative elements. All strains harboured prophage-like elements, including putatively intact prophages in 13 of them, and ~67% of 238 predicted antiphage systems resided on mobile genetic elements, underscoring their central role in phage defence. Restriction-modification systems dominated the defensome, and three strains encoded CRISPR-Cas systems (including type III-A and type I-C), indicating a higher prevalence than has been reported for Lactococcus lactis and Lactococcus cremoris, where CRISPR-Cas has rarely been observed. Methylome analysis identified 43 distinct motifs, of which 25 were novel. The P. laudensis methylome was overwhelmingly dominated by N[6]-methyladenine, and most motifs were short, non-palindromic and largely associated with type III restriction-modification systems and some type I and II subtypes. Nearly all strains exhibited distinct methylation profiles, including those isolated from the same dairy starter culture, highlighting extensive epigenetic diversification in dairy environments. Altogether, the data reveals a highly dynamic genomic and epigenomic landscape in P. laudensis, greatly shaped by mobile genetic elements, and provides a foundation for future work in this species and other Pseudolactococci.

Additional Links: PMID-42405957

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42405957,
year = {2026},
author = {Soto-Serrano, A and Vincze, T and Roberts, RJ and Krych, L and Mahony, J and Deptula, P},
title = {Comparative genomics and methylome profiling of Pseudolactococcus laudensis reveal signatures of niche adaptation and strain-level variation in mobile genetic elements and phage defence.},
journal = {Microbial genomics},
volume = {12},
number = {7},
pages = {},
doi = {10.1099/mgen.0.001779},
pmid = {42405957},
issn = {2057-5858},
mesh = {*Bacteriophages/genetics ; Genome, Bacterial ; Milk/microbiology ; *Interspersed Repetitive Sequences ; Genomics/methods ; Animals ; Gene Transfer, Horizontal ; DNA Methylation ; Adaptation, Physiological/genetics ; *Lactococcus/genetics/virology ; Phylogeny ; },
abstract = {Pseudolactococcus laudensis (formerly named Lactococcus laudensis) is an emerging lactic acid bacterium first isolated from raw milk in 2015 and subsequently detected in vegetables and dairy mesophilic starter cultures. Despite its recurrent isolation from diverse environments, the genetic basis of its niche adaptation, horizontal gene transfer and phage defence remains unexplored. Here, we perform the first comparative genomic and epigenomic analysis of P. laudensis using complete genomes of a plant-derived isolate (MCRI-603), a milk isolate (DSM 28961) and 20 strains from a Danish dairy mesophilic starter culture. Genomes were annotated and analysed using pangenomics, Clustering of Orthologous Genes and methylome profiling. Average nucleotide identity, pangenome and Clustering of Orthologous Genes analyses revealed niche-associated structure: dairy starter strains formed a tight cluster, while the plant isolate MCRI-603 and milk isolate DSM 28961 were more similar to each other than to the starter culture group. The pangenome comprised 4,946 genes, with 1,396 core genes. Dairy starter strains showed markedly elevated numbers of insertion sequences, pseudogenes, plasmids and genomic islands relative to MCRI-603, which was plasmid-free and carried very few insertion sequence elements or genomic islands. DSM 28961 displayed pseudogene count similar to the dairy starter strains but markedly fewer transposases. These patterns are consistent with a plant-associated origin of P. laudensis and progressive dairy specialization via mobile genetic element acquisition. The P. laudensis mobilome was found to carry key niche-related traits. Lactose utilization operons were plasmid-encoded, whereas exopolysaccharide-encoding loci, opp oligopeptide transport systems and several defence loci, including clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins (CRISPR-Cas), were consistently encoded within chromosomal integrative elements. All strains harboured prophage-like elements, including putatively intact prophages in 13 of them, and ~67% of 238 predicted antiphage systems resided on mobile genetic elements, underscoring their central role in phage defence. Restriction-modification systems dominated the defensome, and three strains encoded CRISPR-Cas systems (including type III-A and type I-C), indicating a higher prevalence than has been reported for Lactococcus lactis and Lactococcus cremoris, where CRISPR-Cas has rarely been observed. Methylome analysis identified 43 distinct motifs, of which 25 were novel. The P. laudensis methylome was overwhelmingly dominated by N[6]-methyladenine, and most motifs were short, non-palindromic and largely associated with type III restriction-modification systems and some type I and II subtypes. Nearly all strains exhibited distinct methylation profiles, including those isolated from the same dairy starter culture, highlighting extensive epigenetic diversification in dairy environments. Altogether, the data reveals a highly dynamic genomic and epigenomic landscape in P. laudensis, greatly shaped by mobile genetic elements, and provides a foundation for future work in this species and other Pseudolactococci.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Bacteriophages/genetics
Genome, Bacterial
Milk/microbiology
*Interspersed Repetitive Sequences
Genomics/methods
Animals
Gene Transfer, Horizontal
DNA Methylation
Adaptation, Physiological/genetics
*Lactococcus/genetics/virology
Phylogeny

RevDate: 2026-07-06
CmpDate: 2026-07-06

Li Z, Qiao Z, Wang Z, et al (2026)

Identification of CAMTA transcription factors and functional analysis of OsCAMTA4 in rice blast and salt stress.

Planta, 264(2):.

The OsCAMTA4 gene regulates salt and blast resistance in rice without yield loss via calcium and ABA signaling. As a key regulatory hub in the calcium signaling pathway, calmodulin-binding transcription activator (CAMTA) responds to diverse stresses and developmental signals. However, its roles in rice salt and rice blast stress responses remain largely unclear. Here, we characterized the rice CAMTA family genome-wide. Using the 3 K Rice Pan-genome and 3,000 Rice Functional Gene Haplotype Databases, we found seven core CAMTA genes are prevalent across 2,978 accessions but unevenly distributed among subgroups, with their three high-frequency haplotypes exerting distinct regulatory effects on key agronomic traits. The seven OsCAMTA genes show spatiotemporally specific responses to drought and cold stress. RT-qPCR revealed that OsCAMTA4 expression specifically was downregulated under rice blast but upregulated under salt stress. Overexpression of OsCAMTA4 enhanced salt tolerance by increasing seed germination rate, root length, proline content, and transcript levels of ABA signaling pathway genes, while decreasing malondialdehyde and hydrogen peroxide (H2O2) contents. Additionally, OsCAMTA4 knockout improved rice blast resistance by increasing proline and H2O2 accumulation and expression of disease resistance-related genes. The OsCAMTA4 protein is localized in the nucleus and interacts with OsCML2, suggesting it mediates stress responses via calcium ion (Ca[2+]) signaling. Notably, the actual presence of the OsCAMTA4 gene has no significant effect on rice yield over wild type, supporting its potential for improving salt tolerance and disease resistance without yield loss. Thus, it provides a new target for breeding broad-spectrum stress-resistant rice.

Additional Links: PMID-42406130

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42406130,
year = {2026},
author = {Li, Z and Qiao, Z and Wang, Z and Huang, R and Tan, J and Zhou, Q and Huang, X and Sheng, F and Du, X},
title = {Identification of CAMTA transcription factors and functional analysis of OsCAMTA4 in rice blast and salt stress.},
journal = {Planta},
volume = {264},
number = {2},
pages = {},
pmid = {42406130},
issn = {1432-2048},
support = {2021BBA224//the Major Projects of Technological Innovation in Hubei Province/ ; },
mesh = {*Oryza/genetics/physiology/microbiology ; *Plant Proteins/genetics/metabolism ; Gene Expression Regulation, Plant ; *Transcription Factors/genetics/metabolism ; *Salt Stress/genetics ; *Plant Diseases/microbiology/genetics ; Abscisic Acid/metabolism ; Salt Tolerance/genetics ; Hydrogen Peroxide/metabolism ; Disease Resistance/genetics ; Stress, Physiological ; Magnaporthe/physiology ; },
abstract = {The OsCAMTA4 gene regulates salt and blast resistance in rice without yield loss via calcium and ABA signaling. As a key regulatory hub in the calcium signaling pathway, calmodulin-binding transcription activator (CAMTA) responds to diverse stresses and developmental signals. However, its roles in rice salt and rice blast stress responses remain largely unclear. Here, we characterized the rice CAMTA family genome-wide. Using the 3 K Rice Pan-genome and 3,000 Rice Functional Gene Haplotype Databases, we found seven core CAMTA genes are prevalent across 2,978 accessions but unevenly distributed among subgroups, with their three high-frequency haplotypes exerting distinct regulatory effects on key agronomic traits. The seven OsCAMTA genes show spatiotemporally specific responses to drought and cold stress. RT-qPCR revealed that OsCAMTA4 expression specifically was downregulated under rice blast but upregulated under salt stress. Overexpression of OsCAMTA4 enhanced salt tolerance by increasing seed germination rate, root length, proline content, and transcript levels of ABA signaling pathway genes, while decreasing malondialdehyde and hydrogen peroxide (H2O2) contents. Additionally, OsCAMTA4 knockout improved rice blast resistance by increasing proline and H2O2 accumulation and expression of disease resistance-related genes. The OsCAMTA4 protein is localized in the nucleus and interacts with OsCML2, suggesting it mediates stress responses via calcium ion (Ca[2+]) signaling. Notably, the actual presence of the OsCAMTA4 gene has no significant effect on rice yield over wild type, supporting its potential for improving salt tolerance and disease resistance without yield loss. Thus, it provides a new target for breeding broad-spectrum stress-resistant rice.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Oryza/genetics/physiology/microbiology
*Plant Proteins/genetics/metabolism
Gene Expression Regulation, Plant
*Transcription Factors/genetics/metabolism
*Salt Stress/genetics
*Plant Diseases/microbiology/genetics
Abscisic Acid/metabolism
Salt Tolerance/genetics
Hydrogen Peroxide/metabolism
Disease Resistance/genetics
Stress, Physiological
Magnaporthe/physiology

RevDate: 2026-07-06

Mahamud SMI, Sahim M, Sanam M, et al (2026)

Genomic landscape of 340 virulent Acinetobacter bacteriophages reveals anti-CRISPR-enriched candidates for therapeutic prioritization.

European journal of clinical microbiology & infectious diseases : official publication of the European Society of Clinical Microbiology [Epub ahead of print].

PURPOSE: Carbapenem-resistant Acinetobacter baumannii (CRAB) represents a critical global health threat for which existing antibiotics are increasingly inadequate. This study aimed to establish a comprehensive genomic framework for the rational prioritization of virulent Acinetobacter bacteriophages as therapeutic candidates.

METHODS: We performed large-scale comparative genomic analysis of 340 virulent Acinetobacter bacteriophages, integrating phylogenetic reconstruction, pangenome analysis, CRISPR spacer-based host interaction mapping, Anti-CRISPR protein identification, and systematic antimicrobial resistance (AMR) gene screening.

RESULTS: Genome sizes spanned a nearly 20-fold range, with a significant negative correlation between genome size and GC content (R² = 0.139, ρ = -0.630). Phylogenetic analysis revealed extensive divergence across multiple lineages with no dominant clade. Pangenome analysis identified 20,982 unique protein families, of which 76.2% were cloud genes, confirming a highly open genome architecture. CRISPR spacer matching yielded 1,480 high-confidence matches across 100 phage genomes, providing molecular evidence of broad historical infectivity. Anti-CRISPR profiling identified Acinetobacter phage XC1 as an exceptional therapeutic candidate harboring 55 predicted Anti-CRISPR proteins with canonical regulatory locus architecture. AMR screening identified 21 distinct AMR gene homologs (Loose RGI hits, 22.5 to 47.1% amino acid identity) distributed heterogeneously across the dataset, confirming abundant therapeutically clean candidates while flagging a subset warranting further scrutiny before therapeutic exclusion.

CONCLUSION: These findings provide a multi-criteria genomic framework for rational phage candidate prioritization against multidrug-resistant Acinetobacter infections, with direct implications for evidence-based phage therapy development.

Additional Links: PMID-42406312

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42406312,
year = {2026},
author = {Mahamud, SMI and Sahim, M and Sanam, M and Oishi, JF and Tabassum, J and Shams, NB and Ansary, MM},
title = {Genomic landscape of 340 virulent Acinetobacter bacteriophages reveals anti-CRISPR-enriched candidates for therapeutic prioritization.},
journal = {European journal of clinical microbiology & infectious diseases : official publication of the European Society of Clinical Microbiology},
volume = {},
number = {},
pages = {},
pmid = {42406312},
issn = {1435-4373},
abstract = {PURPOSE: Carbapenem-resistant Acinetobacter baumannii (CRAB) represents a critical global health threat for which existing antibiotics are increasingly inadequate. This study aimed to establish a comprehensive genomic framework for the rational prioritization of virulent Acinetobacter bacteriophages as therapeutic candidates.

METHODS: We performed large-scale comparative genomic analysis of 340 virulent Acinetobacter bacteriophages, integrating phylogenetic reconstruction, pangenome analysis, CRISPR spacer-based host interaction mapping, Anti-CRISPR protein identification, and systematic antimicrobial resistance (AMR) gene screening.

RESULTS: Genome sizes spanned a nearly 20-fold range, with a significant negative correlation between genome size and GC content (R² = 0.139, ρ = -0.630). Phylogenetic analysis revealed extensive divergence across multiple lineages with no dominant clade. Pangenome analysis identified 20,982 unique protein families, of which 76.2% were cloud genes, confirming a highly open genome architecture. CRISPR spacer matching yielded 1,480 high-confidence matches across 100 phage genomes, providing molecular evidence of broad historical infectivity. Anti-CRISPR profiling identified Acinetobacter phage XC1 as an exceptional therapeutic candidate harboring 55 predicted Anti-CRISPR proteins with canonical regulatory locus architecture. AMR screening identified 21 distinct AMR gene homologs (Loose RGI hits, 22.5 to 47.1% amino acid identity) distributed heterogeneously across the dataset, confirming abundant therapeutically clean candidates while flagging a subset warranting further scrutiny before therapeutic exclusion.

CONCLUSION: These findings provide a multi-criteria genomic framework for rational phage candidate prioritization against multidrug-resistant Acinetobacter infections, with direct implications for evidence-based phage therapy development.},
}

RevDate: 2026-07-07

Wang C, Huang D, Dong Z, et al (2026)

Pan-genomic and transcriptomic analyses reveal subfunctionalization of CBP/p300-like histone acetyltransferases in soybean seed development.

BMC plant biology pii:10.1186/s12870-026-09449-y [Epub ahead of print].

BACKGROUND: Soybean is a crucial global source of protein and oil. The CBP/p300 histone acetyltransferases (HACs) are key transcriptional regulators, yet their diversity and functions in soybean remain unexplored at a pan-genomic level.

RESULTS: Here, we constructed a pan-genomic resource for the HAC gene family across 29 wild, landrace, and cultivated soybean accessions, identifying 142 HAC genes. These genes are confined to chromosomes 7, 8, 15, and 19, indicating strong evolutionary constraints. Phylogenetic analysis divided HACs into five subgroups with distinct domain architectures: Group 4-5 retain full CBP/p300 domains, whereas Group 1-3 show progressive domain loss. Pan-transcriptomic analyses revealed an expression dichotomy: Group 3-5 are broadly expressed, while Group 1-2 exhibit endosperm-specific expression during early seed development, suggesting specialized roles in nutrient transfer and embryogenesis. Notably, elite cultivars (e.g., Wm82, ZH13) have lost Group 2 homologs preserved in wild soybeans, highlighting domestication-driven erosion of epigenetic diversity. Co-expression network analysis prioritized Wm82-HAC1 (Group 1) as a candidate gene coordinating nutrient metabolism and seed maturation pathways.

CONCLUSION: Our study provides the first comprehensive panorama of epigenetic regulators in the soybean pan-genome. Our findings reveal how subfunctionalization and domestication help shape the HAC regulatory network in soybean, highlighting wild germplasm as a valuable reservoir for recovering lost alleles (Group2 homologs) and identifying Wm82-HAC1 (Group 1) as a prime target for precision breeding of seed traits.

Additional Links: PMID-42410509

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42410509,
year = {2026},
author = {Wang, C and Huang, D and Dong, Z and Chen, Y},
title = {Pan-genomic and transcriptomic analyses reveal subfunctionalization of CBP/p300-like histone acetyltransferases in soybean seed development.},
journal = {BMC plant biology},
volume = {},
number = {},
pages = {},
doi = {10.1186/s12870-026-09449-y},
pmid = {42410509},
issn = {1471-2229},
support = {32201799//National Natural Science Foundation of China/ ; },
abstract = {BACKGROUND: Soybean is a crucial global source of protein and oil. The CBP/p300 histone acetyltransferases (HACs) are key transcriptional regulators, yet their diversity and functions in soybean remain unexplored at a pan-genomic level.

RESULTS: Here, we constructed a pan-genomic resource for the HAC gene family across 29 wild, landrace, and cultivated soybean accessions, identifying 142 HAC genes. These genes are confined to chromosomes 7, 8, 15, and 19, indicating strong evolutionary constraints. Phylogenetic analysis divided HACs into five subgroups with distinct domain architectures: Group 4-5 retain full CBP/p300 domains, whereas Group 1-3 show progressive domain loss. Pan-transcriptomic analyses revealed an expression dichotomy: Group 3-5 are broadly expressed, while Group 1-2 exhibit endosperm-specific expression during early seed development, suggesting specialized roles in nutrient transfer and embryogenesis. Notably, elite cultivars (e.g., Wm82, ZH13) have lost Group 2 homologs preserved in wild soybeans, highlighting domestication-driven erosion of epigenetic diversity. Co-expression network analysis prioritized Wm82-HAC1 (Group 1) as a candidate gene coordinating nutrient metabolism and seed maturation pathways.

CONCLUSION: Our study provides the first comprehensive panorama of epigenetic regulators in the soybean pan-genome. Our findings reveal how subfunctionalization and domestication help shape the HAC regulatory network in soybean, highlighting wild germplasm as a valuable reservoir for recovering lost alleles (Group2 homologs) and identifying Wm82-HAC1 (Group 1) as a prime target for precision breeding of seed traits.},
}

RevDate: 2026-07-07

Furtado KL, Gilbert JA, M Neal (2026)

Bioactive environments to combat antimicrobial resistance: artificial intelligence and model-driven microbial biocontrol for living materials.

Journal of applied microbiology pii:8726148 [Epub ahead of print].

Antimicrobial resistance (AMR) continues to outpace development of new therapeutics. Many interventions focus on treating infection after it occurs, but resistant pathogens often emerge, persist, and spread within reservoirs, such as built environments. Microbial biocontrol offers a complementary, upstream strategy by reshaping ecological interactions to suppress the colonization, persistence, and transmission of AMR pathogens. Currently, biocontrol design relies upon the presumed functionality of probiotic genera across diverse environments despite limited experimental validation, alongside heuristic model predictions that prioritize efficiency over sensitivity. These approaches yield inconsistent outcomes, reflecting the context-dependent nature of microbial behavior. We review how advances in metabolic modeling and artificial intelligence (AI), in conjunction with experimental data, enable adaptable, context-aware biocontrol design with iterative design-test-learn cycles for optimization. We outline the ecological principles underlying microbial competition, highlighting Bacillus as a robust biocontrol chassis due to its biosynthetic capacity, stress tolerance, and genetic tractability. We then discuss how genome-scale, pan-genome-scale, and metabolism-and-expression models provide mechanistic insight into competitive fitness, metabolic trade-offs, and persistence. AI advances these approaches by extracting patterns from multi-omic datasets to build specific, yet versatile, foundation models (FMs) that guide strain and/or consortium selection for specific built environments. Moreover, these tools facilitate safe biocontrol deployment by enabling risk assessment of persistence, ecological displacement, and horizontal gene transfer (HGT), particularly for engineered living materials (ELMs) and bioactive building surfaces. Ultimately, AI-guided modeling and systems-level design provide scalable frameworks for developing durable, preventive strategies against AMR, shifting the focus from reactive treatment toward proactive control of pathogen ecology.

Additional Links: PMID-42411838

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42411838,
year = {2026},
author = {Furtado, KL and Gilbert, JA and Neal, M},
title = {Bioactive environments to combat antimicrobial resistance: artificial intelligence and model-driven microbial biocontrol for living materials.},
journal = {Journal of applied microbiology},
volume = {},
number = {},
pages = {},
doi = {10.1093/jambio/lxag161},
pmid = {42411838},
issn = {1365-2672},
abstract = {Antimicrobial resistance (AMR) continues to outpace development of new therapeutics. Many interventions focus on treating infection after it occurs, but resistant pathogens often emerge, persist, and spread within reservoirs, such as built environments. Microbial biocontrol offers a complementary, upstream strategy by reshaping ecological interactions to suppress the colonization, persistence, and transmission of AMR pathogens. Currently, biocontrol design relies upon the presumed functionality of probiotic genera across diverse environments despite limited experimental validation, alongside heuristic model predictions that prioritize efficiency over sensitivity. These approaches yield inconsistent outcomes, reflecting the context-dependent nature of microbial behavior. We review how advances in metabolic modeling and artificial intelligence (AI), in conjunction with experimental data, enable adaptable, context-aware biocontrol design with iterative design-test-learn cycles for optimization. We outline the ecological principles underlying microbial competition, highlighting Bacillus as a robust biocontrol chassis due to its biosynthetic capacity, stress tolerance, and genetic tractability. We then discuss how genome-scale, pan-genome-scale, and metabolism-and-expression models provide mechanistic insight into competitive fitness, metabolic trade-offs, and persistence. AI advances these approaches by extracting patterns from multi-omic datasets to build specific, yet versatile, foundation models (FMs) that guide strain and/or consortium selection for specific built environments. Moreover, these tools facilitate safe biocontrol deployment by enabling risk assessment of persistence, ecological displacement, and horizontal gene transfer (HGT), particularly for engineered living materials (ELMs) and bioactive building surfaces. Ultimately, AI-guided modeling and systems-level design provide scalable frameworks for developing durable, preventive strategies against AMR, shifting the focus from reactive treatment toward proactive control of pathogen ecology.},
}

RevDate: 2026-07-07
CmpDate: 2026-07-07

Harviainen J, Sena F, Moumard C, et al (2026)

Scalable computation of ultrabubbles in pangenomes by orienting bidirected graphs.

Bioinformatics (Oxford, England), 42(Supplement_1):.

MOTIVATION: Pangenome graphs are increasingly used in bioinformatics, ranging from environmental surveillance and crop improvement to the construction of population-scale human pangenomes. As these graphs grow in size, methods that scale efficiently become essential. A central task in pangenome analysis is the discovery of variation structures. In directed graphs, the most widely studied such structures, superbubbles, can be identified in linear time. Their canonical generalization to bidirected graphs, ultrabubbles, more accurately models DNA reverse complementarity. However, existing ultrabubble algorithms are quadratic in the worst case.

RESULTS: We show that all ultrabubbles in a bidirected graph containing at least one tip or one cutvertex-a common property of pangenome graphs-can be computed in linear time. Our key contribution is a new linear-time orientation algorithm that transforms such a bidirected graph into a directed graph of the same size, in practice. Orientation conflicts are resolved by introducing auxiliary source or sink vertices. We prove that ultrabubbles in the original bidirected graph correspond to weak superbubbles in the resulting directed graph, enabling the use of existing linear-time algorithms. Our approach achieves speedups of up to 25× over the ultrabubble implementation in vg, and of >200× over BubbleGun, enabling scalable pangenome analyses. For example, on the v2.0 pangenome graph constructed by the Human Pangenome Reference Consortium from 232 individuals, after reading the input, our method completes in under 3 min, while vg requires >1 hour, and four times more RAM.

Our method is implemented in the BubbleFinder tool github.com/algbio/BubbleFinder, via the new ultrabubbles subcommand.

Additional Links: PMID-42412794

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42412794,
year = {2026},
author = {Harviainen, J and Sena, F and Moumard, C and Politov, A and Schmidt, S and Tomescu, AI},
title = {Scalable computation of ultrabubbles in pangenomes by orienting bidirected graphs.},
journal = {Bioinformatics (Oxford, England)},
volume = {42},
number = {Supplement_1},
pages = {},
doi = {10.1093/bioinformatics/btag235},
pmid = {42412794},
issn = {1367-4811},
support = {101169716//European Union/ ; 351156//Research Council of Finland/ ; //Helsinki Institute of Information Technology/ ; },
mesh = {*Algorithms ; Humans ; *Genomics/methods ; *Computational Biology/methods ; },
abstract = {MOTIVATION: Pangenome graphs are increasingly used in bioinformatics, ranging from environmental surveillance and crop improvement to the construction of population-scale human pangenomes. As these graphs grow in size, methods that scale efficiently become essential. A central task in pangenome analysis is the discovery of variation structures. In directed graphs, the most widely studied such structures, superbubbles, can be identified in linear time. Their canonical generalization to bidirected graphs, ultrabubbles, more accurately models DNA reverse complementarity. However, existing ultrabubble algorithms are quadratic in the worst case.

RESULTS: We show that all ultrabubbles in a bidirected graph containing at least one tip or one cutvertex-a common property of pangenome graphs-can be computed in linear time. Our key contribution is a new linear-time orientation algorithm that transforms such a bidirected graph into a directed graph of the same size, in practice. Orientation conflicts are resolved by introducing auxiliary source or sink vertices. We prove that ultrabubbles in the original bidirected graph correspond to weak superbubbles in the resulting directed graph, enabling the use of existing linear-time algorithms. Our approach achieves speedups of up to 25× over the ultrabubble implementation in vg, and of >200× over BubbleGun, enabling scalable pangenome analyses. For example, on the v2.0 pangenome graph constructed by the Human Pangenome Reference Consortium from 232 individuals, after reading the input, our method completes in under 3 min, while vg requires >1 hour, and four times more RAM.

Our method is implemented in the BubbleFinder tool github.com/algbio/BubbleFinder, via the new ultrabubbles subcommand.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Algorithms
Humans
*Genomics/methods
*Computational Biology/methods

RevDate: 2026-07-03

Yang L, Wang J, Kuhn K, et al (2026)

Pangenome-based structural variant imputation enables large-scale genotype-phenotype studies in dairy cattle.

Nature communications pii:10.1038/s41467-026-75219-x [Epub ahead of print].

Pangenomes of several species have been assembled recently, facilitating the detection and genotyping of structural variants. As part of the FarmGTEx Project, we previously constructed a Holstein pangenome (H20D) based on 40 phased haploid assemblies. Here, we use this breed specific pangenome to genotype 93,059 structural variants from whole-genome sequences of 1,571 cattle. We then develop a Holstein pangenome variation imputation reference panel we name HolPIP. Leveraging HolPIP, we impute 86.65% (68,354/78,886) of structural variants for 50,299 bulls with Beagle R[2] ≥ 0.8. Using these imputed structural variants and phenotypes for 43 complex traits, we conduct GWAS, identifying 1,225 structural variant-trait associations. We next use fine-mapping to prioritize 32 high-confidence candidate structural variants, including a 75-bp deletion in ANKRD11 linked to dairy form, rump width, and stature, as well as an insertion in DHX32 associated with RNA metabolism. Compared to SNPs across various functional annotations, structural variants show a stronger genome-wide enrichment across most complex traits in cattle, suggesting that structural variants may have an important contribution to the genetic basis of dairy traits.

Additional Links: PMID-42399265

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42399265,
year = {2026},
author = {Yang, L and Wang, J and Kuhn, K and Li, W and Zanton, G and Neupane, M and Boschiero, C and Cole, JB and Li, B and Li, C and Baldwin Vi, RL and Van Tassell, CP and Rosen, BD and Smith, TPL and Jiang, J and Fang, L and Ma, L and Liu, GE},
title = {Pangenome-based structural variant imputation enables large-scale genotype-phenotype studies in dairy cattle.},
journal = {Nature communications},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41467-026-75219-x},
pmid = {42399265},
issn = {2041-1723},
support = {2019-67015-29321//United States Department of Agriculture | National Institute of Food and Agriculture (NIFA)/ ; 2021-67015-33409//United States Department of Agriculture | National Institute of Food and Agriculture (NIFA)/ ; },
abstract = {Pangenomes of several species have been assembled recently, facilitating the detection and genotyping of structural variants. As part of the FarmGTEx Project, we previously constructed a Holstein pangenome (H20D) based on 40 phased haploid assemblies. Here, we use this breed specific pangenome to genotype 93,059 structural variants from whole-genome sequences of 1,571 cattle. We then develop a Holstein pangenome variation imputation reference panel we name HolPIP. Leveraging HolPIP, we impute 86.65% (68,354/78,886) of structural variants for 50,299 bulls with Beagle R[2] ≥ 0.8. Using these imputed structural variants and phenotypes for 43 complex traits, we conduct GWAS, identifying 1,225 structural variant-trait associations. We next use fine-mapping to prioritize 32 high-confidence candidate structural variants, including a 75-bp deletion in ANKRD11 linked to dairy form, rump width, and stature, as well as an insertion in DHX32 associated with RNA metabolism. Compared to SNPs across various functional annotations, structural variants show a stronger genome-wide enrichment across most complex traits in cattle, suggesting that structural variants may have an important contribution to the genetic basis of dairy traits.},
}

RevDate: 2026-07-04
CmpDate: 2026-07-04

Mukadam H, Bhujbal T, Bhattacharyya K, et al (2026)

Systems-level genomic and functional characterization of lipopeptide biosurfactant pathways in Bacillus albus MITWPUB5.

World journal of microbiology & biotechnology, 42(7):.

This study describes the system level genomic analysis, chemical and pathway characterisation of a biotechnologically important, yet lesser-known biosurfactant producer, Bacillus albus MITWPUB5. The organism is an isolate of hydrocarbon-contaminated soil. Thin Layer Chromatography (TLC), Fourier Transform Infrared Spectroscopy (FTIR), and Liquid Chromatography-Mass Spectrometry (LC-MS) analysis revealed that the biosurfactant produced by this isolate belongs to lipopeptide class. The biosurfactant's nature was found to be anionic, with a high emulsification index (76%) and stability across a wide range of abiotic conditions such as temperatures, pH levels, and salt concentrations. Comprehensive genomics coupled with metabolic pathway analysis revealed genes encoding key enzymes in fatty acid and lipoprotein biosynthetic pathways, and Non-Ribosomal Peptide (NRP) synthesis. We present a novel computational approach which combines biosynthetic gene cluster prediction, NRPS module analysis to decode conserved catalytic domains and their encoded substrate that govern lipopeptide biosurfactant synthesis in B. albus. Comparative genome analysis of globally available B. albus strains followed by phylogenetic analysis revealed the conservation and distribution of genes encoding lipopeptide biosurfactant. Pangenome analysis uncovered an open structure in B. albus, where conserved fatty acid biosynthesis genes formed a stable core while NRPS genes exhibited strain-specific distribution within the accessory genome. Collectively, these findings demonstrate biotechnological relevance of MITWPUB5 as a promising source of an environmentally stable lipopeptide biosurfactant. The study contributes to the United Nations Sustainable Development Goals (UNSDGs) by highlighting the potential of an ecologically sustainable, biologically derived biosurfactant that can reduce reliance on synthetic surfactants. Such integrative framework can facilitate deeper insights into metabolic pathways, regulatory networks, and optimization strategies governing biosurfactant synthesis.

Additional Links: PMID-42400755

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42400755,
year = {2026},
author = {Mukadam, H and Bhujbal, T and Bhattacharyya, K and Gaikwad, S},
title = {Systems-level genomic and functional characterization of lipopeptide biosurfactant pathways in Bacillus albus MITWPUB5.},
journal = {World journal of microbiology & biotechnology},
volume = {42},
number = {7},
pages = {},
pmid = {42400755},
issn = {1573-0972},
mesh = {*Biosurfactants/metabolism/chemistry ; *Lipopeptides/chemistry/biosynthesis/metabolism/genetics ; *Bacillus/genetics/metabolism/classification/isolation & purification ; Phylogeny ; Genomics ; Biosynthetic Pathways/genetics ; Genome, Bacterial ; Multigene Family ; Fatty Acids/biosynthesis ; Soil Microbiology ; Peptide Synthases/genetics ; Chromatography, Thin Layer ; *Surface-Active Agents/metabolism/chemistry ; Metabolic Networks and Pathways/genetics ; Bacterial Proteins/genetics/metabolism ; },
abstract = {This study describes the system level genomic analysis, chemical and pathway characterisation of a biotechnologically important, yet lesser-known biosurfactant producer, Bacillus albus MITWPUB5. The organism is an isolate of hydrocarbon-contaminated soil. Thin Layer Chromatography (TLC), Fourier Transform Infrared Spectroscopy (FTIR), and Liquid Chromatography-Mass Spectrometry (LC-MS) analysis revealed that the biosurfactant produced by this isolate belongs to lipopeptide class. The biosurfactant's nature was found to be anionic, with a high emulsification index (76%) and stability across a wide range of abiotic conditions such as temperatures, pH levels, and salt concentrations. Comprehensive genomics coupled with metabolic pathway analysis revealed genes encoding key enzymes in fatty acid and lipoprotein biosynthetic pathways, and Non-Ribosomal Peptide (NRP) synthesis. We present a novel computational approach which combines biosynthetic gene cluster prediction, NRPS module analysis to decode conserved catalytic domains and their encoded substrate that govern lipopeptide biosurfactant synthesis in B. albus. Comparative genome analysis of globally available B. albus strains followed by phylogenetic analysis revealed the conservation and distribution of genes encoding lipopeptide biosurfactant. Pangenome analysis uncovered an open structure in B. albus, where conserved fatty acid biosynthesis genes formed a stable core while NRPS genes exhibited strain-specific distribution within the accessory genome. Collectively, these findings demonstrate biotechnological relevance of MITWPUB5 as a promising source of an environmentally stable lipopeptide biosurfactant. The study contributes to the United Nations Sustainable Development Goals (UNSDGs) by highlighting the potential of an ecologically sustainable, biologically derived biosurfactant that can reduce reliance on synthetic surfactants. Such integrative framework can facilitate deeper insights into metabolic pathways, regulatory networks, and optimization strategies governing biosurfactant synthesis.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Biosurfactants/metabolism/chemistry
*Lipopeptides/chemistry/biosynthesis/metabolism/genetics
*Bacillus/genetics/metabolism/classification/isolation & purification
Phylogeny
Genomics
Biosynthetic Pathways/genetics
Genome, Bacterial
Multigene Family
Fatty Acids/biosynthesis
Soil Microbiology
Peptide Synthases/genetics
Chromatography, Thin Layer
*Surface-Active Agents/metabolism/chemistry
Metabolic Networks and Pathways/genetics
Bacterial Proteins/genetics/metabolism

RevDate: 2026-07-05

Cruz-Medrano MG, Sánchez-Reyes A, Manzanares-Leal GL, et al (2026)

Pangenome analysis of Nocardia brasiliensis reveals phylogenetic divergence, high genomic diversity and widespread distribution of biosynthetic gene clusters involved in secondary metabolite biosynthesis.

Molecular phylogenetics and evolution pii:S1055-7903(26)00156-9 [Epub ahead of print].

Actinobacteria are a diverse and heterogeneous group of bacteria with complex taxonomy that produce most of the natural products used in medicine. Although comparative genomic studies of Nocardia species have been reported, comprehensive species-level analyses integrating phylogenomics, pangenome structure, and biosynthetic gene cluster distribution in N. brasiliensis remain limited. In this study, we performed phylogenomic orthology inference, analyzed pangenome composition, and evaluated the potential of Nocardia brasiliensis as a source of secondary metabolites using comparative genomics. Four clinical strains from Mexico and 22 publicly accessible genomes were included. Genomic identification was performed, orthologous genes were identified, core genome and pangenome composition were estimated, and phylogenomic orthology inference was assessed. All genomes were searched for known BGCs, secondary metabolites were predicted, and data on reported biological activity were collected. A pangenome comprising 17,715 clusters was calculated, with the core genome accounting for 22.76 % and the cloud genome for 48.17 %. The trend in the gene accumulation curve indicated that the species had an open pangenome, as the continuous increase in gene clusters with the addition of new genomes suggests a high level of genomic diversity and ongoing gene acquisition within the species, reflecting its capacity for environmental adaptation and evolutionary plasticity. Phylogenomic analysis showed that geographical origin and isolation conditions affect evolutionary divergence within N. brasiliensis. Computational BGC prediction detected PKS, NRPS, NAPAA, terpenes, aminopolycarboxylic acids, hybrids, and other clusters coding for secondary metabolites with antimicrobial activity (ε-Poly-L-lysine, brasiliquinones A-B), antitumor activity (rhizomides A-C, anthramycin), antioxidant activity (isorenieratene), and a fertilizer for calcareous soils ([S, S]-EDDS). The results reveal significant genomic diversity and a wide distribution of biosynthetic clusters within the Nocardia brasiliensis pangenome, demonstrating its genomic plasticity and the variability in metabolic potential across strains.

Additional Links: PMID-42402314

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42402314,
year = {2026},
author = {Cruz-Medrano, MG and Sánchez-Reyes, A and Manzanares-Leal, GL and Ramírez-Durán, N},
title = {Pangenome analysis of Nocardia brasiliensis reveals phylogenetic divergence, high genomic diversity and widespread distribution of biosynthetic gene clusters involved in secondary metabolite biosynthesis.},
journal = {Molecular phylogenetics and evolution},
volume = {},
number = {},
pages = {108686},
doi = {10.1016/j.ympev.2026.108686},
pmid = {42402314},
issn = {1095-9513},
abstract = {Actinobacteria are a diverse and heterogeneous group of bacteria with complex taxonomy that produce most of the natural products used in medicine. Although comparative genomic studies of Nocardia species have been reported, comprehensive species-level analyses integrating phylogenomics, pangenome structure, and biosynthetic gene cluster distribution in N. brasiliensis remain limited. In this study, we performed phylogenomic orthology inference, analyzed pangenome composition, and evaluated the potential of Nocardia brasiliensis as a source of secondary metabolites using comparative genomics. Four clinical strains from Mexico and 22 publicly accessible genomes were included. Genomic identification was performed, orthologous genes were identified, core genome and pangenome composition were estimated, and phylogenomic orthology inference was assessed. All genomes were searched for known BGCs, secondary metabolites were predicted, and data on reported biological activity were collected. A pangenome comprising 17,715 clusters was calculated, with the core genome accounting for 22.76 % and the cloud genome for 48.17 %. The trend in the gene accumulation curve indicated that the species had an open pangenome, as the continuous increase in gene clusters with the addition of new genomes suggests a high level of genomic diversity and ongoing gene acquisition within the species, reflecting its capacity for environmental adaptation and evolutionary plasticity. Phylogenomic analysis showed that geographical origin and isolation conditions affect evolutionary divergence within N. brasiliensis. Computational BGC prediction detected PKS, NRPS, NAPAA, terpenes, aminopolycarboxylic acids, hybrids, and other clusters coding for secondary metabolites with antimicrobial activity (ε-Poly-L-lysine, brasiliquinones A-B), antitumor activity (rhizomides A-C, anthramycin), antioxidant activity (isorenieratene), and a fertilizer for calcareous soils ([S, S]-EDDS). The results reveal significant genomic diversity and a wide distribution of biosynthetic clusters within the Nocardia brasiliensis pangenome, demonstrating its genomic plasticity and the variability in metabolic potential across strains.},
}

RevDate: 2026-07-06
CmpDate: 2026-07-06

Vijayakumar S, S Ramaiah (2026)

A comparative genomic approach to identify determinants of meropenem resistance in Klebsiella pneumoniae using pan-genome-wide association analysis.

Frontiers in microbiology, 17:1851170.

Carbapenem-resistant Klebsiella pneumoniae (CRKP) is a serious global health threat associated with both nosocomial and community-acquired infections. The reduced efficacy of meropenem due to diverse resistance mechanisms emphasizes the need to investigate genetic determinants and their association with meropenem susceptibility, which is explored in this study. We have performed pan-genome analysis and genome-wide association studies on 350 K. pneumoniae genomes with corresponding antimicrobial susceptibility data to elucidate the genetic basis of meropenem resistance. A high prevalence of sequence types, such as ST101, ST11, ST147, and ST383, was observed among meropenem-resistant genomes. KL17 and KL64 were the predominant capsular types associated with resistance phenotypes. Class-A β-lactamases were widely distributed across both resistant and susceptible genomes. Carbapenemases, including NDM and KPC variants, were predominantly detected in meropenem-resistant genomes. The pan-genome exhibited an open structure, with mobilome (12.49%) and defence-related genes (7.53%) predominant in the accessory genome. Regarding alterations in outer membrane porins, over half of the resistant genomes showed predicted truncations in OmpK35 (56.13%). Additionally, OmpK36 in resistant genomes exhibited GD, TD, SD, and D amino acid insertions that were absent in susceptible genomes. Genome-wide association analyses identified several genes significantly associated with meropenem-resistance, including blaNDM-1 , ble, trpF, cutA, groL, and groS, along with blaOXA and multiple transposases. Overall, this study provides a comprehensive genomic framework for understanding meropenem resistance in K. pneumoniae, highlighting the interplay between carbapenemase production and porin modifications. These findings emphasize the necessity of ongoing genomic surveillance and improvement in effective therapeutic strategies to combat multidrug-resistant infections.

Additional Links: PMID-42404801

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42404801,
year = {2026},
author = {Vijayakumar, S and Ramaiah, S},
title = {A comparative genomic approach to identify determinants of meropenem resistance in Klebsiella pneumoniae using pan-genome-wide association analysis.},
journal = {Frontiers in microbiology},
volume = {17},
number = {},
pages = {1851170},
pmid = {42404801},
issn = {1664-302X},
abstract = {Carbapenem-resistant Klebsiella pneumoniae (CRKP) is a serious global health threat associated with both nosocomial and community-acquired infections. The reduced efficacy of meropenem due to diverse resistance mechanisms emphasizes the need to investigate genetic determinants and their association with meropenem susceptibility, which is explored in this study. We have performed pan-genome analysis and genome-wide association studies on 350 K. pneumoniae genomes with corresponding antimicrobial susceptibility data to elucidate the genetic basis of meropenem resistance. A high prevalence of sequence types, such as ST101, ST11, ST147, and ST383, was observed among meropenem-resistant genomes. KL17 and KL64 were the predominant capsular types associated with resistance phenotypes. Class-A β-lactamases were widely distributed across both resistant and susceptible genomes. Carbapenemases, including NDM and KPC variants, were predominantly detected in meropenem-resistant genomes. The pan-genome exhibited an open structure, with mobilome (12.49%) and defence-related genes (7.53%) predominant in the accessory genome. Regarding alterations in outer membrane porins, over half of the resistant genomes showed predicted truncations in OmpK35 (56.13%). Additionally, OmpK36 in resistant genomes exhibited GD, TD, SD, and D amino acid insertions that were absent in susceptible genomes. Genome-wide association analyses identified several genes significantly associated with meropenem-resistance, including blaNDM-1 , ble, trpF, cutA, groL, and groS, along with blaOXA and multiple transposases. Overall, this study provides a comprehensive genomic framework for understanding meropenem resistance in K. pneumoniae, highlighting the interplay between carbapenemase production and porin modifications. These findings emphasize the necessity of ongoing genomic surveillance and improvement in effective therapeutic strategies to combat multidrug-resistant infections.},
}

RevDate: 2026-07-03
CmpDate: 2026-07-03

Pajic P, O Gokcumen (2026)

Evolutionary genetic approaches to analyze mucins.

Methods in enzymology, 732:569-588.

Mucins are heavily glycosylated proteins that form protective mucus barriers at host-environment interfaces. Mucin genes frequently contain exonic variable number tandem repeat (exVNTR) domains that encode peptides enriched in proline, threonine, and serine. These repeat domains create substantial challenges for comparative and population genetic analyses because short-read sequencing often collapses repeat arrays and obscures haplotype structure. Recent advances in long-read sequencing, pangenome resources, and specialized VNTR analysis tools now enable systematic investigation of mucin genetics and evolution. In this chapter, we present practical protocols for identifying candidate mucin genes across species, annotating mucin exVNTRs from long-read genome assemblies, and genotyping exVNTR alleles in large short-read sequencing cohorts. We further outline analytical strategies for evaluating natural selection acting on mucin repeat domains. Together, these protocols enable systematic identification, structural resolution, and evolutionary analysis of mucin variation across species and human populations.

Additional Links: PMID-42399062

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42399062,
year = {2026},
author = {Pajic, P and Gokcumen, O},
title = {Evolutionary genetic approaches to analyze mucins.},
journal = {Methods in enzymology},
volume = {732},
number = {},
pages = {569-588},
doi = {10.1016/bs.mie.2026.03.009},
pmid = {42399062},
issn = {1557-7988},
mesh = {*Mucins/genetics/chemistry ; Humans ; *Evolution, Molecular ; Animals ; Minisatellite Repeats ; Selection, Genetic ; Glycosylation ; },
abstract = {Mucins are heavily glycosylated proteins that form protective mucus barriers at host-environment interfaces. Mucin genes frequently contain exonic variable number tandem repeat (exVNTR) domains that encode peptides enriched in proline, threonine, and serine. These repeat domains create substantial challenges for comparative and population genetic analyses because short-read sequencing often collapses repeat arrays and obscures haplotype structure. Recent advances in long-read sequencing, pangenome resources, and specialized VNTR analysis tools now enable systematic investigation of mucin genetics and evolution. In this chapter, we present practical protocols for identifying candidate mucin genes across species, annotating mucin exVNTRs from long-read genome assemblies, and genotyping exVNTR alleles in large short-read sequencing cohorts. We further outline analytical strategies for evaluating natural selection acting on mucin repeat domains. Together, these protocols enable systematic identification, structural resolution, and evolutionary analysis of mucin variation across species and human populations.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Mucins/genetics/chemistry
Humans
*Evolution, Molecular
Animals
Minisatellite Repeats
Selection, Genetic
Glycosylation

RevDate: 2026-07-03
CmpDate: 2026-07-03

Kokroko N, Jayanti R, Sapoval N, et al (2026)

Kente: A Graph-based Pangenomic Approach for Horizontal Gene Transfer Detection in Microbiomes.

bioRxiv : the preprint server for biology pii:2026.06.22.733643.

MOTIVATION: Horizontal gene transfer (HGT) shapes bacterial evolution and microbial ecosystems, yet detecting HGT within microbiomes remains a challenge due to fragmented metagenomic assemblies, reference bias, reliance on gene boundaries, and limited ability to model structural mosaicism and patterns across genomes.

METHODS: We present Kente, a novel pangenome graph-based framework designed for HGT detection that aligns metagenomic assembly contigs to a curated database of >600 genus-level bacterial pangenome graphs constructed using minigraph. Kente infers local taxonomic composition along contigs using alignment evidence and classifies candidate transfers using structured clade-transition topologies (e.g., A-B-A sandwich, open tips, and mosaic patterns). A complementary intra-genus module detects inter-species transfers within a single genus graph using segment-level clade annotations.

RESULTS: Across simulated intra- and inter-genus transfer scenarios, Kente achieves higher precision and comparable recall relative to existing gene-centric microbiome HGT detection approaches while reducing false positives from fragmented assemblies. Application to real human gut metagenomes (HMP2, n = 26) demonstrates Kente's ability to detect candidate cross-lineage transfer regions in complex microbial communities. Runtime profiling shows near-linear scaling with input size, enabling efficient analysis of large metagenomic assemblies.

https://github.com/treangenlab/Kente.

Additional Links: PMID-42395547

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42395547,
year = {2026},
author = {Kokroko, N and Jayanti, R and Sapoval, N and Nute, MG and Nakhleh, L and Treangen, TJ},
title = {Kente: A Graph-based Pangenomic Approach for Horizontal Gene Transfer Detection in Microbiomes.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.64898/2026.06.22.733643},
pmid = {42395547},
issn = {2692-8205},
abstract = {MOTIVATION: Horizontal gene transfer (HGT) shapes bacterial evolution and microbial ecosystems, yet detecting HGT within microbiomes remains a challenge due to fragmented metagenomic assemblies, reference bias, reliance on gene boundaries, and limited ability to model structural mosaicism and patterns across genomes.

METHODS: We present Kente, a novel pangenome graph-based framework designed for HGT detection that aligns metagenomic assembly contigs to a curated database of >600 genus-level bacterial pangenome graphs constructed using minigraph. Kente infers local taxonomic composition along contigs using alignment evidence and classifies candidate transfers using structured clade-transition topologies (e.g., A-B-A sandwich, open tips, and mosaic patterns). A complementary intra-genus module detects inter-species transfers within a single genus graph using segment-level clade annotations.

RESULTS: Across simulated intra- and inter-genus transfer scenarios, Kente achieves higher precision and comparable recall relative to existing gene-centric microbiome HGT detection approaches while reducing false positives from fragmented assemblies. Application to real human gut metagenomes (HMP2, n = 26) demonstrates Kente's ability to detect candidate cross-lineage transfer regions in complex microbial communities. Runtime profiling shows near-linear scaling with input size, enabling efficient analysis of large metagenomic assemblies.

https://github.com/treangenlab/Kente.},
}

RevDate: 2026-07-01
CmpDate: 2026-07-01

Ardalani O, Phaneuf PV, Krishnan KJ, et al (2026)

Annotating the pangenome reveals the diversity in the genetic basis for metabolic enzymes.

Science advances, 12(27):eaeb3363.

Affordable sequencing has flooded public databases with bacterial genomes; yet, species-scale maps that connect gene content variation to metabolic functions essential to biotechnology/system biology remain scarce. We address this gap by building a pangenome-wide gene-protein-reaction association and applying it to 2377 Escherichia coli genomes to reconstruct a pangenome-scale metabolic model (panGEM). We validate panGEM against Biolog carbon source utilization assays, achieving ≈0.99 precision in growth/no-growth predictions. Using panGEM, we identify >11,000 rare metabolic genes, yet only 35 metabolic reactions are rare. To explain the mismatch, we examined rare genes and found that most are pseudogenes or diverged orthologs acquired by horizontal gene transfer (HGT). Results indicate a recurrent loss-reacquisition cycle in which a core allele is lost/pseudogenized and its function is restored by HGT, preserving function without expanding the reactome, generating genetic heterogeneity in a small subset (~3.6%) of reactions, marking selection pressure hotspots of metabolism. Thus, pangenome annotation reveals the evolutionary dynamics that shape the genetic basis of metabolism.

Additional Links: PMID-42384800

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42384800,
year = {2026},
author = {Ardalani, O and Phaneuf, PV and Krishnan, KJ and Pride, D and Nielsen, LK and Palsson, BO},
title = {Annotating the pangenome reveals the diversity in the genetic basis for metabolic enzymes.},
journal = {Science advances},
volume = {12},
number = {27},
pages = {eaeb3363},
pmid = {42384800},
issn = {2375-2548},
mesh = {*Escherichia coli/genetics/metabolism/enzymology ; *Genome, Bacterial ; *Genetic Variation ; *Metabolic Networks and Pathways/genetics ; *Molecular Sequence Annotation ; Gene Transfer, Horizontal ; Evolution, Molecular ; *Enzymes/genetics/metabolism ; },
abstract = {Affordable sequencing has flooded public databases with bacterial genomes; yet, species-scale maps that connect gene content variation to metabolic functions essential to biotechnology/system biology remain scarce. We address this gap by building a pangenome-wide gene-protein-reaction association and applying it to 2377 Escherichia coli genomes to reconstruct a pangenome-scale metabolic model (panGEM). We validate panGEM against Biolog carbon source utilization assays, achieving ≈0.99 precision in growth/no-growth predictions. Using panGEM, we identify >11,000 rare metabolic genes, yet only 35 metabolic reactions are rare. To explain the mismatch, we examined rare genes and found that most are pseudogenes or diverged orthologs acquired by horizontal gene transfer (HGT). Results indicate a recurrent loss-reacquisition cycle in which a core allele is lost/pseudogenized and its function is restored by HGT, preserving function without expanding the reactome, generating genetic heterogeneity in a small subset (~3.6%) of reactions, marking selection pressure hotspots of metabolism. Thus, pangenome annotation reveals the evolutionary dynamics that shape the genetic basis of metabolism.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Escherichia coli/genetics/metabolism/enzymology
*Genome, Bacterial
*Genetic Variation
*Metabolic Networks and Pathways/genetics
*Molecular Sequence Annotation
Gene Transfer, Horizontal
Evolution, Molecular
*Enzymes/genetics/metabolism

RevDate: 2026-07-02
CmpDate: 2026-07-02

Belhadj MSE, Messaoudi O, Yousfi M, et al (2026)

Comparative genome mining and metabolomics reveal divergent NRPS-derived fengycin biosynthetic gene clusters in Bacillus halotolerans.

FEMS microbes, 7:xtag037.

Microbial multidrug resistance is a major public health concern, underscoring the urgent need for new antimicrobial natural products. In this study, strain F11, identified as Bacillus halotolerans, was selected based on its strong antimicrobial activity and taxonomic identification. Whole-genome sequencing revealed a single circular chromosome of 4.15 Mb with a GC content of 43.82%, encoding 4122 predicted proteins. Pangenome analysis identified 17 unique genes. Genome mining predicted 10 biosynthetic gene clusters (BGCs), including a complete fengycin cluster. Comparative analyses using BiG-SCAPE/CORASON and clinker revealed evolutionary divergence within the fengycin BGCs, including those identified in B. halotolerans F11 and B. halotolerans HMB20199. This divergence was further supported by NRPS substrate specificity predictions, which revealed two amino acid variations at positions 6 and 8 in the predicted fengycin decapeptide of strain B. halotolerans F11 compared to the canonical sequence. In contrast, B. halotolerans HMB20199 exhibited a mosaic fengycin-iturin hybrid organization, characterized by an extended NRPS assembly line comprising 19 modules. Furthermore, untargeted metabolomic profiling of B. halotolerans F11 detected 9719 metabolites, of which 3453 were successfully annotated. Integration of genomic and metabolomic datasets enabled the correlation of two compounds-bacillaene and bacillibactin-with their corresponding BGCs. However, the lack of detection of fengycin, surfactin, and subtilosin A was attributable to methodological constraints. Collectively, these findings expand our understanding of B. halotolerans strains as promising genomic reservoirs of novel NRPS-derived lipopeptides and highlight Algerian Sahara soils as a valuable source of antimicrobial natural products.

Additional Links: PMID-42389721

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42389721,
year = {2026},
author = {Belhadj, MSE and Messaoudi, O and Yousfi, M and Bakhrouf, A},
title = {Comparative genome mining and metabolomics reveal divergent NRPS-derived fengycin biosynthetic gene clusters in Bacillus halotolerans.},
journal = {FEMS microbes},
volume = {7},
number = {},
pages = {xtag037},
pmid = {42389721},
issn = {2633-6685},
abstract = {Microbial multidrug resistance is a major public health concern, underscoring the urgent need for new antimicrobial natural products. In this study, strain F11, identified as Bacillus halotolerans, was selected based on its strong antimicrobial activity and taxonomic identification. Whole-genome sequencing revealed a single circular chromosome of 4.15 Mb with a GC content of 43.82%, encoding 4122 predicted proteins. Pangenome analysis identified 17 unique genes. Genome mining predicted 10 biosynthetic gene clusters (BGCs), including a complete fengycin cluster. Comparative analyses using BiG-SCAPE/CORASON and clinker revealed evolutionary divergence within the fengycin BGCs, including those identified in B. halotolerans F11 and B. halotolerans HMB20199. This divergence was further supported by NRPS substrate specificity predictions, which revealed two amino acid variations at positions 6 and 8 in the predicted fengycin decapeptide of strain B. halotolerans F11 compared to the canonical sequence. In contrast, B. halotolerans HMB20199 exhibited a mosaic fengycin-iturin hybrid organization, characterized by an extended NRPS assembly line comprising 19 modules. Furthermore, untargeted metabolomic profiling of B. halotolerans F11 detected 9719 metabolites, of which 3453 were successfully annotated. Integration of genomic and metabolomic datasets enabled the correlation of two compounds-bacillaene and bacillibactin-with their corresponding BGCs. However, the lack of detection of fengycin, surfactin, and subtilosin A was attributable to methodological constraints. Collectively, these findings expand our understanding of B. halotolerans strains as promising genomic reservoirs of novel NRPS-derived lipopeptides and highlight Algerian Sahara soils as a valuable source of antimicrobial natural products.},
}

RevDate: 2026-07-02

Parkin IAP, AG Sharpe (2026)

Building up pangenome analysis block by block.

Nature genetics [Epub ahead of print].

Additional Links: PMID-42393219

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42393219,
year = {2026},
author = {Parkin, IAP and Sharpe, AG},
title = {Building up pangenome analysis block by block.},
journal = {Nature genetics},
volume = {},
number = {},
pages = {},
pmid = {42393219},
issn = {1546-1718},
}

RevDate: 2026-07-03
CmpDate: 2026-07-03

Mohanty SK, Marin MG, Smeds L, et al (2026)

Variation and selection at predicted G-quadruplexes across the human pangenome.

bioRxiv : the preprint server for biology pii:2026.06.18.733261.

G-quadruplexes (G4s), non-canonical DNA structures whose sequence motifs occupy approximately 1% of the human genome, are important for myriad cellular functions, including regulating transcription and replication. Yet they also contribute to genomic instability by increasing mutations and structural variation. Despite their significance, G4 motifs have not been studied in detail across multiple human genomes. Here, we conducted a comprehensive analysis of presence/absence and sequence variation, measured selection strength, and evaluated gene expression regulation potential for predicted G4s (pG4s) across population groups in the second release of the Human Pangenome Reference Consortium dataset, comprising high-quality, near-telomere-to-telomere diploid genomes from 231 individuals worldwide, along with three reference assemblies. Across the human pangenome, we identified over 353 million pG4s, including 1.15 million pG4s absent from reference assemblies but shared across other haplotypes. Our analysis revealed that pG4 sharing patterns recapitulate human population structure: African individuals displayed lower levels of pG4 sharing than non-Africans, whereas East Asian individuals exhibited higher levels of sharing. By analyzing the site frequency spectrum across various genomic annotations, we computed and compared selection coefficients (S d) at pG4 vs. non-pG4 sites. As expected, the strongest purifying selection (S d ≥ 10) was detected at protein-coding exons, where pG4 sites had similar or lower selection coefficients compared with those for pG4 sites. Strikingly, this pattern reversed at regulatory regions: although purifying selection was weaker overall at promoters, introns, enhancers, and replication origins (1 ≤ S d < 10), pG4 sites at these regions experienced stronger selection than non-pG4 sites-suggesting that pG4s play functional roles outside coding sequences. Additionally, by integrating pG4 data with long-read transcriptome data profiles from this large cohort, we found that pG4s located at promoters and at (or near) exon-intron junctions may influence variation in gene expression levels and transcript isoforms, respectively, across the human pangenome individuals. Leveraging extensive population-scale data, our research illuminates the fundamental importance and functional relevance of G4s across human genomes.

Additional Links: PMID-42395539

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42395539,
year = {2026},
author = {Mohanty, SK and Marin, MG and Smeds, L and Chiaromonte, F and Huber, CD and Makova, KD and , },
title = {Variation and selection at predicted G-quadruplexes across the human pangenome.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.64898/2026.06.18.733261},
pmid = {42395539},
issn = {2692-8205},
abstract = {G-quadruplexes (G4s), non-canonical DNA structures whose sequence motifs occupy approximately 1% of the human genome, are important for myriad cellular functions, including regulating transcription and replication. Yet they also contribute to genomic instability by increasing mutations and structural variation. Despite their significance, G4 motifs have not been studied in detail across multiple human genomes. Here, we conducted a comprehensive analysis of presence/absence and sequence variation, measured selection strength, and evaluated gene expression regulation potential for predicted G4s (pG4s) across population groups in the second release of the Human Pangenome Reference Consortium dataset, comprising high-quality, near-telomere-to-telomere diploid genomes from 231 individuals worldwide, along with three reference assemblies. Across the human pangenome, we identified over 353 million pG4s, including 1.15 million pG4s absent from reference assemblies but shared across other haplotypes. Our analysis revealed that pG4 sharing patterns recapitulate human population structure: African individuals displayed lower levels of pG4 sharing than non-Africans, whereas East Asian individuals exhibited higher levels of sharing. By analyzing the site frequency spectrum across various genomic annotations, we computed and compared selection coefficients (S d) at pG4 vs. non-pG4 sites. As expected, the strongest purifying selection (S d ≥ 10) was detected at protein-coding exons, where pG4 sites had similar or lower selection coefficients compared with those for pG4 sites. Strikingly, this pattern reversed at regulatory regions: although purifying selection was weaker overall at promoters, introns, enhancers, and replication origins (1 ≤ S d < 10), pG4 sites at these regions experienced stronger selection than non-pG4 sites-suggesting that pG4s play functional roles outside coding sequences. Additionally, by integrating pG4 data with long-read transcriptome data profiles from this large cohort, we found that pG4s located at promoters and at (or near) exon-intron junctions may influence variation in gene expression levels and transcript isoforms, respectively, across the human pangenome individuals. Leveraging extensive population-scale data, our research illuminates the fundamental importance and functional relevance of G4s across human genomes.},
}

RevDate: 2026-07-01

Kurt IC, Guner H, Erdem ZA, et al (2026)

Genomic evidence of ecological flexibility and cross-niche CRISPR spacerome targeting phage-plasmid hybrids in Latilactobacillus curvatus.

BMC genomics pii:10.1186/s12864-026-13098-8 [Epub ahead of print].

BACKGROUND: Latilactobacillus curvatus is a lactic acid bacterium with a remarkable ability to persist in diverse niches, including fermented foods and gut. Despite its industrial and potential probiotic relevance, the genomic underpinnings of its cross-niche adaptability remain poorly characterized.

METHODS: We conducted a species-contextualized comparative genomic analysis of 53 L. curvatus strains from food and gut isolates. This analysis integrated pangenome structure, metabolic repertoire, CRISPR-Cas immunity profiles, and mobilome analysis. Additionally, binding mode predictions and dynamics simulations were used to evaluate the theoretical binding energies of bacteriocins to the BamA target.

RESULTS: Phylogenomics revealed a polyphyletic population structure, indicating that long-term evolution is not strictly niche-specific. In contrast, genome-wide similarity showed clustering by isolation source, highlighting horizontal gene transfer (HGT) as a plausible contributor to niche adaptation. We identified a highly active mobilome, encompassing diverse plasmids, IS elements, and multiple intact prophages, reflecting high genomic plasticity characteristic of a multihabitat lifestyle. CRISPR-Cas systems were widespread, and analysis of 2,029 spacers revealed a broad immune repertoire targeting mobile genetic elements represented in fermented food, gut, and environmental datasets. We also identified spacer matches to phage-plasmid hybrid-like elements, highlighting the diversity of mobile genetic elements associated with the L. curvatus spacerome.

CONCLUSION: Our study reveals genomic features consistent with ecological flexibility in L. curvatus, including high genomic plasticity and a broad CRISPR spacer repertoire. Rather than demonstrating strict niche-specific evolution or a causal mechanism for cross-niche persistence, these findings support the hypothesis that this species has experienced diverse interactions with mobile genetic elements across multiple ecological contexts.

Additional Links: PMID-42380749

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42380749,
year = {2026},
author = {Kurt, IC and Guner, H and Erdem, ZA and Can, O and Gumustop, I and Sirin, A and Erol, I and Kotil, ES and Ortakci, F},
title = {Genomic evidence of ecological flexibility and cross-niche CRISPR spacerome targeting phage-plasmid hybrids in Latilactobacillus curvatus.},
journal = {BMC genomics},
volume = {},
number = {},
pages = {},
doi = {10.1186/s12864-026-13098-8},
pmid = {42380749},
issn = {1471-2164},
support = {MGA-2024-45355//Bilimsel Araştırma Projeleri Birimi, İstanbul Teknik Üniversitesi/ ; },
abstract = {BACKGROUND: Latilactobacillus curvatus is a lactic acid bacterium with a remarkable ability to persist in diverse niches, including fermented foods and gut. Despite its industrial and potential probiotic relevance, the genomic underpinnings of its cross-niche adaptability remain poorly characterized.

METHODS: We conducted a species-contextualized comparative genomic analysis of 53 L. curvatus strains from food and gut isolates. This analysis integrated pangenome structure, metabolic repertoire, CRISPR-Cas immunity profiles, and mobilome analysis. Additionally, binding mode predictions and dynamics simulations were used to evaluate the theoretical binding energies of bacteriocins to the BamA target.

RESULTS: Phylogenomics revealed a polyphyletic population structure, indicating that long-term evolution is not strictly niche-specific. In contrast, genome-wide similarity showed clustering by isolation source, highlighting horizontal gene transfer (HGT) as a plausible contributor to niche adaptation. We identified a highly active mobilome, encompassing diverse plasmids, IS elements, and multiple intact prophages, reflecting high genomic plasticity characteristic of a multihabitat lifestyle. CRISPR-Cas systems were widespread, and analysis of 2,029 spacers revealed a broad immune repertoire targeting mobile genetic elements represented in fermented food, gut, and environmental datasets. We also identified spacer matches to phage-plasmid hybrid-like elements, highlighting the diversity of mobile genetic elements associated with the L. curvatus spacerome.

CONCLUSION: Our study reveals genomic features consistent with ecological flexibility in L. curvatus, including high genomic plasticity and a broad CRISPR spacer repertoire. Rather than demonstrating strict niche-specific evolution or a causal mechanism for cross-niche persistence, these findings support the hypothesis that this species has experienced diverse interactions with mobile genetic elements across multiple ecological contexts.},
}

RevDate: 2026-07-01

Hanaya Alsuwaidi A, Mousa M, Olbrich M, et al (2026)

National genomic projects in Asia and Africa: a review.

Human genomics pii:10.1186/s40246-026-01007-9 [Epub ahead of print].

National genome projects (NGPs) are increasingly shaping precision medicine by improving representation of population-specific genetic diversity. This review compiles findings from NGPs across Asia and Africa, regions that remain underrepresented in global genomic databases despite their extensive demographic and genetic diversity. A total of 53 studies from 24 countries were identified to understand (1) the genomic approach utilized, (2) novel findings that have emerged, and (3) strategies for improving research in these regions. The NGPs implement population-based variome databases (20 NGPs), linear reference genome assemblies (8 NGPs), and graph-based pangenome assemblies (1 NGP). Novel variants ranged between 0.28% (China) and 19.6% (Iran), whereas rare variants accounted for up to 88.9% of the detected variants in the Chinese population. Each NGP documents its country's evolutionary and migration history, which impacts disease frequency and pharmacogenomic variants. Clinically, NGPs revealed strong population stratification in disease-associated and pharmacogenomic variants. For example, the GJB2 rs72474224 hearing-loss variant ranged from 13% in Vietnam and 12% in Hong Kong to 0.0894% in Turkey, while the VKORC1 rs9923231 pharmacogenomic variant reached 89.2% in Taiwan but was 20%-25% in European-related Russian subpopulations. These findings demonstrate that clinically relevant allele frequencies, pathogenicity assessments, and drug-response markers differ substantially across ancestries. This review highlights ongoing efforts and strategies to enhance the representativeness of genomic data through NGPs in Asia and Africa. We also suggest future directions for national projects, including integrating family-based studies, multi-omic data, and standardized pipelines to accelerate discovery and support the equitable implementation of precision medicine.

Additional Links: PMID-42381083

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42381083,
year = {2026},
author = {Hanaya Alsuwaidi, A and Mousa, M and Olbrich, M and Marzouka, NAD and Wohlers, I and Ibrahim, S and Alsafar, H},
title = {National genomic projects in Asia and Africa: a review.},
journal = {Human genomics},
volume = {},
number = {},
pages = {},
doi = {10.1186/s40246-026-01007-9},
pmid = {42381083},
issn = {1479-7364},
support = {8434000474//Abu Dhabi Executive Office/ ; },
abstract = {National genome projects (NGPs) are increasingly shaping precision medicine by improving representation of population-specific genetic diversity. This review compiles findings from NGPs across Asia and Africa, regions that remain underrepresented in global genomic databases despite their extensive demographic and genetic diversity. A total of 53 studies from 24 countries were identified to understand (1) the genomic approach utilized, (2) novel findings that have emerged, and (3) strategies for improving research in these regions. The NGPs implement population-based variome databases (20 NGPs), linear reference genome assemblies (8 NGPs), and graph-based pangenome assemblies (1 NGP). Novel variants ranged between 0.28% (China) and 19.6% (Iran), whereas rare variants accounted for up to 88.9% of the detected variants in the Chinese population. Each NGP documents its country's evolutionary and migration history, which impacts disease frequency and pharmacogenomic variants. Clinically, NGPs revealed strong population stratification in disease-associated and pharmacogenomic variants. For example, the GJB2 rs72474224 hearing-loss variant ranged from 13% in Vietnam and 12% in Hong Kong to 0.0894% in Turkey, while the VKORC1 rs9923231 pharmacogenomic variant reached 89.2% in Taiwan but was 20%-25% in European-related Russian subpopulations. These findings demonstrate that clinically relevant allele frequencies, pathogenicity assessments, and drug-response markers differ substantially across ancestries. This review highlights ongoing efforts and strategies to enhance the representativeness of genomic data through NGPs in Asia and Africa. We also suggest future directions for national projects, including integrating family-based studies, multi-omic data, and standardized pipelines to accelerate discovery and support the equitable implementation of precision medicine.},
}

RevDate: 2026-06-30
CmpDate: 2026-06-30

Wu H, Lei Z, Chen S, et al (2026)

Genomic landscape and phylogenetic insights of Burkholderia pseudomallei over two decades in southern China and its global surveillance.

Emerging microbes & infections, 15(1):2691358.

Melioidosis, caused by Burkholderia pseudomallei, is an endemic infectious disease with high mortality in tropical and subtropical regions. Large-scale epidemiological data remain insufficient in China, while comprehensive data integrating genomic epidemiology are rare worldwide. Herein, performed a retrospective analysis of 554 culture-confirmed melioidosis cases in southern China from 2003 to 2022. Genomic characteristics and their relationship with antimicrobial susceptibility and clinical characteristics were analyzed via whole genome sequencing. Core-genome SNP phylogenies were constructed from recombination-masked alignments and compared them with 3,573 publicly available global B. pseudomallei genomes to define their population structure and phylogeographic patterns. Melioidosis predominantly affected male patients (86.8%, 481/554) and those aged 45 -64 years (57.7%). Bacteremia (OR=5.91, p<0.001), diabetes mellitus (OR=2.27, p=0.008), and pulmonary infection (OR=2.26, p=0.005) were identified as risk factors for mortality. Antimicrobial susceptibility testing showed B. pseudomallei exhibited high in vitro susceptibility to imipenem (100%) and ceftazidime (99.6%). Pan-genome analysis confirmed chromosomal functional compartmentalization of the bipartite genome, and genome-wide association study identified high-confidence genetic markers (OR >3 or <0.33) significantly associated with mortality and bacteremia. Furthermore, global phylogenomic analysis identified 10 evolutionary clusters; Chinese isolates were significantly enriched in Cluster 1, a clade shared with Thai strains, and were phylogenetically distinct from Cluster 5, as predominantly composed of Australian isolates. In summary, this large-scale genomic and clinical analysis provides the most comprehensive overview of melioidosis in southern China. The genomic analysis highlighted substantial regional and global genetic diversity, and phylogeographic structuring of B. pseudomallei, underscoring the importance of continued genomic surveillance.

Additional Links: PMID-42377320

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42377320,
year = {2026},
author = {Wu, H and Lei, Z and Chen, S and Wang, X and Huang, H and Xiang, D and Tan, W and Chen, J and Chen, C and Qin, M and Wen, Q and Lu, B},
title = {Genomic landscape and phylogenetic insights of Burkholderia pseudomallei over two decades in southern China and its global surveillance.},
journal = {Emerging microbes & infections},
volume = {15},
number = {1},
pages = {2691358},
doi = {10.1080/22221751.2026.2691358},
pmid = {42377320},
issn = {2222-1751},
mesh = {*Burkholderia pseudomallei/genetics/classification/drug effects/isolation & purification ; *Melioidosis/epidemiology/microbiology ; Humans ; *Phylogeny ; China/epidemiology ; Male ; *Genome, Bacterial ; Anti-Bacterial Agents/pharmacology ; Middle Aged ; Whole Genome Sequencing ; Female ; Retrospective Studies ; Polymorphism, Single Nucleotide ; Microbial Sensitivity Tests ; Phylogeography ; Genomics ; Aged ; },
abstract = {Melioidosis, caused by Burkholderia pseudomallei, is an endemic infectious disease with high mortality in tropical and subtropical regions. Large-scale epidemiological data remain insufficient in China, while comprehensive data integrating genomic epidemiology are rare worldwide. Herein, performed a retrospective analysis of 554 culture-confirmed melioidosis cases in southern China from 2003 to 2022. Genomic characteristics and their relationship with antimicrobial susceptibility and clinical characteristics were analyzed via whole genome sequencing. Core-genome SNP phylogenies were constructed from recombination-masked alignments and compared them with 3,573 publicly available global B. pseudomallei genomes to define their population structure and phylogeographic patterns. Melioidosis predominantly affected male patients (86.8%, 481/554) and those aged 45 -64 years (57.7%). Bacteremia (OR=5.91, p<0.001), diabetes mellitus (OR=2.27, p=0.008), and pulmonary infection (OR=2.26, p=0.005) were identified as risk factors for mortality. Antimicrobial susceptibility testing showed B. pseudomallei exhibited high in vitro susceptibility to imipenem (100%) and ceftazidime (99.6%). Pan-genome analysis confirmed chromosomal functional compartmentalization of the bipartite genome, and genome-wide association study identified high-confidence genetic markers (OR >3 or <0.33) significantly associated with mortality and bacteremia. Furthermore, global phylogenomic analysis identified 10 evolutionary clusters; Chinese isolates were significantly enriched in Cluster 1, a clade shared with Thai strains, and were phylogenetically distinct from Cluster 5, as predominantly composed of Australian isolates. In summary, this large-scale genomic and clinical analysis provides the most comprehensive overview of melioidosis in southern China. The genomic analysis highlighted substantial regional and global genetic diversity, and phylogeographic structuring of B. pseudomallei, underscoring the importance of continued genomic surveillance.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Burkholderia pseudomallei/genetics/classification/drug effects/isolation & purification
*Melioidosis/epidemiology/microbiology
Humans
*Phylogeny
China/epidemiology
Male
*Genome, Bacterial
Anti-Bacterial Agents/pharmacology
Middle Aged
Whole Genome Sequencing
Female
Retrospective Studies
Polymorphism, Single Nucleotide
Microbial Sensitivity Tests
Phylogeography
Genomics
Aged

RevDate: 2026-06-30

Bandsode V, Qumar S, Singh A, et al (2026)

Global genomic surveillance of Salmonella in the environment: assessing virulence and antimicrobial resistance at scale.

mBio [Epub ahead of print].

Salmonella is a globally distributed zoonotic pathogen with widespread environmental persistence; however, genomic characterization of environmental isolates from underrepresented regions remains limited. Current global data sets are predominantly populated with genomes from high-income countries, restricting our ability to resolve evolutionary trajectories, ecological adaptations, and emerging antimicrobial resistance (AMR). We performed a comparative genomic analysis of 1,399 high-quality Salmonella genomes, integrating 54 newly sequenced isolates from India (representing surface water and soil samples) with global data sets. Phenotypic analysis showed that 55.6% of the Indian isolates were multidrug-resistant, and 72.2% displayed strong biofilm-forming capacity. Integration of global genomes revealed extensive phylogenetic interspersion, reflecting widely distributed lineages shaped by shared ancestry or environmentally mixed Salmonella populations. The pangenome comprised 20,915 genes, with a 3,394 core, and a large accessory genome (>16,001 cloud genes). Serogroups B and C2-C3 dominated globally and carried the broadest AMR repertoires. While efflux-associated and regulatory resistance genes were conserved across subspecies, acquired determinants such as aminoglycoside-modifying enzymes, tet(A/B), sul genes, and rare extended-spectrum β-lactamases (ESBLs) varied by serogroup. Detection of mcr-1, mcr-5, and mcr-9 highlights early circulation of colistin resistance in environmental reservoirs. Core virulence loci (SPI-1/SPI-2) remained uniformly conserved, whereas accessory modules, including spv and pef operons, siderophore systems (iro, iuc/iut), and stress-response genes, showed serogroup-specific enrichment. Plasmidome analysis revealed marked diversity, dominated by IncF and colicinogenic plasmids, with serogroup-specific patterns, suggesting niche adaptation and horizontal gene transfer. Overall, environmental Salmonella constitute a globally connected and genetically dynamic reservoir where conserved virulence backbones coexist with rapidly evolving resistance and plasmid repertoires. These findings position environmental surveillance as a cornerstone of One Health preparedness for tackling high-risk, pathogenic lineages of Salmonella.IMPORTANCESalmonella inhabiting environmental niches, such as water and soil, remain underexplored despite their potential role in pathogen gene pool evolution and infection burden. Using a global data set that includes newly sequenced genomes of isolates from India, we show that environmental populations are active evolutionary reservoirs that maintain a conserved virulence core while rapidly exchanging antimicrobial resistance genes via horizontal gene transfer. The detection of early-stage colistin resistance and multidrug-resistant lineages in global ecosystems identify these environments as potential early-warning systems for emerging clinical threats. Our findings demonstrate that Indian environmental strains of Salmonella are deeply interconnected with global lineages, underscoring the need for global surveillance. Collectively, genomic epidemiology as described herein reinforces a One Health framework and highlights environmental surveillance as a critical requirement in the context of high-risk pathogens such as Salmonella.

Additional Links: PMID-42378013

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42378013,
year = {2026},
author = {Bandsode, V and Qumar, S and Singh, A and Das, D and Quadriya, H and Nyambero, M and Gawai, V and Mahapatra, A and Semmler, T and Rani, PS and Ahmed, N},
title = {Global genomic surveillance of Salmonella in the environment: assessing virulence and antimicrobial resistance at scale.},
journal = {mBio},
volume = {},
number = {},
pages = {e0114226},
doi = {10.1128/mbio.01142-26},
pmid = {42378013},
issn = {2150-7511},
abstract = {Salmonella is a globally distributed zoonotic pathogen with widespread environmental persistence; however, genomic characterization of environmental isolates from underrepresented regions remains limited. Current global data sets are predominantly populated with genomes from high-income countries, restricting our ability to resolve evolutionary trajectories, ecological adaptations, and emerging antimicrobial resistance (AMR). We performed a comparative genomic analysis of 1,399 high-quality Salmonella genomes, integrating 54 newly sequenced isolates from India (representing surface water and soil samples) with global data sets. Phenotypic analysis showed that 55.6% of the Indian isolates were multidrug-resistant, and 72.2% displayed strong biofilm-forming capacity. Integration of global genomes revealed extensive phylogenetic interspersion, reflecting widely distributed lineages shaped by shared ancestry or environmentally mixed Salmonella populations. The pangenome comprised 20,915 genes, with a 3,394 core, and a large accessory genome (>16,001 cloud genes). Serogroups B and C2-C3 dominated globally and carried the broadest AMR repertoires. While efflux-associated and regulatory resistance genes were conserved across subspecies, acquired determinants such as aminoglycoside-modifying enzymes, tet(A/B), sul genes, and rare extended-spectrum β-lactamases (ESBLs) varied by serogroup. Detection of mcr-1, mcr-5, and mcr-9 highlights early circulation of colistin resistance in environmental reservoirs. Core virulence loci (SPI-1/SPI-2) remained uniformly conserved, whereas accessory modules, including spv and pef operons, siderophore systems (iro, iuc/iut), and stress-response genes, showed serogroup-specific enrichment. Plasmidome analysis revealed marked diversity, dominated by IncF and colicinogenic plasmids, with serogroup-specific patterns, suggesting niche adaptation and horizontal gene transfer. Overall, environmental Salmonella constitute a globally connected and genetically dynamic reservoir where conserved virulence backbones coexist with rapidly evolving resistance and plasmid repertoires. These findings position environmental surveillance as a cornerstone of One Health preparedness for tackling high-risk, pathogenic lineages of Salmonella.IMPORTANCESalmonella inhabiting environmental niches, such as water and soil, remain underexplored despite their potential role in pathogen gene pool evolution and infection burden. Using a global data set that includes newly sequenced genomes of isolates from India, we show that environmental populations are active evolutionary reservoirs that maintain a conserved virulence core while rapidly exchanging antimicrobial resistance genes via horizontal gene transfer. The detection of early-stage colistin resistance and multidrug-resistant lineages in global ecosystems identify these environments as potential early-warning systems for emerging clinical threats. Our findings demonstrate that Indian environmental strains of Salmonella are deeply interconnected with global lineages, underscoring the need for global surveillance. Collectively, genomic epidemiology as described herein reinforces a One Health framework and highlights environmental surveillance as a critical requirement in the context of high-risk pathogens such as Salmonella.},
}

RevDate: 2026-06-30

Caballero-Villalobos J, Ryan EG, Leonard FC, et al (2026)

Staphylococcus aureus bovine-adapted lineage CC97 is associated with farms with a high within-herd prevalence of subclinical mastitis in Ireland.

Journal of dairy science pii:S0022-0302(26)03042-0 [Epub ahead of print].

Bovine intramammary infection (IMI) has a negative impact on the dairy industry through economic loss, increased antibiotic use, and reduced animal welfare. Major infectious agents include Staphylococcus aureus, non-aureus staphylococci and mammaliicocci (NASM), streptococci, and coliforms such as Escherichia coli, with the pathogen profile influenced by geography and production system. Strain variability, particularly for S. aureus, also affects the severity and epidemiology of IMI. To investigate the genetic variability of bovine S. aureus and determine if herd-level and individual cow milk quality parameters are associated with specific IMI pathogens or S. aureus genotypes, 20 Irish dairy herds were studied. These herds were classified as problem herds, i.e., high within-herd prevalence of sub-clinical mastitis based on cow level somatic cell counts (SCC) (n = 10), or control, i.e., low within-herd prevalence of sub-clinical mastitis (n = 10). Milk samples were collected from putatively infected primiparous and multiparous cows and subjected to microbiological, milk composition and milk processing analyses. The major pathogens isolated were NASM (37%), streptococci (17%) and S. aureus (13%) with no microorganism recovered from 29% of samples. The majority of S. aureus isolates belonged to Clonal Complex (CC) 97 or CC151 and S. aureus lineage was significantly associated with farm status; CC97 was 10 times more likely to be isolated from a problem than a control herd whereas CC151 was 5.6 times more likely to be recovered from a control than a problem herd. SCC was dependent on the nature of the infecting agent. For all bacterial species, positive samples had significantly higher SCC than samples that were bacteriologically negative, with milk containing S. aureus having significantly higher SCC than all other groups. Infectious agent was also associated with total solids, lactose percentage and rennet coagulation time, although there were no differences between S. aureus lineages in SCC, milk composition or processing parameters. Antimicrobial resistance was rare among the S. aureus isolates, although CC97 and CC8 isolates were resistant to penicillin and ampicillin. Comparative genomic analysis of the S. aureus isolates identified lineage-associated virulence genes. Pan-genome analysis identified a small number of CC97 and CC151 unique genes although no functional enrichment of the unique genes was identified. A number of the lineage-unique genes were associated with mobile genetic elements.

Additional Links: PMID-42379354

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42379354,
year = {2026},
author = {Caballero-Villalobos, J and Ryan, EG and Leonard, FC and Hanrahan, JP and Cormican, P and Garzón, A and Keane, OM},
title = {Staphylococcus aureus bovine-adapted lineage CC97 is associated with farms with a high within-herd prevalence of subclinical mastitis in Ireland.},
journal = {Journal of dairy science},
volume = {},
number = {},
pages = {},
doi = {10.3168/jds.2026-28353},
pmid = {42379354},
issn = {1525-3198},
abstract = {Bovine intramammary infection (IMI) has a negative impact on the dairy industry through economic loss, increased antibiotic use, and reduced animal welfare. Major infectious agents include Staphylococcus aureus, non-aureus staphylococci and mammaliicocci (NASM), streptococci, and coliforms such as Escherichia coli, with the pathogen profile influenced by geography and production system. Strain variability, particularly for S. aureus, also affects the severity and epidemiology of IMI. To investigate the genetic variability of bovine S. aureus and determine if herd-level and individual cow milk quality parameters are associated with specific IMI pathogens or S. aureus genotypes, 20 Irish dairy herds were studied. These herds were classified as problem herds, i.e., high within-herd prevalence of sub-clinical mastitis based on cow level somatic cell counts (SCC) (n = 10), or control, i.e., low within-herd prevalence of sub-clinical mastitis (n = 10). Milk samples were collected from putatively infected primiparous and multiparous cows and subjected to microbiological, milk composition and milk processing analyses. The major pathogens isolated were NASM (37%), streptococci (17%) and S. aureus (13%) with no microorganism recovered from 29% of samples. The majority of S. aureus isolates belonged to Clonal Complex (CC) 97 or CC151 and S. aureus lineage was significantly associated with farm status; CC97 was 10 times more likely to be isolated from a problem than a control herd whereas CC151 was 5.6 times more likely to be recovered from a control than a problem herd. SCC was dependent on the nature of the infecting agent. For all bacterial species, positive samples had significantly higher SCC than samples that were bacteriologically negative, with milk containing S. aureus having significantly higher SCC than all other groups. Infectious agent was also associated with total solids, lactose percentage and rennet coagulation time, although there were no differences between S. aureus lineages in SCC, milk composition or processing parameters. Antimicrobial resistance was rare among the S. aureus isolates, although CC97 and CC8 isolates were resistant to penicillin and ampicillin. Comparative genomic analysis of the S. aureus isolates identified lineage-associated virulence genes. Pan-genome analysis identified a small number of CC97 and CC151 unique genes although no functional enrichment of the unique genes was identified. A number of the lineage-unique genes were associated with mobile genetic elements.},
}

RevDate: 2026-06-29
CmpDate: 2026-06-29

Núñez-García LÁ, Feliciano-Guzmán JM, Ortíz-Álvarez J, et al (2026)

Genomic insights into the resistome, mobilome and functional adaptation of Achromobacter xylosoxidans across clinical and environmental contexts.

Microbial genomics, 12(6):.

Achromobacter xylosoxidans is an emerging opportunistic pathogen associated with a wide range of infections in humans. This species is widely distributed in the environment due to its high adaptability. Isolates of A. xylosoxidans have intrinsic resistance to several antibiotics and the potential to acquire genetic resistance determinants. Despite its growing frequency of isolation, little is known about the genomic characteristics of this pathogen. In this study, we conducted a comprehensive genomic analysis of assemblies from the NCBI RefSeq database, along with a newly sequenced respiratory isolate from a patient with cystic fibrosis. Through pangenome analysis, we identified genes and functions associated with specific isolation sources, suggesting niche-specific adaptation. Resistance-associated mutations in the AxyZ efflux pump regulator, along with bla AXC-1, were exclusively detected in genomes of clinical origin. Furthermore, while the resistome is limited, non-core antimicrobial resistance genes were detected to be primarily associated with the mobilome, underscoring the potential for horizontal gene transfer to further shape resistance in this species.

Additional Links: PMID-42371691

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42371691,
year = {2026},
author = {Núñez-García, LÁ and Feliciano-Guzmán, JM and Ortíz-Álvarez, J and Garza-González, E},
title = {Genomic insights into the resistome, mobilome and functional adaptation of Achromobacter xylosoxidans across clinical and environmental contexts.},
journal = {Microbial genomics},
volume = {12},
number = {6},
pages = {},
pmid = {42371691},
issn = {2057-5858},
mesh = {*Achromobacter denitrificans/genetics/drug effects/isolation & purification ; Humans ; Genome, Bacterial ; Anti-Bacterial Agents/pharmacology ; *Drug Resistance, Bacterial/genetics ; Genomics ; Gene Transfer, Horizontal ; Cystic Fibrosis/microbiology ; Phylogeny ; Adaptation, Physiological ; Gram-Negative Bacterial Infections/microbiology ; },
abstract = {Achromobacter xylosoxidans is an emerging opportunistic pathogen associated with a wide range of infections in humans. This species is widely distributed in the environment due to its high adaptability. Isolates of A. xylosoxidans have intrinsic resistance to several antibiotics and the potential to acquire genetic resistance determinants. Despite its growing frequency of isolation, little is known about the genomic characteristics of this pathogen. In this study, we conducted a comprehensive genomic analysis of assemblies from the NCBI RefSeq database, along with a newly sequenced respiratory isolate from a patient with cystic fibrosis. Through pangenome analysis, we identified genes and functions associated with specific isolation sources, suggesting niche-specific adaptation. Resistance-associated mutations in the AxyZ efflux pump regulator, along with bla AXC-1, were exclusively detected in genomes of clinical origin. Furthermore, while the resistome is limited, non-core antimicrobial resistance genes were detected to be primarily associated with the mobilome, underscoring the potential for horizontal gene transfer to further shape resistance in this species.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Achromobacter denitrificans/genetics/drug effects/isolation & purification
Humans
Genome, Bacterial
Anti-Bacterial Agents/pharmacology
*Drug Resistance, Bacterial/genetics
Genomics
Gene Transfer, Horizontal
Cystic Fibrosis/microbiology
Phylogeny
Adaptation, Physiological
Gram-Negative Bacterial Infections/microbiology

RevDate: 2026-06-29

Taylor N, TJ Hearn (2026)

Somatic variant-calling beyond cancer: Repurposing algorithms to map low‑allele‑fraction variants across genomics.

Mutation research. Reviews in mutation research, 798:108602 pii:S1383-5742(26)00018-9 [Epub ahead of print].

BACKGROUND: Somatic variant callers were originally developed to identify tumour-specific mutations in mixed tumour-normal samples. Increasingly, disciplines such as developmental biology, reproductive medicine, virology and mitochondrial genetics require detection of low-frequency variants from high-depth sequencing. Many laboratories therefore reuse cancer callers without clear guidance on their statistical assumptions or validation in non-cancer contexts.

METHODS: We review widely used somatic callers and post-calling classifiers, summarising their underlying models and assumptions. We then synthesise peer-reviewed case studies (2020-2025) where these tools were applied to detect low-allele-fraction variants outside cancer. For each study, we extract sample type, sequencing depth, variant-allele-fraction thresholds, caller(s) used and validation approaches. We discuss technical challenges and propose practical adaptations.

RESULTS: Cancer callers generally assume diploid genomes and moderate allele fractions, use Beta-binomial or Poisson-based models and apply filters for tumour/normal comparisons. In non-cancer applications, allele fractions frequently fall below 5%, confounded by ploidy differences and sequencing artefacts. Published studies demonstrate that somatic callers can be repurposed for detecting post-zygotic variants at 1-3% allele fraction, sperm mosaicism, mitochondrial heteroplasmy as low as 0.4-0.5%, minority viral variants around 5% and cfDNA variants at ≥ 1-5% allele fraction. Unique molecular identifiers, panel-of-normal filtering and machine-learning post-processing markedly improve specificity. Emerging long-read and deep-learning callers (e.g. DeepSomatic) promise enhanced sensitivity.

CONCLUSIONS: Somatic callers offer versatile frameworks to interrogate low-allele-fraction variants across genomics; however, careful parameter tuning and validation are essential. Future work could integrate pangenome references and federated benchmarking datasets to make somatic variant detection more robust in diverse biological settings.

Additional Links: PMID-42372426

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42372426,
year = {2026},
author = {Taylor, N and Hearn, TJ},
title = {Somatic variant-calling beyond cancer: Repurposing algorithms to map low‑allele‑fraction variants across genomics.},
journal = {Mutation research. Reviews in mutation research},
volume = {798},
number = {},
pages = {108602},
doi = {10.1016/j.mrrev.2026.108602},
pmid = {42372426},
issn = {1388-2139},
abstract = {BACKGROUND: Somatic variant callers were originally developed to identify tumour-specific mutations in mixed tumour-normal samples. Increasingly, disciplines such as developmental biology, reproductive medicine, virology and mitochondrial genetics require detection of low-frequency variants from high-depth sequencing. Many laboratories therefore reuse cancer callers without clear guidance on their statistical assumptions or validation in non-cancer contexts.

METHODS: We review widely used somatic callers and post-calling classifiers, summarising their underlying models and assumptions. We then synthesise peer-reviewed case studies (2020-2025) where these tools were applied to detect low-allele-fraction variants outside cancer. For each study, we extract sample type, sequencing depth, variant-allele-fraction thresholds, caller(s) used and validation approaches. We discuss technical challenges and propose practical adaptations.

RESULTS: Cancer callers generally assume diploid genomes and moderate allele fractions, use Beta-binomial or Poisson-based models and apply filters for tumour/normal comparisons. In non-cancer applications, allele fractions frequently fall below 5%, confounded by ploidy differences and sequencing artefacts. Published studies demonstrate that somatic callers can be repurposed for detecting post-zygotic variants at 1-3% allele fraction, sperm mosaicism, mitochondrial heteroplasmy as low as 0.4-0.5%, minority viral variants around 5% and cfDNA variants at ≥ 1-5% allele fraction. Unique molecular identifiers, panel-of-normal filtering and machine-learning post-processing markedly improve specificity. Emerging long-read and deep-learning callers (e.g. DeepSomatic) promise enhanced sensitivity.

CONCLUSIONS: Somatic callers offer versatile frameworks to interrogate low-allele-fraction variants across genomics; however, careful parameter tuning and validation are essential. Future work could integrate pangenome references and federated benchmarking datasets to make somatic variant detection more robust in diverse biological settings.},
}

RevDate: 2026-06-29

Chettri D, AK Verma (2026)

Whole-genome sequencing and comparative genomic analysis of a novel Bacillus subtilis strain YE16 isolated from yak dung.

Scientific reports pii:10.1038/s41598-026-59359-0 [Epub ahead of print].

In this study, genomic and functional characterization of the previously identified lignocellulolytic Bacillus sp. strain YE16 from an underrepresented ecological niche (yak dung) in the Sikkim Himalayas was performed-thus making it the first report of such a study of a strain from this region. Whole genome sequencing resolved the ambiguity regarding its taxonomic placement where it showed 98.63% ANI and 89.4% dDDH with the reference genome Bacillus subtilis subsp. subtilis 168, definitively assigning YE16 to B. subtilis. This was further confirmed by core-pangenome SNP based phylogenetic analysis, where the YE16 strain forms a separate branch on the phylogenetic tree, with a core-genome SNP distance of 33,807 relative to strain 168, indicating substantial strain-level genomic differentiation. YE16 encodes 43 strain specific genes with nutrient transportation, metabolism and defense mechanisms as the primary functional groups. The isolate possessed a total of 95 GHs, 83 GTs, 38 CBMs, consistent with the previously demonstrated lignocellulolytic phenotype. 18 biosynthetic gene clusters were identified with expanded lipopeptide-related NRPS clusters, of which 4 showed low similarity (< 50%). A single intact prophage related to Brevibacillus phage Osiris was also detected, augmenting its genomic variability. Preliminary biosafety assessment identified no major virulence-associated determinants or evidence of acquired multidrug resistance within the scope of the study. Based on its distinct phylogenetic position, strain-specific gene content, diverse CAZyme and Biosynthetic Gene Cluster (BGC) repertoires, and isolation from an underrepresented niche, YE16 represents a genomically distinct strain of Bacillus subtilis.

Additional Links: PMID-42373818

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42373818,
year = {2026},
author = {Chettri, D and Verma, AK},
title = {Whole-genome sequencing and comparative genomic analysis of a novel Bacillus subtilis strain YE16 isolated from yak dung.},
journal = {Scientific reports},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41598-026-59359-0},
pmid = {42373818},
issn = {2045-2322},
support = {UGC-BSR Research Startup-grant 2020//University Grants Commission/ ; },
abstract = {In this study, genomic and functional characterization of the previously identified lignocellulolytic Bacillus sp. strain YE16 from an underrepresented ecological niche (yak dung) in the Sikkim Himalayas was performed-thus making it the first report of such a study of a strain from this region. Whole genome sequencing resolved the ambiguity regarding its taxonomic placement where it showed 98.63% ANI and 89.4% dDDH with the reference genome Bacillus subtilis subsp. subtilis 168, definitively assigning YE16 to B. subtilis. This was further confirmed by core-pangenome SNP based phylogenetic analysis, where the YE16 strain forms a separate branch on the phylogenetic tree, with a core-genome SNP distance of 33,807 relative to strain 168, indicating substantial strain-level genomic differentiation. YE16 encodes 43 strain specific genes with nutrient transportation, metabolism and defense mechanisms as the primary functional groups. The isolate possessed a total of 95 GHs, 83 GTs, 38 CBMs, consistent with the previously demonstrated lignocellulolytic phenotype. 18 biosynthetic gene clusters were identified with expanded lipopeptide-related NRPS clusters, of which 4 showed low similarity (< 50%). A single intact prophage related to Brevibacillus phage Osiris was also detected, augmenting its genomic variability. Preliminary biosafety assessment identified no major virulence-associated determinants or evidence of acquired multidrug resistance within the scope of the study. Based on its distinct phylogenetic position, strain-specific gene content, diverse CAZyme and Biosynthetic Gene Cluster (BGC) repertoires, and isolation from an underrepresented niche, YE16 represents a genomically distinct strain of Bacillus subtilis.},
}

RevDate: 2026-06-27

Khan N, Faryal R, Khan MH, et al (2026)

Pseudomonas aeruginosa ST4936 with extensive antibacterial drug resistance and virulence determinants from ureteral stent biofilms: whole-genome insights.

Naunyn-Schmiedeberg's archives of pharmacology [Epub ahead of print].

Newly emerging Pseudomonas aeruginosa sequence types carrying diverse antibacterial resistance and virulence determinants pose serious clinical concerns. This study aimed to characterize the genomic features of P. aeruginosa isolates recovered from ureteral stent biofilms, focusing on antibacterial drug resistance genes, virulence determinants, and mobile genetic elements. P. aeruginosa isolates were confirmed phenotypically and underwent whole-genome sequencing. Multilocus sequence typing and phylogenomic, pangenomic analysis, and comprehensive bioinformatics tools were used to investigate the genomic determinants. All isolates belonged to ST4936 and clustered together phylogenetically, while exhibiting notable genomic variations. Resistome analysis identified acquired resistance genes, including blaIMP-34, blaOXA-10, aac(6')-Ib, aadA6, rmtF, sul1, dfrA15, and crpP, in addition to several intrinsic resistance genes. A total of 207 virulence-associated genes were detected, including type III secretion system (exoU, exoY, and exoT), alginate (algU, algW, mucA, mucD), flagellar genes (fliC, fliD, fleI, fleP), quorum sensing (lasI, lasR, rhlR), pyoverdine (pvdA, pvdE, pvdD, pvdS) and type IV pili (pilA, pilB). Four insertion sequences (ISPa6, ISPa7, ISPa32, and ISPa26) and multiple prophages were identified. Pangenomic analysis revealed extensive accessory gene content and genome rearrangements. To the best of our knowledge, this is the first genomic characterization of ST4936 P. aeruginosa isolated from ureteral stent biofilms in Pakistan. The combination of extensive antibacterial resistance determinants, numerous virulence factors, and mobile genetic elements suggests that ST4936 might have the potential to persist in ureteral stent-associated infections and limit antibacterial treatment options. However, multicenter surveillance studies are required to determine its epidemiological significance in Pakistan.

Additional Links: PMID-42365120

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42365120,
year = {2026},
author = {Khan, N and Faryal, R and Khan, MH and Ali, L},
title = {Pseudomonas aeruginosa ST4936 with extensive antibacterial drug resistance and virulence determinants from ureteral stent biofilms: whole-genome insights.},
journal = {Naunyn-Schmiedeberg's archives of pharmacology},
volume = {},
number = {},
pages = {},
pmid = {42365120},
issn = {1432-1912},
abstract = {Newly emerging Pseudomonas aeruginosa sequence types carrying diverse antibacterial resistance and virulence determinants pose serious clinical concerns. This study aimed to characterize the genomic features of P. aeruginosa isolates recovered from ureteral stent biofilms, focusing on antibacterial drug resistance genes, virulence determinants, and mobile genetic elements. P. aeruginosa isolates were confirmed phenotypically and underwent whole-genome sequencing. Multilocus sequence typing and phylogenomic, pangenomic analysis, and comprehensive bioinformatics tools were used to investigate the genomic determinants. All isolates belonged to ST4936 and clustered together phylogenetically, while exhibiting notable genomic variations. Resistome analysis identified acquired resistance genes, including blaIMP-34, blaOXA-10, aac(6')-Ib, aadA6, rmtF, sul1, dfrA15, and crpP, in addition to several intrinsic resistance genes. A total of 207 virulence-associated genes were detected, including type III secretion system (exoU, exoY, and exoT), alginate (algU, algW, mucA, mucD), flagellar genes (fliC, fliD, fleI, fleP), quorum sensing (lasI, lasR, rhlR), pyoverdine (pvdA, pvdE, pvdD, pvdS) and type IV pili (pilA, pilB). Four insertion sequences (ISPa6, ISPa7, ISPa32, and ISPa26) and multiple prophages were identified. Pangenomic analysis revealed extensive accessory gene content and genome rearrangements. To the best of our knowledge, this is the first genomic characterization of ST4936 P. aeruginosa isolated from ureteral stent biofilms in Pakistan. The combination of extensive antibacterial resistance determinants, numerous virulence factors, and mobile genetic elements suggests that ST4936 might have the potential to persist in ureteral stent-associated infections and limit antibacterial treatment options. However, multicenter surveillance studies are required to determine its epidemiological significance in Pakistan.},
}

RevDate: 2026-06-28

Farishta S, Hanif S, Faryal R, et al (2026)

Pangenome analysis of salmonella Paratyphi a reveals genetic diversity, antimicrobial resistance determinants, and public health implications.

Scientific reports pii:10.1038/s41598-026-58971-4 [Epub ahead of print].

Salmonella Paratyphi A (SPA) causing paratyphoid fever, a significant health concern in South Asia, particularly in Pakistan. This research aimed to explore the antibiotic resistance pattern, genetic diversity, and the evolutionary dynamics of SPA isolated from suspected paratyphoid patients in Pakistan. Whole-genome sequencing (WGS) of (n = 10) isolates predicted predominantly serotype O-2, H1: a, H2:1,5. Sequence type (ST85) was detected, alongside three STs (ST21eb, ST6d3b, ST95c4) and eight pathogenicity islands. The study reported extensively drug resistant (XDR) isolates (SPA 2,14,27,79) as per the AMR genes detected in IncY and IncQ1 plasmids (blaTEM-1, blaCTX-M-15, sul1, sul2, dfrA7, catA1, qnrS1) along with multiple resistance associated mutations in gyrA (S83F, E133G), gyrB (T14M), ParC (T57S) and AcrB (L40P) genes. These genomic results were co-related with phenotypic resistance exhibited by XDR Paratyphi A isolates against different class of antibiotics. The Paratyphi A strains (SPA 1,2,14,27 and 79) harbored highest number of unique genes determined by pangenome analysis. Interestingly these strains were highly virulent and exhibited XDR profile which indicated significant resistance and virulence genes transfer through horizontal gene transfer mechanism. The phylogenetic Tree constructed by maximum likelihood method showed that eight of the ten SPA isolates of the study belonged to genotype 2.3 as they formed a tight cluster with reference strain (AKU_12601). The present study represents a well-characterized genomic profiling of Salmonella Paratyphi A isolates from Pakistan. The detection of XDR alarms the situation in the country as no XDR reported yet in Paratyphi A. Unavailability of vaccines for Paratyphi A strains further warns of limited treatment and prevention strategies thus possess serious public health threat. The findings emphasize the need for urgent action by public health authorities to mitigate the potential emerging XDR Salmonella Paratyphi A and prevent its future outbreaks in Pakistan.

Additional Links: PMID-42366200

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42366200,
year = {2026},
author = {Farishta, S and Hanif, S and Faryal, R and Ali, M and Uppal, R and Khan, AA and Ali, Z and Salman, M and Ahmed, M and Khokhar, F and Holmes, M and Ahmed, IE and Tasqeeruddin, S and Khan, A},
title = {Pangenome analysis of salmonella Paratyphi a reveals genetic diversity, antimicrobial resistance determinants, and public health implications.},
journal = {Scientific reports},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41598-026-58971-4},
pmid = {42366200},
issn = {2045-2322},
abstract = {Salmonella Paratyphi A (SPA) causing paratyphoid fever, a significant health concern in South Asia, particularly in Pakistan. This research aimed to explore the antibiotic resistance pattern, genetic diversity, and the evolutionary dynamics of SPA isolated from suspected paratyphoid patients in Pakistan. Whole-genome sequencing (WGS) of (n = 10) isolates predicted predominantly serotype O-2, H1: a, H2:1,5. Sequence type (ST85) was detected, alongside three STs (ST21eb, ST6d3b, ST95c4) and eight pathogenicity islands. The study reported extensively drug resistant (XDR) isolates (SPA 2,14,27,79) as per the AMR genes detected in IncY and IncQ1 plasmids (blaTEM-1, blaCTX-M-15, sul1, sul2, dfrA7, catA1, qnrS1) along with multiple resistance associated mutations in gyrA (S83F, E133G), gyrB (T14M), ParC (T57S) and AcrB (L40P) genes. These genomic results were co-related with phenotypic resistance exhibited by XDR Paratyphi A isolates against different class of antibiotics. The Paratyphi A strains (SPA 1,2,14,27 and 79) harbored highest number of unique genes determined by pangenome analysis. Interestingly these strains were highly virulent and exhibited XDR profile which indicated significant resistance and virulence genes transfer through horizontal gene transfer mechanism. The phylogenetic Tree constructed by maximum likelihood method showed that eight of the ten SPA isolates of the study belonged to genotype 2.3 as they formed a tight cluster with reference strain (AKU_12601). The present study represents a well-characterized genomic profiling of Salmonella Paratyphi A isolates from Pakistan. The detection of XDR alarms the situation in the country as no XDR reported yet in Paratyphi A. Unavailability of vaccines for Paratyphi A strains further warns of limited treatment and prevention strategies thus possess serious public health threat. The findings emphasize the need for urgent action by public health authorities to mitigate the potential emerging XDR Salmonella Paratyphi A and prevent its future outbreaks in Pakistan.},
}

RevDate: 2026-06-28
CmpDate: 2026-06-28

Hassan F, Türkyılmaz S, TS El-Mahdy (2026)

Genomic analysis identifies candidate SOS-associated and EPS/envelope-remodeling islands in a drinking-water Stenotrophomonas maltophilia complex isolate.

Scientific reports, 16(1):.

Drinking-water distribution systems (DWDS) impose ecological pressures shaped by oligotrophy, surface attachment, hydraulic fluctuation, and residual disinfectants. These conditions may favor stress-tolerant and biofilm-capable microorganisms, including members of the Stenotrophomonas maltophilia complex. Although this complex is clinically relevant because of intrinsic multidrug resistance and opportunistic pathogenic potential, the strain-level genomic features that may support persistence in disinfectant-managed water systems remain incompletely characterized. We applied an integrated genome-resolved framework to Stenotrophomonas sp. NG-SM01, a drinking-water isolate assigned to the S. maltophilia complex genomospecies Sgn4. Hybrid sequencing was used to generate a closed genome assembly, followed by phylogenomics and MLST for taxonomic placement, pangenome analysis for gene-content context, and genomic-island mapping to identify candidate accessory regions. Functional profiling included BacMet screening for metal/biocide tolerance-associated loci and genome-scale metabolic reconstruction using gapseq. Phenotypic assays assessed crystal-violet biofilm biomass and acute hydrogen peroxide (H2O2) survival as a general oxidative-stress proxy. NG-SM01 clustered within the Sgn4 genomospecies of the S. maltophilia complex and formed a strongly supported sister lineage to its closest available reference genome, GCF_025642255.1 (UFboot = 100). MLST confirmed a newly curated guaA allele (guaA-909), and the allelic profile was assigned ST1409. Disk diffusion showed limited inhibition by several antimicrobial agents; however, categorical interpretation using S. maltophilia-specific CLSI criteria was applied only to levofloxacin and trimethoprim-sulfamethoxazole, both of which were susceptible. Genomic-island analysis highlighted two candidate loci potentially relevant to stress adaptation: GI_6, carrying an EPS/envelope-remodeling cassette including algL and a GT26-family glycosyltransferase within a panel-restricted architecture, and GI_4, an IME-associated region located near SOS-response genes including recA, recX, and lexA. NG-SM01 formed reproducible moderate biofilm biomass under static microtiter conditions and showed plateau-like survival dynamics during acute H2O2 exposure. BacMet screening prioritized Tier-1 metal/biocide tolerance-associated signals, including CzcR-like and AdeL-like regulators and a YfeB-like transport component, while gapseq reconstruction predicted broad transport capacity, including 1,920 transporter entries, of which 23% were metal-related. This single-isolate study identifies genomic and phenotypic features potentially relevant to persistence in a drinking-water S. maltophilia complex Sgn4 isolate. The combined evidence supports a hypothesis-generating "Switch-Shield" framework, in which stress-response regulation and mobilome-associated plasticity may represent a candidate "switch," while EPS/envelope remodeling may represent a candidate protective "shield." However, direct causality between GI_4/GI_6 and the observed phenotypes was not demonstrated. Future validation using transcriptomics, mutant-based assays, and DWDS-mimetic free-chlorine or chloramine exposure models, including flow/pipe reactors with detachment and shedding measurements, will be required to test this model.

Additional Links: PMID-42366209

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42366209,
year = {2026},
author = {Hassan, F and Türkyılmaz, S and El-Mahdy, TS},
title = {Genomic analysis identifies candidate SOS-associated and EPS/envelope-remodeling islands in a drinking-water Stenotrophomonas maltophilia complex isolate.},
journal = {Scientific reports},
volume = {16},
number = {1},
pages = {},
pmid = {42366209},
issn = {2045-2322},
mesh = {*Stenotrophomonas maltophilia/genetics/isolation & purification/drug effects/classification ; *Drinking Water/microbiology ; *Genomic Islands ; Biofilms/growth & development/drug effects ; Genome, Bacterial ; *SOS Response, Genetics/genetics ; *Extracellular Polymeric Substance Matrix/genetics/metabolism ; Genomics/methods ; Phylogeny ; },
abstract = {Drinking-water distribution systems (DWDS) impose ecological pressures shaped by oligotrophy, surface attachment, hydraulic fluctuation, and residual disinfectants. These conditions may favor stress-tolerant and biofilm-capable microorganisms, including members of the Stenotrophomonas maltophilia complex. Although this complex is clinically relevant because of intrinsic multidrug resistance and opportunistic pathogenic potential, the strain-level genomic features that may support persistence in disinfectant-managed water systems remain incompletely characterized. We applied an integrated genome-resolved framework to Stenotrophomonas sp. NG-SM01, a drinking-water isolate assigned to the S. maltophilia complex genomospecies Sgn4. Hybrid sequencing was used to generate a closed genome assembly, followed by phylogenomics and MLST for taxonomic placement, pangenome analysis for gene-content context, and genomic-island mapping to identify candidate accessory regions. Functional profiling included BacMet screening for metal/biocide tolerance-associated loci and genome-scale metabolic reconstruction using gapseq. Phenotypic assays assessed crystal-violet biofilm biomass and acute hydrogen peroxide (H2O2) survival as a general oxidative-stress proxy. NG-SM01 clustered within the Sgn4 genomospecies of the S. maltophilia complex and formed a strongly supported sister lineage to its closest available reference genome, GCF_025642255.1 (UFboot = 100). MLST confirmed a newly curated guaA allele (guaA-909), and the allelic profile was assigned ST1409. Disk diffusion showed limited inhibition by several antimicrobial agents; however, categorical interpretation using S. maltophilia-specific CLSI criteria was applied only to levofloxacin and trimethoprim-sulfamethoxazole, both of which were susceptible. Genomic-island analysis highlighted two candidate loci potentially relevant to stress adaptation: GI_6, carrying an EPS/envelope-remodeling cassette including algL and a GT26-family glycosyltransferase within a panel-restricted architecture, and GI_4, an IME-associated region located near SOS-response genes including recA, recX, and lexA. NG-SM01 formed reproducible moderate biofilm biomass under static microtiter conditions and showed plateau-like survival dynamics during acute H2O2 exposure. BacMet screening prioritized Tier-1 metal/biocide tolerance-associated signals, including CzcR-like and AdeL-like regulators and a YfeB-like transport component, while gapseq reconstruction predicted broad transport capacity, including 1,920 transporter entries, of which 23% were metal-related. This single-isolate study identifies genomic and phenotypic features potentially relevant to persistence in a drinking-water S. maltophilia complex Sgn4 isolate. The combined evidence supports a hypothesis-generating "Switch-Shield" framework, in which stress-response regulation and mobilome-associated plasticity may represent a candidate "switch," while EPS/envelope remodeling may represent a candidate protective "shield." However, direct causality between GI_4/GI_6 and the observed phenotypes was not demonstrated. Future validation using transcriptomics, mutant-based assays, and DWDS-mimetic free-chlorine or chloramine exposure models, including flow/pipe reactors with detachment and shedding measurements, will be required to test this model.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Stenotrophomonas maltophilia/genetics/isolation & purification/drug effects/classification
*Drinking Water/microbiology
*Genomic Islands
Biofilms/growth & development/drug effects
Genome, Bacterial
*SOS Response, Genetics/genetics
*Extracellular Polymeric Substance Matrix/genetics/metabolism
Genomics/methods
Phylogeny

RevDate: 2026-06-27
CmpDate: 2026-06-27

Li Z, Li Y, Huang S, et al (2026)

Global Diversity of Helicobacter pylori Prophages Reveals Genetic Drivers of Virulence and Associations With Gastric Cancer.

Helicobacter, 31(3):e70140.

BACKGROUND: Helicobacter pylori is a globally prevalent gastric pathogen, and chronic infection accounts for most gastric cancer (GC) cases worldwide. Major oncogenic determinants, including CagA, VacA, and the type IV secretion system, show marked geographic heterogeneity, yet the evolutionary forces shaping this uneven distribution remain unclear. Prophages can mediate horizontal gene transfer and modulate bacterial fitness and virulence, but their contribution to H. pylori carcinogenicity has not been systematically evaluated.

METHODS: We characterized prophage diversity, population structure, and virulence potential using 2379 H. pylori host genomes and 139 complete prophage genomes. Prophage population structure and intergenomic relatedness were inferred, and the prophage pangenome and protein-sharing network were reconstructed. Homology-based association analyses were performed to test enrichment of prophage orthologous groups (POGs) with major oncogenic virulence factors (CagA and/or VacA) across the 2379 host genomes.

RESULTS: Prophages segregated into geographically structured populations. The EastAsia and EastAsia2 prophage groups were tightly coupled to high-risk hspEAsia hosts and exhibited the largest and most diverse accessory repertoires. Virulence-associated genes were strongly population-specific and were detected only in the EastAsia/EastAsia2 prophage populations. Moreover, carriage of POGs homologs from 1961P, HPy1R, and phiHP33 showed significant positive associations with CagA and/or VacA across the 2379 genomes, whereas no enrichment was observed for KHP30 or KHP40.

CONCLUSIONS: H. pylori prophages are not passive genomic remnants but population-structured reservoirs whose gene repertoires track high-risk virulence backgrounds and may contribute to the bacterium's carcinogenic potential.

Additional Links: PMID-42363415

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42363415,
year = {2026},
author = {Li, Z and Li, Y and Huang, S and Shang, Y and Li, Y and Zhang, T and Wei, X and Xie, X and Wu, Q and Zhao, X},
title = {Global Diversity of Helicobacter pylori Prophages Reveals Genetic Drivers of Virulence and Associations With Gastric Cancer.},
journal = {Helicobacter},
volume = {31},
number = {3},
pages = {e70140},
doi = {10.1111/hel.70140},
pmid = {42363415},
issn = {1523-5378},
support = {2022YFD2100703//National Key Research and Development Program of China/ ; 2025A1515012225//the Guangdong Basic and Applied Basic Research Foundation/ ; 2022GDASZH-2022010101//GDAS's Project of Science and Technology Development/ ; },
mesh = {*Helicobacter pylori/virology/genetics/pathogenicity ; *Prophages/genetics/classification ; *Stomach Neoplasms/microbiology ; Humans ; Virulence Factors/genetics ; *Helicobacter Infections/microbiology/complications ; *Genetic Variation ; Virulence ; Bacterial Proteins/genetics ; },
abstract = {BACKGROUND: Helicobacter pylori is a globally prevalent gastric pathogen, and chronic infection accounts for most gastric cancer (GC) cases worldwide. Major oncogenic determinants, including CagA, VacA, and the type IV secretion system, show marked geographic heterogeneity, yet the evolutionary forces shaping this uneven distribution remain unclear. Prophages can mediate horizontal gene transfer and modulate bacterial fitness and virulence, but their contribution to H. pylori carcinogenicity has not been systematically evaluated.

METHODS: We characterized prophage diversity, population structure, and virulence potential using 2379 H. pylori host genomes and 139 complete prophage genomes. Prophage population structure and intergenomic relatedness were inferred, and the prophage pangenome and protein-sharing network were reconstructed. Homology-based association analyses were performed to test enrichment of prophage orthologous groups (POGs) with major oncogenic virulence factors (CagA and/or VacA) across the 2379 host genomes.

RESULTS: Prophages segregated into geographically structured populations. The EastAsia and EastAsia2 prophage groups were tightly coupled to high-risk hspEAsia hosts and exhibited the largest and most diverse accessory repertoires. Virulence-associated genes were strongly population-specific and were detected only in the EastAsia/EastAsia2 prophage populations. Moreover, carriage of POGs homologs from 1961P, HPy1R, and phiHP33 showed significant positive associations with CagA and/or VacA across the 2379 genomes, whereas no enrichment was observed for KHP30 or KHP40.

CONCLUSIONS: H. pylori prophages are not passive genomic remnants but population-structured reservoirs whose gene repertoires track high-risk virulence backgrounds and may contribute to the bacterium's carcinogenic potential.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Helicobacter pylori/virology/genetics/pathogenicity
*Prophages/genetics/classification
*Stomach Neoplasms/microbiology
Humans
Virulence Factors/genetics
*Helicobacter Infections/microbiology/complications
*Genetic Variation
Virulence
Bacterial Proteins/genetics

RevDate: 2026-06-27

Faizan R, Naveed M, Estevez IB, et al (2026)

Structure-based immunopharmacological design of a multi-epitope vaccine candidate against Naegleria fowleri targeting TLR3: a pan-genomic and molecular dynamics approach.

Naunyn-Schmiedeberg's archives of pharmacology [Epub ahead of print].

Naegleria fowleri is a highly lethal free-living protozoan responsible for primary amoebic meningoencephalitis (PAM), a rapidly progressive central nervous system infection with a mortality rate exceeding 95%, for which no licensed vaccine or effective preventive therapy currently exists. The absence of prophylactic strategies demands development of novel immunopharmacological interventions. In this study, a comprehensive pan-genomic and structure-guided immunoinformatics approach was employed to design a multi-epitope vaccine candidate against N. fowleri. Four genomes of N. fowleri were subjected to an integrative pan-genomic-immunoinformatics pipeline involving orthologous clustering, subtractive genomics, and immunological relevance screening. From 1427 predicted membrane proteins, 885 antigenic, non-allergenic, and non-toxic candidates were identified. Epitope prediction yielded 7 B-cell epitopes, 26 MHC class I-restricted CTL epitopes, and 22 MHC class II-restricted HTL epitopes, which were assembled into a multi-epitope vaccine construct with immunostimulatory adjuvants. The resulting 320-amino acid construct demonstrated high antigenicity and global population coverage of 91.75%. Structural validation confirmed 98.9% of residues in favorable Ramachandran regions and an ERRAT quality score of 92.484. Molecular docking with Toll-like receptor 3 (TLR3) revealed a highly stable complex with a weighted energy score of - 1475.4 kcal/mol, featuring 24 hydrogen bonds and 5 salt bridges. Molecular dynamics simulations over 100 ns demonstrated structural stability with low RMSD and RMSF values, indicating dynamic stability and coordinated residue motions. Immune simulations predicted strong humoral and cellular immune responses with sustained antibody levels and memory cell generation. Codon optimization achieved a CAI of 0.99 and 53.4% GC content, supporting heterologous expression in Escherichia coli. This multi-epitope vaccine candidate exhibits strong immunogenic potential, structural stability, and favorable interactions with innate immune receptors, positioning it as a promising candidate for experimental validation against N. fowleri infection.

Additional Links: PMID-42365117

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42365117,
year = {2026},
author = {Faizan, R and Naveed, M and Estevez, IB and Rehman, HM and Latif, A and Hammad, HM and Khan, AR and Alsulami, SO and Aljumaa, MA and Tombozara, N},
title = {Structure-based immunopharmacological design of a multi-epitope vaccine candidate against Naegleria fowleri targeting TLR3: a pan-genomic and molecular dynamics approach.},
journal = {Naunyn-Schmiedeberg's archives of pharmacology},
volume = {},
number = {},
pages = {},
pmid = {42365117},
issn = {1432-1912},
abstract = {Naegleria fowleri is a highly lethal free-living protozoan responsible for primary amoebic meningoencephalitis (PAM), a rapidly progressive central nervous system infection with a mortality rate exceeding 95%, for which no licensed vaccine or effective preventive therapy currently exists. The absence of prophylactic strategies demands development of novel immunopharmacological interventions. In this study, a comprehensive pan-genomic and structure-guided immunoinformatics approach was employed to design a multi-epitope vaccine candidate against N. fowleri. Four genomes of N. fowleri were subjected to an integrative pan-genomic-immunoinformatics pipeline involving orthologous clustering, subtractive genomics, and immunological relevance screening. From 1427 predicted membrane proteins, 885 antigenic, non-allergenic, and non-toxic candidates were identified. Epitope prediction yielded 7 B-cell epitopes, 26 MHC class I-restricted CTL epitopes, and 22 MHC class II-restricted HTL epitopes, which were assembled into a multi-epitope vaccine construct with immunostimulatory adjuvants. The resulting 320-amino acid construct demonstrated high antigenicity and global population coverage of 91.75%. Structural validation confirmed 98.9% of residues in favorable Ramachandran regions and an ERRAT quality score of 92.484. Molecular docking with Toll-like receptor 3 (TLR3) revealed a highly stable complex with a weighted energy score of - 1475.4 kcal/mol, featuring 24 hydrogen bonds and 5 salt bridges. Molecular dynamics simulations over 100 ns demonstrated structural stability with low RMSD and RMSF values, indicating dynamic stability and coordinated residue motions. Immune simulations predicted strong humoral and cellular immune responses with sustained antibody levels and memory cell generation. Codon optimization achieved a CAI of 0.99 and 53.4% GC content, supporting heterologous expression in Escherichia coli. This multi-epitope vaccine candidate exhibits strong immunogenic potential, structural stability, and favorable interactions with innate immune receptors, positioning it as a promising candidate for experimental validation against N. fowleri infection.},
}

RevDate: 2026-06-26
CmpDate: 2026-06-26

Zhang Z, Hou Z, Zhu S, et al (2026)

Identification and analysis of the AP2/ERF gene family in Dendrobium officinale based on pan-genome and functional characterization of DofERF109_2.

Frontiers in plant science, 17:1834268.

INTRODUCTION: AP2/ERF transcription factors are key regulators of plant stress responses and developmental processes. Despite their functional significance, limited research has focused on this gene family in the medicinal orchid Dendrobium officinale.

METHODS: Based on the pangenome data of seven Dendrobium officinale individuals from different habitats, we performed a pangenome family analysis of AP2/ERF, including analyses of presence-absence variation (PAV), selection pressure, transposable elements, etc., and conducted functional validation of the screened key members.

RESULTS: A total of 101, 76, 113, 123, 113, 105 and 113 AP2/ERF genes were identified in the seven Dendrobium officinale individuals, respectively. PAV analysis classified the non redundant members into core (29), softcore (28), dispensable (17) and private (3) genes. Compared with Arabidopsis thaliana, D. officinale AP2/ERFs exhibited a significant evolutionary contraction, although some genes underwent duplication. Most genes experienced negative selection, while a few showed positive selection. Cold and heat stress induced differential expression patterns; genes with stable expression were predominantly core or softcore members. The candidate DofERF109_2 localized to the nucleus. Its transient expression suppressed anthocyanin accumulation in tobacco and downregulated the key enzyme gene DofCHI in D. officinale.

DISCUSSION: Our pan genome analysis demonstrates that the AP2/ERF family in D. officinale has undergone significant evolutionary contraction compared with Arabidopsis, suggesting lineage specific gene loss or rapid divergence in orchids. Despite this overall contraction, lineage specific duplications (e.g., ERF109 and RAP2.11) were observed, which may provide raw material for adaptive evolution. The classification into core, softcore, dispensable and private genes reveals a conserved set likely involved in essential functions, whereas variable genes may contribute to local adaptation. Ka/Ks analysis identified positive selection only in a few genes, often those with recent duplications, supporting neofunctionalization. The widespread presence of transposable elements (68.8% of members) suggests that TE insertion is a common, ongoing process that may generate regulatory diversity without being strongly counter selected. Expression profiling under temperature stress further highlighted functional divergence: cold stress induced gradual upregulation, while heat stress caused downregulation of most genes. Notably, the nuclear localized DofERF109_2 negatively regulated anthocyanin biosynthesis by suppressing DofCHI expression and reducing pigment accumulation. Together, our results provide a comprehensive pan genome resource of AP2/ERFs in D. officinale and identify DofERF109_2 as a candidate negative regulator of anthocyanin synthesis.

Additional Links: PMID-42359407

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42359407,
year = {2026},
author = {Zhang, Z and Hou, Z and Zhu, S and Wei, S and Li, S and Xue, Q and Liu, W and Ding, X and Niu, Z},
title = {Identification and analysis of the AP2/ERF gene family in Dendrobium officinale based on pan-genome and functional characterization of DofERF109_2.},
journal = {Frontiers in plant science},
volume = {17},
number = {},
pages = {1834268},
pmid = {42359407},
issn = {1664-462X},
abstract = {INTRODUCTION: AP2/ERF transcription factors are key regulators of plant stress responses and developmental processes. Despite their functional significance, limited research has focused on this gene family in the medicinal orchid Dendrobium officinale.

METHODS: Based on the pangenome data of seven Dendrobium officinale individuals from different habitats, we performed a pangenome family analysis of AP2/ERF, including analyses of presence-absence variation (PAV), selection pressure, transposable elements, etc., and conducted functional validation of the screened key members.

RESULTS: A total of 101, 76, 113, 123, 113, 105 and 113 AP2/ERF genes were identified in the seven Dendrobium officinale individuals, respectively. PAV analysis classified the non redundant members into core (29), softcore (28), dispensable (17) and private (3) genes. Compared with Arabidopsis thaliana, D. officinale AP2/ERFs exhibited a significant evolutionary contraction, although some genes underwent duplication. Most genes experienced negative selection, while a few showed positive selection. Cold and heat stress induced differential expression patterns; genes with stable expression were predominantly core or softcore members. The candidate DofERF109_2 localized to the nucleus. Its transient expression suppressed anthocyanin accumulation in tobacco and downregulated the key enzyme gene DofCHI in D. officinale.

DISCUSSION: Our pan genome analysis demonstrates that the AP2/ERF family in D. officinale has undergone significant evolutionary contraction compared with Arabidopsis, suggesting lineage specific gene loss or rapid divergence in orchids. Despite this overall contraction, lineage specific duplications (e.g., ERF109 and RAP2.11) were observed, which may provide raw material for adaptive evolution. The classification into core, softcore, dispensable and private genes reveals a conserved set likely involved in essential functions, whereas variable genes may contribute to local adaptation. Ka/Ks analysis identified positive selection only in a few genes, often those with recent duplications, supporting neofunctionalization. The widespread presence of transposable elements (68.8% of members) suggests that TE insertion is a common, ongoing process that may generate regulatory diversity without being strongly counter selected. Expression profiling under temperature stress further highlighted functional divergence: cold stress induced gradual upregulation, while heat stress caused downregulation of most genes. Notably, the nuclear localized DofERF109_2 negatively regulated anthocyanin biosynthesis by suppressing DofCHI expression and reducing pigment accumulation. Together, our results provide a comprehensive pan genome resource of AP2/ERFs in D. officinale and identify DofERF109_2 as a candidate negative regulator of anthocyanin synthesis.},
}

RevDate: 2026-06-26
CmpDate: 2026-06-26

Quan W, Zhang T, Liu CL, et al (2026)

Whole-genome and Comparative Genomic Analysis Reveal the Biocontrol and Plant Growth-Promoting Potential of Bacillus velezensis JN.Y2 Against Oat Anthracnose.

Functional & integrative genomics, 26(1):.

Oat anthracnose, primarily caused by Colletotrichum cereale, represents a significant threat to oat production, necessitating the development of sustainable biocontrol alternatives. In this study, we characterized an oat endophytic bacterium, Bacillus velezensis JN.Y2, isolated from healthy oat leaves in Inner Mongolia. In vitro assays demonstrated that B. velezensis JN.Y2 (designated as Bv. JN.Y2) possesses an inhibitory activity against C. cereale (74.49%) and exhibits a broad antifungal spectrum. Greenhouse and three-year multi-location field trials confirmed its beneficial biocontrol efficacy, which remained stable even under fluctuating climatic conditions and high disease pressure, consistently outperforming conventional chemical treatments. Beyond disease suppression, Bv. JN.Y2 significantly enhanced oat growth and grain yield, supported by its ability to produce IAA, solubilize nutrients, and secrete diverse hydrolytic enzymes. Complete genome sequencing revealed a 3.87 Mb circular chromosome containing 13 secondary metabolite BGCs and 4 AOIs for RiPPs, including Amylocyclicin and LCI. Comparative genomic analysis highlighted an "open" pangenome and identified 411 unique genes associated with specialized metabolism and environmental sensing. While Bv. JN.Y2 shares high sequence synteny with B. velezensis CBMB205, distinct variations in the sporulation kinase kinA and secondary metabolite pathways suggest a fine-tuned adaptation to the oat endosphere. Furthermore, biosafety evaluations confirmed a relatively high level of genetic stability and a lack of active antibiotic resistance. Collectively, these findings provide a beneficial molecular foundation for the application of Bv. JN.Y2 as a reliable and secure biocontrol agent in sustainable agriculture.

Additional Links: PMID-42360518

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42360518,
year = {2026},
author = {Quan, W and Zhang, T and Liu, CL and Shi, SX and Zhang, XX and Liu, KH and Wang, CY and Zhao, MM and Dong, BZ and Zhou, HY},
title = {Whole-genome and Comparative Genomic Analysis Reveal the Biocontrol and Plant Growth-Promoting Potential of Bacillus velezensis JN.Y2 Against Oat Anthracnose.},
journal = {Functional & integrative genomics},
volume = {26},
number = {1},
pages = {},
pmid = {42360518},
issn = {1438-7948},
support = {2025YFHH0165//Science and Technology Program of Inner Mongolia Autonomous Region/ ; 2022ZY0060//Central Government-Guided Local Science and Technology Development Fund/ ; BR251033//Basic Research Fund for Universities Directly Affiliated with Inner Mongolia Autonomous Region/ ; 2023YFD1600701-5//National Key Research and Development Program of China/ ; CARS-07-C-3//China Agriculture Research System for Oat and Buckwheat/ ; },
mesh = {*Bacillus/genetics/physiology/metabolism ; *Avena/microbiology/growth & development ; *Genome, Bacterial ; *Plant Diseases/microbiology/prevention & control ; *Colletotrichum/pathogenicity ; },
abstract = {Oat anthracnose, primarily caused by Colletotrichum cereale, represents a significant threat to oat production, necessitating the development of sustainable biocontrol alternatives. In this study, we characterized an oat endophytic bacterium, Bacillus velezensis JN.Y2, isolated from healthy oat leaves in Inner Mongolia. In vitro assays demonstrated that B. velezensis JN.Y2 (designated as Bv. JN.Y2) possesses an inhibitory activity against C. cereale (74.49%) and exhibits a broad antifungal spectrum. Greenhouse and three-year multi-location field trials confirmed its beneficial biocontrol efficacy, which remained stable even under fluctuating climatic conditions and high disease pressure, consistently outperforming conventional chemical treatments. Beyond disease suppression, Bv. JN.Y2 significantly enhanced oat growth and grain yield, supported by its ability to produce IAA, solubilize nutrients, and secrete diverse hydrolytic enzymes. Complete genome sequencing revealed a 3.87 Mb circular chromosome containing 13 secondary metabolite BGCs and 4 AOIs for RiPPs, including Amylocyclicin and LCI. Comparative genomic analysis highlighted an "open" pangenome and identified 411 unique genes associated with specialized metabolism and environmental sensing. While Bv. JN.Y2 shares high sequence synteny with B. velezensis CBMB205, distinct variations in the sporulation kinase kinA and secondary metabolite pathways suggest a fine-tuned adaptation to the oat endosphere. Furthermore, biosafety evaluations confirmed a relatively high level of genetic stability and a lack of active antibiotic resistance. Collectively, these findings provide a beneficial molecular foundation for the application of Bv. JN.Y2 as a reliable and secure biocontrol agent in sustainable agriculture.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Bacillus/genetics/physiology/metabolism
*Avena/microbiology/growth & development
*Genome, Bacterial
*Plant Diseases/microbiology/prevention & control
*Colletotrichum/pathogenicity

RevDate: 2026-06-26

Sabbagh Q, Gilissen C, Yntema HG, et al (2026)

Near-perfect genome sequencing in medical genetics.

Nature genetics [Epub ahead of print].

Medical genetics currently operates through a fragmented diagnostic cascade built around short-read sequencing technologies that carry well-documented blind spots, including regions of high sequence homology, tandem repeats and segmental duplications, as well as large or complex structural variants, invisible base modifications and a lack of variant phasing. We propose that long-read genome sequencing should be considered as one pillar of a broader technological convergence encompassing diploid genome assembly, pangenome references and artificial intelligence-driven variant interpretation, termed near-perfect genome sequencing (NPGS). We further propose a Bayesian framework in which genomic completeness itself constitutes interpretive evidence for variant classification. This principle has direct implications for the interpretation of variants of uncertain significance in clinical practice. We highlight the potential of NPGS across postnatal, prenatal and oncological settings and outline a staged implementation roadmap toward the one-test paradigm. We also address real-world implementation challenges, including cost, computational demand, equity and ethical considerations.

Additional Links: PMID-42362790

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42362790,
year = {2026},
author = {Sabbagh, Q and Gilissen, C and Yntema, HG and Vissers, LELM and Hoischen, A},
title = {Near-perfect genome sequencing in medical genetics.},
journal = {Nature genetics},
volume = {},
number = {},
pages = {},
pmid = {42362790},
issn = {1546-1718},
abstract = {Medical genetics currently operates through a fragmented diagnostic cascade built around short-read sequencing technologies that carry well-documented blind spots, including regions of high sequence homology, tandem repeats and segmental duplications, as well as large or complex structural variants, invisible base modifications and a lack of variant phasing. We propose that long-read genome sequencing should be considered as one pillar of a broader technological convergence encompassing diploid genome assembly, pangenome references and artificial intelligence-driven variant interpretation, termed near-perfect genome sequencing (NPGS). We further propose a Bayesian framework in which genomic completeness itself constitutes interpretive evidence for variant classification. This principle has direct implications for the interpretation of variants of uncertain significance in clinical practice. We highlight the potential of NPGS across postnatal, prenatal and oncological settings and outline a staged implementation roadmap toward the one-test paradigm. We also address real-world implementation challenges, including cost, computational demand, equity and ethical considerations.},
}

RevDate: 2026-06-25
CmpDate: 2026-06-25

Sawai K, Ikai M, Shinohara M, et al (2026)

Integrated analysis of transposon insertion sequencing and pangenome reveals core and lineage-specific essential genes in Mycobacterium avium subsp. hominissuis.

Microbial genomics, 12(6):.

Pulmonary disease caused by non-tuberculous mycobacteria (NTM-PD) is an emerging global health concern. Among NTM, Mycobacterium avium subsp. hominissuis (MAH) is the major causative agent of NTM-PD. Similar to Mycobacterium tuberculosis, MAH exhibits lineage-specific geographical distributions and host adaptations. Here, we characterized three MAH strains from the residential bathrooms of Mycobacterium avium complex pulmonary disease patients in Japan. A genetic population clustering analysis revealed that the three strains belong to the East Asia (EA) lineages that are predominant in Japan and Korea. Pan-genome analysis using the publicly available complete genome sequences of MAH and the newly sequenced MAH strains identified 3,313 core genes that are conserved among distinct MAH lineages. Identification of essential genes in the three strains was conducted using transposon insertion sequencing, and their gene essentiality profiles were compared to those of a previously studied sequence cluster 3 (SC3) lineage strain, MAC109. Despite their genetic diversity, nearly all essential genes were derived from the core gene set. In addition, we identified a set of common essential genes for the EA and SC3 lineages, as well as lineage-specific essential genes. Our results highlight the evolutionary and clinical importance of lineage-specific adaptations in MAH.

Additional Links: PMID-42348439

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42348439,
year = {2026},
author = {Sawai, K and Ikai, M and Shinohara, M and Nishiuchi, Y and Fujiyoshi, S and Doi, Y and Iwamoto, T and Arikawa, K and Maruyama, F and Minato, Y},
title = {Integrated analysis of transposon insertion sequencing and pangenome reveals core and lineage-specific essential genes in Mycobacterium avium subsp. hominissuis.},
journal = {Microbial genomics},
volume = {12},
number = {6},
pages = {},
pmid = {42348439},
issn = {2057-5858},
mesh = {*DNA Transposable Elements ; *Genes, Essential ; Genome, Bacterial ; Phylogeny ; Humans ; Japan ; Mutagenesis, Insertional ; Genetic Variation ; *Mycobacterium/genetics/classification ; },
abstract = {Pulmonary disease caused by non-tuberculous mycobacteria (NTM-PD) is an emerging global health concern. Among NTM, Mycobacterium avium subsp. hominissuis (MAH) is the major causative agent of NTM-PD. Similar to Mycobacterium tuberculosis, MAH exhibits lineage-specific geographical distributions and host adaptations. Here, we characterized three MAH strains from the residential bathrooms of Mycobacterium avium complex pulmonary disease patients in Japan. A genetic population clustering analysis revealed that the three strains belong to the East Asia (EA) lineages that are predominant in Japan and Korea. Pan-genome analysis using the publicly available complete genome sequences of MAH and the newly sequenced MAH strains identified 3,313 core genes that are conserved among distinct MAH lineages. Identification of essential genes in the three strains was conducted using transposon insertion sequencing, and their gene essentiality profiles were compared to those of a previously studied sequence cluster 3 (SC3) lineage strain, MAC109. Despite their genetic diversity, nearly all essential genes were derived from the core gene set. In addition, we identified a set of common essential genes for the EA and SC3 lineages, as well as lineage-specific essential genes. Our results highlight the evolutionary and clinical importance of lineage-specific adaptations in MAH.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*DNA Transposable Elements
*Genes, Essential
Genome, Bacterial
Phylogeny
Humans
Japan
Mutagenesis, Insertional
Genetic Variation
*Mycobacterium/genetics/classification

RevDate: 2026-06-25

Schuler F, Grebe T, Schwierzeck V, et al (2026)

Genomic analysis reveals limited pathogen-specific factors for Staphylococcus aureus translocation from blood-to-urine in patients with bacteremia: A crosssectional study.

International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases pii:S1201-9712(26)00563-1 [Epub ahead of print].

OBJECTIVES: Up to 40% of patients with a Staphylococcus aureus bacteremia (SAB) have concomitant asymptomatic S. aureus bacteriuria (SABU). It remains unclear whether pathogen-related mechanisms contribute to SABU secondary to SAB. We compared genomic characteristics of S. aureus from SAB patients with and without SABU.

METHODS: In this cross-sectional study (2020-2023) SAB patients were screened for SABU. Genetic relatedness between blood and urine isolates was assessed by core genome multilocus sequence typing (cgMLST). Genomes were screened for virulence factors, and associations with SABU were analyzed. Pangenome-based and k-mer-based genome-wide association studies (GWAS) were performed.

RESULTS: Among 116 patients (median age: 66.8 years, 44 females), 32 (27.6%) had SABU. All paired blood and urine isolates were genetically identical or differed by ≤6 cgMLST alleles. The presence of set23, (encoding superantigen-like protein SSL8), was associated with SABU (univariate analysis), whereas sec independently predicted absence of SABU (multivariate analysis). Although pangenome-based GWAS identified 141 loci significantly associated with SABU (naïve p-value <0.05), none remained significant after correction for multiple testing. No significant associations were identified in k-mer-based GWAS.

CONCLUSIONS: Blood and urine isolates were genetically highly related, supporting hematogenous blood-to-urine translocation. Comprehensive genomic analyses provided minimal evidence for pathogen-specific drivers of SABU, suggesting host factors are more important.

Additional Links: PMID-42349778

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42349778,
year = {2026},
author = {Schuler, F and Grebe, T and Schwierzeck, V and Kolte, B and Schaumburg, F},
title = {Genomic analysis reveals limited pathogen-specific factors for Staphylococcus aureus translocation from blood-to-urine in patients with bacteremia: A crosssectional study.},
journal = {International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases},
volume = {},
number = {},
pages = {108928},
doi = {10.1016/j.ijid.2026.108928},
pmid = {42349778},
issn = {1878-3511},
abstract = {OBJECTIVES: Up to 40% of patients with a Staphylococcus aureus bacteremia (SAB) have concomitant asymptomatic S. aureus bacteriuria (SABU). It remains unclear whether pathogen-related mechanisms contribute to SABU secondary to SAB. We compared genomic characteristics of S. aureus from SAB patients with and without SABU.

METHODS: In this cross-sectional study (2020-2023) SAB patients were screened for SABU. Genetic relatedness between blood and urine isolates was assessed by core genome multilocus sequence typing (cgMLST). Genomes were screened for virulence factors, and associations with SABU were analyzed. Pangenome-based and k-mer-based genome-wide association studies (GWAS) were performed.

RESULTS: Among 116 patients (median age: 66.8 years, 44 females), 32 (27.6%) had SABU. All paired blood and urine isolates were genetically identical or differed by ≤6 cgMLST alleles. The presence of set23, (encoding superantigen-like protein SSL8), was associated with SABU (univariate analysis), whereas sec independently predicted absence of SABU (multivariate analysis). Although pangenome-based GWAS identified 141 loci significantly associated with SABU (naïve p-value <0.05), none remained significant after correction for multiple testing. No significant associations were identified in k-mer-based GWAS.

CONCLUSIONS: Blood and urine isolates were genetically highly related, supporting hematogenous blood-to-urine translocation. Comprehensive genomic analyses provided minimal evidence for pathogen-specific drivers of SABU, suggesting host factors are more important.},
}

RevDate: 2026-06-25

Aditama R, Siregar HA, Tanjung ZA, et al (2026)

Graph-based pan-genome reveals structural and functional diversity across oil palm domestication gradients.

BMC plant biology pii:10.1186/s12870-026-09320-0 [Epub ahead of print].

BACKGROUND: Oil palm (Elaeis guineensis Jacq.), the world's most land-efficient oil crop, underpins global vegetable oil supply yet faces mounting constraints from limited expansion, climate stress, and disease pressure. These challenges highlight the urgent need for genomic resources that capture species-wide diversity to support sustainable improvement. While recent reference assemblies have advanced trait discovery, single linear genomes fail to represent the full spectrum of structural and gene-content variation, limiting resolution of agronomic alleles.

RESULTS: Here, we constructed a graph-based pan-genome from 30 diverse oil palm assemblies representing wild, semi-domesticated, and commercial accessions. We characterized structural variants, gene presence-absence variation, and copy-number gains, with focusing on functional stratification and resistance gene dynamics. The graph-based pan-genome revealed extensive structural and gene-content variation, including a large conserved core, complemented by shell and unique fractions enriched or biased toward regulatory, stress-responsive, and defense-related functions. Structural variation and duplication-derived copy-number gains contributed substantially to gene-content diversity, with semi-domesticated accessions exhibiting the greatest variability. Resistance gene repertoires showed contrasting patterns: receptor-like kinases remained comparatively stable, whereas the CNL subclass of NLR genes contributed disproportionately to shell-genome variation and duplication-associated turnover.

CONCLUSIONS: This graph-based pan-genome provides a curated multi-assembly reference and comparative framework for oil palm genomics. By capturing structural variants, gene-content variations, copy-number gains, and resistance gene dynamics across domestication gradients, it establishes a foundation for future pan-GWAS analysis, functional genomics, and molecular breeding strategies aimed at improving resilience and productivity in this globally important crop.

Additional Links: PMID-42350953

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42350953,
year = {2026},
author = {Aditama, R and Siregar, HA and Tanjung, ZA and Dinarti, D and Ardie, SW and Suwarno, WB and Liwang, T and Suprianto, E and Romero, HM and Utomo, C and Sudarsono, S},
title = {Graph-based pan-genome reveals structural and functional diversity across oil palm domestication gradients.},
journal = {BMC plant biology},
volume = {},
number = {},
pages = {},
doi = {10.1186/s12870-026-09320-0},
pmid = {42350953},
issn = {1471-2229},
support = {GRS_20220227110350//Indonesian Plantation Fund Management Agency (BPDP)/ ; GRS_20220227110350//Indonesian Plantation Fund Management Agency (BPDP)/ ; GRS_20220227110350//Indonesian Plantation Fund Management Agency (BPDP)/ ; GRS_20220227110350//Indonesian Plantation Fund Management Agency (BPDP)/ ; },
abstract = {BACKGROUND: Oil palm (Elaeis guineensis Jacq.), the world's most land-efficient oil crop, underpins global vegetable oil supply yet faces mounting constraints from limited expansion, climate stress, and disease pressure. These challenges highlight the urgent need for genomic resources that capture species-wide diversity to support sustainable improvement. While recent reference assemblies have advanced trait discovery, single linear genomes fail to represent the full spectrum of structural and gene-content variation, limiting resolution of agronomic alleles.

RESULTS: Here, we constructed a graph-based pan-genome from 30 diverse oil palm assemblies representing wild, semi-domesticated, and commercial accessions. We characterized structural variants, gene presence-absence variation, and copy-number gains, with focusing on functional stratification and resistance gene dynamics. The graph-based pan-genome revealed extensive structural and gene-content variation, including a large conserved core, complemented by shell and unique fractions enriched or biased toward regulatory, stress-responsive, and defense-related functions. Structural variation and duplication-derived copy-number gains contributed substantially to gene-content diversity, with semi-domesticated accessions exhibiting the greatest variability. Resistance gene repertoires showed contrasting patterns: receptor-like kinases remained comparatively stable, whereas the CNL subclass of NLR genes contributed disproportionately to shell-genome variation and duplication-associated turnover.

CONCLUSIONS: This graph-based pan-genome provides a curated multi-assembly reference and comparative framework for oil palm genomics. By capturing structural variants, gene-content variations, copy-number gains, and resistance gene dynamics across domestication gradients, it establishes a foundation for future pan-GWAS analysis, functional genomics, and molecular breeding strategies aimed at improving resilience and productivity in this globally important crop.},
}

RevDate: 2026-06-26
CmpDate: 2026-06-26

Alhashel AF, Almasrahi AA, Alsaleh MA, et al (2026)

Orthogroup-Based Comparative Analysis of Prophage Gene Content in Candidatus Liberibacter Asiaticus Supports a Predominantly Conserved Global Repertoire with Limited Accessory Variation.

International journal of molecular sciences, 27(12): pii:ijms27125638.

Huanglongbing, a destructive citrus disease of global importance that is also present in Saudi Arabia, is associated with Candidatus Liberibacter asiaticus (CLas) and remains a major threat to citrus production. Although previous studies have documented sequence variation and prophage polymorphism in CLas, broader comparisons of prophage-associated gene content remain limited. In particular, comparative orthogroup analysis of prophage gene-content conservation across geographically structured CLas populations has rarely been explored. In this study, we analyzed 42 CLas prophage genomes from Saudi Arabia and other geographic regions using a comparative orthogroup framework. OrthoFinder assigned 99.1% of predicted proteins (1825 of 1841) to 64 orthogroups, with only 16 genes remaining unassigned. A small number of rare orthogroups restricted to only a few genomes were identified, and no orthogroup was detected in all genomes. Presence-absence analyses supported a predominantly conserved prophage gene repertoire together with a small accessory component, while also indicating that apparent absences should be interpreted in light of mixed assembly status and prophage-region completeness. Saudi Arabian genomes were distributed within the broader global framework and exhibited generally similar gene-content profiles rather than a deeply separated lineage. Functional interpretation of representative orthogroups identified conserved prophage-associated genes related to replication, helicase activity, and phage packaging, whereas variable orthogroups were primarily associated with hypothetical or accessory prophage-related functions. Overall, these results are consistent with a model in which CLas prophage diversification is associated more with sequence-level variation and localized structural differences than with extensive gain or loss of prophage genes. These findings further refine current understanding of CLas genome evolution and highlight conserved prophage-associated targets that may support molecular diagnostics and epidemiological surveillance.

Additional Links: PMID-42353352

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid42353352,
year = {2026},
author = {Alhashel, AF and Almasrahi, AA and Alsaleh, MA and Widyawan, A and El-Komy, MH and Ibrahim, YE},
title = {Orthogroup-Based Comparative Analysis of Prophage Gene Content in Candidatus Liberibacter Asiaticus Supports a Predominantly Conserved Global Repertoire with Limited Accessory Variation.},
journal = {International journal of molecular sciences},
volume = {27},
number = {12},
pages = {},
doi = {10.3390/ijms27125638},
pmid = {42353352},
issn = {1422-0067},
support = {ORF-2026-2043//King Saud University/ ; },
mesh = {*Prophages/genetics ; Phylogeny ; Citrus/microbiology ; *Rhizobiaceae/genetics/virology ; Genome, Bacterial ; Genetic Variation ; Plant Diseases/microbiology ; Saudi Arabia ; Liberibacter ; },
abstract = {Huanglongbing, a destructive citrus disease of global importance that is also present in Saudi Arabia, is associated with Candidatus Liberibacter asiaticus (CLas) and remains a major threat to citrus production. Although previous studies have documented sequence variation and prophage polymorphism in CLas, broader comparisons of prophage-associated gene content remain limited. In particular, comparative orthogroup analysis of prophage gene-content conservation across geographically structured CLas populations has rarely been explored. In this study, we analyzed 42 CLas prophage genomes from Saudi Arabia and other geographic regions using a comparative orthogroup framework. OrthoFinder assigned 99.1% of predicted proteins (1825 of 1841) to 64 orthogroups, with only 16 genes remaining unassigned. A small number of rare orthogroups restricted to only a few genomes were identified, and no orthogroup was detected in all genomes. Presence-absence analyses supported a predominantly conserved prophage gene repertoire together with a small accessory component, while also indicating that apparent absences should be interpreted in light of mixed assembly status and prophage-region completeness. Saudi Arabian genomes were distributed within the broader global framework and exhibited generally similar gene-content profiles rather than a deeply separated lineage. Functional interpretation of representative orthogroups identified conserved prophage-associated genes related to replication, helicase activity, and phage packaging, whereas variable orthogroups were primarily associated with hypothetical or accessory prophage-related functions. Overall, these results are consistent with a model in which CLas prophage diversification is associated more with sequence-level variation and localized structural differences than with extensive gain or loss of prophage genes. These findings further refine current understanding of CLas genome evolution and highlight conserved prophage-associated targets that may support molecular diagnostics and epidemiological surveillance.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

*Prophages/genetics
Phylogeny
Citrus/microbiology
*Rhizobiaceae/genetics/virology
Genome, Bacterial
Genetic Variation
Plant Diseases/microbiology
Saudi Arabia
Liberibacter

RevDate: 2026-06-26
CmpDate: 2026-06-26

Zhu Y, Yang H, Tang R, et al (2026)

Comparative Genomics of Fermented Vegetable-Derived Leuconostoc mesenteroides from Biodiversity Hotspot Yunnan, China.

Microorganisms, 14(6): pii:microorganisms14061350.

Fermented vegetables in Yunnan Province, China, harbor abundant microbial diversity. However, the development of indigenous starter cultures remains under-utilized. Genomic information regarding Leuconostoc (L.) mesenteroides isolates from this region is particularly scarce. To assess the genomic characteristics of eight L. mesenteroides isolates from traditional Yunnan fermented vegetables, we performed whole-genome sequencing and conducted a comparative analysis with 21 publicly available vegetable-derived genomes. Comparative genomic analysis revealed marked variation in genome size and plasmid content, and pangenome analysis indicated an open configuration. Core-genome multilocus sequence typing (cgMLST) of the eight indigenous isolates showed high allelic diversity, indicating a genetically heterogeneous and non-clonal population. Phylogenomic analysis revealed that the evolutionary relationships among the 29 strains were not strictly correlated with their vegetable sources, suggesting an influence from other factors, such as geographic origin and region-specific processing methods. Similar to the profiles of the 21 publicly available genomes, inactive prophages, intrinsic vancomycin resistance genes, and genomic island fragments were detected in eight isolates, whereas no known virulence genes were identified. Bacteriocin gene clusters varied among strains, while stress tolerance and probiotic-related genes were conserved. Overall, these results provide genomic indications relevant to the safety, adaptability, and fermentation potential of indigenous L. mesenteroides from Yunnan. However, because these functional traits are inferred solely from genomic predictions, subsequent experimental validation is essential to confirm their phenotypic properties and technological efficacy.

Additional Links: PMID-42354974

Publisher:

PubMed:

Google: