Viewport Size Code:
Login | Create New Account
picture

  MENU

About | Classical Genetics | Timelines | What's New | What's Hot

About | Classical Genetics | Timelines | What's New | What's Hot

icon

Bibliography Options Menu

icon
QUERY RUN:
HITS:
PAGE OPTIONS:
Hide Abstracts   |   Hide Additional Links
NOTE:
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

The Electronic Scholarly Publishing Project: Providing world-wide, free access to classic scientific papers and other scholarly materials, since 1993.

More About:  ESP | OUR CONTENT | THIS WEBSITE | WHAT'S NEW | WHAT'S HOT

ESP: PubMed Auto Bibliography 13 Oct 2024 at 01:32 Created: 

Pangenome

Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: ( pangenome OR "pan-genome" OR "pan genome" ) NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

-->

RevDate: 2024-10-11

Naser-Khdour S, Scheuber F, Fields PD, et al (2024)

The Evolution of Extreme Genetic Variability in a Parasite-Resistance Complex.

Genome biology and evolution pii:7818197 [Epub ahead of print].

Genomic regions that play a role in parasite defense are often found to be highly variable, with the MHC serving as an iconic example. Single nucleotide polymorphisms may represent only a small portion of this variability, with Indel polymorphisms and copy number variation further contributing. In extreme cases, haplotypes may no longer be recognized as orthologous. Understanding the evolution of such highly divergent regions is challenging because the most extreme variation is not visible using reference-assisted genomic approaches. Here we analyze the case of the Pasteuria Resistance Complex (PRC) in the crustacean Daphnia magna, a defense complex in the host against the common and virulent bacterium Pasteuria ramosa. Two haplotypes of this region have been previously described, with parts of it being non-homologous, and the region has been shown to be under balancing selection. Using pan-genome analysis and tree reconciliation methods to explore the evolution of the PRC and its characteristics within and between species of Daphnia and other Cladoceran species, our analysis revealed a remarkable diversity in this region even among host species, with many non-homologous hyper-divergent-haplotypes. The PRC is characterized by extensive duplication and losses of Fucosyltransferase (FuT) and Galactosyltransferase (GalT) genes that are believed to play a role in parasite defense. The PRC region can be traced back to common ancestors over 250 million years. The unique combination of an ancient resistance complex and a dynamic, hyper-divergent genomic environment presents a fascinating opportunity to investigate the role of such regions in the evolution and long-term maintenance of resistance polymorphisms. Our findings offer valuable insights into the evolutionary forces shaping disease resistance and adaptation, not only in the genus Daphnia, but potentially across the entire Cladocera class.

RevDate: 2024-10-11

Ferro E, Oliva M, Gagie T, et al (2024)

Building a pangenome alignment index via recursive prefix-free parsing.

iScience, 27(10):110933.

Pangenomics alignment offers a solution to reduce bias in biomedical research. Traditionally, short-read aligners like Bowtie and BWA indexed a single reference genome to find approximate alignments. These methods, limited by linear-memory requirements, can only index a few genomes. Emerging pangenome aligners, such as VG, Giraffe, and Moni, address this by indexing more genomes. VG and Giraffe use a variation graph, while Moni indexes sequences accounting for repetition using prefix-free parsing to build a dictionary and parse. The main challenge is the parse's size, which becomes significantly larger than the dictionary. To scale Moni, we propose removing the parse from the construction of the run-length encoded BWT (RLBWT), suffix array, and Longest Common Prefix (LCP) by applying prefix-free parsing recursively. This approach improves construction time and memory requirements, enabling efficient construction of RLBWT, suffix array, and LCP for large pangenomes, such as those from the Human Pangenome Reference Consortium.

RevDate: 2024-10-11

Gabory E, Mwaniki MN, Pisanti N, et al (2024)

Pangenome comparison via ED strings.

Frontiers in bioinformatics, 4:1397036.

INTRODUCTION: An elastic-degenerate (ED) string is a sequence of sets of strings. It can also be seen as a directed acyclic graph whose edges are labeled by strings. The notion of ED strings was introduced as a simple alternative to variation and sequence graphs for representing a pangenome, that is, a collection of genomic sequences to be analyzed jointly or to be used as a reference.

METHODS: In this study, we define notions of matching statistics of two ED strings as similarity measures between pangenomes and, consequently infer a corresponding distance measure. We then show that both measures can be computed efficiently, in both theory and practice, by employing the intersection graph of two ED strings.

RESULTS: We also implemented our methods as a software tool for pangenome comparison and evaluated their efficiency and effectiveness using both synthetic and real datasets.

DISCUSSION: As for efficiency, we compare the runtime of the intersection graph method against the classic product automaton construction showing that the intersection graph is faster by up to one order of magnitude. For showing effectiveness, we used real SARS-CoV-2 datasets and our matching statistics similarity measure to reproduce a well-established clade classification of SARS-CoV-2, thus demonstrating that the classification obtained by our method is in accordance with the existing one.

RevDate: 2024-10-11

Udaondo Z, Ramos JL, K Abram (2024)

Unraveling the Genomic Diversity of the Pseudomonas putida Group: Exploring Taxonomy, Core Pangenome, and Antibiotic Resistance Mechanisms.

FEMS microbiology reviews pii:7818139 [Epub ahead of print].

The genus Pseudomonas is characterized by its rich genetic diversity, with over 300 species been validly recognized. This reflects significant progress made through sequencing and computational methods. Pseudomonas putida group comprises highly adaptable species that thrive in diverse environments and play various ecological roles, from promoting plant growth to being pathogenic in immunocompromised individuals. By leveraging the GRUMPS computational pipeline, we scrutinized 26363 genomes labeled as Pseudomonas in NCBI GenBank, categorizing all Pseudomonas spp. genomes into 435 distinct species-level clusters or cliques. We identified 224 strains deposited under the taxonomic identifier "Pseudomonas putida" distributed within 31 of these species-level clusters, challenging prior classifications. Nine of these 31 cliques contained at least six genomes labeled as "Pseudomonas putida" and were analyzed in depth, particularly clique_1 (P. alloputida) and clique_2 (P. putida). Pangenomic analysis of a set of 413 P. putida group strains revealed over 2.2 million proteins and more than 77000 distinct protein families. The core genome of these 413 strains includes 2226 protein families involved in essential biological processes. Intraspecific genetic homogeneity was observed within each clique, each possessing a distinct genomic identity. These cliques exhibit distinct core genes and diverse subgroups, reflecting adaptation to specific environments. Contrary to traditional views, nosocomial infections by P. alloputida, P. putida, and P. monteilii have been reported, with strains showing varied antibiotic resistance profiles due to diverse mechanisms. This review enhances the taxonomic understanding of key P. putida group species using advanced population genomics approaches and provides a comprehensive understanding of their genetic diversity, ecological roles, interactions, and potential applications.

RevDate: 2024-10-10

Vaduva P, J Bertherat (2024)

The molecular genetics of adrenal cushing.

Hormones (Athens, Greece) [Epub ahead of print].

Adrenal Cushing represents 20% of cases of endogenous hypercorticism. Unilateral cortisol-producing adenoma (CPA), a benign tumor, and adrenocortical carcinoma (ACC), a malignant tumor, are more frequent than bilateral adrenal nodular diseases (primary bilateral macronodular adrenal hyperplasia (PBMAH) and primary pigmented nodular adrenal disease (PPNAD)).In cortisol-producing adrenal tumors, the signaling pathways mainly altered are the protein kinase A and Wnt/β-catenin pathways. Studying components of these pathways and exploring syndromic and familial cases of these tumors has historically enabled identification of many of the predisposing genes. More recently, pangenomic sequencing revealed alterations in sporadic tumors.In ACC, mainly due to TP53 alterations causing Li-Fraumeni syndrome, germline predisposition is frequent in children, while it is rare in adults. Pathogenic variants in the DNA mismatch repair genes MLH1, MSH2, MSH6, and PMS2, which cause Lynch syndrome or alterations of IGF2 and CDKN1C (11p15 locus) in Beckwith-Wiedemann syndrome, can also cause ACC. Rarely, ACC is described in other hereditary tumor syndromes due to germline pathogenic variants in MEN1 or APC and, in very rare cases, NF1, SDH, PRKAR1A, or BRCA2. Concerning ACC somatic alterations, TP53 and genetic or epigenetic alterations at the 11p15 locus are also frequently described, as well as CTNNB1 and ZNRF3 pathogenic variants.CPAs mainly harbor somatic pathogenic variants in PRKACA and CTNNB1 and, less frequently, PRKAR1A, PRKACB, or GNAS1 pathogenic variants. Isolated PBMAH is due to ARMC5 inactivating pathogenic variants in 20 to 25% of cases and to KDM1A pathogenic variants in food-dependent Cushing. Syndromic PBMAH may be due to germline pathogenic variants in MEN1, APC, or FH, causing type 1 multiple endocrine neoplasia, familial adenomatous polyposis, or hereditary leiomyomatosis-kidney cancer syndrome, respectively. PRKAR1A germline pathogenic variants are the main alteration causing PPNAD (isolated or part of Carney complex).

RevDate: 2024-10-10

Li W (2024)

Personalizing pangenome graphs with k-mers.

Nature genetics pii:10.1038/s41588-024-01954-w [Epub ahead of print].

RevDate: 2024-10-10

Huang P, Charton F, Schmelzle JM, et al (2024)

Pangenome-Informed Language Models for Privacy-Preserving Synthetic Genome Sequence Generation.

bioRxiv : the preprint server for biology pii:2024.09.18.612131.

The public availability of genome datasets, such as The Human Genome Project (HGP), The 1000 Genomes Project, The Cancer Genome Atlas, and the International HapMap Project, has significantly advanced scientific research and medical understanding. Here our goal is to share such genomic information for downstream analysis while protecting the privacy of individuals through Differential Privacy (DP). We introduce synthetic DNA data generation based on pangenomes in combination with Pretrained-Language Models (PTLMs). We introduce two novel tokenization schemes based on pangenome graphs to enhance the modeling of DNA. We evaluated these tokenization methods, and compared them with classical single nucleotide and k -mer tokenizations. We find k -mer tokenization schemes, indicating that our tokenization schemes boost the model's performance consistency with long effective context length (covering longer sequences with the same number of tokens). Additionally, we propose a method to utilize the pangenome graph and make it comply with DP privacy standards. We assess the performance of DP training on the quality of generated sequences with discussion of the trade-offs between privacy and model accuracy. The source code for our work will be published under a free and open source license soon.

RevDate: 2024-10-09
CmpDate: 2024-10-09

Ali R, Ali K, Aurongzeb M, et al (2024)

Characterization of meningitis-causing bacteria, with focus on genomic and pangenomic study of multi-drug resistant Streptococcus pneumoniae from cerebrospinal fluid.

Antonie van Leeuwenhoek, 118(1):16.

Streptococcus pneumoniae is a major cause of meningitis in under developed countries with low vaccination rates and high antibiotic resistance. This study aimed to analyze 83 suspected meningitis patients in Karachi for the detection of S. pneumoniae followed by its whole genome sequencing and Pan Genome analysis. Out of the 83 samples collected, 33 samples with altered physical (turbidity), cytological (white blood cell count) and biochemical (total protein and total glucose concentrations) parameters indicated potential meningitis cases, while these parameters were within normal healthy ranges in remaining 50 samples. Latex particle agglutination (LPA) was performed on the 33 samples, revealing 20 positive cases of bacterial meningitis. The PCR and culturing methods revealed 5 S. pneumoniae isolates. Antibiotic susceptibility tests showed that one S. pneumoniae strain was resistant to erythromycin, levofloxacin, and tetracycline. Whole-genome sequencing of this resistant strain was performed and S. pneumoniae was confirmed with MLST analysis, while it had > 2.3 Mb genome and a single repUS43 plasmid. In CARD analysis, the strain had tet(M), ermB, RlmA(II), patB, pmrA, and patA ARGs, which could provide resistance against tetracycline, macrolide, fluoroquinolone, and glycopeptide antibiotics. Phylogenetic analysis revealed that the isolate was closely related to strains from Hungary and the USA. Pan-genome analysis with 144 genome assemblies from NCBI database showed that 1101 non-redundant core genes were shared between all strains. This study gives valuable understanding into the prevalence and characterization of meningitis-causing bacteria in Karachi, Pakistan with prime focus on multi-drug resistant S. pneumoniae.

RevDate: 2024-10-08

Chu N, Liu TT, Zhang HL, et al (2024)

Complete genome sequences of two Pantoea stewartii strains ATCC 8199 from maize and PSCN1 from sugarcane.

BMC genomic data, 25(1):86.

OBJECTIVES: The pathogen of Pantoea stewartii (Ps) is the causal agent of bacterial disease in corn and various graminaceous plants. Ps has two subspecies, Pantoea stewartii subsp. stewartia (Pss) and Pantoea stewartii subsp. indologenes (Psi). This study presents two complete genomes of Ps strains including ATCC 8199 isolated from maize and PSCN1 causing bacterial wilt in sugarcane. The two bacterial genomes information will be helpful for taxonomy analysis in this genus Pantoea at whole-genome levels and accurately discriminated the two subspecies of Pss and Psi.

DATA DESCRIPTION: The reference strain ATCC 8199 isolated from maize was purchased from Beijing Biobw Biotechnology Co., Ltd. (China) and the strain of PSCN1 was isolated from sugarcane cultivar YZ08-1095 in Zhanjiang, Guangdong province of China. Two complete genomes were sequenced using Illumina Hiseq (second-generation) and Oxford Nanopore (third-generation) platforms. The genome of the strain ATCC 8199 comprised of 4.78 Mb with an average GC content of 54.03%, along with five plasmids, encoding a total of 4,846 gene with an average gene length of 827 bp. The genome of PSCN1 comprised of 5.03 Mb with an average GC content of 53.78%, along with two plasmids, encoding a total of 4,725 gene with an average gene length of 913 bp. The bacterial pan-genome analysis highlighted the strain ATCC 8199 was clustered into a subgroup with a Pss strain CCUG 26,359 from USA, while the strain PSCN1 was clustered into another subgroup with a Ps strain NRRLB-133 from USA. These findings will serve as a useful resource for further analyses of the evolution of Ps strains and corresponding disease epidemiology worldwide.

RevDate: 2024-10-08

Cortinovis G, Vincenzi L, Anderson R, et al (2024)

Author Correction: Adaptive gene loss in the common bean pan-genome during range expansion and domestication.

Nature communications, 15(1):8715 pii:10.1038/s41467-024-52864-8.

RevDate: 2024-10-08

Liu D, Luo C, Dai R, et al (2024)

AMIR: a multi-omics data platform for Asteraceae plants genetics and breeding research.

Nucleic acids research pii:7815640 [Epub ahead of print].

As the largest family of dicotyledon, the Asteraceae family comprises a variety of economically important crops, ornamental plants and numerous medicinal herbs. Advancements in genomics and transcriptomic have revolutionized research in Asteraceae species, generating extensive omics data that necessitate an efficient platform for data integration and analysis. However, existing databases face challenges in mining genes with specific functions and supporting cross-species studies. To address these gaps, we introduce the Asteraceae Multi-omics Information Resource (AMIR; https://yanglab.hzau.edu.cn/AMIR/), a multi-omics hub for the Asteraceae plant community. AMIR integrates diverse omics data from 74 species, encompassing 132 genomes, 4 408 432 genes annotated across seven different perspectives, 3897 transcriptome sequencing samples spanning 131 organs, tissues and stimuli, 42 765 290 unique variants and 15 662 metabolites genes. Leveraging these data, AMIR establishes the first pan-genome, comparative genomics and transcriptome system for the Asteraceae family. Furthermore, AMIR offers user-friendly tools designed to facilitate extensive customized bioinformatics analyses. Two case studies demonstrate AMIR's capability to provide rapid, reproducible and reliable analysis results. In summary, by integrating multi-omics data of Asteraceae species and developing powerful analytical tools, AMIR significantly advances functional genomics research and contributes to breeding practices of Asteraceae.

RevDate: 2024-10-08

Zhang X, Zhou Y, Fu L, et al (2024)

WGS Analysis of Staphylococcus warneri Outbreak in a Neonatal Intensive Care Unit.

Infection and drug resistance, 17:4279-4289.

PURPOSE: Staphylococcus warneri is an opportunistic pathogen responsible for hospital-acquired infections (HAIs). The aim of this study was to describe an outbreak caused by S. warneri infection in a neonatal intensive care unit (NICU) and provide investigation, prevention and control strategies for this outbreak.

METHODS: We conducted an epidemiological investigation of the NICU S. warneri outbreak, involving seven neonates, staff, and environmental screening, to identify the source of infection. WGS analyses were performed on S. warneri isolates, including species identification, core genome single-nucleotide polymorphism (cgSNP) analysis, pan-genome analysis, and genetic characterization assessment of the prevalence of specific antibiotic resistance and virulence genes.

RESULTS: Eight S. warneri strains were isolated from this outbreak, with seven from neonates and one from environment. Six clinical cases within three days in 2021 were linked to one strain isolated from environmental samples; isolates varied by 0-69 SNPs and were confirmed to be from an outbreak through WGS. Multiple infection prevention measures were implemented, including comprehensive environmental disinfection and stringent protocols, and all affected neonates were transferred to the isolation wards. Following these interventions, no further cases of S. warneri infections were observed. Furthermore, pan-genome analysis results suggested that in human S. warneri may exhibit host specificity.

CONCLUSION: The investigation has revealed that the outbreak was linked to the milk preparation workbench by the WGS. It is recommended that there be a stronger focus on environmental disinfection management in order to raise awareness, improve identification, and prevention of healthcare-associated infections that are associated with the hospital environment.

RevDate: 2024-10-08

Du Y, Qian C, Li X, et al (2024)

Unveiling intraspecific diversity and evolutionary dynamics of the foodborne pathogen Bacillus paranthracis through high-quality pan-genome analysis.

Current research in food science, 9:100867.

Understanding the evolutionary dynamics of foodborne pathogens throughout host-associated habitats is of utmost importance. Bacterial pan-genomes, as dynamic entities, are strongly influenced by ecological lifestyles. As a phenotypically diverse species in the Bacillus cereus group, Bacillus paranthracis is recognized as an emerging foodborne pathogen and a probiotic simultaneously. This poorly understood species is a suitable study model for adaptive pan-genome evolution. In this study, we determined the biogeographic distribution, abundance, genetic diversity, and genotypic profiles of key genetic elements of B. paranthracis. Metagenomic read recruitment analyses demonstrated that B. paranthracis members are globally distributed and abundant in host-associated habitats. A high-quality pan-genome of B. paranthracis was subsequently constructed to analyze the evolutionary dynamics involved in ecological adaptation comprehensively. The open pan-genome indicated a flexible gene repertoire with extensive genetic diversity. Significant divergences in the phylogenetic relationships, functional enrichment, and degree of selective pressure between the different components demonstrated different evolutionary dynamics between the core and accessory genomes driven by ecological forces. Purifying selection and gene loss are the main signatures of evolutionary dynamics in B. paranthracis pan-genome. The plasticity of the accessory genome is characterized by horizontal gene transfer (HGT), massive gene losses, and weak purifying or positive selection, which might contribute to niche-specific adaptation. In contrast, although the core genome dominantly undergoes purifying selection, its association with HGT and positively selected mutations indicates its potential role in ecological diversification. Furthermore, host fitness-related dynamics are characterized by the loss of secondary metabolite biosynthesis gene clusters (BGCs) and CAZyme-encoding genes and the acquisition of antimicrobial resistance (AMR) and virulence genes via HGT. This study offers a case study of pan-genome evolution to investigate the ecological adaptations reflected by biogeographical characteristics, thereby advancing the understanding of intraspecific diversity and evolutionary dynamics of foodborne pathogens.

RevDate: 2024-10-07

Moens C, Bogaerts B, Lorente-Leal V, et al (2024)

Genomic comparison between Mycobacterium bovis and Mycobacterium microti and in silico analysis of peptide-based biomarkers for serodiagnosis.

Frontiers in veterinary science, 11:1446930.

In recent years, there has been an increase in the number of reported cases of Mycobacterium microti infection in various animals, which can interfere with the ante-mortem diagnosis of animal tuberculosis caused by Mycobacterium bovis. In this study, whole genome sequencing (WGS) was used to search for protein-coding genes to distinguish M. microti from M. bovis. In addition, the population structure of the available M. microti genomic WGS datasets is described, including three novel Belgian isolates from infections in alpacas. Candidate genes were identified by examining the presence of the regions of difference and by a pan-genome analysis of the available WGS data. A total of 80 genes showed presence-absence variation between the two species, including genes encoding Proline-Glutamate (PE), Proline-Proline-Glutamate (PPE), and Polymorphic GC-Rich Sequence (PE-PGRS) proteins involved in virulence and host interaction. Filtering based on predicted subcellular localization, sequence homology and predicted antigenicity resulted in 28 proteins out of 80 that were predicted to be potential antigens. As synthetic peptides are less costly and variable than recombinant proteins, an in silico approach was performed to identify linear and discontinuous B-cell epitopes in the selected proteins. From the 28 proteins, 157 B-cell epitope-based peptides were identified that discriminated between M. bovis and M. microti species. Although confirmation by in vitro testing is still required, these candidate synthetic peptides containing B-cell epitopes could potentially be used in serological tests to differentiate cases of M. bovis from M. microti infection, thus reducing misdiagnosis in animal tuberculosis surveillance.

RevDate: 2024-10-07

Ford MKB, Hari A, Zhou Q, et al (2024)

Biologically-informed Killer cell immunoglobulin-like receptor (KIR) gene annotation tool.

bioRxiv : the preprint server for biology pii:2024.08.13.607835.

Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This paper introduces BAKIR (Biologically-informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community.

RevDate: 2024-10-07

Logsdon GA, Ebert P, Audano PA, et al (2024)

Complex genetic variation in nearly complete human genomes.

bioRxiv : the preprint server for biology pii:2024.09.24.614721.

Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.

RevDate: 2024-10-07

Karthik K, Anbazhagan S, Priyadharshini MLM, et al (2024)

Comparative genomics of zoonotic pathogen Clostridioides difficile of animal origin to understand its diversity.

3 Biotech, 14(11):257.

UNLABELLED: Clostridioides difficile, a zoonotic pathogen causing enteric diseases in different animals and humans. A comprehensive study on the presence of toxin genes and antimicrobial resistance genes based on genome data of C. difficile in animals is scanty. In the present study, a total of 15 C. difficile isolates were recovered from dogs and isolates with toxin genes (D1, CD15 and CD26) along with two other non-toxigenic strains (CD28, CD32) were used for whole genome sequencing and comparative genomics. Sequence type-based clustering was noted in the whole genome phylogeny with 4 known multi-locus sequence typing (MLST) clades namely I, II, IV, and V and a cryptic clade. ST11 and ST54 were reported for the 2[nd] time worldwide in dogs. Out of 109 genomes used in the study, 29 genomes were predicted with all four toxin genes (toxA, toxB, cdtA, cdtB) while 22 did not have any of the toxin genes. ST11 of MLST clade V had the maximum number of 46 genomes predicted with at least one toxin gene. Among the genomes sequenced in this study, CD26 had a maximum of 5 AMR genes (aac(6')-aph(2″), ant(6)-Ia, catP, erm(B)_18, and tet(M)_11) and CD15 was predicted with 2 AMR genes (aac(6')-aph(2″), erm(B)_18). Tetracycline resistance genes were predicted most in the ST11 genome. Of the 22 non-toxigenic strains, 9 genomes (ST48 = 5, ST3 = 2, ST109 = 1, ST15 = 1) were predicted with a minimum of one AMR gene. Pangenome analysis indicated that the Bpan value is 0.12 showing that C. difficile has an open pangenome structure. This indicates that the organism can evolve by the addition of new genes. This study reports the circulation of clinically important ST11 and multidrug-resistant non-toxigenic strains among animals.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-024-04102-7.

RevDate: 2024-10-04

Liu J, Shi Y, Mo D, et al (2024)

The goat pan-genome reveals patterns of gene loss during domestication.

Journal of animal science and biotechnology, 15(1):132.

BACKGROUND: Unveiling genetic diversity features and understanding the genetic mechanisms of diverse goat phenotypes are pivotal in facilitating the preservation and utilization of these genetic resources. However, the total genetic diversity within a species can't be captured by the reference genome of a single individual. The pan-genome is a collection of all the DNA sequences that occur in a species, and it is expected to capture the total genomic diversity of the specific species.

RESULTS: We constructed a goat pan-genome using map-to-pan assemble based on 813 individuals, including 723 domestic goats and 90 samples from their wild relatives, which presented a broad regional and global representation. In total, 146 Mb sequences and 974 genes were identified as absent from the reference genome (ARS1.2; GCF_001704415.2). We identified 3,190 novel single nucleotide polymorphisms (SNPs) using the pan-genome analysis. These novel SNPs could properly reveal the population structure of domestic goats and their wild relatives. Presence/absence variation (PAV) analysis revealed gene loss and intense negative selection during domestication and improvement.

CONCLUSIONS: Our research highlights the importance of the goat pan-genome in capturing the missing genetic variations. It reveals the changes in genomic architecture during goat domestication and improvement, such as gene loss. This improves our understanding of the evolutionary and breeding history of goats.

RevDate: 2024-10-04
CmpDate: 2024-10-04

Mejía-Limones I, Andrade-Molina D, Morey-León G, et al (2024)

Whole-genome sequencing of Klebsiella pneumoniae MDR circulating in a pediatric hospital setting: a comprehensive genome analysis of isolates from Guayaquil, Ecuador.

BMC genomics, 25(1):928.

BACKGROUND: Klebsiella pneumoniae is the major cause of nosocomial infections worldwide and is related to a worsening increase in Multidrug-Resistant Bacteria (MDR) and virulence genes that seriously affect immunosuppressed patients, long-stay intensive care patients, elderly individuals, and children. Whole-Genome Sequencing (WGS) has resulted in a useful strategy for characterizing the genomic components of clinically important bacteria, such as K. pneumoniae, enabling them to monitor genetic changes and understand transmission, highlighting the risk of dissemination of resistance and virulence associated genes in hospitals. In this study, we report on WGS 14 clinical isolates of K. pneumoniae from a pediatric hospital biobank of Guayaquil, Ecuador.

RESULTS: The main findings revealed pronounced genetic heterogeneity among the isolates. Multilocus sequencing type ST45 was the predominant lineage among non-KPC isolates, whereas ST629 was found more frequently among KPC isolates. Phylogenetic analysis suggested local transmission dynamics. Comparative genomic analysis revealed a core set of 3511 conserved genes and an open pangenome in neonatal isolates. The diversity of MLSTs and capsular types, and the high genetic diversity among these isolates indicate high intraspecific variability. In terms of virulence factors, we identified genes associated with adherence, biofilm formation, immune evasion, secretion systems, multidrug efflux pump transporters, and a notably high number of genes related to iron uptake. A large number of these genes were detected in the ST45 isolate, whereas iron uptake yersiniabactin genes were found exclusively in the non-KPC isolates. We observed high resistance to commonly used antibiotics and determined that these isolates exhibited multidrug resistance including β-lactams, aminoglycosides, fluoroquinolones, quinolones, trimetropins, fosfomycin and macrolides; additionally, resistance-associated point mutations and cross-resistance genes were identified in all the isolates. We also report the first K. pneumoniae KPC-3 gene producers in Ecuador.

CONCLUSIONS: Our WGS results for clinical isolates highlight the importance of MDR in neonatal K. pneumoniae infections and their genetic diversity. WGS will be an imperative strategy for the surveillance of K. pneumoniae in Ecuador, and will contribute to identifying effective treatment strategies for K. pneumoniae infections in critical units in patients at stratified risk.

RevDate: 2024-10-04
CmpDate: 2024-10-04

Nagy N, P Hodor (2024)

Chromosomal gene order defines several structural classes of Staphylococcus epidermidis genomes.

PloS one, 19(10):e0311520 pii:PONE-D-23-36569.

The original methodology for describing the pangenome of a prokaryotic species is based on modeling genomes as unordered sets of genes. More recent findings have underlined the importance of considering the ordering of genes along the genetic material as well, when making comparisons among genomes. To further investigate the benefits of gene order when describing genomes of a given species, we applied two distance metrics on a dataset of 84 genomes of Staphylococcus epidermidis. The first metric, GeLev, depends on the order of genes and is a derivative of the Levenshtein distance. The second, the Jaccard distance, depends on gene sets only. The application of these distances reveals information about the global structure of the genomes, and allows clustering of the genomes into classes. The main biological result is that, while genomes within the same class are structurally similar, genomes of different classes have an additional characteristic. Between genomes in different classes we can discover instances where a large segment of the first genome appears in reverse order in the second. This feature suggests that genome rearrangements in S. epidermidis happen on a large scale, while micro-rearrangements of single or a small number of genes are rare. Thus, this paper describes a straight-forward method to classify genomes into structural classes with the same order of genes and makes it possible to visualize reversed segments in pairs of genomes. The method can be readily applied to other species.

RevDate: 2024-10-04

Neal M, Brakewood W, Betenbaugh M, et al (2024)

Pan-genome-scale metabolic modeling of Bacillus subtilis reveals functionally distinct groups.

mSystems [Epub ahead of print].

UNLABELLED: Bacillus subtilis is an important industrial and environmental microorganism known to occupy many niches and produce many compounds of interest. Although it is one of the best-studied organisms, much of this focus including the reconstruction of genome-scale metabolic models has been placed on a few key laboratory strains. Here, we substantially expand these prior models to pan-genome-scale, representing 481 genomes of B. subtilis with 2,315 orthologous gene clusters, 1,874 metabolites, and 2,239 reactions. Furthermore, we incorporate data from carbon utilization experiments for eight strains to refine and validate its metabolic predictions. This comprehensive pan-genome model enables the assessment of strain-to-strain differences related to nutrient utilization, fermentation outputs, robustness, and other metabolic aspects. Using the model and phenotypic predictions, we divide B. subtilis strains into five groups with distinct patterns of behavior that correlate across these features. The pan-genome model offers deep insights into B. subtilis' metabolism as it varies across environments and provides an understanding as to how different strains have adapted to dynamic habitats.

IMPORTANCE: As the volume of genomic data and computational power have increased, so has the number of genome-scale metabolic models. These models encapsulate the totality of metabolic functions for a given organism. Bacillus subtilis strain 168 is one of the first bacteria for which a metabolic network was reconstructed. Since then, several updated reconstructions have been generated for this model microorganism. Here, we expand the metabolic model for a single strain into a pan-genome-scale model, which consists of individual models for 481 B. subtilis strains. By evaluating differences between these strains, we identified five distinct groups of strains, allowing for the rapid classification of any particular strain. Furthermore, this classification into five groups aids the rapid identification of suitable strains for any application.

RevDate: 2024-10-04

Ajesh BR, Sariga R, Nakkeeran S, et al (2024)

Insights on mining the pangenome of Sphingobacterium thalpophilum NMS02 S296 from the resistant banana cultivar Pisang lilin confirms the antifungal action against Fusarium oxysporum f. sp. cubense.

Frontiers in microbiology, 15:1443195.

INTRODUCTION: Fusarium wilt, caused by Fusarium oxysporum f. sp. cubense (Foc), poses a significant global threat to banana cultivation. Conventional methods of disease management are increasingly challenged, thus making it necessary to explore alternative strategies. Bacterial endophytes, particularly from resistant genotypes, are gaining attention as potential biocontrol agents. Sphingobacterium thalpophilum, isolated from the resistant banana cultivar Pisang lilin (JALHSB010000001-JALHSB010000029), presents an intriguing prospect for combating Fusarium wilt. However, its underlying biocontrol mechanisms remain poorly understood. This study aimed to elucidate the antifungal efficacy of S. thalpophilum NMS02 S296 against Foc and explore its biocontrol mechanisms at the genomic level.

METHODS: Whole genome sequencing of S. thalpophilum NMS02 S296 was conducted using next-generation sequencing technologies and bioinformatics analyses were performed to identify genes associated with antifungal properties. In vitro assays were used to assess the inhibitory effects of the bacterial isolate on the mycelial growth of Foc. To explore the biomolecules responsible for the observed antagonistic activity, metabolites diffused into the agar at the zone of inhibition between Foc S16 and S. thalpophilum NMS02 S296 were extracted and identified.

RESULTS: Whole genome sequencing revealed an array of genes encoding antifungal enzymes and secondary metabolites in S. thalpophilum NMS02 S296. In vitro experiments demonstrated significant inhibition of Foc mycelial growth by the bacterial endophyte. Comparative genomic analysis highlighted unique genomic features in S. thalpophilum linked to its biocontrol potential, setting it apart from other bacterial species.

DISCUSSION: The study underscores the remarkable antifungal efficacy of S. thalpophilum NMS02 S296 against Fusarium wilt. The genetic basis for its biocontrol potential was elucidated through whole genome sequencing, shedding light on the mechanisms behind its antifungal activity. This study advanced our understanding of bacterial endophytes as biocontrol agents and offers a promising avenue for plant growth promotion towards sustainable strategies to mitigate Fusarium wilt in banana cultivation.

RevDate: 2024-10-03

Vogel NA, Rubin JD, Pedersen AG, et al (2024)

soibean: High-resolution Taxonomic Identification of Ancient Environmental DNA Using Mitochondrial Pangenome Graphs.

Molecular biology and evolution pii:7809583 [Epub ahead of print].

Ancient environmental DNA (aeDNA) is becoming a powerful tool to gain insights about past ecosystems, overcoming the limitations of conventional fossil records. However, several methodological challenges remain, particularly for classifying the DNA to species level and conducting phylogenetic analysis. Current methods, primarily tailored for modern datasets, fail to capture several idiosyncrasies of aeDNA, including species mixtures from closely related species and ancestral divergence. We introduce soibean, a novel tool that utilises mitochondrial pangenomic graphs for identifying species from aeDNA reads. It outperforms existing methods in accurately identifying species from multiple closely related sources within a sample, enhancing phylogenetic analysis for aeDNA. soibean employs a damage-aware likelihood model for precise identification at low coverage with a high damage rate. Additionally, we reconstructed ancestral sequences for soibean's database to handle aeDNA that is highly diverged from modern references. soibean demonstrates effectiveness through simulated data tests and empirical validation. Notably, our method uncovered new empirical results in published datasets, including using porpoise whales as food in a Mesolithic community in Sweden, demonstrating its potential to reveal previously unrecognised findings in aeDNA studies.

RevDate: 2024-10-01

Shoer S, Reicher L, Zhao C, et al (2024)

Pangenomes of human gut microbiota uncover links between genetic diversity and stress response.

Cell host & microbe pii:S1931-3128(24)00324-X [Epub ahead of print].

The genetic diversity of the gut microbiota has a central role in host health. Here, we created pangenomes for 728 human gut prokaryotic species, quadrupling the genes of strain-specific genomes. Each of these species has a core set of a thousand genes, differing even between closely related species, and an accessory set of genes unique to the different strains. Functional analysis shows high strain variability associates with sporulation, whereas low variability is linked with antibiotic resistance. We further map the antibiotic resistome across the human gut population and find 237 cases of extreme resistance even to last-resort antibiotics, with a predominance among Enterobacteriaceae. Lastly, the presence of specific genes in the microbiota relates to host age and sex. Our study underscores the genetic complexity of the human gut microbiota, emphasizing its significant implications for host health. The pangenomes and antibiotic resistance map constitute a valuable resource for further research.

RevDate: 2024-10-01

Li Q, Yang J, Wang M, et al (2024)

Global distribution and genomic characteristics analysis of avian-derived mcr-1-positive Escherichia coli.

Ecotoxicology and environmental safety, 285:117109 pii:S0147-6513(24)01185-0 [Epub ahead of print].

The prevalence of avian-derived Escherichia coli (E. coli) carrying mcr-1 poses a significant threat to the development of the poultry industry and public health safety. Despite ongoing in-depth epidemiological research worldwide, a comprehensive macroscopic study based on genomics is still lacking. In response, this study collected 1104 genomic sequences of avian-derived mcr-1-positive E. coli (MCRPEC) from the NCBI public database, covering 31 countries. The majority of sequences originated from China (48.82 %), followed by the Netherlands (10.41 %). In terms of avian hosts, chicken accounted for the largest proportion (44.11 %), followed by gallus (24.09 %). Avian-derived MCRPEC also serves as a reservoir for other antibiotic resistance genes (ARGs), with 179 ARGs coexisting with mcr-1 identified. A total of 206 virulence-associated genes were also identified, revealing the pathogenic risks of MCRPEC. Pan-genome analysis revealed that avian-derived MCRPEC from different hosts, countries of origin, and serotypes exhibit minor SNP differences, indicating a high risk of cross-regional and cross-host transmission. The ST types of MCRPRC are diverse, with ST10 being the most prevalent (n=70). Spearman analysis showed a significant correlation between the number of ARGs and the insertion sequences (ISs) as well as plasmid replicon in ST10 strains. Furthermore, ST10 strains share a similar genetic basis with human-derived MCRPEC, suggesting the possibility of clonal dissemination. Pan-genome-wide association studies (pan-GWAS) indicated that the differential genes of MCRPEC from different countries and host sources are significantly different, mainly related to genes encoding type IV secretion systems and mobile genetic elements (MGEs). Plasmid mapping of showed that the prevalent plasmid types vary by country and host, with IncI2 and IncX4 being the main mcr-1-positive plasmids. Among the 12 identified mcr-1 genetic contexts with ISs, the Tn6330 transposon was the predominant carrier of mcr-1. In summary, the potential threat of avian-derived MCRPEC cannot be ignored, and long-term and comprehensive monitoring are essential.

RevDate: 2024-10-01

Ling X, Gu X, Shen Y, et al (2024)

Comparative genomic analysis of Acanthamoeba from different sources and horizontal transfer events of antimicrobial resistance genes.

mSphere [Epub ahead of print].

UNLABELLED: Acanthamoeba species are among the most common free-living amoeba and ubiquitous protozoa, mainly distributed in water and soil, and cause Acanthamoeba keratitis (AK) and severe visual impairment in patients. Although several studies have reported genomic characteristics of Acanthamoeba, limited sample sizes and sources have resulted in an incomplete understanding of the genetic diversity of Acanthamoeba from different sources. While endosymbionts exert a significant influence on the phenotypes of Acanthamoeba, including pathogenicity, virulence, and drug resistance, the species diversity and functional characterization remain largely unexplored. Herein, our study sequenced and analyzed the whole genomes of 19 Acanthamoeba pathogenic strains that cause AK, and by integrating publicly available genomes, we sampled 29 Acanthamoeba strains from ocular, environmental, and other sources. Combined pan-genomic and comparative functional analyses revealed genetic differences and evolutionary relationships among the different sources of Acanthamoeba, as well as classification into multiple functional groups, with ocular isolates in particular showing significant differences that may account for differences in pathogenicity. Phylogenetic and rhizome gene mosaic analyses of ocular Acanthamoeba strains suggested that genomic exchanges between Acanthamoeba and endosymbionts, particularly potential antimicrobial resistance genes trafficking including the adeF, amrA, and amrB genes exchange events, potentially contribute to Acanthamoeba drug resistance. In conclusion, this study elucidated the adaptation of Acanthamoeba to different ecological niches and the influence of gene exchange on the evolution of ocular Acanthamoeba genome, guiding the clinical diagnosis and treatment of AK and laying a theoretical groundwork for developing novel therapeutic approaches.

IMPORTANCE: Acanthamoeba causes a serious blinding keratopathy, Acanthamoeba keratitis, which is currently under-recognized by clinicians. In this study, we analyzed 48 strains of Acanthamoeba using a whole-genome approach, revealing differences in pathogenicity and function between strains of different origins. Horizontal transfer events of antimicrobial resistance genes can help provide guidance as potential biomarkers for the treatment of specific Acanthamoeba keratitis cases.

RevDate: 2024-10-02

Che J, Lai C, Lai G, et al (2024)

Complete genome sequence analysis and Pks genes identification of Brevibacillus brevis FJAT-0809-GLX with a broad inhibitory spectrum against phytopathogens.

World journal of microbiology & biotechnology, 40(11):332.

Brevibacillus brevis FJAT-0809-GLX has a broad spectrum of antimicrobial activity. Understanding the molecular basis of biocontrol ability of B. brevis will allow us to develop effective microbial agents for sustainable agriculture. In this study, we present the complete and annotated genome sequence of FJAT-0809-GLX. The complete genome size of B. brevis FJAT-0809-GLX was 6,137,019 bp, with 5688 predicted coding sequences (CDS). The average GC content of 47.38%, and there were 44 copies of the rRNAs operon (16S, 23S and 5S RNA), and 127 tRNA genes. A total of 11,162 genes were functionally annotated with the COG, GO, and KEGG databases, and 123 genes belonged to CAZymes. Genomic secondary metabolite analysis indicated 13 clusters encoding potential new antimicrobials. FJAT-0809-GLX was designated as B. brevis according to average nucleotide polymorphism (ANI) and phylogenetic analysis. The pangenome consisted of 7141 homologous genes, and 4469 homologous genes shared by B. brevis FJAT-0809-GLX, B. brevis NBRC100599, B. brevis DSM30, and B. brevis NCTC2611. The number of unique homologous genes of B. brevis FJAT-0809-GLX (419 genes) and B. brevis NBRC100599 (480 genes) were much more than those in B. brevis DSM30 (13 genes), and B. brevis NCTC2611 (6 genes). Nine gene clusters encoding for secondary metabolite biosynthesis were compared in the genome of B. brevis FJAT-0809-GLX with those of B. brevis NBRC100599, B. brevis DSM30 and B. brevis NCTC2611, and the gene clusters encoding for lantipeptide and transatpks-otherks only existed in genome of B. brevis FJAT-0809-GLX. The 11 BbPks genes were included in the B. brevis FJAT-0809-GLX genome, which contained the conserved PS-DH domain. The relative expression of BbPksL, BbPksM2, BbPksM3, BbPksN3, BbPksN4 and BbPksN5 reached a maximum at 120 h and then decreased at 144 h. Our results provided detailed genomic and Pks genes information for the FJAT-0809-GLX strain, and lid a foundation for studying its biocontrol mechanisms.

RevDate: 2024-10-02

Tong Z, Huang Y, Zhu QH, et al (2024)

Retrospect and prospect of Nicotiana tabacum genome sequencing.

Frontiers in plant science, 15:1474658.

Investigating plant genomes offers crucial foundational resources for exploring various aspects of plant biology and applications, such as functional genomics and breeding practices. With the development in sequencing and assembly technology, several Nicotiana tabacum genomes have been published. In this paper, we reviewed the progress on N. tabacum genome assembly and quality, from the initial draft genomes to the recent high-quality chromosome-level assemblies. The application of long-read sequencing, optical mapping, and Hi-C technologies has significantly improved the contiguity and completeness of N. tabacum genome assemblies, with the latest assemblies having a contig N50 size over 50 Mb. Despite these advancements, further improvements are still required and possible, particularly on the development of pan-genome and telomere-to-telomere (T2T) genomes. These new genomes will capture the genomic diversity and variations among different N. tabacum cultivars and species, and provide a comprehensive view of the N. tabacum genome structure and gene content, so to deepen our understanding of the N. tabacum genome and facilitate precise breeding and functional genomics.

RevDate: 2024-09-30

Kalbfleisch TS, Smith ML, Ciosek JL, et al (2024)

Three decades of rat genomics: approaching the finish(ed) line.

Physiological genomics [Epub ahead of print].

The rat, Rattus norvegicus, has provided an important model for investigation of a range of characteristics of biomedical importance. Here we survey the origins of this species, its introduction into laboratory research and the emergence of genetic and genomic methods that utilize this model organism. Genomic studies have yielded important progress and provided new insight into several biologically important traits. However, some studies have been impeded by the lack of a complete and accurate reference genome for this species. New sequencing and genome assembly methods applied to the rat have resulted in a new reference genome assembly, GRCr8, which is a near telomere-to-telomere assembly of high base level accuracy that incorporates several elements not captured in prior assemblies. As genome assembly methods continue to advance and production costs become a less significant obstacle, genome assemblies for multiple inbred rat strains are emerging. These assemblies will allow a rat pangenome assembly to be constructed which captures all the genetic variation in strains selected for their utility in research and will overcome reference bias, a limitation associated with reliance on a single reference assembly. By this means, the full utility of this model organism to genomic studies will begin to be revealed.

RevDate: 2024-09-30

Mastoras M, Asri M, Brambrink L, et al (2024)

Highly accurate assembly polishing with DeepPolisher.

bioRxiv : the preprint server for biology pii:2024.09.17.613505.

Accurate genome assemblies are essential for biological research, but even the highest quality assemblies retain errors caused by the technologies used to construct them. Base-level errors are typically fixed with an additional polishing step that uses reads aligned to the draft assembly to identify necessary edits. However, current methods struggle to find a balance between over- and under-polishing. Here, we present an encoder-only transformer model for assembly polishing called DeepPolisher, which predicts corrections to the underlying sequence using Pacbio HiFi read alignments to a diploid assembly. Our pipeline introduces a method, PHARAOH (Phasing Reads in Areas Of Homozygosity), which uses ultra-long ONT data to ensure alignments are accurately phased and to correctly introduce heterozygous edits in falsely homozygous regions. We demonstrate that the DeepPolisher pipeline can reduce assembly errors by half, with a greater than 70% reduction in indel errors. We have applied our DeepPolisher-based pipeline to 180 assemblies from the next Human Pangenome Reference Consortium (HPRC) data release, producing an average predicted Quality Value (QV) improvement of 3.4 (54% error reduction) for the majority of the genome.

RevDate: 2024-09-30

Xu A, Lu L, Zhang W, et al (2024)

Microevolution of Bartonella grahamii driven by geographic and host factors.

mSystems [Epub ahead of print].

UNLABELLED: Bartonella grahamii is one of the most prevalent Bartonella species in wild rodents and has been associated with human cases of neuroretinitis. The structure and distribution of genomic diversity in natural B. grahamii is largely unexplored. Here, we have applied a comprehensive population genomic and phylogenomic analysis to 172 strains of B. grahamii to unravel the genetic differences and influencing factors that shape its populations. The findings reveal a remarkable genomic diversity within the species, primarily in the form of single- nucleotide polymorphisms. The open pangenome of B. grahamii indicates a dynamic genomic evolution influenced by its ecological niche. Whole-genome data allowed us to decompose B. grahamii diversity into six phylogroups, each characterized by a unique "mosaic pattern" of hosts and biogeographic regions. This suggests a complex interplay between host specificity and biogeography. In addition, our study suggests a possible origin of European strains from Asian lineages, and host factors have a more significant impact on the genetic differentiation of B. grahamii than geographical factors. These insights contribute to understanding the evolutionary history of this pathogen and provide a foundation for future epidemiological research and public health strategies.

IMPORTANCE: Bartonella grahamii has been reported worldwide and shown to infect humans. Up to now, an effective transmission route of B. grahamii to humans has not been confirmed. The genetic evolution of B. grahamii and the relationship between B. grahamii and its host need to be further studied. The factors driving the genetic diversity of B. grahamii are still controversial. The results showed that the European isolates shared a common ancestor with the Chinese isolates. Host factors were shown to play an important role in driving the genetic diversity of B. grahamii. When host factors were fixed, geographic barriers drove B. grahamii microevolution. Our study emphasizes the importance of characterizing isolate genomes derived from hosts and geographical locations and provides a new reference for the origin of B. grahamii.

RevDate: 2024-09-29
CmpDate: 2024-09-29

Zhao Z, Zhu Z, Jiao Y, et al (2024)

Pan-genome analysis of GT64 gene family and expression response to Verticillium wilt in cotton.

BMC plant biology, 24(1):893.

BACKGROUND: The GT64 subfamily, belonging to the glycosyltransferase family, plays a critical function in plant adaptation to stress conditions and the modulation of plant growth, development, and organogenesis processes. However, a comprehensive identification and systematic analysis of GT64 in cotton are still lacking.

RESULTS: This study used bioinformatics techniques to conduct a detailed investigation on the GT64 gene family members of eight cotton species for the first time. A total of 39 GT64 genes were detected, which could be classified into five subfamilies according to the phylogenetic tree. Among them, six genes were found in upland cotton. Furthermore, investigated the precise chromosomal positions of these genes and visually represented their gene structure details. Moreover, forecasted cis-regulatory elements in GhGT64s and ascertained the duplication type of the GT64 in the eight cotton species. Evaluation of the Ka/Ks ratio for similar gene pairs among the eight cotton species provided insights into the selective pressures acting on these homologous genes. Additionally, analyzed the expression profiles of the GT64 gene family. Overexpressing GhGT64_4 in tobacco improved its disease resistance. Subsequently, VIGS experiments conducted in cotton demonstrated reduced disease resistance upon silencing of the GhGT64_4, may indicate its involvement in affecting lignin and jasmonic acid biosynthesis pathways, thus impacting cotton resistance. Weighted Gene Co-expression Network Analysis (WGCNA) revealed an early immune response against Verticillium dahliae in G. barbadense compared to G. hirsutum. Quantitative Reverse Transcription Polymerase Chain Reaction (qRT-PCR) analysis indicated that some GT64 genes might play a role under various biotic and abiotic stress conditions.

CONCLUSIONS: These discoveries enhance our knowledge of GT64 family members and lay the groundwork for future investigations into the disease resistance mechanisms of this gene in cotton.

RevDate: 2024-09-28

Naqvi M, Utheim TP, C Charnock (2024)

Whole genome sequencing and characterization of Corynebacterium isolated from the healthy and dry eye ocular surface.

BMC microbiology, 24(1):368.

BACKGROUND: The purpose of this study was to characterize Corynebacterium isolated from the ocular surface of dry eye disease patients and healthy controls. We aimed to investigate the pathogenic potential of these isolates in relation to ocular surface health. To this end, we performed whole genome sequencing in combination with biochemical, enzymatic, and antibiotic susceptibility tests. In addition, we employed deferred growth inhibition assays to examine how Corynebacterium isolates may impact the growth of potentially competing microorganisms including the ocular pathogens Pseudomonas aeruginosa and Staphylococcus aureus, as well as other Corynebacterium present on the eye.

RESULTS: The 23 isolates were found to belong to 8 different species of Corynebacterium with genomes ranging from 2.12 mega base pairs in a novel Corynebacterium sp. to 2.65 mega base pairs in C. bovis. Whole genome sequencing revealed the presence of a range of antimicrobial targets present in all isolates. Pangenome analysis showed the presence of 516 core genes and that the pangenome is open. Phenotypic characterization showed variously urease, lipase, mucinase, protease and DNase activity in some isolates. Attention was particularly drawn to a potentially new or novel Corynebacterium species which had the smallest genome, and which produced a range of hydrolytic enzymes. Strikingly the isolate inhibited in vitro the growth of a range of possible pathogenic bacteria as well as other Corynebacterium isolates. The majority of Corynebacterium species included in this study did not seem to possess canonical pathogenic activity.

CONCLUSIONS: This study is the first reported genomic and biochemical characterization of ocular Corynebacterium. A number of potential virulence factors were identified which may have direct relevance for ocular health and contribute to the finding of our previous report on the ocular microbiome, where it was shown that DNA libraries were often dominated by members of this genus. Particularly interesting in this regard was the observation that some Corynebacterium, particularly new or novel Corynebacterium sp. can inhibit the growth of other ocular Corynebacterium as well as known pathogens of the eye.

RevDate: 2024-09-28
CmpDate: 2024-09-28

Heuberger M, Bernasconi Z, Said M, et al (2024)

Analysis of a global wheat panel reveals a highly diverse introgression landscape and provides evidence for inter-homoeologue chromosomal recombination.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik, 137(10):236.

This study highlights the agronomic potential of rare introgressions, as demonstrated by a major QTL for powdery mildew resistance on chromosome 7D. It further shows evidence for inter-homoeologue recombination in wheat. Agriculturally important genes are often introgressed into crops from closely related donor species or landraces. The gene pool of hexaploid bread wheat (Triticum aestivum) is known to contain numerous such "alien" introgressions. Recently established high-quality reference genome sequences allow prediction of the size, frequency and identity of introgressed chromosome regions. Here, we characterise chromosomal introgressions in bread wheat using exome capture data from the WHEALBI collection. We identified 24,981 putative introgression segments of at least 2 Mb across 434 wheat accessions. Detailed study of the most frequent introgressions identified T. timopheevii or its close relatives as a frequent donor species. Importantly, 118 introgressions of at least 10 Mb were exclusive to single wheat accessions, revealing that large populations need to be studied to assess the total diversity of the wheat pangenome. In one case, a 14 Mb introgression in chromosome 7D, exclusive to cultivar Pamukale, was shown by QTL mapping to harbour a recessive powdery mildew resistance gene. We identified multiple events where distal chromosomal segments of one subgenome were duplicated in the genome and replaced the homoeologous segment in another subgenome. We propose that these examples are the results of inter-homoeologue recombination. Our study produced an extensive catalogue of the wheat introgression landscape, providing a resource for wheat breeding. Of note, the finding that the wheat gene pool contains numerous rare, but potentially important introgressions and chromosomal rearrangements has implications for future breeding.

RevDate: 2024-09-28

da Silva MERJ, Breyer GM, da Costa MM, et al (2024)

Genomic Analyses of Methicillin-Susceptible and Methicillin-Resistant Staphylococcus pseudintermedius Strains Involved in Canine Infections: A Comprehensive Genotypic Characterization.

Pathogens (Basel, Switzerland), 13(9): pii:pathogens13090760.

Staphylococcus pseudintermedius is frequently associated with several bacterial infections in dogs, highlighting a One Health concern due to the zoonotic potential. Given the clinical significance of this pathogen, we performed comprehensive genomic analyses of 28 S. pseudintermedius strains isolated from canine infections throughout whole-genome sequencing using Illumina HiSeq, and compared the genetic features between S. pseudintermedius methicillin-resistant (MRSP) and methicillin-susceptible (MSSP) strains. Our analyses determined that MRSP genomes are larger than MSSP strains, with significant changes in antimicrobial resistance genes and virulent markers, suggesting differences in the pathogenicity of MRSP and MSSP strains. In addition, the pangenome analysis of S. pseudintermedius from canine and human origins identified core and accessory genomes with 1847 and 3037 genes, respectively, which indicates that most of the S. pseudintermedius genome is highly variable. Furthermore, phylogenomic analysis clearly separated MRSP from MSSP strains, despite their infection sites, showing phylogenetic differences according to methicillin susceptibility. Altogether our findings underscore the importance of studying the evolutionary dynamics of S. pseudintermedius, which is crucial for the development of effective prevention and control strategies of resistant S. pseudintermedius infections.

RevDate: 2024-09-28

García-Rivera C, Molina-Pardines C, Haro-Moreno JM, et al (2024)

Genomic Analysis of Antimicrobial Resistance in Pseudomonas aeruginosa from a "One Health" Perspective.

Microorganisms, 12(9): pii:microorganisms12091770.

The "One Health" approach provides a comprehensive framework for understanding antimicrobial resistance. This perspective is of particular importance in the study of Pseudomonas aeruginosa, as it is not only a pathogen that affects humans but also persists in environmental reservoirs. To assess evolutionary selection for niche-specific traits, a genomic comparison of 749 P. aeruginosa strains from three environments (clinical, aquatic, and soil) was performed. The results showed that the environment does indeed exert selective pressure on specific traits. The high percentage of persistent genome, the lack of correlation between phylogeny and origin of the isolate, and the high intrinsic resistance indicate that the species has a high potential for pathogenicity and resistance, regardless of the reservoir. The flexible genome showed an enrichment of metal resistance genes, which could act as a co-selection of antibiotic resistance genes. In the plasmids, resistance genes were found in multigenic clusters, with the presence of a mobile integron being prominent. This integron was identified in several pathogenic strains belonging to distantly related taxa with a worldwide distribution, showing the risk of rapid evolution of resistance. These results provide a more complete understanding of the evolution of P. aeruginosa, which could help develop new prevention strategies.

RevDate: 2024-09-28

Hua L, Ye P, Li X, et al (2024)

Anti-Aflatoxigenic Burkholderia contaminans BC11-1 Exhibits Mycotoxin Detoxification, Phosphate Solubilization, and Cytokinin Production.

Microorganisms, 12(9): pii:microorganisms12091754.

The productivity and quality of agricultural crops worldwide are adversely affected by disease outbreaks and inadequate nutrient availability. Of particular concern is the potential increase in mycotoxin prevalence due to crop diseases, which poses a threat to food security. Microorganisms with multiple functions have been favored in sustainable agriculture to address such challenges. Aspergillus flavus is a prevalent aflatoxin B1 (AFB1)-producing fungus in China. Therefore, we wanted to obtain an anti-aflatoxigenic bacterium with potent mycotoxin detoxification ability and other beneficial properties. In the present study, we have isolated an anti-aflatoxigenic strain, BC11-1, of Burkholderia contaminans, from a forest rhizosphere soil sample obtained in Luzhou, Sichuan Province, China. We found that it possesses several beneficial properties, as follows: (1) a broad spectrum of antifungal activity but compatibility with Trichoderma species, which are themselves used as biocontrol agents, making it possible to use in a biocontrol mixture or individually with other biocontrol agents in an integrated management approach; (2) an exhibited mycotoxin detoxification capacity with a degradation ratio of 90% for aflatoxin B1 and 78% for zearalenone, suggesting its potential for remedial application; and (3) a high ability to solubilize phosphorus and produce cytokinin production, highlighting its potential as a biofertilizer. Overall, the diverse properties of BC11-1 render it a beneficial bacterium with excellent potential for use in plant disease protection and mycotoxin prevention and as a biofertilizer. Lastly, a pan-genomic analysis suggests that BC11-1 may possess other undiscovered biological properties, prompting further exploration of the properties of this unique strain of B. contaminans. These findings highlight the potential of using the anti-aflatoxigenic strain BC11-1 to enhance disease protection and improve soil fertility, thus contributing to food security. Given its multiple beneficial properties, BC11-1 represents a valuable microbial resource as a biocontrol agent and biofertilizer.

RevDate: 2024-09-28
CmpDate: 2024-09-28

Cai K, Song X, Yue W, et al (2024)

Identification and Functional Characterization of Abiotic Stress Tolerance-Related PLATZ Transcription Factor Family in Barley (Hordeum vulgare L.).

International journal of molecular sciences, 25(18): pii:ijms251810191.

Plant AT-rich sequence and zinc-binding proteins (PLATZs) are a novel category of plant-specific transcription factors involved in growth, development, and abiotic stress responses. However, the PLATZ gene family has not been identified in barley. In this study, a total of 11 HvPLATZs were identified in barley, and they were unevenly distributed on five of the seven chromosomes. The phylogenetic tree, incorporating PLATZs from Arabidopsis, rice, maize, wheat, and barley, could be classified into six clusters, in which HvPLATZs are absent in Cluster VI. HvPLATZs exhibited conserved motif arrangements with a characteristic PLATZ domain. Two segmental duplication events were observed among HvPLATZs. All HvPLATZs were core genes present in 20 genotypes of the barley pan-genome. The HvPLATZ5 coding sequences were conserved among 20 barley genotypes, whereas HvPLATZ4/9/10 exhibited synonymous single nucleotide polymorphisms (SNPs); the remaining ones showed nonsynonymous variations. The expression of HvPLATZ2/3/8 was ubiquitous in various tissues, whereas HvPLATZ7 appeared transcriptionally silent; the remaining genes displayed tissue-specific expression. The expression of HvPLATZs was modulated by salt stress, potassium deficiency, and osmotic stress, with response patterns being time-, tissue-, and stress type-dependent. The heterologous expression of HvPLATZ3/5/6/8/9/10/11 in yeast enhanced tolerance to salt and osmotic stress, whereas the expression of HvPLATZ2 compromised tolerance. These results advance our comprehension and facilitate further functional characterization of HvPLATZs.

RevDate: 2024-09-28
CmpDate: 2024-09-28

Bouras N, Bakli M, Dif G, et al (2024)

The Phylogenomic Characterization of Planotetraspora Species and Their Cellulases for Biotechnological Applications.

Genes, 15(9): pii:genes15091202.

This study aims to evaluate the in silico genomic characteristics of five species of the genus Planotetraspora: P. kaengkrachanensis, P. mira, P. phitsanulokensis, P. silvatica, and P. thailandica, with a view to their application in therapeutic research. The 16S rRNA comparison indicated that these species were phylogenetically distinct. Pairwise comparisons of digital DNA-DNA hybridization (dDDH) and OrthoANI values between these studied type strains indicated that dDDH values were below 62.5%, while OrthoANI values were lower than 95.3%, suggesting that the five species represent distinct genomospecies. These results were consistent with the phylogenomic study based on core genes and the pangenome analysis of these five species within the genus Planotetraspora. However, the genome annotation showed some differences between these species, such as variations in the number of subsystem category distributions across whole genomes (ranging between 1979 and 2024). Additionally, the number of CAZYme (Carbohydrate-Active enZYme) genes ranged between 298 and 325, highlighting the potential of these bacteria for therapeutic research applications. The in silico physico-chemical characteristics of cellulases from Planotetraspora species were analyzed. Their 3D structure was modeled, refined, and validated. A molecular docking analysis of this cellulase protein structural model was conducted with cellobiose, cellotetraose, laminaribiose, carboxymethyl cellulose, glucose, and xylose ligand. Our study revealed significant interaction between the Planotetraspora cellulase and cellotetraose substrate, evidenced by stable binding energies. This suggests that this bacterial enzyme holds great potential for utilizing cellotetraose as a substrate in various applications. This study enriches our understanding of the potential applications of Planotetraspora species in therapeutic research.

RevDate: 2024-09-27
CmpDate: 2024-09-28

Stocke K, Lamont G, Tan J, et al (2024)

Delineation of global, absolutely essential and conditionally essential pangenomes of Porphyromonas gingivalis.

Scientific reports, 14(1):22247.

Porphyromonas gingivalis is a Gram-negative, anaerobic oral pathobiont, an etiological agent of periodontitis and the most commonly studied periodontal bacterium. Multiple low passage clinical isolates were sequenced, and their genomes compared to several laboratory strains. Phylogenetic distances were mapped, a gene absence-presence matrix generated, and core (present in all genomes) and accessory (absent in one or more genomes) genes delineated. Subsequently, a second pangenome delineating the prevalence of inherently essential genes was generated. The prevalence of genes conditionally essential for surviving tobacco exposure, abscess formation and epithelial invasion was also determined, in addition to genes encoding key proteolytic enzymes containing putative signal peptides. While the absolutely essential pangenome was highly conserved, significant differences in the complete and conditionally essential pangenomes were apparent. Thus, genetic plasticity appears to lie primarily in gene sets facilitating adaptation to variant disease-related environments. Those genes that are highly pervasive in the P. gingivalis absolutely essential pangenome or are highly prevalent and essential for fitness in disease-relevant models, may represent particularly attractive therapeutic targets worthy of further investigation. As mutations in absolutely essential genes are expected to be lethal, the data provided herein should also facilitate improved planning for P. gingivalis gene mutation strategies.

RevDate: 2024-09-27

Uzzal Hossain M, Khan Tanvir N, Naimur Rahman ABZ, et al (2024)

From sequence to Significance: A thorough investigation of the distinctive genome features Uncovered in C. Werkmanii strain NIB003.

Gene pii:S0378-1119(24)00846-1 [Epub ahead of print].

Citrobacter werkmanii (C. werkmanii), an opportunistic urinary bacterium that causes diarrhea, is poorly understood. Our research focuses on genetic features that are crucial to disease development, such as pathogenic interactions, antibiotic resistance, virulence genes and genetic variation. Following its morphological, biochemical, and molecular identification, the whole genome of C. werkmanii strain NIB003 was sequenced in Bangladesh for the first time. Despite having around 80% whole genome conservation, the research shows that the Bangladeshi strain forms a separate phylogenetic cluster. This emphasises the genetic variability within C. werkmanii, resulting in particular modifications at the strain level and changes in its ability to cause disease. The results of the genetic diversity analysis indicate that the Bangladeshi sequenced genome is more diverse than the other strains due to the existence of unique features, such as the presence of t-RNA binding domain and N-6 adenine-specific DNA methylases.

RevDate: 2024-09-27

Nawrocki EM, Kudva IT, EG Dudley (2024)

Investigating the adherence factors of Escherichia coli at the bovine recto-anal junction.

Microbiology spectrum [Epub ahead of print].

UNLABELLED: Shiga toxin-producing Escherichia coli (STEC) are major foodborne pathogens that result in thousands of hospitalizations each year in the United States. Cattle, the natural reservoir, harbor STEC asymptomatically at the recto-anal junction (RAJ). The molecular mechanisms that allow STEC and non-STEC E. coli to adhere to the RAJ are not fully understood, in part because most adherence studies utilize human cell culture models. To identify a set of bovine-specific E. coli adherence factors, we used the primary RAJ squamous epithelial (RSE) cell-adherence assay to coculture RSE cells from healthy Holstein cattle with diverse E. coli strains from bovine and nonbovine sources. We hypothesized that a comparative genomic analysis of the strains would reveal factors associated with RSE adherence. After performing adherence assays with historical strains from the E. coli Reference Center (n = 62) and strains newly isolated from the RAJ (n = 15), we used the bioinformatic tool Roary to create a pangenome of this collection. We classified strains as either low or high adherence and using the Scoary program compiled a list of accessory genes correlated with the "high adherence" strains. While none of the correlations were statistically significant, several gene clusters were associated with the high-adherence phenotype, including two that encode uncharacterized proteins. We also demonstrated that non-STEC E. coli strains from the RAJ are more adherent than other isolates and can outcompete STEC in coculture with RSEs. Further analysis of adherence-associated gene clusters may lead to an improved understanding of the molecular mechanisms of RSE adherence and may help develop probiotics targeting STEC in cattle.

IMPORTANCE: E. coli strains that produce Shiga toxin cause foodborne illness in humans but colonize cattle asymptomatically. The molecular mechanisms that E. coli uses to adhere to cattle cells are largely unknown. Various strategies are used to control E. coli in livestock and limit the risk of outbreaks. These include vaccinating animals against common E. coli strains and supplementing their feed with probiotics to reduce the carriage of pathogens. No strategy is completely effective, and probiotics often fail to colonize the animals. We sought to clarify the genes required for E. coli adherence in cattle by quantifying the attachment to bovine cells in a diverse set of bacteria. We also isolated nonpathogenic E. coli from healthy cows and showed that a representative isolate could outcompete pathogenic strains in cocultures. We propose that the focused study of these strains and their adherence factors will better inform the design of probiotics and vaccines for livestock.

RevDate: 2024-09-27

Ong CT, Blackall PJ, Boe-Hansen GB, et al (2024)

Whole-genome comparison using complete genomes from Campylobacter fetus strains revealed single nucleotide polymorphisms on non-genomic islands for subspecies differentiation.

Frontiers in microbiology, 15:1452564.

INTRODUCTION: Bovine Genital Campylobacteriosis (BGC), caused by Campylobacter fetus subsp. venerealis, is a sexually transmitted bacterium that significantly impacts cattle reproductive performance. However, current detection methods lack consistency and reliability due to the close genetic similarity between C. fetus subsp. venerealis and C. fetus subsp. fetus. Therefore, this study aimed to utilize complete genome analysis to distinguish genetic features between C. fetus subsp. venerealis and other subspecies, thereby enhancing BGC detection for routine screening and epidemiological studies.

METHODS AND RESULTS: This study reported the complete genomes of four C. fetus subsp. fetus and five C. fetus subsp. venerealis, sequenced using long-read sequencing technologies. Comparative whole-genome analyses (n = 25) were conducted, incorporating an additional 16 complete C. fetus genomes from the NCBI database, to investigate the genomic differences between these two closely related C. fetus subspecies. Pan-genomic analyses revealed a core genome consisting of 1,561 genes and an accessory pangenome of 1,064 genes between the two C. fetus subspecies. However, no unique predicted genes were identified in either subspecies. Nonetheless, whole-genome single nucleotide polymorphisms (SNPs) analysis identified 289 SNPs unique to one or the C. fetus subspecies. After the removal of SNPs located on putative genomic islands, recombination sites, and those causing synonymous amino acid changes, the remaining 184 SNPs were functionally annotated. Candidate SNPs that were annotated with the KEGG "Peptidoglycan Biosynthesis" pathway were recruited for further analysis due to their potential association with the glycine intolerance characteristic of C. fetus subsp. venerealis and its biovar variant. Verification with 58 annotated C. fetus genomes, both complete and incomplete, from RefSeq, successfully classified these seven SNPs into two groups, aligning with their phenotypic identification as CFF (Campylobacter fetus subsp. fetus) or CFV/CFVi (Campylobacter fetus subsp. venerealis and its biovar variant). Furthermore, we demonstrated the application of mraY SNPs for detecting C. fetus subspecies using a quantitative PCR assay.

DISCUSSION: Our results highlighted the high genetic stability of C. fetus subspecies. Nevertheless, Campylobacter fetus subsp. venerealis and its biovar variants encoded common SNPs in genes related to glycine intolerance, which differentiates them from C. fetus subsp. fetus. This discovery highlights the potential of employing a multiple-SNP assay for the precise differentiation of C. fetus subspecies.

RevDate: 2024-09-26

Guo M, Bi G, Wang H, et al (2024)

Genomes of autotetraploid wild and cultivated Ziziphus mauritiana reveal polyploid evolution and crop domestication.

Plant physiology pii:7777155 [Epub ahead of print].

Indian jujube (Ziziphus mauritiana) holds a prominent position in the global fruit and pharmaceutical markets. Here, we report the assemblies of haplotype-resolved, telomere-to-telomere genomes of autotetraploid wild and cultivated Indian jujube plants using a two-stage assembly strategy. The generation of these genomes permitted in-depth investigations into the divergence and evolutionary history of this important fruit crop. Using a graph-based pan-genome constructed from eight monoploid genomes, we identified structural variation (SV)-FST hotspots and SV hotspots. Gap-free genomes provide a means to obtain a global view of centromere structures. We identified presence-absence variation-related genes in four monoploid genomes (cI, cIII, wI, and wIII) and resequencing populations. We also present the population structure and domestication trajectory of the Indian jujube based on the resequencing of 73 wild and cultivated accessions. Metabolomic and transcriptomic analyses of mature fruits of wild and cultivated accessions unveiled the genetic basis underlying loss of fruit astringency during domestication of Indian jujube. This study reveals mechanisms underlying the divergence, evolution, and domestication of the autotetraploid Indian jujube and provides rich and reliable genetic resources for future research.

RevDate: 2024-09-25

Narechania A, Bobo D, Deitz K, et al (2024)

Rapid SARS-COV2 surveillance using clinical, pooled, or wastewater sequence as a sensor for population change.

Genome research pii:gr.278594.123 [Epub ahead of print].

The COVID-19 pandemic has highlighted the critical role of genomic surveillance for guiding policy and control. Timeliness is key, but sequence alignment and phylogeny slows most surveillance techniques. Millions of SARS-CoV-2 genomes have been assembled. Phylogenetic methods are ill equipped to handle this sheer scale. We introduce a pangenomic measure that examines the information diversity of a k-mer library drawn from a country's complete set of clinical, pooled, or wastewater sequence. Quantifying diversity is central to ecology. Hill numbers, or the effective number of species in a sample, provide a simple metric for comparing species diversity across environments. The more diverse the sample, the higher the Hill number. We adopt this ecological approach and consider each k-mer an individual and each genome a transect in the pangenome of the species. Structured in this way, Hill numbers summarize the temporal trajectory of pandemic variants, collapsing each day's assemblies into genome equivalents. For pooled or wastewater sequence, we instead compare days using survey sequence divorced from individual infections. Across data from the UK, USA, and South Africa, we trace the ascendance of new variants of concern as they emerge in local populations well before these variants are named and added to phylogenetic databases. Using data from San Diego wastewater, we monitor these same population changes from raw, unassembled sequence. This history of emerging variants senses all available data as it is sequenced, intimating variant sweeps to dominance or declines to extinction at the leading edge of the COVID19 pandemic.

RevDate: 2024-09-25

Fornezza S, Delvecchio VS, Harvey WT, et al (2024)

AGAP duplicons associate with structural diversity at Chromosome 10q11.22.

Genome research pii:gr.279454.124 [Epub ahead of print].

The 10q11.22 chromosomal region is a duplication-rich interval of the human genome and one of the last to be fully assembled. It carries copy-number variable genes associated with intellectual disability, bipolar disorder, and obesity. In this study, we characterized the structural diversity at this locus by analyzing 64 haploid assemblies produced by the Human Pangenome Reference Consortium. We identified eleven alternative haplotypes that differ in the copy number and/or orientation of large genomic segments, ranging from hundreds of kilobase pairs (kbp) to over one megabase pair (Mbp). We uncovered a 2.4 Mbp size difference between the shortest and longest haplotypes. Breakpoint analysis revealed that genomic instability results from nonallelic homologous recombination between segmental duplication (SD) pairs with varying similarity (94.4-99.6%). Nonetheless, these pairs generally recombine at positions where their identity is higher (>99.6%). Recurrent inversions occur with varying breakpoints within the same inverted SD pair. Inversion polymorphisms shuffle the entire SD arrangement, creating new predispositions to copy-number variations. The SD architecture is associated with a catarrhine-specific subgroup of the AGAP gene family, which likely triggered the accumulation of SDs at this locus over the past 25 million years of human evolution. Our results reveal extensive structural diversity and genomic instability at the 10q11.22 locus and expand the general understanding of the mutational mechanisms behind SD-mediated rearrangements.

RevDate: 2024-09-25
CmpDate: 2024-09-25

Chen L, Zhang L, Li Y, et al (2024)

Screening of promising molecules against potential drug targets in Yersinia pestis by integrative pan and subtractive genomics, docking and simulation approach.

Archives of microbiology, 206(10):415.

This study focuses on Yersinia pestis, the bacterium responsible for plague, which posed a severe threat to public health in history. Despite the availability of antibiotics treatment, the emergence of antibiotic resistance in this pathogen has increased challenges of controlling the infections and plague outbreaks. The development of new drug targets and therapies is urgently needed. This research aims to identify novel protein targets from 28 Y. pestis strains by the integrative pan-genomic and subtractive genomics approach. Additionally, it seeks to screen out potential safe and effective alternative therapies against these targets via high-throughput virtual screening. Targets should lack homology to human, gut microbiota, and known human 'anti-targets', while should exhibit essentiality for pathogen's survival and virulence, druggability, antibiotic resistance, and broad spectrum across multiple pathogenic bacteria. We identified two promising targets: the aminotransferase class I/class II domain-containing protein and 3-oxoacyl-[acyl-carrier-protein] synthase 2. These proteins were modeled using AlphaFold2, validated through several structural analyses, and were subjected to molecular docking and ADMET analysis. Molecular dynamics simulations determined the stability of the ligand-target complexes, providing potential therapeutic options against Y. pestis.

RevDate: 2024-09-25

Cunha F, Zhai Y, Casaro S, et al (2024)

Pangenomic and biochemical analyses of Helcococcus ovis reveal widespread tetracycline resistance and a novel bacterial species, Helcococcus bovis.

Frontiers in microbiology, 15:1456569.

Helcococcus ovis (H. ovis) is an opportunistic bacterial pathogen of a wide range of animal hosts including domestic ruminants, swine, avians, and humans. In this study, we sequenced the genomes of 35 Helcococcus sp. clinical isolates from the uterus of dairy cows and explored their antimicrobial resistance and biochemical phenotypes in vitro. Phylogenetic and average nucleotide identity analyses classified four Helcococcus isolates within a cryptic clade representing an undescribed species, for which we propose the name Helcococcus bovis sp. nov. By establishing this new species clade, we also resolve the longstanding question of the classification of the Tongji strain responsible for a confirmed human conjunctival infection. This strain did not neatly fit into H. ovis and is instead a member of H. bovis. We applied whole genome comparative analyses to explore the pangenome, resistome, virulome, and taxonomic diversity of the remaining 31 H. ovis isolates. An overwhelming 97% of H. ovis strains (30 out of 31) harbor mobile tetracycline resistance genes and displayed significantly increased minimum inhibitory concentrations of tetracyclines in vitro. The high prevalence of mobile tetracycline resistance genes makes H. ovis a significant antimicrobial resistance gene reservoir in our food chain. Finally, the phylogenetic distribution of co-occurring high-virulence determinant genes of H. ovis across unlinked and distant loci highlights an instance of convergent gene loss in the species. In summary, this study showed that mobile genetic element-mediated tetracycline resistance is widespread in H. ovis, and that there is evidence of co-occurring virulence factors across clades suggesting convergent gene loss in the species. Finally, we introduced a novel Helcococcus species closely related to H. ovis, called H. bovis sp. nov., which has been reported to cause infection in humans.

RevDate: 2024-09-24

Zheng B, Xu J, Zhang Y, et al (2024)

MBCN: A novel reference database for Effcient Metagenomic analysis of human gut microbiome.

Heliyon, 10(18):e37422.

Metagenomic shotgun sequencing data can identify microbes and their proportions. But metagenomic shotgun data profiling results obtained from multiple projects using different reference databases are difficult to compare and apply meta-analysis. Our work aims to create a novel collection of human gut prokaryotic genomes, named Microbiome Collection Navigator (MBCN). 2379 human gut metagenomic samples are screened, and 16,785 metagenome-assembled genomes (MAGs) are assembled using a standardized pipeline. In addition, MAGs are combined with the representative genomes from public prokaryotic genomes collections to cluster, and pan-genomes for each cluster's genomes are constructed to build Kraken2 and Bracken databases. The databases built by MBCN are more comprehensive and accurate for profiling metagenomic reads comparing with other collections on simulated reads and virtual bio-projects. We profile 1082 human gut metagenomic samples with MBCN database and organize profiles and metadata on the web program. Meanwhile, using MBCN as a reference database, we also develop a unified, standardized, and systematic metagenomic analysis pipeline and platform, named MicrobiotaCN (http://www.microbiota.cn) and common statistical and visualization tools for microbiome research are integrated into the web program. Taken together, MBCN and MicrobiotaCN can be a valuable resource and a powerful tool that allows researchers to perform metagenomic analysis by a unified pipeline efficiently.

RevDate: 2024-09-23

Liu JN, Yan L, Chai Z, et al (2024)

Pan-genome analyses of eleven Fraxinus species provide insights into salt adaptation in ash trees.

Plant communications pii:S2590-3462(24)00533-9 [Epub ahead of print].

Ash trees (Fraxinus) exhibit rich genetic diversity and wide adaptation to various ecological environments, several of which are highly salt-tolerant. Dissecting the genomic basis underlying ash tree salt adaptation is vital for its resistance breeding. Here, we presented eleven high-quality chromosome-level genome assemblies for Fraxinus species, revealing two unequal sub-genome compositions and two more recent whole-genome triplication events in evolutionary history. A Fraxinus structural variation-based pan-genome was constructed and revealed that presence-absence variations (PAVs) of transmembrane transport genes likely contribute to Fraxinus salt adaptation. Through whole-genome resequencing of an inter-species cross F1-population of F. velutina 'Lula 3' (salt-tolerant) × F. pennsylvanica 'Lula 5' (salt-sensitive), we performed a salt tolerance PAV-based quantitative trait loci (QTL) mapping and pinpointed two PAV-QTLs and candidate genes associated with Fraxinus salt tolerance. Mechanismly, FvbHLH85 enhanced salt tolerance by mediating reactive oxygen species and Na[+]/K[+] homeostasis, while FvSWEET5 by mediating osmotic homeostasis. Collectively, these findings provide valuable genomic resources for Fraxinus salt resistance breeding and research community.

RevDate: 2024-09-19
CmpDate: 2024-09-19

Sarwal V, Lee S, Yang J, et al (2024)

VISTA: an integrated framework for structural variant discovery.

Briefings in bioinformatics, 25(5):.

Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.

RevDate: 2024-09-21

Gaye A, Sene ARG, Gadji M, et al (2024)

Toward building a comprehensive human pan-genome: The SEN-GENOME project.

American journal of human genetics pii:S0002-9297(24)00303-3 [Epub ahead of print].

The human reference genome (GRCh38), primarily sourced from individuals of European descent, falls short in capturing the vast genetic diversity across global populations. Efforts to diversify the reference genome face challenges in accessibility and representation, exacerbating the scarcity of African genomic data crucial for studying diseases prevalent in these populations. Sherman et al. proposed constructing reference genomes tailored to distinct human sub-populations. Their African Pan-Genome initiative highlighted substantial genetic variation missing from the GRCh38 human reference genome, emphasizing the necessity for population-specific genomes. In response, local initiatives like the Senegalese Genome project (SEN-GENOME) have emerged to document the genomes of historically overlooked populations. SEN-GENOME embodies community-driven decentralized research. With meticulous recruitment criteria and ethical practices, it aims to sequence 1,000 genomes from 31 ethnolinguistic groups, in the fourteen administrative regions of Senegal, fostering local genomic research tailored to the region. The key to SEN-GENOME's success is its commitment to local governance of data, capacity building, and integration with broader pan-genome projects in Africa. Despite the complexities of data harmonization and sharing, our collaborative efforts are aligned with common goals, ensuring steady progress toward a comprehensive human pan-genome. We invite and welcome collaboration with other research entities to achieve this shared vision. In summary, local initiatives such as SEN-GENOME are pivotal in bridging genomic disparities, offering pathways to equitable and inclusive genomic research. Collaborative endeavors guided by a collective vision for human health will propel us toward a more encompassing understanding of the human genome and better health through genomic medicine.

RevDate: 2024-09-20
CmpDate: 2024-09-21

Silva MH, Batista LL, Malta SM, et al (2024)

Unveiling the Brazilian kefir microbiome: discovery of a novel Lactobacillus kefiranofaciens (LkefirU) genome and in silico prospection of bioactive peptides with potential anti-Alzheimer properties.

BMC genomics, 25(1):884.

BACKGROUND: Kefir is a complex microbial community that plays a critical role in the fermentation and production of bioactive peptides, and has health-improving properties. The composition of kefir can vary by geographic localization and weather, and this paper focuses on a Brazilian sample and continues previous work that has successful anti-Alzheimer properties. In this study, we employed shotgun metagenomics and peptidomics approaches to characterize Brazilian kefir further.

RESULTS: We successfully assembled the novel genome of Lactobacillus kefiranofaciens (LkefirU) and conducted a comprehensive pangenome analysis to compare it with other strains. Furthermore, we performed a peptidome analysis, revealing the presence of bioactive peptides encrypted by L. kefiranofaciens in the Brazilian kefir sample, and utilized in silico prospecting and molecular docking techniques to identify potential anti-Alzheimer peptides, targeting β-amyloid (fibril and plaque), BACE, and acetylcholinesterase. Through this analysis, we identified two peptides that show promise as compounds with anti-Alzheimer properties.

CONCLUSIONS: These findings not only provide insights into the genome of L. kefiranofaciens but also serve as a promising prototype for the development of novel anti-Alzheimer compounds derived from Brazilian kefir.

RevDate: 2024-09-20
CmpDate: 2024-09-20

Martineau M, Ambroset C, Lefebvre S, et al (2024)

Unravelling the main genomic features of Mycoplasma equirhinis.

BMC genomics, 25(1):886.

BACKGROUND: Mycoplasma spp. are wall-less bacteria with small genomes (usually 0.5-1.5 Mb). Many Mycoplasma (M.) species are known to colonize the respiratory tract of both humans and livestock animals, where they act as primary pathogens or opportunists. M. equirhinis was described for the first time in 1975 in horses but has been poorly studied since, despite regular reports of around 14% prevalence in equine respiratory disorders. We recently showed that M. equirhinis is not a primary pathogen but could play a role in co-infections of the respiratory tract. This study was a set up to propose the first genomic characterization to better our understanding of the M. equirhinis species.

RESULTS: Four circularized genomes, two of which were generated here, were compared in terms of synteny, gene content, and specific features associated with virulence or genome plasticity. An additional 20 scaffold-level genomes were used to analyse intra-species diversity through a pangenome phylogenetic approach. The M. equirhinis species showed consistent genomic homogeneity, pointing to potential clonality of isolates despite their varied geographical origins (UK, Japan and various places in France). Three different classes of mobile genetic elements have been detected: insertion sequences related to the IS1634 family, a putative prophage related to M. arthritidis and integrative conjugative elements related to M. arginini. The core genome harbours the typical putative virulence-associated genes of mycoplasmas mainly involved in cytoadherence and immune escape.

CONCLUSION: M. equirhinis is a highly syntenic, homogeneous species with a limited repertoire of mobile genetic elements and putative virulence genes.

RevDate: 2024-09-20
CmpDate: 2024-09-20

Chandra T, Jaiswal S, Tomar RS, et al (2024)

Realizing visionary goals for the International Year of Millet (IYoM): accelerating interventions through advances in molecular breeding and multiomics resources.

Planta, 260(4):103.

Leveraging advanced breeding and multi-omics resources is vital to position millet as an essential "nutricereal resource," aligning with IYoM goals, alleviating strain on global cereal production, boosting resilience to climate change, and advancing sustainable crop improvement and biodiversity. The global challenges of food security, nutrition, climate change, and agrarian sustainability demand the adoption of climate-resilient, nutrient-rich crops to support a growing population amidst shifting environmental conditions. Millets, also referred to as "Shree Anna," emerge as a promising solution to address these issues by bolstering food production, improving nutrient security, and fostering biodiversity conservation. Their resilience to harsh environments, nutritional density, cultural significance, and potential to enhance dietary quality index made them valuable assets in global agriculture. Recognizing their pivotal role, the United Nations designated 2023 as the "International Year of Millets (IYoM 2023)," emphasizing their contribution to climate-resilient agriculture and nutritional enhancement. Scientific progress has invigorated efforts to enhance millet production through genetic and genomic interventions, yielding a wealth of advanced molecular breeding technologies and multi-omics resources. These advancements offer opportunities to tackle prevailing challenges in millet, such as anti-nutritional factors, sensory acceptability issues, toxin contamination, and ancillary crop improvements. This review provides a comprehensive overview of molecular breeding and multi-omics resources for nine major millet species, focusing on their potential impact within the framework of IYoM. These resources include whole and pan-genome, elucidating adaptive responses to abiotic stressors, organelle-based studies revealing evolutionary resilience, markers linked to desirable traits for efficient breeding, QTL analysis facilitating trait selection, functional gene discovery for biotechnological interventions, regulatory ncRNAs for trait modulation, web-based platforms for stakeholder communication, tissue culture techniques for genetic modification, and integrated omics approaches enabled by precise application of CRISPR/Cas9 technology. Aligning these resources with the seven thematic areas outlined by IYoM catalyzes transformative changes in millet production and utilization, thereby contributing to global food security, sustainable agriculture, and enhanced nutritional consequences.

RevDate: 2024-09-19

Hellewell J, Horsfield ST, von Wachsmann J, et al (2024)

CELEBRIMBOR: Core and accessory genes from metagenomes.

Bioinformatics (Oxford, England) pii:7762100 [Epub ahead of print].

MOTIVATION: Metagenome-Assembled Genomes (MAGs) or Single-cell Amplified Genomes (SAGs) are often incomplete, with sequences missing due to errors in assembly or low coverage. This presents a particular challenge for the identification of true gene frequencies within a microbial population, as core genes missing in only a few assemblies will be mischaracterized by current pangenome approaches.

RESULTS: Here, we present CELEBRIMBOR, a Snakemake pangenome analysis pipeline which uses a measure of genome completeness to automatically adjust the frequency threshold at which core genes are identified, enabling accurate core gene identification in MAGs and SAGs.

AVAILABILITY: CELEBRIMBOR is published under open source Apache 2.0 licence at https://github.com/bacpop/CELEBRIMBOR and is available as a Docker container from this repository. Supplementary material is available in the online version of the article.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2024-09-18
CmpDate: 2024-09-18

Vaibarova V, Kralova S, Palikova M, et al (2024)

Genetic and phenotypic diversity of Flavobacterium psychrophilum isolates from Czech salmonid fish farms.

BMC microbiology, 24(1):352.

BACKGROUND: The salmonid pathogen Flavobacterium psychrophilum poses a significant economic threat to global aquaculture, yet our understanding of its genetic and phenotypic diversity remains incomplete across much of its geographic range. In this study, we characterise the genetic and phenotypic diversity of 70 isolates collected from rainbow trout (Oncorhynchus mykiss) and brown trout (Salmo trutta m. fario) from fish farms in the Czech Republic between 2012 and 2019 to compare their genomic content with all draft or complete genomes present in the NCBI database (n = 187).

RESULTS: The Czech isolates underwent comprehensive evaluation, including multiplex PCR-based serotyping, genetic analysis, antimicrobial resistance testing, and assessment of selected virulence factors. Multiplex PCR serotyping revealed 43 isolates as Type 1, 23 as Type 2, with sporadic cases of Types 3 and 4. Multi-locus sequence typing unveiled 12 sequence types (ST), including seven newly described ones. Notably, 24 isolates were identified as ST329, a novel sequence type, while 22 were classified as the globally-distributed ST2. Phylogenetic analysis demonstrated clonal distribution of ST329 in the Czech Republic, with these isolates lacking a phage sequence in their genomes. Antimicrobial susceptibility testing revealed a high proportion of isolates classified as non-wild type with reduced susceptibility to oxolinic acid, oxytetracycline, flumequine, and enrofloxacin, while most isolates were classified as wild type for florfenicol, sulfamethoxazole-trimethoprim, and erythromycin. However, 31 isolates classified as wild type for florfenicol exhibited minimum inhibitory concentrations at the susceptibility breakpoint.

CONCLUSION: The prevalence of the Czech F. psychrophilum serotypes has evolved over time, likely influenced by the introduction of new isolates through international trade. Thus, it is crucial to monitor F. psychrophilum clones within and across countries using advanced methods such as MLST, serotyping, and genome sequencing. Given the open nature of the pan-genome, further sequencing of strains promises exciting discoveries in F. psychrophilum genomics.

RevDate: 2024-09-18

Góngora E, Lirette AO, Freyria NJ, et al (2024)

Metagenomic survey reveals hydrocarbon biodegradation potential of Canadian high Arctic beaches.

Environmental microbiome, 19(1):72.

BACKGROUND: Decreasing sea ice coverage across the Arctic Ocean due to climate change is expected to increase shipping activity through previously inaccessible shipping routes, including the Northwest Passage (NWP). Changing weather conditions typically encountered in the Arctic will still pose a risk for ships which could lead to an accident and the uncontrolled release of hydrocarbons onto NWP shorelines. We performed a metagenomic survey to characterize the microbial communities of various NWP shorelines and to determine whether there is a metabolic potential for hydrocarbon degradation in these microbiomes.

RESULTS: We observed taxonomic and functional gene evidence supporting the potential of NWP beach microbes to degrade various types of hydrocarbons. The metagenomic and metagenome-assembled genome (MAG) taxonomy showed that known hydrocarbon-degrading taxa are present in these beaches. Additionally, we detected the presence of biomarker genes of aerobic and anaerobic degradation pathways of alkane and aromatic hydrocarbons along with complete degradation pathways for aerobic alkane degradation. Alkane degradation genes were present in all samples and were also more abundant (33.8 ± 34.5 hits per million genes, HPM) than their aromatic hydrocarbon counterparts (11.7 ± 12.3 HPM). Due to the ubiquity of MAGs from the genus Rhodococcus (23.8% of the MAGs), we compared our MAGs with Rhodococcus genomes from NWP isolates obtained using hydrocarbons as the carbon source to corroborate our results and to develop a pangenome of Arctic Rhodococcus. Our analysis revealed that the biodegradation of alkanes is part of the core pangenome of this genus. We also detected nitrogen and sulfur pathways as additional energy sources and electron donors as well as carbon pathways providing alternative carbon sources. These pathways occur in the absence of hydrocarbons allowing microbes to survive in these nutrient-poor beaches.

CONCLUSIONS: Our metagenomic analyses detected the genetic potential for hydrocarbon biodegradation in these NWP shoreline microbiomes. Alkane metabolism was the most prevalent type of hydrocarbon degradation observed in these tidal beach ecosystems. Our results indicate that bioremediation could be used as a cleanup strategy, but the addition of adequate amounts of N and P fertilizers, should be considered to help bacteria overcome the oligotrophic nature of NWP shorelines.

RevDate: 2024-09-17
CmpDate: 2024-09-17

Bouznada K, Saker R, Belaouni HA, et al (2024)

Phylogenomic Analysis Supports the Reclassification of Caldicoprobacter faecalis (Winter et al. 1988) Bouanane-Darenfed et al. (2015) as a Later Heterotypic Synonym of Caldicoprobacter oshimai Yokoyama et al. (2010).

Current microbiology, 81(11):363.

This study employs genome-based methodologies to explore the taxonomic relationship between Caldicoprobacter faecalis DSM 20678[T] and Caldicoprobacter oshimai DSM 21659[T]. The genome-based similarity indices calculations consisting of digital DNA-DNA Hybridization (dDDH), Average Amino Aid Identity (AAI), and Average Nucleotide Identity (ANI) between the genomes of these two type strains yielded percentages of 91.2%, 98.9%, and 99.1%, respectively. These values were above the recommended thresholds of 70% (dDDH) and 95-96% (ANI and AAI) for bacterial species delineation, indicating a shared taxonomic position for C. faecalis and C. oshimai. Furthermore, analysis utilizing the 'Bacterial Pan Genome Analysis' (BPGA) pipeline and constructing a Maximum Likelihood core-genes tree using FastTree2 consistently demonstrated the close relationship between C. faecalis DSM 20678[T] and C. oshimai DSM 21659[T], evident from their clustering in the core-genes phylogenomic tree. Based on these comprehensive findings, we propose the reclassification of C. faecalis as a later heterotypic synonym of C. oshimai.

RevDate: 2024-09-17

Bucher-Johannessen C, Senthakumaran T, Avershina E, et al (2024)

Species-level verification of Phascolarctobacterium association with colorectal cancer.

mSystems [Epub ahead of print].

We have previously demonstrated an association between increased abundance of Phascolarctobacterium and colorectal cancer (CRC) and adenomas in two independent Norwegian cohorts. Here we seek to verify our previous findings using new cohorts and methods. In addition, we characterize lifestyle and sex specificity, the functional potential of the Phascolarctobacterium species, and their interaction with other microbial species. We analyze Phascolarctobacterium with 16S rRNA sequencing, shotgun metagenome sequencing, and species-specific qPCR, using 2350 samples from three Norwegian cohorts-CRCAhus, NORCCAP, and CRCbiome-and a large publicly available data set, curatedMetagenomicData. Using metagenome-assembled genomes from the CRCbiome study, we explore the genomic characteristics and functional potential of the Phascolarctobacterium pangenome. Three species of Phascolarctobacterium associated with adenoma/CRC were consistently detected by qPCR and sequencing. Positive associations with adenomas/CRC were verified for Phascolarctobacterium succinatutens and negative associations were shown for Phascolarctobacterium faecium and adenoma in curatedMetagenomicData. Men show a higher prevalence of P. succinatutens across cohorts. Co-occurrence among Phascolarctobacterium species was low (<6%). Each of the three species shows distinct microbial composition and forms distinct correlation networks with other bacterial taxa, although Dialister invisus was negatively correlated to all investigated Phascolarctobacterium species. Pangenome analyses showed P. succinatutens to be enriched for genes related to porphyrin metabolism and degradation of complex carbohydrates, whereas glycoside hydrolase enzyme 3 was specific to P. faecium.IMPORTANCEUntil now Phascolarctobacterium has been going under the radar as a CRC-associated genus despite having been noted, but overseen, as such for over a decade. We found not just one, but two species of Phascolarctobacterium to be associated with CRC-Phascolarctobacterium succinatutens was more abundant in adenoma/CRC, while Phascolarctobacterium faecium was less abundant in adenoma. Each of them represents distinct communities, constituted by specific microbial partners and metabolic capacities-and they rarely occur together in the same patients. We have verified that P. succinatutens is increased in adenoma and CRC and this species should be recognized among the most important CRC-associated bacteria.

RevDate: 2024-09-16
CmpDate: 2024-09-16

Geethanjali S, Kadirvel P, S Periyannan (2024)

Wheat improvement through advances in single nucleotide polymorphism (SNP) detection and genotyping with a special emphasis on rust resistance.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik, 137(10):224.

Single nucleotide polymorphism (SNP) markers in wheat and their prospects in breeding with special reference to rust resistance. Single nucleotide polymorphism (SNP)-based markers are increasingly gaining momentum for screening and utilizing vital agronomic traits in wheat. To date, more than 260 million SNPs have been detected in modern cultivars and landraces of wheat. This rapid SNP discovery was made possible through the release of near-complete reference and pan-genome assemblies of wheat and its wild relatives, coupled with whole genome sequencing (WGS) of thousands of wheat accessions. Further, genotyping customized SNP sites were facilitated by a series of arrays (9 to 820Ks), a cost effective substitute WGS. Lately, germplasm-specific SNP arrays have been introduced to characterize novel traits and detect closely linked SNPs for marker-assisted breeding. Subsequently, the kompetitive allele-specific PCR (KASP) assay was introduced for rapid and large-scale screening of specific SNP markers. Moreover, with the advances and reduction in sequencing costs, ample opportunities arise for generating SNPs artificially through mutations and in combination with next-generation sequencing and comparative genomic analyses. In this review, we provide historical developments and prospects of SNP markers in wheat breeding with special reference to rust resistance where over 50 genetic loci have been characterized through SNP markers. Rust resistance is one of the most essential traits for wheat breeding as new strains of the Puccinia fungus, responsible for rust diseases, evolve frequently and globally.

RevDate: 2024-09-16

Olivos-Caicedo KY, Fernandez F, Daniel SL, et al (2024)

Pangenome analysis of Clostridium scindens : a collection of diverse bile acid and steroid metabolizing commensal gut bacterial strains.

bioRxiv : the preprint server for biology pii:2024.09.06.610859.

Clostridium scindens is a commensal gut bacterium capable of forming the secondary bile acids deoxycholic acid and lithocholic acid from the primary bile acids cholic acid and chenodeoxycholic acid, respectively, as well as converting glucocorticoids to androgens. Historically, only two strains, C. scindens ATCC 35704 and C. scindens VPI 12708, have been characterized in vitro and in vivo to any significant extent. The formation of secondary bile acids is important in maintaining normal gastrointestinal function, in regulating the structure of the gut microbiome, in the etiology of such diseases such as cancers of the GI tract, and in the prevention of Clostridium difficile infection. We therefore wanted to determine the pangenome of 34 cultured strains of C. scindens and a set of 200 metagenome-assembled genomes (MAGs) to understand the variability among strains. The results indicate that the 34 strains of C. scindens have an open pangenome with 12,720 orthologous gene groups, and a core genome with 1,630 gene families, in addition to 7,051 and 4,039 gene families in the accessory and unique (i.e., strain-exclusive) genomes, respectively. The core genome contains 39% of the proteins with predicted metabolic function, and, in the unique genome, the function of storage and processing of information prevails, with 34% of the proteins being in that category. The pangenome profile including the MAGs also proved to be open. The presence of bile acid inducible (bai) and steroid-17,20-desmolase (des) genes was identified among groups of strains. The analysis reveals that C. scindens strains are distributed into two clades, indicating the possible onset of C. scindens separation into two species, confirmed by gene content, phylogenomic, and average nucleotide identity (ANI) analyses. This study provides insight into the structure and function of the C. scindens pangenome, offering a genetic foundation of significance for many aspects of research on the intestinal microbiota and bile acid metabolism.

RevDate: 2024-09-16

Littlefield C, Lazaro-Guevara JM, Stucki D, et al (2024)

A Draft Pacific Ancestry Pangenome Reference.

bioRxiv : the preprint server for biology pii:2024.08.07.606392.

Individuals of Pacific ancestry suffer some of the highest rates of health disparities yet remain vastly underrepresented in genomic research, including currently available linear and pangenome references. To begin addressing this, we developed the first Pacific ancestry pangenome reference using 23 individuals with diverse Pacific ancestry. We assembled 46 haploid genomes from these 23 individuals, resulting in highly accurate and contiguous genome assemblies with an average quality value of 55.0 and an average N50 of 40.7 Mb, marking the first de novo assembly of highly accurate Pacific ancestry genomes. We combined these assemblies to create a pangenome reference, which added 30.6 Mb of novel sequence missing from the Human Pangenome Reference Consortium (HPRC) reference. Mapping short reads to this pangenome reduced variant call errors and yielded more true-positive variants compared to the HPRC and T2T-CHM13 references. This Pacific ancestry pangenome reference serves as a resource to enhance genetic analyses for this underserved population.

RevDate: 2024-09-16

Feng Y, Weers T, RJ Peters (2024)

Double-barreled defense: dual ent-miltiradiene synthases in most rice cultivars.

aBIOTECH, 5(3):375-380 pii:167.

UNLABELLED: Rice (Oryza sativa) produces numerous diterpenoid phytoalexins that are important in defense against pathogens. Surprisingly, despite extensive previous investigations, a major group of such phytoalexins, the abietoryzins, were only recently reported. These aromatic abietanes are presumably derived from ent-miltiradiene, but such biosynthetic capacity has not yet been reported in O. sativa. While wild rice has been reported to contain such an enzyme, specifically ent-kaurene synthase-like 10 (KSL10), the only characterized ortholog from O. sativa (OsKSL10), specifically from the well-studied cultivar (cv.) Nipponbare, instead has been shown to make ent-sandaracopimaradiene, precursor to the oryzalexins. Notably, in many other cultivars, OsKSL10 is accompanied by a tandem duplicate, termed here OsKSL14. Biochemical characterization of OsKLS14 from cv. Kitaake demonstrates that this produces the expected abietoryzin precursor ent-miltiradiene. Strikingly, phylogenetic analysis of OsKSL10 across the rice pan-genome reveals that from cv. Nipponbare is an outlier, whereas the alleles from most other cultivars group with those from wild rice, suggesting that these also might produce ent-miltiradiene. Indeed, OsKSL10 from cv. Kitaake exhibits such activity as well, consistent with its production of abietoryzins but not oryzalexins. Similarly consistent with these results is the lack of abietoryzin production by cv. Nipponbare. Although their equivalent product outcome might suggest redundancy, OsKSL10 and OsKSL14 were observed to exhibit distinct expression patterns, indicating such differences may underlie retention of these duplicated genes. Regardless, the results reported here clarify abietoryzin biosynthesis and provide insight into the evolution of rice diterpenoid phytoalexins.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42994-024-00167-3.

RevDate: 2024-09-15
CmpDate: 2024-09-15

Hou Y, Gan J, Fan Z, et al (2024)

Haplotype-based pangenomes reveal genetic variations and climate adaptations in moso bamboo populations.

Nature communications, 15(1):8085.

Moso bamboo (Phyllostachys edulis), an ecologically and economically important forest species in East Asia, plays vital roles in carbon sequestration and climate change mitigation. However, intensifying climate change threatens moso bamboo survival. Here we generate high-quality haplotype-based pangenome assemblies for 16 representative moso bamboo accessions and integrated these assemblies with 427 previously resequenced accessions. Characterization of the haplotype-based pangenome reveals extensive genetic variation, predominantly between haplotypes rather than within accessions. Many genes with allele-specific expression patterns are implicated in climate responses. Integrating spatiotemporal climate data reveals more than 1050 variations associated with pivotal climate factors, including temperature and precipitation. Climate-associated variations enable the prediction of increased genetic risk across the northern and western regions of China under future emissions scenarios, underscoring the threats posed by rising temperatures. Our integrated haplotype-based pangenome elucidates moso bamboo's local climate adaptation mechanisms and provides critical genomic resources for addressing intensifying climate pressures on this essential bamboo. More broadly, this study demonstrates the power of long-read sequencing in dissecting adaptive traits in climate-sensitive species, advancing evolutionary knowledge to support conservation.

RevDate: 2024-09-14

Wu Y, Wang F, Lyu K, et al (2024)

Comparative Analysis of Transposable Elements in the Genomes of Citrus and Citrus-Related Genera.

Plants (Basel, Switzerland), 13(17): pii:plants13172462.

Transposable elements (TEs) significantly contribute to the evolution and diversity of plant genomes. In this study, we explored the roles of TEs in the genomes of Citrus and Citrus-related genera by constructing a pan-genome TE library from 20 published genomes of Citrus and Citrus-related accessions. Our results revealed an increase in TE content and the number of TE types compared to the original annotations, as well as a decrease in the content of unclassified TEs. The average length of TEs per assembly was approximately 194.23 Mb, representing 41.76% (Murraya paniculata) to 64.76% (Citrus gilletiana) of the genomes, with a mean value of 56.95%. A significant positive correlation was found between genome size and both the number of TE types and TE content. Consistent with the difference in mean whole-genome size (39.83 Mb) between Citrus and Citrus-related genera, Citrus genomes contained an average of 34.36 Mb more TE sequences than Citrus-related genomes. Analysis of the estimated insertion time and half-life of long terminal repeat retrotransposons (LTR-RTs) suggested that TE removal was not the primary factor contributing to the differences among genomes. These findings collectively indicate that TEs are the primary determinants of genome size and play a major role in shaping genome structures. Principal coordinate analysis (PCoA) of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) identifiers revealed that the fragmented TEs were predominantly derived from ancestral genomes, while intact TEs were crucial in the recent evolutionary diversification of Citrus. Moreover, the presence or absence of intact TEs near the AdhE superfamily was closely associated with the bitterness trait in the Citrus species. Overall, this study enhances TE annotation in Citrus and Citrus-related genomes and provides valuable data for future genetic breeding and agronomic trait research in Citrus.

RevDate: 2024-09-14
CmpDate: 2024-09-14

Song Y, Han S, Wang M, et al (2024)

Pangenome Identification and Analysis of Terpene Synthase Gene Family Members in Gossypium.

International journal of molecular sciences, 25(17): pii:ijms25179677.

Terpene synthases (TPSs), key gatekeepers in the biosynthesis of herbivore-induced terpenes, are pivotal in the diversity of terpene chemotypes across and within plant species. Here, we constructed a gene-based pangenome of the Gossypium genus by integrating the genomes of 17 diploid and 10 tetraploid species. Within this pangenome, 208 TPS syntelog groups (SGs) were identified, comprising 2 core SGs (TPS5 and TPS42) present in all 27 analyzed genomes, 6 softcore SGs (TPS11, TPS12, TPS13, TPS35, TPS37, and TPS47) found in 25 to 26 genomes, 131 dispensable SGs identified in 2 to 24 genomes, and 69 private SGs exclusive to a single genome. The mutational load analysis of these identified TPS genes across 216 cotton accessions revealed a great number of splicing variants and complex splicing patterns. The nonsynonymous/synonymous Ka/Ks value for all 52 analyzed TPS SGs was less than one, indicating that these genes were subject to purifying selection. Of 208 TPS SGs encompassing 1795 genes, 362 genes derived from 102 SGs were identified as atypical and truncated. The structural analysis of TPS genes revealed that gene truncation is a major mechanism contributing to the formation of atypical genes. An integrated analysis of three RNA-seq datasets from cotton plants subjected to herbivore infestation highlighted nine upregulated TPSs, which included six previously characterized TPSs in G. hirsutum (AD1_TPS10, AD1_TPS12, AD1_TPS40, AD1_TPS42, AD1_TPS89, and AD1_TPS104), two private TPSs (AD1_TPS100 and AD2_TPS125), and one atypical TPS (AD2_TPS41). Also, a TPS-associated coexpression module of eight genes involved in the terpenoid biosynthesis pathway was identified in the transcriptomic data of herbivore-infested G. hirsutum. These findings will help us understand the contributions of TPS family members to interspecific terpene chemotypes within Gossypium and offer valuable resources for breeding insect-resistant cotton cultivars.

RevDate: 2024-09-13

Olson MA, Cullimore C, Hutchison WD, et al (2024)

Genes associated with fitness and disease severity in the pan-genome of mastitis-associated Escherichia coli.

Frontiers in microbiology, 15:1452007.

INTRODUCTION: Bovine mastitis caused by Escherichia coli compromises animal health and inflicts substantial product losses in dairy farming. It may manifest as subclinical through severe acute disease and can be transient or persistent in nature. Little is known about bacterial factors that impact clinical outcomes or allow some strains to outcompete others in the mammary gland (MG) environment. Mastitis-associated E. coli (MAEC) may have distinctive characteristics which may contribute to the varied nature of the disease. Given their high levels of intraspecies genetic variability, virulence factors of commonly used MAEC model strains may not be relevant to all members of this group.

METHODS: In this study, we sequenced the genomes of 96 MAEC strains isolated from cattle with clinical mastitis (CM). We utilized clinical severity data to perform genome-wide association studies to identify accessory genes associated with strains isolated from mild or severe CM, or with high or low competitive fitness during in vivo competition assays. Genes associated with mastitis pathogens or commensal strains isolated from bovine sources were also identified.

RESULTS: A type-2 secretion system (T2SS) and a chitinase (ChiA) exported by this system were strongly associated with pathogenic isolates compared with commensal strains. Deletion of chiA from MAEC isolates decreased their adherence to cultured bovine mammary epithelial cells.

DISCUSSION: The increased fitness associated with strains possessing this gene may be due to better attachment in the MG. Overall, these results provide a much richer understanding of MAEC and suggest bacterial processes that may underlie the clinical diversity associated with mastitis and their adaptation to this unique environment.

RevDate: 2024-09-12

Magar S, Kolte V, Sharma G, et al (2024)

Exploring pangenomic diversity and CRISPR-Cas evasion potential in jumbo phages: a comparative genomics study.

Microbiology spectrum [Epub ahead of print].

UNLABELLED: Jumbo phages are characterized by their remarkably large-sized genome and unique life cycles. Jumbo phages belonging to Chimalliviridae family protect the replicating phage DNA from host immune systems like CRISPR-Cas and restriction-modification system through a phage nucleus structure. Several recent studies have provided new insights into jumbo phage infection biology, but the pan-genome diversity of jumbo phages and their relationship with CRISPR-Cas targeting beyond Chimalliviridae are not well understood. In this study, we used pan-genome analysis to identify orthologous gene families shared among 331 jumbo phages with complete genomes. We show that jumbo phages lack a universally conserved set of core genes but identified seven "soft-core genes" conserved in over 50% of these phages. These genes primarily govern DNA-related activities, such as replication, repair, or nucleotide synthesis. Jumbo phages exhibit a wide array of accessory and unique genes, underscoring their genetic diversity. Phylogenetic analyses of the soft-core genes revealed frequent horizontal gene transfer events between jumbo phages, non-jumbo phages, and occasionally even giant eukaryotic viruses, indicating a polyphyletic evolutionary nature. We categorized jumbo phages into 11 major viral clusters (VCs) spanning 130 sub-clusters, with the majority being multi-genus jumbo phage clusters. Moreover, through the analysis of hallmark genes related to CRISPR-Cas targeting, we predict that many jumbo phages can evade host immune systems using both known and yet-to-be-identified mechanisms. In summary, our study enhances our understanding of jumbo phages, shedding light on their pan-genome diversity and remarkable genome protection capabilities.

IMPORTANCE: Jumbo phages are large bacterial viruses known for more than 50 years. However, only in recent years, a significant number of complete genome sequences of jumbo phages have become available. In this study, we employed comparative genomic approaches to investigate the genomic diversity and genome protection capabilities of the 331 jumbo phages. Our findings revealed that jumbo phages exhibit high genetic diversity, with only a few genes being relatively conserved across jumbo phages. Interestingly, our data suggest that jumbo phages employ yet-to-be-identified strategies to protect their DNA from the host immune system, such as CRISPR-Cas.

RevDate: 2024-09-11

Sirén J, Eskandar P, Ungaro MT, et al (2024)

Personalized pangenome references.

Nature methods [Epub ahead of print].

Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants. However, this blunt heuristic both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach that imputes a personalized pangenome subgraph by sampling local haplotypes according to k-mer counts in the reads. We implement the approach in the vg toolkit (https://github.com/vgteam/vg) for the Giraffe short-read aligner and compare its accuracy to state-of-the-art methods using human pangenome graphs from the Human Pangenome Reference Consortium. This reduces small variant genotyping errors by four times relative to the Genome Analysis Toolkit and makes short-read structural variant genotyping of known variants competitive with long-read variant discovery methods.

RevDate: 2024-09-11

Thorgersen MP, Goff JL, Trotter VV, et al (2024)

Fitness factors impacting survival of a subsurface bacterium in contaminated groundwater.

The ISME journal pii:7755367 [Epub ahead of print].

Many factors contribute to the ability of a microbial species to persist when encountering complexly contaminated environments including time of exposure, the nature and concentration of contaminants, availability of nutritional resources, and possession of a combination of appropriate molecular mechanisms needed for survival. Herein we sought to identify genes that are most important for survival of Gram-negative Enterobacteriaceae in contaminated groundwater environments containing high concentrations of nitrate and metals using the metal-tolerant Oak Ridge Reservation (ORR) isolate, Pantoea sp. MT58 (MT58). Survival fitness experiments in which a randomly barcoded transposon insertion (RB-TnSeq) library of MT58 was exposed directly to contaminated ORR groundwater samples from across a nitrate and mixed metal contamination plume were used to identify genes important for survival with increasing exposure times and concentrations of contaminants, and availability of a carbon source. Genes involved in controlling and using carbon, encoding transcriptional regulators, and related to Gram-negative outer membrane processes were among those found to be important for survival in contaminated ORR groundwater. A comparative genomics analysis of 75 Pantoea genus strains allowed us to further separate the survival determinants into core and non-core genes in the Pantoea pangenome, revealing insights into the survival of subsurface microorganisms during contaminant plume intrusion.

RevDate: 2024-09-11

Liu Z, Yang F, Wan H, et al (2024)

Genome architecture of the allotetraploid wild grass Aegilops ventricosa reveals its evolutionary history and contributions to wheat improvement.

Plant communications pii:S2590-3462(24)00527-3 [Epub ahead of print].

The allotetraploid wild grass Aegilops ventricosa (2n=4X=28, genome D[v]D[v]N[v]N[v]) has been recognized as an important germplasm resource for wheat improvement due to its ability to tolerate biotic stresses. Especially 2N[v]S segment from Aegilops ventricosa, as a stable and effective resistance source, has greatly contributed to wheat improvement. The 2N[v]S/2AS translocation is a prevalent chromosomal translocation between common wheat and wild relatives, ranking just behind the 1B/1R translocation in importance for modern wheat breeding. Here, we assembled a high-quality chromosome-level reference genome of Ae. ventricosa RM271 with a total length of 8.67 Gb. Phylogenomic analyses revealed that the progenitor of the D[v] subgenome of Ae. ventricosa was Ae. tauschii ssp. tauschii (genome DD); in contrast, the progenitor of the D subgenome of bread wheat (Triticum aestivum L.) was Ae. tauschii ssp. strangulata (genome DD). The oldest polyploidization time of Ae. ventricosa occurred ∼0.7 million years ago. The D[v] subgenome of Ae. ventricosa was less conserved than the D subgenome of bread wheat. Construction of a graph-based pangenome of 2AS/6N[v]L (originally known as 2N[v]S) segments from Ae. ventricosa and other genomes in the Triticeae enables us identifying candidate resistance genes sourced from Ae. ventricosa. We identified 12 nonredundant introgressed segments from the D[v] and N[v] subgenomes using a large winter wheat collection representing the full diversity of the wheat European genetic pool, and 29.40% of European wheat varieties inherited at least one of these segments. The high-quality RM271 reference genome will provide a basis for cloning key genes, including the Yr17-Lr37-Sr38-Cre5 resistance gene cluster in Ae. ventricosa, and facilitate the full use of elite wild genetic resources to accelerate wheat improvement.

RevDate: 2024-09-10

Li X, Huo L, Li X, et al (2024)

Genomes of diverse Actinidia species provide insights into cis-regulatory motifs and genes associated with critical traits.

BMC biology, 22(1):200.

BACKGROUND: Kiwifruit, belonging to the genus Actinidia, represents a unique fruit crop characterized by its modern cultivars being genetically diverse and exhibiting remarkable variations in morphological traits and adaptability to harsh environments. However, the genetic mechanisms underlying such morphological diversity remain largely elusive.

RESULTS: We report the high-quality genomes of five Actinidia species, including Actinidia longicarpa, A. macrosperma, A. polygama, A. reticulata, and A. rufa. Through comparative genomics analyses, we identified three whole genome duplication events shared by the Actinidia genus and uncovered rapidly evolving gene families implicated in the development of characteristic kiwifruit traits, including vitamin C (VC) content and fruit hairiness. A range of structural variations were identified, potentially contributing to the phenotypic diversity in kiwifruit. Notably, phylogenomic analyses revealed 76 cis-regulatory elements within the Actinidia genus, predominantly associated with stress responses, metabolic processes, and development. Among these, five motifs did not exhibit similarity to known plant motifs, suggesting the presence of possible novel cis-regulatory elements in kiwifruit. Construction of a pan-genome encompassing the nine Actinidia species facilitated the identification of gene DTZ79_23g14810 specific to species exhibiting extraordinarily high VC content. Expression of DTZ79_23g14810 is significantly correlated with the dynamics of VC concentration, and its overexpression in the transgenic roots of kiwifruit plants resulted in increased VC content.

CONCLUSIONS: Collectively, the genomes and pan-genome of diverse Actinidia species not only enhance our understanding of fruit development but also provide a valuable genomic resource for facilitating the genome-based breeding of kiwifruit.

RevDate: 2024-09-10

Duan S, Yan L, Shen Z, et al (2024)

Genomic analyses of agronomic traits in tea plants and related Camellia species.

Frontiers in plant science, 15:1449006.

The genus Camellia contains three types of domesticates that meet various needs of ancient humans: the ornamental C. japonica, the edible oil-producing C. oleifera, and the beverage-purposed tea plant C. sinensis. The genomic drivers of the functional diversification of Camellia domesticates remain unknown. Here, we present the genomic variations of 625 Camellia accessions based on a new genome assembly of C. sinensis var. assamica ('YK10'), which consists of 15 pseudo-chromosomes with a total length of 3.35 Gb and a contig N50 of 816,948 bp. These accessions were mainly distributed in East Asia, South Asia, Southeast Asia, and Africa. We profiled the population and subpopulation structure in tea tree Camellia to find new evidence for the parallel domestication of C. sinensis var. assamica (CSA) and C. sinensis var. sinensis (CSS). We also identified candidate genes associated with traits differentiating CSA, CSS, oilseed Camellia, and ornamental Camellia cultivars. Our results provide a unique global view of the genetic diversification of Camellia domesticates and provide valuable resources for ongoing functional and molecular breeding research.

RevDate: 2024-09-10

Stanley S, Silva-Costa C, Gomes-Silva J, et al (2024)

CC180 clade dynamics does not universally explain Streptococcus pneumoniae serotype 3 persistence post-vaccine: a global comparative population genomics study.

medRxiv : the preprint server for health sciences pii:2024.08.29.24312665.

BACKGROUND: Clonal complex 180 (CC180) is currently the major clone of serotype 3 Streptococcus pneumoniae (Spn). The 13-valent pneumococcal conjugate vaccine (PCV13) does not have significant efficacy against serotype 3 despite polysaccharide inclusion in the vaccine. It was hypothesized that PCV13 may effectively control Clade I of CC180 but that Clades III and IV are resistant, provoking a population shift that enables serotype 3 persistence. This has been observed in the United States, England, and Wales but not Spain. We tested this hypothesis further utilizing a dataset from Portugal.

METHODS: We whole-genome sequenced (WGS) 501 serotype 3 strains from Portugal isolated from patients with pneumococcal infections between 1999-2020. The draft genomes underwent phylogenetic analyses, pangenome profiling, and a genome-wide association study (GWAS). We also completed antibiotic susceptibility testing and compiled over 2,600 serotype 3 multilocus sequence type 180 (MLST180) WGSs to perform global comparative genomics.

FINDINGS: CC180 Clades I, II, III, IV, and VI distributions were similar when comparing non-invasive pneumonia isolates and invasive disease isolates (Fisher's exact test, P=0.29), and adult and pediatric cases (Fisher's exact test, P=0.074). The serotype 3 CCs shifted post-PCV13 (Fisher's exact test, P<0.0001) and Clade I became dominant. Clade I is largely antibiotic-sensitive and carries the ΦOXC141 prophage but the pangenome is heterogenous. Strains from Portugal and Spain, where Clade I remains dominant post-PCV13, have larger pangenomes and are associated with the presence of two genes encoding hypothetical proteins.

INTERPRETATION: Clade I became dominant in Portugal post-PCV13, despite the burden of the prophage and antibiotic sensitivity. The accessory genome content may mitigate these fitness costs. Regional differences in Clade I prevalence and pangenome heterogeneity suggest that clade dynamics is not a generalizable approach to understanding serotype 3 vaccine escape.

FUNDING: National Institute of Child Health and Human Development, Pfizer, and Merck Sharp & Dohme.

RESEARCH IN CONTEXT: Evidence before this study: We conducted this study because of the mounting interest surrounding the changing prevalence of serotype 3 Streptococcus pneumoniae (Spn) genetic lineages and the potential association with escape from 13-valent pneumococcal conjugate vaccine (PCV13) control. To inform our investigation, we searched the PubMed database using different combinations of the following keywords: "Streptococcus pneumoniae", "serotype 3", "CC180", "PCV13", "Clade Iα", "Clade Iβ", and "Clade II". The search included all English language primary research articles published before July 1 [st] , 2024; this language limitation may bias the results of our assessment. Most ST3 isolates belong to clonal complex 180 (CC180), and one study identified three major lineages within CC180: Clade Iα, Clade Iβ, and Clade II. This study observed a global trend of increasing Clade II prevalence with a concomitant decrease in Clade I prevalence over time, which was associated with the introduction of PCV13 in the United States. A report from England and Wales made a similar observation. It was therefore hypothesized that PCV13 may be effective at controlling Clade Iα and that Clade II is driving vaccine escape. Later work refined the clade classification system as follows: Clade I (Clade Iα), Clades II and VI (Clade Iβ), Clades III and IV (Clade II), and Clade V. Clade I strains are marked by a significantly lower recombination rate partly due to the presence of a lineage-specific prophage interfering with competence development, which is a potential mechanism explaining the possible reduced fitness of Clade I. Clade I is also noted to be mostly antibiotic-susceptible. However, a recent study found that Clade I persists as a dominant serotype 3 lineage in Spain, so the generalizability and implications of clade dynamics remain unclear. Added value of this study: Early work assessing the association between changes in serotype 3 clade prevalence and PCV13 was limited by small sample sizes. In addition, studies investigating differences in clade dynamics did not comprehensively consider patient age or disease manifestations such as non-invasive pneumonia and invasive infections. In this study, we evaluated 501 serotype 3 strains from Portugal to investigate clade dynamics. This must be explored in different geographic contexts for a more robust understanding of changing serotype 3 population genomics. We also sought to define genetic determinants linked to strains from regions in which Clade I remains dominant. This is an important step towards a more mechanistic understanding of the serotype 3 CC180 lineage fitness landscape.Implications of all the available evidence: Unlike other serotypes covered by PCV13, serotype 3 has evaded vaccine control. It has been suggested that Clade I prevalence has decreased due to PCV13, which has created an expanded niche for strains from other clades and ultimately renders PCV13 less effective against serotype 3. This postulation has important implications for the future design of an improved vaccine, so this hypothesis must be thoroughly tested in diverse contexts. We find that Clade I remains the dominant lineage in Portugal even after the introduction of PCV13. We delineate Clade I pangenome heterogeneity and show that strains from Portugal and Spain share similar pangenome features in contrast to Clade I strains from regions where Clade I decreased in prevalence, which should motivate future studies to elucidate more generalizable population genomics trends that may better inform strategies for the design of an improved vaccine.

RevDate: 2024-09-09

Zorigt T, Furuta Y, Paudel A, et al (2024)

Pan-genome analysis reveals novel chromosomal markers for multiplex PCR-based specific detection of Bacillus anthracis.

BMC infectious diseases, 24(1):942.

BACKGROUND: Bacillus anthracis is a highly pathogenic bacterium that can cause lethal infection in animals and humans, making it a significant concern as a pathogen and biological agent. Consequently, accurate diagnosis of B. anthracis is critically important for public health. However, the identification of specific marker genes encoded in the B. anthracis chromosome is challenging due to the genetic similarity it shares with B. cereus and B. thuringiensis.

METHODS: The complete genomes of B. anthracis, B. cereus, B. thuringiensis, and B. weihenstephanensis were de novo annotated with Prokka, and these annotations were used by Roary to produce the pan-genome. B. anthracis exclusive genes were identified by Perl script, and their specificity was examined by nucleotide BLAST search. A local BLAST alignment was performed to confirm the presence of the identified genes across various B. anthracis strains. Multiplex polymerase chain reactions (PCR) were established based on the identified genes.

RESULT: The distribution of genes among 151 whole-genome sequences exhibited three distinct major patterns, depending on the bacterial species and strains. Further comparative analysis between the three groups uncovered thirty chromosome-encoded genes exclusively present in B. anthracis strains. Of these, twenty were found in known lambda prophage regions, and ten were in previously undefined region of the chromosome. We established three distinct multiplex PCRs for the specific detection of B. anthracis by utilizing three of the identified genes, BA1698, BA5354, and BA5361.

CONCLUSION: The study identified thirty chromosome-encoded genes specific to B. anthracis, encompassing previously described genes in known lambda prophage regions and nine newly discovered genes from an undefined gene region to the best of our knowledge. Three multiplex PCR assays offer an accurate and reliable alternative method for detecting B. anthracis. Furthermore, these genetic markers have value in anthrax vaccine development, and understanding the pathogenicity of B. anthracis.

RevDate: 2024-09-09

Ou S, Scheben A, Collins T, et al (2024)

Differences in activity and stability drive transposable element variation in tropical and temperate maize.

Genome research pii:gr.278131.123 [Epub ahead of print].

Much of the profound interspecific variation in genome content has been attributed to transposable elements (TEs). To explore the extent of TE variation within species, we developed an optimized open-source algorithm, panEDTA, to de novo annotate TEs in a pangenome context. We then generated a unified TE annotation for a maize pangenome derived from 26 reference-quality genomes, which reveals an excess of 35.1 Mb of TE sequences per genome in tropical maize relative to temperate maize. A small number (n = 216) of TE families, mainly LTR retrotransposons, drive these differences. Evidence from the methylome, transcriptome, LTR age distribution, and LTR insertional polymorphisms reveals that 64.7% of the variability is contributed by LTR families that are young, less methylated, and more expressed in tropical maize, whereas 18.5% is driven by LTR families with removal or loss in temperate maize. Additionally, we find enrichment for Young LTR families adjacent to nucleotide-binding and leucine-rich repeat (NLR) clusters of varying copy number across lines, suggesting TE activity may be associated with disease resistance in maize.

RevDate: 2024-09-09

Hung TK, Liu WC, Lai SK, et al (2024)

Genetic complexity of killer-cell immunoglobulin-like receptor genes in human pangenome assemblies.

Genome research pii:gr.278358.123 [Epub ahead of print].

The killer-cell immunoglobulin-like receptor (KIR) gene complex, a highly polymorphic region of the human genome that encodes proteins involved in immune responses, poses strong challenges in genotyping owing to its remarkable genetic diversity and structural intricacy. Accurate analysis of KIR alleles, including their structural variations, is crucial for understanding their roles in various immune responses. Leveraging the high-quality genome assemblies from the Human Pangenome Reference Consortium (HPRC), we present a novel bioinformatic tool, the structural KIR annoTator (SKIRT), to investigate gene diversity and facilitate precise KIR allele analysis. In 47 HPRC-phased assemblies, SKIRT identifies a recurrent novel KIR2DS4/3DL1 fusion gene in the paternal haplotype of HG02630 and maternal haplotype of NA19240. Additionally, SKIRT accurately identifies eight structural variants and 15 novel nonsynonymous alleles, all of which are independently validated using short-read data or quantitative polymerase chain reaction. Our study has discovered a total of 570 novel alleles, among which eight haplotypes harbor at least one KIR gene duplication, six haplotypes have lost at least one framework gene, and 75 out of 94 haplotypes (79.8%) carry at least five novel alleles, thus confirming KIR genetic diversity. These findings are pivotal in providing insights into KIR gene diversity and serve as a solid foundation for understanding the functional consequences of KIR structural variations. High-resolution genome assemblies offer unprecedented opportunities to explore polymorphic regions that are challenging to investigate using short-read sequencing methods. The SKIRT pipeline emerges as a highly efficient tool, enabling the comprehensive detection of the complete spectrum of KIR alleles within human genome assemblies.

RevDate: 2024-09-08
CmpDate: 2024-09-08

Kenneally C, Murphy CP, Sleator RD, et al (2024)

Genotypic and phenotypic characterisation of asymptomatic bacteriuria (ABU) isolates displaying bacterial interference against multi-drug resistant uropathogenic E. Coli.

Archives of microbiology, 206(10):394.

Escherichia coli can colonise the urogenital tract of individuals without causing symptoms of infection, in a condition referred to as asymptomatic bacteriuria (ABU). ABU isolates can protect the host against symptomatic urinary tract infections (UTIs) by bacterial interference against uropathogenic E. coli (UPEC). The aim of this study was to investigate the genotypic and phenotypic characteristics of five ABU isolates from midstream urine samples of adults. Comparative genomic and phenotypic analysis was conducted including an antibiotic resistance profile, pangenome analysis, and a putative virulence profile. Based on the genome analysis, the isolates consisted of one from phylogroup A, three from phylogroup B2, and one from phylogroup D. Two of the isolates, PUTS 58 and SK-106-1, were noted for their lack of antibiotic resistance and virulence genes compared to the prototypic ABU strain E. coli 83,972. This study provides insights into the genotypic and phenotypic profiles of uncharacterised ABU isolates, and how relevant fitness and virulence traits can impact their potential suitability for therapeutic bacterial interference.

RevDate: 2024-09-07
CmpDate: 2024-09-07

Campbell AM, Gavilan RG, Abanto Marin M, et al (2024)

Evolutionary dynamics of the successful expansion of pandemic Vibrio parahaemolyticus ST3 in Latin America.

Nature communications, 15(1):7828.

The underlying evolutionary mechanisms driving global expansions of pathogen strains are poorly understood. Vibrio parahaemolyticus is one of only two marine pathogens where variants have emerged in distinct climates globally. The success of a Vibrio parahaemolyticus clone (VpST3) in Latin America- the first spread identified outside its endemic region of tropical Asia- provided an invaluable opportunity to investigate mechanisms of VpST3 expansion into a distinct marine climate. A global collection of VpST3 isolates and novel Latin American isolates were used for evolutionary population genomics, pangenome analysis and combined with oceanic climate data. We found a VpST3 population (LatAm-VpST3) introduced in Latin America well before the emergence of this clone in India, previously considered the onset of the VpST3 epidemic. LatAm-VpST3 underwent successful adaptation to local conditions over its evolutionary divergence from Asian VpST3 isolates, to become dominant in Latin America. Selection signatures were found in genes providing resilience to the distinct marine climate. Core genome mutations and accessory gene presences that promoted survival over long dispersals or increased environmental fitness were associated with environmental conditions. These results provide novel insights into the global expansion of this successful V. parahaemolyticus clone into regions with different climate scenarios.

RevDate: 2024-09-06
CmpDate: 2024-09-07

Kim HS, Haley OC, Portwood Ii JL, et al (2024)

Fusarium Protein Toolkit: a web-based resource for structural and variant analysis of Fusarium species.

BMC microbiology, 24(1):326.

BACKGROUND: The genus Fusarium poses significant threats to food security and safety worldwide because numerous species of the fungus cause destructive diseases and/or mycotoxin contamination in crops. The adverse effects of climate change are exacerbating some existing threats and causing new problems. These challenges highlight the need for innovative solutions, including the development of advanced tools to identify targets for control strategies.

DESCRIPTION: In response to these challenges, we developed the Fusarium Protein Toolkit (FPT), a web-based tool that allows users to interrogate the structural and variant landscape within the Fusarium pan-genome. The tool displays both AlphaFold and ESMFold-generated protein structure models from six Fusarium species. The structures are accessible through a user-friendly web portal and facilitate comparative analysis, functional annotation inference, and identification of related protein structures. Using a protein language model, FPT predicts the impact of over 270 million coding variants in two of the most agriculturally important species, Fusarium graminearum and F. verticillioides. To facilitate the assessment of naturally occurring genetic variation, FPT provides variant effect scores for proteins in a Fusarium pan-genome based on 22 diverse species. The scores indicate potential functional consequences of amino acid substitutions and are displayed as intuitive heatmaps using the PanEffect framework.

CONCLUSION: FPT fills a knowledge gap by providing previously unavailable tools to assess structural and missense variation in proteins produced by Fusarium. FPT has the potential to deepen our understanding of pathogenic mechanisms in Fusarium, and aid the identification of genetic targets for control strategies that reduce crop diseases and mycotoxin contamination. Such targets are vital to solving the agricultural problems incited by Fusarium, particularly evolving threats resulting from climate change. Thus, FPT has the potential to contribute to improving food security and safety worldwide.

RevDate: 2024-09-06

Masignani V, Rappuoli R, M Pizza (2024)

Next generation of "magic bullets", solutions from the microbial pangenome.

EMBO molecular medicine [Epub ahead of print].

RevDate: 2024-09-06

Najjari A, Jabberi M, Chérif SF, et al (2024)

Genome and pan-genome analysis of a new exopolysaccharide-producing bacterium Pyschrobacillus sp. isolated from iron ores deposit and insights into iron uptake.

Frontiers in microbiology, 15:1440081.

Bacterial exopolysaccharides (EPS) have emerged as one of the key players in the field of heavy metal-contaminated environmental bioremediation. This study aimed to characterize and evaluate the metal biosorption potential of EPS produced by a novel Psychrobacillus strain, NEAU-3TGS, isolated from an iron ore deposit at Tamra iron mine, northern Tunisia. Genomic and pan-genomic analysis of NEAU-3TGS bacterium with nine validated published Psychrobacillus species was also performed. The results showed that the NEAU-3TGS genome (4.48 Mb) had a mean GC content of 36%, 4,243 coding sequences and 14 RNA genes. Phylogenomic analysis and calculation of nucleotide identity (ANI) values (less than 95% for new species with all strains) confirmed that NEAU-3TGS represents a potential new species. Pangenomic analysis revealed that Psychrobacillus genomic diversity represents an "open" pangenome model with 33,091 homologous genes, including 65 core, 3,738 shell, and 29,288 cloud genes. Structural EPS characterization by attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectroscopy showed uronic acid and α-1,4-glycosidic bonds as dominant components of the EPS. X-ray diffraction (XRD) analysis revealed the presence of chitin, chitosan, and calcite CaCO3 and confirmed the amorphous nature of the EPS. Heavy metal bioabsorption assessment showed that iron and lead were more adsorbed than copper and cadmium. Notably, the optimum activity was observed at 37°C, pH=7 and after 3 h contact of EPS with each metal. Genomic insights on iron acquisition and metabolism in Psychrobacillus sp. NEAU-3TGS suggested that no genes involved in siderophore biosynthesis were found, and only the gene cluster FeuABCD and trilactone hydrolase genes involved in the uptake of siderophores, iron transporter and exporter are present. Molecular modelling and docking of FeuA (protein peptidoglycan siderophore-binding protein) and siderophores ferrienterobactine [Fe[+3] (ENT)][-3] and ferribacillibactine [Fe[+3] (BB)][-3] ligand revealed that [Fe[+3] (ENT)][-3] binds to Phe122, Lys127, Ile100, Gln314, Arg215, Arg217, and Gln252. Almost the same for [Fe[+3] (ENT)][-3] in addition to Cys222 and Tyr229, but not Ile100.To the best of our knowledge, this is the first report on the characterization of EPS and the adsorption of heavy metals by Psychrobacillus species. The heavy metal removal capabilities may be advantageous for using these organisms in metal remediation.

RevDate: 2024-09-05
CmpDate: 2024-09-06

Cheng R, Zhao Z, Tang Y, et al (2024)

Genome-wide survey of KT/HAK/KUP genes in the genus Citrullus and analysis of their involvement in K[+]-deficiency and drought stress responses in between C. lanatus and C. amarus.

BMC genomics, 25(1):836.

BACKGROUND: The KT/HAK/KUP is the largest K[+] transporter family in plants, playing crucial roles in K[+] absorption, transport, and defense against environmental stress. Sweet watermelon is an economically significant horticultural crop belonging to the genus Citrullus, with a high demand for K[+] during its growth process. However, a comprehensive analysis of the KT/HAK/KUP gene family in watermelon has not been reported.

RESULTS: 14 KT/HAK/KUP genes were identified in the genomes of each of seven Citrullus species. These KT/HAK/KUPs in watermelon were unevenly distributed across seven chromosomes. Segmental duplication is the primary driving force behind the expansion of the KT/HAK/KUP family, subjected to purifying selection during domestication (Ka/Ks < 1), and all KT/HAK/KUPs exhibit conserved motifs and could be phylogenetically classified into four groups. The promoters of KT/HAK/KUPs contain numerous cis-regulatory elements related to plant growth and development, phytohormone response, and stress response. Under K[+] deficiency, the growth of watermelon seedlings was significantly inhibited, with cultivated watermelon experiencing greater impacts (canopy width, redox enzyme activity) compared to the wild type. All KT/HAK/KUPs in C. lanatus and C. amarus exhibit specific expression responses to K[+]-deficiency and drought stress by qRT-PCR. Notably, ClG42_07g0120700/CaPI482276_07g014010 were predominantly expressed in roots and were further induced by K[+]-deficiency and drought stress. Additionally, the K[+] transport capacity of ClG42_07g0120700 under low K[+] stress was confirmed by yeast functional complementation assay.

CONCLUSIONS: KT/HAK/KUP genes in watermelon were systematically identified and analyzed at the pangenome level and provide a foundation for understanding the classification and functions of the KT/HAK/KUPs in watermelon plants.

RevDate: 2024-09-05

de Oliva BHD, do Nascimento AB, de Oliveira JP, et al (2024)

Genomic insights into a Proteus mirabilis strain inducing avian cellulitis.

Brazilian journal of microbiology : [publication of the Brazilian Society for Microbiology] [Epub ahead of print].

Proteus mirabilis, a microorganism distributed in soil, water, and animals, is clinically known for causing urinary tract infections in humans. However, recent studies have linked it to skin infections in broiler chickens, termed avian cellulitis, which poses a threat to animal welfare. While Avian Pathogenic Escherichia coli (APEC) is the primary cause of avian cellulitis, few cases of P. mirabilis involvement are reported, raising questions about the factors facilitating such occurrences. This study employed a pan-genomic approach to investigate whether unique genes exist in P. mirabilis strains causing avian cellulitis. The genome of LBUEL-A33, a P. mirabilis strain known to cause this infection, was assembled, and compared with other P. mirabilis strains isolated from poultry and other sources. Additionally, in silico serogroup analysis was conducted. Results revealed numerous genes unique to the LBUEL-A33 strain. No function in cellulitis was identified for these genes, and in silico investigation of the virulence potential of LBUEL-A33's exclusive proteins proved inconclusive. These findings support that multiple factors are necessary for P. mirabilis to cause avian cellulitis. Furthermore, this species likely employs its own unique arsenal of virulence factors, as many identified mechanisms are analogous to those of E. coli. While antigenic gene clusters responsible for serogroups were identified, no clear trend was observed, and the gene cluster of LBUEL-A33 did not show homology with any sequenced Proteus serogroups. These results reinforce the understanding that this disease is multifactorial, necessitating further research to unravel the mechanisms and underpin the development of control and prevention strategies.

RevDate: 2024-09-05
CmpDate: 2024-09-05

Brandenburg JM, Stapleton GS, Kline KE, et al (2024)

Salmonella Hadar linked to two distinct transmission vehicles highlights challenges to enteric disease outbreak investigations.

Epidemiology and infection, 152:e86 pii:S0950268824000682.

In 2020, an outbreak of Salmonella Hadar illnesses was linked to contact with non-commercial, privately owned (backyard) poultry including live chickens, turkeys, and ducks, resulting in 848 illnesses. From late 2020 to 2021, this Salmonella Hadar strain caused an outbreak that was linked to ground turkey consumption. Core genome multilocus sequence typing (cgMLST) analysis determined that the Salmonella Hadar isolates detected during the outbreak linked to backyard poultry and the outbreak linked to ground turkey were closely related genetically (within 0-16 alleles). Epidemiological and traceback investigations were unable to determine how Salmonella Hadar detected in backyard poultry and ground turkey were linked, despite this genetic relatedness. Enhanced molecular characterization methods, such as analysis of the pangenome of Salmonella isolates, might be necessary to understand the relationship between these two outbreaks. Similarly, enhanced data collection during outbreak investigations and further research could potentially aid in determining whether these transmission vehicles are truly linked by a common source and what reservoirs exist across the poultry industries that allow Salmonella Hadar to persist. Further work combining epidemiological data collection, more detailed traceback information, and genomic analysis tools will be important for monitoring and investigating future enteric disease outbreaks.

RevDate: 2024-09-04
CmpDate: 2024-09-04

Rinker DC, Sauters TJC, Steffen K, et al (2024)

Strain heterogeneity in a non-pathogenic Aspergillus fungus highlights factors associated with virulence.

Communications biology, 7(1):1082.

Fungal pathogens exhibit extensive strain heterogeneity, including variation in virulence. Whether closely related non-pathogenic species also exhibit strain heterogeneity remains unknown. Here, we comprehensively characterized the pathogenic potentials (i.e., the ability to cause morbidity and mortality) of 16 diverse strains of Aspergillus fischeri, a non-pathogenic close relative of the major pathogen Aspergillus fumigatus. In vitro immune response assays and in vivo virulence assays using a mouse model of pulmonary aspergillosis showed that A. fischeri strains varied widely in their pathogenic potential. Furthermore, pangenome analyses suggest that A. fischeri genomic and phenotypic diversity is even greater. Genomic, transcriptomic, and metabolic profiling identified several pathways and secondary metabolites associated with variation in virulence. Notably, strain virulence was associated with the simultaneous presence of the secondary metabolites hexadehydroastechrome and gliotoxin. We submit that examining the pathogenic potentials of non-pathogenic close relatives is key for understanding the origins of fungal pathogenicity.

RevDate: 2024-09-04
CmpDate: 2024-09-04

Veseli I, DeMers MA, Cooper ZS, et al (2024)

Digital Microbe: a genome-informed data integration framework for team science on emerging model organisms.

Scientific data, 11(1):967.

The remarkable pace of genomic data generation is rapidly transforming our understanding of life at the micron scale. Yet this data stream also creates challenges for team science. A single microbe can have multiple versions of genome architecture, functional gene annotations, and gene identifiers; additionally, the lack of mechanisms for collating and preserving advances in this knowledge raises barriers to community coalescence around shared datasets. "Digital Microbes" are frameworks for interoperable and reproducible collaborative science through open source, community-curated data packages built on a (pan)genomic foundation. Housed within an integrative software environment, Digital Microbes ensure real-time alignment of research efforts for collaborative teams and facilitate novel scientific insights as new layers of data are added. Here we describe two Digital Microbes: 1) the heterotrophic marine bacterium Ruegeria pomeroyi DSS-3 with > 100 transcriptomic datasets from lab and field studies, and 2) the pangenome of the cosmopolitan marine heterotroph Alteromonas containing 339 genomes. Examples demonstrate how an integrated framework collating public (pan)genome-informed data can generate novel and reproducible findings.

RevDate: 2024-09-04

Bolognini D, Halgren A, Lou RN, et al (2024)

Recurrent evolution and selection shape structural diversity at the amylase locus.

Nature [Epub ahead of print].

The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations[1]. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake[2], although evidence of recent selection is lacking[3,4]. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately 5,600 contemporary and ancient humans, we resolve the diversity and evolutionary history of structural variation at the amylase locus. We find that amylase genes have higher copy numbers in agricultural populations than in fishing, hunting and pastoral populations. We identify 28 distinct amylase structural architectures and demonstrate that nearly identical structures have arisen recurrently on different haplotype backgrounds throughout recent human history. AMY1 and AMY2A genes each underwent multiple duplication/deletion events with mutation rates up to more than 10,000-fold the single-nucleotide polymorphism mutation rate, whereas AMY2B gene duplications share a single origin. Using a pangenome-based approach, we infer structural haplotypes across thousands of humans identifying extensively duplicated haplotypes at higher frequency in modern agricultural populations. Leveraging 533 ancient human genomes, we find that duplication-containing haplotypes (with more gene copies than the ancestral haplotype) have rapidly increased in frequency over the past 12,000 years in West Eurasians, suggestive of positive selection. Together, our study highlights the potential effects of the agricultural revolution on human genomes and the importance of structural variation in human adaptation.

RevDate: 2024-09-04

Fan X, Chen L, Chen M, et al (2024)

Pan-omics-based characterization and prediction of highly multidrug-adapted strains from an outbreak fungal species complex.

Innovation (Cambridge (Mass.)), 5(5):100681.

Strains from the Cryptococcus gattii species complex (CGSC) have caused the Pacific Northwest cryptococcosis outbreak, the largest cluster of life-threatening fungal infections in otherwise healthy human hosts known to date. In this study, we utilized a pan-phenome-based method to assess the fitness outcomes of CGSC strains under 31 stress conditions, providing a comprehensive overview of 2,821 phenotype-strain associations within this pathogenic clade. Phenotypic clustering analysis revealed a strong correlation between distinct types of stress phenotypes in a subset of CGSC strains, suggesting that shared determinants coordinate their adaptations to various stresses. Notably, a specific group of strains, including the outbreak isolates, exhibited a remarkable ability to adapt to all three of the most commonly used antifungal drugs for treating cryptococcosis (amphotericin B, 5-fluorocytosine, and fluconazole). By integrating pan-genomic and pan-transcriptomic analyses, we identified previously unrecognized genes that play crucial roles in conferring multidrug resistance in an outbreak strain with high multidrug adaptation. From these genes, we identified biomarkers that enable the accurate prediction of highly multidrug-adapted CGSC strains, achieving maximum accuracy and area under the curve (AUC) of 0.79 and 0.86, respectively, using machine learning algorithms. Overall, we developed a pan-omic approach to identify cryptococcal multidrug resistance determinants and predict highly multidrug-adapted CGSC strains that may pose significant clinical concern.

RevDate: 2024-09-04

Do VH, Nguyen VS, Nguyen SH, et al (2024)

PanKA: Leveraging population pangenome to predict antibiotic resistance.

iScience, 27(9):110623.

Machine learning has the potential to be a powerful tool in the fight against antimicrobial resistance (AMR), a critical global health issue. Machine learning can identify resistance mechanisms from DNA sequence data without prior knowledge. The first step in building a machine learning model is a feature extraction from sequencing data. Traditional methods like single nucleotide polymorphism (SNP) calling and k-mer counting yield numerous, often redundant features, complicating prediction and analysis. In this paper, we propose PanKA, a method using the pangenome to extract a concise set of relevant features for predicting AMR. PanKA not only enables fast model training and prediction but also improves accuracy. Applied to the Escherichia coli and Klebsiella pneumoniae bacterial species, our model is more accurate than conventional and state-of-the-art methods in predicting AMR.

RevDate: 2024-09-03

Bonnici V, D Chicco (2024)

Seven quick tips for gene-focused computational pangenomic analysis.

BioData mining, 17(1):28.

Pangenomics is a relatively new scientific field which investigates the union of all the genomes of a clade. The word pan means everything in ancient Greek; the term pangenomics originally regarded genomes of bacteria and was later intended to refer to human genomes as well. Modern bioinformatics offers several tools to analyze pangenomics data, paving the way to an emerging field that we can call computational pangenomics. Current computational power available for the bioinformatics community has made computational pangenomic analyses easy to perform, but this higher accessibility to pangenomics analysis also increases the chances to make mistakes and to produce misleading or inflated results, especially by beginners. To handle this problem, we present here a few quick tips for efficient and correct computational pangenomic analyses with a focus on bacterial pangenomics, by describing common mistakes to avoid and experienced best practices to follow in this field. We believe our recommendations can help the readers perform more robust and sound pangenomic analyses and to generate more reliable results.

RevDate: 2024-09-02
CmpDate: 2024-09-02

Trisakul K, Hinwan Y, Eisiri J, et al (2024)

Comparisons of genome assembly tools for characterization of Mycobacterium tuberculosis genomes using hybrid sequencing technologies.

PeerJ, 12:e17964.

BACKGROUND: Next-generation sequencing of Mycobacterium tuberculosis, the infectious agent causing tuberculosis, is improving the understanding of genomic diversity of circulating lineages and strain-types, and informing knowledge of drug resistance mutations. An increasingly popular approach to characterizing M. tuberculosis genomes (size: 4.4 Mbp) and variants (e.g., single nucleotide polymorphisms (SNPs)) involves the de novo assembly of sequence data.

METHODS: We compared the performance of genome assembly tools (Unicycler, RagOut, and RagTag) on sequence data from nine drug resistant M. tuberculosis isolates (multi-drug (MDR) n = 1; pre-extensively-drug (pre-XDR) n = 8) generated using Illumina HiSeq, Oxford Nanopore Technology (ONT) PromethION, and PacBio platforms.

RESULTS: Our investigation found that Unicycler-based assemblies had significantly higher genome completeness (~98.7%; p values = 0.01) compared to other assembler tools (RagOut = 98.6%, and RagTag = 98.6%). The genome assembly sizes (bp) across isolates and sequencers based on RagOut was significantly longer (p values < 0.001) (4,418,574 ± 8,824 bp) than Unicycler and RagTag assemblies (Unicycler = 4,377,642 ± 55,257 bp, and RagTag = 4,380,711 ± 51,164 bp). RagOut-based assemblies had the fewest contigs (~32) and the longest genome size (4,418,574 bp; vs. H37Rv reference size 4,411,532 bp) and therefore were chosen for downstream analysis. Pan-genome analysis of Illumina and PacBio hybrid assemblies revealed the greatest number of detected genes (4,639 genes; H37Rv reference contains 3,976 genes), while Illumina and ONT hybrid assemblies produced the highest number of SNPs. The number of genes from hybrid assemblies with ONT and PacBio long-reads (mean: 4,620 genes) was greater than short-read assembly alone (4,478 genes). All nine RagOut hybrid genome assemblies detected known mutations in genes associated with MDR-TB and pre-XDR-TB.

CONCLUSIONS: Unicycler software performed the best in terms of achieving contiguous genomes, whereas RagOut improved the quality of Unicycler's genome assemblies by providing a longer genome size. Overall, our approach has demonstrated that short-read and long-read hybrid assembly can provide a more complete genome assembly than short-read assembly alone by detecting pan-genomes and more genes, including IS6110, and SNPs.

RevDate: 2024-09-01
CmpDate: 2024-09-02

Mane RS, Prasad BD, Sahni S, et al (2024)

Biotechnological studies towards improvement of finger millet using multi-omics approaches.

Functional & integrative genomics, 24(5):148.

A plethora of studies have uncovered numerous important genes with agricultural significance in staple crops. However, when it comes to orphan crops like minor millet, genomic research lags significantly behind that of major crops. This situation has promoted a focus on exploring research opportunities in minor millets, particularly in finger millet, using cutting-edge methods. Finger millet, a coarse cereal known for its exceptional nutritional content and ability to withstand environmental stresses represents a promising climate-smart and nutritional crop in the battle against escalating environmental challenges. The existing traditional improvement programs for finger millet are insufficient to address global hunger effectively. The lack of utilization of high-throughput platforms, genome editing, haplotype breeding, and advanced breeding approaches hinders the systematic multi-omics studies on finger millet, which are essential for pinpointing crucial genes related to agronomically important and various stress responses. The growing environmental uncertainties have widened the gap between the anticipated and real progress in crop improvement. To overcome these challenges a combination of cutting-edge multi-omics techniques such as high-throughput sequencing, speed breeding, mutational breeding, haplotype-based breeding, genomic selection, high-throughput phenotyping, pangenomics, genome editing, and more along with integration of deep learning and artificial intelligence technologies are essential to accelerate research efforts in finger millet. The scarcity of multi-omics approaches in finger millet leaves breeders with limited modern tools for crop enhancement. Therefore, leveraging datasets from previous studies could prove effective in implementing the necessary multi-omics interventions to enrich the genetic resource in finger millet.

RevDate: 2024-08-31

Gao J, Y Xu (2024)

DNA sequences alignment method using sparse index on pan-genome graph.

Journal of bioinformatics and computational biology [Epub ahead of print].

The graph of sequences represents the genetic variations of pan-genome concisely and space-efficiently than multiple linear reference genome. In order to accelerate aligning reads to the graph, an index of graph-based reference genomes is used to obtain candidate locations. However, the potential combinatorial explosion of nodes on the sequence graph leads to increasing the index space and maximum memory usage of alignment process considerably, especially for large-scale datasets. For this, existing methods typically attempt to prune complex regions, or extend the length of seeds, which sacrifices the recall of alignment algorithm despite reducing space usage slightly. We present the Sparse-index of Graph (SIG) and alignment algorithm SIG-Aligner, capable of indexing and aligning at the lower memory cost. SIG builds the non-overlapping minimizers index inside nodes of sequence graph and SIG-Aligner filters out most of the false positive matches by the method based on the pigeonhole principle. Compared to Giraffe, the results of computational experiments show that SIG achieves a significant reduction in index memory space ranging from 50% to 75% for the human pan-genome graphs, while still preserving superior or comparable accuracy of alignment and the faster alignment time.

RevDate: 2024-08-30
CmpDate: 2024-08-30

Andrews KR, Besser TE, Stalder T, et al (2024)

Comparative genomic analysis identifies potential adaptive variation in Mycoplasma ovipneumoniae.

Microbial genomics, 10(8):.

Mycoplasma ovipneumoniae is associated with respiratory disease in wild and domestic Caprinae globally, with wide variation in disease outcomes within and between host species. To gain insight into phylogenetic structure and mechanisms of pathogenicity for this bacterial species, we compared M. ovipneumoniae genomes for 99 samples from 6 countries (Australia, Bosnia and Herzegovina, Brazil, China, France and USA) and 4 host species (domestic sheep, domestic goats, bighorn sheep and caribou). Core genome sequences of M. ovipneumoniae assemblies from domestic sheep and goats fell into two well-supported phylogenetic clades that are divergent enough to be considered different bacterial species, consistent with each of these two clades having an evolutionary origin in separate host species. Genome assemblies from bighorn sheep and caribou also fell within these two clades, indicating multiple spillover events, most commonly from domestic sheep. Pangenome analysis indicated a high percentage (91.4 %) of accessory genes (i.e. genes found only in a subset of assemblies) compared to core genes (i.e. genes found in all assemblies), potentially indicating a propensity for this pathogen to adapt to within-host conditions. In addition, many genes related to carbon metabolism, which is a virulence factor for Mycoplasmas, showed evidence for homologous recombination, a potential signature of adaptation. The presence or absence of annotated genes was very similar between sheep and goat clades, with only two annotated genes significantly clade-associated. However, three M. ovipneumoniae genome assemblies from asymptomatic caribou in Alaska formed a highly divergent subclade within the sheep clade that lacked 23 annotated genes compared to other assemblies, and many of these genes had functions related to carbon metabolism. Overall, our results suggest that adaptation of M. ovipneumoniae has involved evolution of carbon metabolism pathways and virulence mechanisms related to those pathways. The genes involved in these pathways, along with other genes identified as potentially involved in virulence in this study, are potential targets for future investigation into a possible genomic basis for the high variation observed in disease outcomes within and between wild and domestic host species.

RevDate: 2024-08-30
CmpDate: 2024-08-30

Askenasy I, Swain JEV, Ho PM, et al (2024)

'Wild Type'.

Microbiology (Reading, England), 170(8):.

In this opinion piece, we consider the meaning of the term 'wild type' in the context of microbiology. This is especially pertinent in the post-genomic era, where we have a greater awareness of species diversity than ever before. Genomic heterogeneity, in vitro evolution/selection pressures, definition of 'the wild', the size and importance of the pan-genome, gene-gene interactions (epistasis), and the nature of the 'wild-type gene' are all discussed. We conclude that wild type is an outdated and even misleading phrase that should be gradually phased out.

RevDate: 2024-08-30
CmpDate: 2024-08-30

de Block T, De Baetselier I, Van den Bossche D, et al (2024)

Genomic oropharyngeal Neisseria surveillance detects MALDI-TOF MS species misidentifications and reveals a novel Neisseria cinerea clade.

Journal of medical microbiology, 73(8):.

Introduction. Commensal Neisseria spp. are highly prevalent in the oropharynx as part of the healthy microbiome. N. meningitidis can colonise the oropharynx too from where it can cause invasive meningococcal disease. To identify N. meningitidis, clinical microbiology laboratories often rely on Matrix Assisted Laser Desorption/Ionisation Time of Flight Mass Spectrometry (MALDI-TOF MS).Hypothesis/Gap statement. N. meningitidis may be misidentified by MALDI-TOF MS.Aim. To conduct genomic surveillance of oropharyngeal Neisseria spp. in order to: (i) verify MALDI-TOF MS species identification, and (ii) characterize commensal Neisseria spp. genomes.Methodology. We analysed whole genome sequence (WGS) data from 119 Neisseria spp. isolates from a surveillance programme for oropharyngeal Neisseria spp. in Belgium. Different species identification methods were compared: (i) MALDI-TOF MS, (ii) Ribosomal Multilocus Sequence Typing (rMLST) and (iii) rplF gene species identification. WGS data were used to further characterize Neisseria species found with supplementary analyses of Neisseria cinerea genomes.Results. Based on genomic species identification, isolates from the oropharyngeal Neisseria surveilence study were composed of the following species: N. meningitidis (n=23), N. subflava (n=61), N. mucosa (n=15), N. oralis (n=8), N. cinerea (n=5), N. elongata (n=3), N. lactamica (n=2), N. bacilliformis (n=1) and N. polysaccharea (n=1). Of these 119 isolates, four isolates identified as N. meningitidis (n=3) and N. subflava (n=1) by MALDI-TOF MS, were determined to be N. polysaccharea (n=1), N. cinerea (n=2) and N. mucosa (n=1) by rMLST. Phylogenetic analyses revealed that N. cinerea isolates from the general population (n=3, cluster one) were distinct from those obtained from men who have sex with men (MSM, n=2, cluster two). The latter contained genomes misidentified as N. meningitidis using MALDI-TOF MS. These two N. cinerea clusters persisted after the inclusion of published N. cinerea WGS (n=42). Both N. cinerea clusters were further defined through pangenome and Average Nucleotide Identity (ANI) analyses.Conclusion. This study provides insights into the importance of genomic genus-wide Neisseria surveillance studies to improve the characterization and identification of the Neisseria genus.

RevDate: 2024-08-30

Hughes Lago C, Blackburn D, Kinder Pavlicek M, et al (2024)

Comparative Genomic Analysis of Campylobacter rectus and Closely Related Species.

bioRxiv : the preprint server for biology pii:2024.07.26.605372.

Campylobacter rectus is a gram-negative, anaerobic bacterium strongly associated with periodontitis. It also causes various extraoral infections and is linked to adverse pregnancy outcomes in humans and murine models. C. rectus and related oral Campylobacters have been termed "emerging Campylobacter species" because infections by these organisms are likely underreported. Previously, no comparative methods have been used to analyze more than single C. rectus strains and until recently, very few C. rectus genomes have been publicly available. More sequenced genomes and comparative analyses are needed to study the genomic features and pathogenicity of this species. We sequenced eight new C. rectus strains and used comparative methods to identify regions of interest. An emphasis was put on the type III flagellar secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) because these protein complexes are important for pathogenesis in other Campylobacter species. RAST, BV-BRC, and other bioinformatics tools were used to assemble, annotate, and compare these regions in the genomes. The pan-genome of C. rectus consists of 2670 genes with core and accessory genomes of 1429 and 1241 genes, respectively. All isolates analyzed in this study have T3SS and T6SS hallmark proteins, while five of the isolates are missing a T4SS system. Twenty-one prophage clusters were identified across the panel of isolates, including four that appear intact. Overall, significant genomic islands were found, suggesting regions in the genomes that underwent horizontal gene transfer. Additionally, the high frequency of CRISPR arrays and other repetitive elements has led to genome rearrangements across the strains, including in areas adjacent to secretion system gene clusters. This study describes the substantial diversity present among C. rectus isolates and highlights tools/assays that have been developed to permit functional genomic studies. Additionally, we have expanded the studies on C. showae T4SS since we have two new C. showae genomes to report. We also demonstrate that unlike C. rectus , C showae does not demonstrate evidence of intact T6SS except for the strain CAM. The only strain of sequenced C. massilensis has neither T4SS or T6SS.

RevDate: 2024-08-29

Gheorghe-Barbu I, Dragomir RI, Gradisteanu Pircalabioru G, et al (2024)

Tracing Acinetobacter baumannii's Journey from Hospitals to Aquatic Ecosystems.

Microorganisms, 12(8): pii:microorganisms12081703.

BACKGROUND: This study provides a comprehensive analysis of Acinetobacter baumannii in aquatic environments and fish microbiota by integrating culture-dependent methods, 16S metagenomics, and antibiotic resistance profiling.

METHODS: A total of 83 A. baumannii isolates were recovered using culture-dependent methods from intra-hospital infections (IHI) and wastewater (WW) and surface water (SW) samples from two southern Romanian cities in August 2022. The antibiotic susceptibility was screened using disc diffusion, microdilution, PCR, and Whole Genome Sequencing assays.

RESULTS: The highest microbial load in the analyzed samples was found in Glina, Bucharest, for both WW and SW samples across all investigated phenotypes. For Bucharest isolates, the resistance levels corresponded to fluoroquinolones > aminoglycosides > β-lactam antibiotics. In contrast, A. baumannii from upstream SW samples in Târgoviște showed the highest resistance to aminoglycosides. The blaOXA-23 gene was frequently detected in IHI, WW, and SW isolates in Bucharest, but was absent in Târgoviște. Molecular phylogeny revealed the presence of ST10 in Târgoviște isolates and ST2 in Bucharest isolates, while other minor STs were not specifically correlated with a sampling point. Using 16S rRNA sequencing, significant differences in microbial populations between the two locations was identified. The low abundance of Alphaproteobacteria and Actinobacteria in both locations suggests environmental pressures or contamination events.

CONCLUSIONS: These findings indicate significant fecal contamination and potential public health risks, emphasizing the need for improved water quality monitoring and management.

LOAD NEXT 100 CITATIONS

ESP Quick Facts

ESP Origins

In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.

ESP Support

In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.

ESP Rationale

Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.

ESP Goal

In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.

ESP Usage

Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.

ESP Content

When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.

ESP Help

Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.

ESP Plans

With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.

Electronic Scholarly Publishing
961 Red Tail Lane
Bellingham, WA 98226

E-mail: RJR8222 @ gmail.com

Papers in Classical Genetics

The ESP began as an effort to share a handful of key papers from the early days of classical genetics. Now the collection has grown to include hundreds of papers, in full-text format.

Digital Books

Along with papers on classical genetics, ESP offers a collection of full-text digital books, including many works by Darwin and even a collection of poetry — Chicago Poems by Carl Sandburg.

Timelines

ESP now offers a large collection of user-selected side-by-side timelines (e.g., all science vs. all other categories, or arts and culture vs. world history), designed to provide a comparative context for appreciating world events.

Biographies

Biographical information about many key scientists (e.g., Walter Sutton).

Selected Bibliographies

Bibliographies on several topics of potential interest to the ESP community are automatically maintained and generated on the ESP site.

ESP Picks from Around the Web (updated 28 JUL 2024 )