Viewport Size Code:
Login | Create New Account


About | Classical Genetics | Timelines | What's New | What's Hot

About | Classical Genetics | Timelines | What's New | What's Hot


Bibliography Options Menu

Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

The Electronic Scholarly Publishing Project: Providing world-wide, free access to classic scientific papers and other scholarly materials, since 1993.


ESP: PubMed Auto Bibliography 25 Oct 2021 at 01:32 Created: 


Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: pangenome or "pan-genome" or "pan genome" NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2021-10-22

Sandholt AKS, Neimanis A, Roos A, et al (2021)

Genomic signatures of host adaptation in group B Salmonella enterica ST416/ST417 from harbour porpoises.

Veterinary research, 52(1):134.

A type of monophasic group B Salmonella enterica with the antigenic formula 4,12:a:- ("Fulica-like") has been described as associated with harbour porpoises (Phocoena phocoena), most frequently recovered from lung samples. In the present study, lung tissue samples from 47 porpoises found along the Swedish coast or as bycatch in fishing nets were analysed, two of which were positive for S. enterica. Pneumonia due to the infection was considered the likely cause of death for one of the two animals. The recovered isolates were whole genome sequenced and found to belong to sequence type (ST) 416 and to be closely related to ST416/ST417 porpoise isolates from UK waters as determined by core-genome MLST. Serovars Bispebjerg, Fulica and Abortusequi were identified as distantly related to the porpoise isolates, but no close relatives from other host species were found. All ST416/417 isolates had extensive loss of function mutations in key Salmonella pathogenicity islands, but carried accessory genetic elements associated with extraintestinal infection such as iron uptake systems. Gene ontology and pathway analysis revealed reduced secondary metabolic capabilities and loss of function in terms of signalling and response to environmental cues, consistent with adaptation for the extraintestinal niche. A classification system based on machine learning identified ST416/417 as more invasive than classical gastrointestinal serovars. Genome analysis results are thus consistent with ST416/417 as a host-adapted and extraintestinal clonal population of S. enterica, which while found in porpoises without associated pathology can also cause severe opportunistic infections.

RevDate: 2021-10-22

Jiao D, Dong X, Yu Y, et al (2021)

Gene Presence/Absence Variation analysis of coronavirus family displays its pan-genomic diversity.

International journal of biological sciences, 17(14):3717-3727.

SARS-CoV-2 belongs to the coronavirus family. Comparing genomic features of viral genomes of coronavirus family can improve our understanding about SARS-CoV-2. Here we present the first pan-genome analysis of 3,932 whole genomes of 101 species out of 4 genera from the coronavirus family. We found that a total of 181 genes in the pan-genome of coronavirus family, among which only 3 genes, the S gene, M gene and N gene, are highly conserved. We also constructed a pan-genome from 23,539 whole genomes of SARS-CoV-2. There are 13 genes in total in the SARS-CoV-2 pan-genome. All of the 13 genes are core genes for SARS-CoV-2. The pan-genome of coronaviruses shows a lower level of diversity than the pan-genomes of other RNA viruses, which contain no core gene. The three highly conserved genes in coronavirus family, which are also core genes in SARS-CoV-2 pan-genome, could be potential targets in developing nucleic acid diagnostic reagents with a decreased possibility of cross-reaction with other coronavirus species.

RevDate: 2021-10-23

Whitworth DE, Sydney N, EJ Radford (2021)

Myxobacterial Genomics and Post-Genomics: A Review of Genome Biology, Genome Sequences and Related 'Omics Studies.

Microorganisms, 9(10): pii:microorganisms9102143.

Myxobacteria are fascinating and complex microbes. They prey upon other members of the soil microbiome by secreting antimicrobial proteins and metabolites, and will undergo multicellular development if starved. The genome sequence of the model myxobacterium Myxococcus xanthus DK1622 was published in 2006 and 15 years later, 163 myxobacterial genome sequences have now been made public. This explosion in genomic data has enabled comparative genomics analyses to be performed across the taxon, providing important insights into myxobacterial gene conservation and evolution. The availability of myxobacterial genome sequences has allowed system-wide functional genomic investigations into entire classes of genes. It has also enabled post-genomic technologies to be applied to myxobacteria, including transcriptome analyses (microarrays and RNA-seq), proteome studies (gel-based and gel-free), investigations into protein-DNA interactions (ChIP-seq) and metabolism. Here, we review myxobacterial genome sequencing, and summarise the insights into myxobacterial biology that have emerged as a result. We also outline the application of functional genomics and post-genomic approaches in myxobacterial research, highlighting important findings to emerge from seminal studies. The review also provides a comprehensive guide to the genomic datasets available in mid-2021 for myxobacteria (including 24 genomes that we have sequenced and which are described here for the first time).

RevDate: 2021-10-23

Kang IJ, Kim KS, Beattie GA, et al (2021)

Pan-Genome Analysis of Effectors in Korean Strains of the Soybean Pathogen Xanthomonas citri pv. glycines.

Microorganisms, 9(10): pii:microorganisms9102065.

Xanthomonas citri pv. glycines is a major pathogen of soybean in Korea. Here, we analyzed pathogenicity genes based on a comparative genome analysis of five Korean strains and one strain from the United States, 8ra. Whereas all six strains had nearly identical profiles of carbohydrate-active enzymes, they varied in diversity and number of candidate type III secretion system effector (T3SE) genes. The five Korean strains were similar in their effectors, but differed from the 8ra strain. Across the six strains, transcription activator-like effectors (TALEs) showed diverse repeat sizes and at least six forms of the repeat variable di-residue (RVD) sequences, with differences not correlated with the origin of the strains. However, a phylogenetic tree based on the alignment of RVD sequences showed two distinct clusters with 17.5 repeats, suggesting that two distinct 17.5 RVD clusters have evolved, potentially to adapt Xcg to growth on distinct soybean cultivars. The predicted effector binding elements of the TALEs fell into six groups and were strongly overlapping in sequence, suggesting evolving target specificity of the binding domains in soybean cultivars. Our findings reveal the variability and adaptability of T3SEs in the Xcg strains and enhance our understanding of Xcg pathogenicity in soybean.

RevDate: 2021-10-23

Lu W, Pei Z, Zang M, et al (2021)

Comparative Genomic Analysis of Bifidobacterium bifidum Strains Isolated from Different Niches.

Genes, 12(10): pii:genes12101504.

The potential probiotic benefits of Bifidobacterium bifidum have received increasing attention recently. We used comparative genomic analysis to explore the differences in the genome and the physiological characteristics of B. bifidum isolated from the fecal samples of Chinese adults and infants. The relationships between genotypes and phenotypes were analyzed to assess the effects of isolation sources on the genetic variation of B. bifidum. The phylogenetic tree results indicated that the phylogeny of B. bifidum may be related to the geographical features of its isolation source. B. bifidum was found to have an open pan-genome and a conserved core genome. The genetic diversity of B. bifidum is mainly reflected in carbohydrate metabolism- and immune/competition-related factors, such as the glycoside hydrolase gene family, bacteriocin operons, antibiotic resistance genes, and clustered regularly interspaced short palindromic repeats (CRISPR)-Cas. Additionally, the type III A CRISPR-Cas system was discovered in B. bifidum for the first time. B. bifidum strains exhibited niche-specific characteristics, and the results of this study provide an improved understanding of the genetics of this species.

RevDate: 2021-10-20

Rodrigues RAL, Queiroz VF, Ghosh J, et al (2021)

Functional genomic analyses reveal an open pan-genome for the chloroviruses and a potential for genetic innovation in new isolates.

Journal of virology [Epub ahead of print].

Chloroviruses (family Phycodnaviridae) are large dsDNA viruses that infect unicellular green algae present in inland waters. These viruses have been isolated using three main chlorella-like green algal host cells, traditionally called NC64A, SAG and Pbi, revealing extensive genetic diversity. In this study, we performed a functional genomic analysis on 36 chloroviruses that infected the three different hosts. Phylogenetic reconstruction based on the DNA polymerase B family gene clustered the chloroviruses into three distinct clades. The viral pan-genome consists of 1,345 clusters of orthologous groups of genes (COGs), with 126 COGs conserved in all viruses. 368, 268 and 265 COGs are found exclusively in viruses that infect NC64A, SAG, and Pbi algal hosts, respectively. Two-thirds of the COGs have no known function, constituting the "dark pan-genome" of chloroviruses, and further studies focusing on these genes may identify important novelties. The proportion of functionally characterized COGs composing the pan- and the core-genome are similar, but those related to transcription and RNA processing, protein metabolism, and virion morphogenesis are at least 4-fold more represented in the core-genome. Bipartite network construction evidencing the COG-sharing among host-specific viruses identified 270 COGs shared by at least one virus from each of the different host groups. Finally, our results reveal an open pan-genome for chloroviruses and a well-established core-genome, indicating that the isolation of new chloroviruses can be a valuable source of genetic discovery. Importance Chloroviruses are large dsDNA viruses that infect unicellular green algae distributed worldwide in freshwater environments. They comprise a genetically diverse group of viruses; however, a comprehensive investigation of the genomic evolution of these viruses is still missing. Here we performed a functional pan-genome analysis comprising 36 chloroviruses associated with three different algal hosts in the family Chlorellaceae, referred to as zoochlorellae because of their endosymbiotic lifestyle. We identified a set of 126 highly conserved genes, most of which are related to essential functions in the viral replicative cycle. Several genes are unique to distinct isolates, resulting in an open pan-genome for chloroviruses. This profile is associated with generalist organisms, and new insights into the evolution and ecology of chloroviruses are presented. Ultimately, our results highlight the potential for genetic diversity in new isolates.

RevDate: 2021-10-20

González-Castillo A, Carballo JL, E Bautista-Guerrero (2021)

Genomics and phylogeny of the proposed phylum 'Candidatus Poribacteria' associated with the excavating sponge Thoosa mismalolli.

Antonie van Leeuwenhoek [Epub ahead of print].

Members of the proposed phylum 'Candidatus Poribacteria' are among the most abundant microorganisms in the highly diverse microbiome of the sponge mesohyl. Genomic and phylogenetic characteristics of this proposed phylum are barely known. In this study, we analyzed metagenome-assembled genomes (MAGs) obtained from the coral reef excavating sponge Thoosa mismalolli from the Mexican Pacific Ocean. Two MAGs were extracted and analyzed together with 32 MAGs and single-amplified genomes (SAGs) obtained from NCBI. The phylogenetic tree based on the sequences of 139 single-copy genes (SCG) showed two clades. Clade A (23 genomes) represented 67.7% of the total of the genomes, while clade B (11 genomes) comprised 32.3% of the genomes. The Average Nucleotide Identity (ANI) showed values between 66 and 99% for the genomes of the proposed phylum, and the pangenome of genomes revealed a total of 37,234 genes that included 1722 core gene. The number of genes used in the phylogenetic analysis increased from 28 (previous studies) to 139 (this study), which allowed a better resolution of the phylogeny of the proposed phylum. The results supported the two previously described classes, 'Candidatus Entoporibacteria' and 'Candidatus Pelagiporibacteria', and the genomes SB0101 and SB0202 obtained in this study belong to two new species of the class 'Candidatus Entoporibacteria'. This is the first comparative study that includes MAGs from a non-sponge host (Porites lutea) to elucidate the taxonomy of the poorly known Candidatus phylum in a polyphasic approach. Finally, our study also contributes to the sponge microbiome project by reporting the first MAGs of the proposed phylum 'Candidatus Poribacteria' isolated from the excavating sponge T. mismalolli.

RevDate: 2021-10-19

Douglas GM, BJ Shapiro (2021)

Genic selection within prokaryotic pangenomes.

Genome biology and evolution pii:6402011 [Epub ahead of print].

Understanding the evolutionary forces shaping prokaryotic pangenome structure is a major goal of microbial evolution research. Recent work has highlighted that a substantial proportion of accessory genes appear to confer niche-specific adaptations. This work has primarily focused on selection acting at the level of individual cells. Herein, we discuss a lower level of selection that also contributes to pangenome variation: genic selection. This refers to cases where genetic elements, rather than individual cells, are the entities under selection. The clearest examples of this form of selection are selfish mobile genetic elements, which are those that have either a neutral or deleterious effect on host fitness. We review the major classes of these and other mobile elements and discuss the characteristic features of such elements that could be under genic selection. We also discuss how genetic elements that are beneficial to hosts can also be under genic selection, a scenario that may be more prevalent but not widely appreciated, because disentangling the effects of selection at different levels (i.e., organisms vs genes) is challenging. Nonetheless, an appreciation for the potential action and implications of genic selection provides a useful conceptual lens with which to study the evolution of prokaryotic pangenomes.

RevDate: 2021-10-18

Liu H, Prajapati V, Prajapati S, et al (2021)

Comparative Genome Analysis of Bacillus amyloliquefaciens Focusing on Phylogenomics, Functional Traits, and Prevalence of Antimicrobial and Virulence Genes.

Frontiers in genetics, 12:724217.

Bacillus amyloliquefaciens is a gram-positive, nonpathogenic, endospore-forming, member of a group of free-living soil bacteria with a variety of traits including plant growth promotion, production of antifungal and antibacterial metabolites, and production of industrially important enzymes. We have attempted to reconstruct the biogeographical structure according to functional traits and the evolutionary lineage of B. amyloliquefaciens using comparative genomics analysis. All the available 96 genomes of B. amyloliquefaciens strains were curated from the NCBI genome database, having a variety of important functionalities in all sectors keeping a high focus on agricultural aspects. In-depth analysis was carried out to deduce the orthologous gene groups and whole-genome similarity. Pan genome analysis revealed that shell genes, soft core genes, core genes, and cloud genes comprise 17.09, 5.48, 8.96, and 68.47%, respectively, which demonstrates that genomes are very different in the gene content. It also indicates that the strains may have flexible environmental adaptability or versatile functions. Phylogenetic analysis showed that B. amyloliquefaciens is divided into two clades, and clade 2 is further dived into two different clusters. This reflects the difference in the sequence similarity and diversification that happened in the B. amyloliquefaciens genome. The majority of plant-associated strains of B. amyloliquefaciens were grouped in clade 2 (73 strains), while food-associated strains were in clade 1 (23 strains). Genome mining has been adopted to deduce antimicrobial resistance and virulence genes and their prevalence among all strains. The genes tmrB and yuaB codes for tunicamycin resistance protein and hydrophobic coat forming protein only exist in clade 2, while clpP, which codes for serine proteases, is only in clade 1. Genome plasticity of all strains of B. amyloliquefaciens reflects their adaption to different niches.

RevDate: 2021-10-18

Thomas P, Abdel-Glil MY, Eichhorn I, et al (2021)

Genome Sequence Analysis of Clostridium chauvoei Strains of European Origin and Evaluation of Typing Options for Outbreak Investigations.

Frontiers in microbiology, 12:732106.

Black quarter caused by Clostridium (C.) chauvoei is an important bacterial disease that affects cattle and sheep with high mortality. A comparative genomics analysis of 64 C. chauvoei strains, most of European origin and a few of non-European and unknown origin, was performed. The pangenome analysis showed limited new gene acquisition for the species. The accessory genome involved prophages and genomic islands, with variations in gene composition observed in a few strains. This limited accessory genome may indicate that the species replicates only in the host or that an active CRISPR/Cas system provides immunity to foreign genetic elements. All strains contained a CRISPR type I-B system and it was confirmed that the unique spacer sequences therein can be used to differentiate strains. Homologous recombination events, which may have contributed to the evolution of this pathogen, were less frequent compared to other related species from the genus. Pangenome single nucleotide polymorphism (SNP) based phylogeny and clustering indicate diverse clusters related to geographical origin. Interestingly the identified SNPs were mostly non-synonymous. The study demonstrates the possibility of the existence of polymorphic populations in one host, based on strain variability observed for strains from the same animal and strains from different animals of one outbreak. The study also demonstrates that new outbreak strains are mostly related to earlier outbreak strains from the same farm/region. This indicates the last common ancestor strain from one farm can be crucial to understand the genetic changes and epidemiology occurring at farm level. Known virulence factors for the species were highly conserved among the strains. Genetic elements involved in Nicotinamide adenine dinucleotide (NAD) precursor synthesis (via nadA, nadB, and nadC metabolic pathway) which are known as potential anti-virulence loci are completely absent in C. chauvoei compared to the partial inactivation in C. septicum. A novel core-genome MLST based typing method was compared to sequence typing based on CRISPR spacers to evaluate the usefulness of the methods for outbreak investigations.

RevDate: 2021-10-18

Caicedo-Montoya C, Manzo-Ruiz M, R Ríos-Estepa (2021)

Pan-Genome of the Genus Streptomyces and Prioritization of Biosynthetic Gene Clusters With Potential to Produce Antibiotic Compounds.

Frontiers in microbiology, 12:677558.

Species of the genus Streptomyces are known for their ability to produce multiple secondary metabolites; their genomes have been extensively explored to discover new bioactive compounds. The richness of genomic data currently available allows filtering for high quality genomes, which in turn permits reliable comparative genomics studies and an improved prediction of biosynthetic gene clusters (BGCs) through genome mining approaches. In this work, we used 121 genome sequences of the genus Streptomyces in a comparative genomics study with the aim of estimating the genomic diversity by protein domains content, sequence similarity of proteins and conservation of Intergenic Regions (IGRs). We also searched for BGCs but prioritizing those with potential antibiotic activity. Our analysis revealed that the pan-genome of the genus Streptomyces is clearly open, with a high quantity of unique gene families across the different species and that the IGRs are rarely conserved. We also described the phylogenetic relationships of the analyzed genomes using multiple markers, obtaining a trustworthy tree whose relationships were further validated by Average Nucleotide Identity (ANI) calculations. Finally, 33 biosynthetic gene clusters were detected to have potential antibiotic activity and a predicted mode of action, which might serve up as a guide to formulation of related experimental studies.

RevDate: 2021-10-19
CmpDate: 2021-10-19

Liang Q, S Lonardi (2021)

Reference-agnostic representation and visualization of pan-genomes.

BMC bioinformatics, 22(1):502.

BACKGROUND: The pan-genome of a species is the union of the genes and non-coding sequences present in all individuals (cultivar, accessions, or strains) within that species.

RESULTS: Here we introduce PGV, a reference-agnostic representation of the pan-genome of a species based on the notion of consensus ordering. Our experimental results demonstrate that PGV enables an intuitive, effective and interactive visualization of a pan-genome by providing a genome browser that can elucidate complex structural genomic variations.

CONCLUSIONS: The PGV software can be installed via conda or downloaded from . The companion PGV browser at can be tested using example bed tracks available from the GitHub page.

RevDate: 2021-10-16

Yuan S, Wang Y, Zhao F, et al (2021)

Complete Genome Sequence of Weissella confusa LM1 and Comparative Genomic Analysis.

Frontiers in microbiology, 12:749218.

The genus Weissella is attracting an increasing amount of attention because of its multiple functions and probiotic potential. In particular, the species Weissella confusa is known to have great potential in industrial applications and exhibits numerous biological functions. However, the knowledge on this bacterium in insects is not investigated. Here, we isolated and identified W. confusa as the dominant lactic acid bacteria in the gut of the migratory locust. We named this strain W. confusa LM1, which is the first genome of an insect-derived W. confusa strain with one complete chromosome and one complete plasmid. Among all W. confusa strains, W. confusa LM1 had the largest genome. Its genome was the closest to that of W. confusa 1001271B_151109_G12, a strain from human feces. Our results provided accurate evolutionary relationships of known Weissella species and W. confusa strains. Based on genomic analysis, the pan-genome of W. confusa is in an open state. Most strains of W. confusa had the unique genes, indicating that these strains can adapt to different ecological niches and organisms. However, the variation of strain-specific genes did represent significant correlations with their hosts and ecological niches. These strains were predicted to have low potential to produce secondary metabolites. Furthermore, no antibiotic resistance genes were identified. At the same time, virulence factors associated with toxin production and secretion system were not found, indicating that W. confusa strains were not sufficient to perform virulence. Our study facilitated the discovery of the functions of W. confusa LM1 in locust biology and their potential application to locust management.

RevDate: 2021-10-14

Zeng Q, Xie J, Li Y, et al (2021)

Comprehensive Genomic Analysis of the Endophytic Bacillus altitudinis Strain GLB197, a Potential Biocontrol Agent of Grape Downy Mildew.

Frontiers in genetics, 12:729603.

Bacillus has been extensively studied for agricultural application as a biocontrol agent. B. altitudinis GLB197, an endophytic bacterium isolated from grape leaves, exhibits distinctive inhibition to grape downy mildew based on unknown mechanisms. To determine the genetic traits involved in the mechanism of biocontrol and host-interaction traits, the genome sequence of GLB197 was obtained and further analyzed. The genome of B. altitudinis GLB197 consisted of one plasmid and a 3,733,835-bp circular chromosome with 41.56% G + C content, containing 3,770 protein-coding genes. Phylogenetic analysis of 17 Bacillus strains using the concatenated 1,226 single-copy core genes divided into different clusters was conducted. In addition, average nucleotide identity (ANI) values indicate that the current taxonomy of some B. pumilus group strains is incorrect. Comparative analysis of B. altitudinis GLB197 proteins with other B. altitudinis strains identified 3,157 core genes. Furthermore, we found that the pan-genome of B. altitudinis is open. The genome of B. altitudinis GLB197 contains one nonribosomal peptide synthetase (NRPS) gene cluster which was annotated as lichenysin. Interestingly, the cluster in B. altitudinis has two more genes than other Bacillus strains (lgrD and lgrB). The two genes were probably obtained via horizontal gene transfer (HGT) during the evolutionary process from Brevibacillus. Taken together, these observations enable the future application of B. altitudinis GLB197 as a biocontrol agent for control of grape downy mildew and promote our understanding of the beneficial interactions between B. altitudinis GLB197 and plants.

RevDate: 2021-10-14

Lisotto P, Couto N, Rosema S, et al (2021)

Molecular Characterisation of Vancomycin-Resistant Enterococcus faecium Isolates Belonging to the Lineage ST117/CT24 Causing Hospital Outbreaks.

Frontiers in microbiology, 12:728356.

Background: Vancomycin-resistant Enterococcus faecium (VREfm) is a successful nosocomial pathogen. The current molecular method recommended in the Netherlands for VREfm typing is based on core genome Multilocus sequence typing (cgMLST), however, the rapid emergence of specific VREfm lineages challenges distinguishing outbreak isolates solely based on their core genome. Here, we explored if a detailed molecular characterisation of mobile genetic elements (MGEs) and accessory genes could support and expand the current molecular typing of VREfm isolates sharing the same genetic background, enhancing the discriminatory power of the analysis. Materials/Methods: The genomes of 39 VREfm and three vancomycin-susceptible E. faecium (VSEfm) isolates belonging to ST117/CT24, as assessed by cgMLST, were retrospectively analysed. The isolates were collected from patients and environmental samples from 2011 to 2017, and their genomes were analysed using short-read sequencing. Pangenome analysis was performed on de novo assemblies, which were also screened for known predicted virulence factors, antimicrobial resistance genes, bacteriocins, and prophages. Two representative isolates were also sequenced using long-read sequencing, which allowed a detailed analysis of their plasmid content. Results: The cgMLST analysis showed that the isolates were closely related, with a minimal allelic difference of 10 between each cluster's closest related isolates. The vanB-carrying transposon Tn1549 was present in all VREfm isolates. However, in our data, we observed independent acquisitions of this transposon. The pangenome analysis revealed differences in the accessory genes related to prophages and bacteriocins content, whilst a similar profile was observed for known predicted virulence and resistance genes. Conclusion: In the case of closely related isolates sharing a similar genetic background, a detailed analysis of MGEs and the integration point of the vanB-carrying transposon allow to increase the discriminatory power compared to the use of cgMLST alone. Thus, enabling the identification of epidemiological links amongst hospitalised patients.

RevDate: 2021-10-13

Chmielowska C, Korsak D, Chapkauskaitse E, et al (2021)

Plasmidome of Listeria spp.-The repA-Family Business.

International journal of molecular sciences, 22(19): pii:ijms221910320.

Bacteria of the genus Listeria (phylum Firmicutes) include both human and animal pathogens, as well as saprophytic strains. A common component of Listeria spp. genomes are plasmids, i.e., extrachromosomal replicons that contribute to gene flux in bacteria. This study provides an in-depth insight into the structure, diversity and evolution of plasmids occurring in Listeria strains inhabiting various environments under different anthropogenic pressures. Apart from the components of the conserved plasmid backbone (providing replication, stable maintenance and conjugational transfer functions), these replicons contain numerous adaptive genes possibly involved in: (i) resistance to antibiotics, heavy metals, metalloids and sanitizers, and (ii) responses to heat, oxidative, acid and high salinity stressors. Their genomes are also enriched by numerous transposable elements, which have influenced the plasmid architecture. The plasmidome of Listeria is dominated by a group of related replicons encoding the RepA replication initiation protein. Detailed comparative analyses provide valuable data on the level of conservation of these replicons and their role in shaping the structure of the Listeria pangenome, as well as their relationship to plasmids of other genera of Firmicutes, which demonstrates the range and direction of flow of genetic information in this important group of bacteria.

RevDate: 2021-10-11

Schildkraut JA, Coolen JPM, Burbaud S, et al (2021)

RNA-sequencing elucidates drug-specific mechanisms of antibiotic tolerance and resistance in M. abscessus.

Antimicrobial agents and chemotherapy [Epub ahead of print].

Mycobacterium abscessus is an opportunistic pathogen notorious for its resistance to most classes of antibiotics and low cure rates. M. abscessus carries an array of mostly unexplored defence mechanisms. A deeper understanding of antibiotic resistance and tolerance mechanisms is pivotal in development of targeted therapeutic regimens. We provide the first description of all major transcriptional mechanisms of tolerance to all antibiotics recommended in current guidelines, using RNA sequencing-guided experiments. M. abscessus ATCC 19977 bacteria were subjected to sub-inhibitory concentrations of clarithromycin, amikacin, tigecycline, cefoxitin and clofazimine for 4- and 24-hours, followed by RNA sequencing. To confirm key mechanisms of tolerance suggested by transcriptomic responses, we performed time-kill kinetic analysis using bacteria after pre-exposure to clarithromycin, amikacin or tigecycline for 24-hours and we constructed isogenic knockout and knockdown strains. To assess strain specificity, pan-genome analysis of 35 strains from all three subspecies was performed. Mycobacterium abscessus shows both drug-specific and common transcriptomic responses to antibiotic exposure. Ribosome-targeting antibiotics clarithromycin, amikacin and tigecycline elicit a common response characterized by upregulation of ribosome structural genes, the WhiB7 regulon and transferases, accompanied by downregulation of respiration through NuoA-N. Exposure to any of these drugs decreases susceptibility to ribosome-targeting drugs from multiple classes. The cytochrome bd-type quinol oxidase contributes to clofazimine tolerance in M. abscessus and the sigma factor sigH but not anti-sigma factor MAB_3542c is involved in tigecycline resistance. The observed transcriptomic responses are not strain-specific, as all genes involved in tolerance, except erm(41), are found in all included strains.

RevDate: 2021-10-11

Ferrés I, G Iraola (2021)

Protocol for post-processing of bacterial pangenome data using Pagoo pipeline.

STAR protocols, 2(4):100802 pii:S2666-1667(21)00508-6.

Multiple downstream analyses are necessary to interpret the output of bacterial pangenome reconstruction software. This requires integrating diverse kinds of genetic and phenotypic data, which to date are left to each user's criterion. To fill this gap, we created Pagoo, a pangenome post-processing tool that leverages a standardized but flexible and extensible framework for data integration, analysis, and storage. Here, we provide the protocol for running Pagoo and performing from simple to more complex comparative analyses on bacterial pangenome data. For complete details on the use and execution of this protocol, please refer to Ferrés and Iraola (2021).

RevDate: 2021-10-08

Fernández-de-Bobadilla MD, Talavera-Rodríguez A, Chacón L, et al (2021)

PATO: Pangenome Analysis Toolkit.

Bioinformatics (Oxford, England) pii:6384566 [Epub ahead of print].

MOTIVATION: We present the Pangenome Analysis Toolkit (PATO) designed to simultaneously analyze thousands of genomes using a desktop computer. The tool performs common tasks of pangenome analysis such as core-genome definition and accessory genome properties and includes new features that help characterize population structure, annotate pathogenic features and create gene sharedness networks. PATO has been developed in R to integrate with the large set of tools available for genetic, phylogenetic and statistical analysis in this environment.

RESULTS: PATO can perform the most demanding bioinformatic analyses in minutes with an accuracy comparable to state-of-the-art software but 20-30x times faster. PATO also integrates all the necessary functions for the complete analysis of the most common objectives in microbiology studies. Lastly, PATO includes the necessary tools for visualizing the results and can be integrated with other analytical packages available in R.

AVAILABILITY: The source code for PATO is freely available at under the GPLv3 license.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2021-10-08

Yang SM, Kim E, Kim D, et al (2021)

Rapid Real-Time Polymerase Chain Reaction for Salmonella Serotyping Based on Novel Unique Gene Markers by Pangenome Analysis.

Frontiers in microbiology, 12:750379.

An accurate diagnostic method for Salmonella serovars is fundamental to preventing the spread of associated diseases. A diagnostic polymerase chain reaction (PCR)-based method has proven to be an effective tool for detecting pathogenic bacteria. However, the gene markers currently used in real-time PCR to detect Salmonella serovars have low specificity and are developed for only a few serovars. Therefore, in this study, we explored the novel unique gene markers for 60 serovars that share similar antigenic formulas and show high prevalence using pangenome analysis and developed a real-time PCR to detect them. Before exploring gene markers, the 535 Salmonella genomes were evaluated, and some genomes had serovars different from the designated serovar information. Based on these analyses, serovar-specific gene markers were explored. These markers were identified as genes present in all strains of target serovar genomes but absent in strains of other serovar genomes. Serovar-specific primer pairs were designed from the gene markers, and a real-time PCR method that can distinguish between 60 of the most common Salmonella serovars in a single 96-well plate assay was developed. As a result, real-time PCR showed 100% specificity for 199 Salmonella and 29 non-Salmonella strains. Subsequently, the method developed was applied successfully to both strains with identified serovars and an unknown strain, demonstrating that real-time PCR can accurately detect serovars of strains compared with traditional serotyping methods, such as antisera agglutination. Therefore, our method enables rapid and economical Salmonella serotyping compared with the traditional serotyping method.

RevDate: 2021-10-05

Bansal K, Kumar S, Kaur A, et al (2021)

Deep phylo-taxono genomics reveals Xylella as a variant lineage of plant associated Xanthomonas and supports their taxonomic reunification along with Stenotrophomonas and Pseudoxanthomonas.

Genomics pii:S0888-7543(21)00363-3 [Epub ahead of print].

Genus Xanthomonas is a group of phytopathogens that is phylogenetically related to Xylella, Stenotrophomonas, and Pseudoxanthomonas, having diverse lifestyles. Xylella is a lethal plant pathogen with a highly reduced genome, atypical GC content and is taxonomically related to these three genera. Deep phylo-taxono genomics reveals that Xylella is a variant Xanthomonas lineage that is sandwiched between Xanthomonas clades. Comparative studies suggest the role of unique pigment and exopolysaccharide gene clusters in the emergence of Xanthomonas and Xylella clades. Pan-genome analysis identified a set of unique genes associated with sub-lineages representing plant-associated Xanthomonas clade and nosocomial origin Stenotrophomonas clade. Overall, our study reveals the importance of reconciling classical phenotypic data and genomic findings in reconstituting the taxonomic status of these four genera. SIGNIFICANCE STATEMENT: Xylella fastidiosa is a devastating pathogen of perennial dicots such as grapes, citrus, coffee, and olives. An insect vector transmits the pathogen to its specific host wherein the infection leads to complete wilting of the plants. The genome of X. fastidiosa is significantly reduced both in terms of size (2 Mb) and GC content (50%) when compared with its relatives such as Xanthomonas, Stenotrophomonas, and Pseudoxanthomonas that have higher GC content (65%) and larger genomes (5 Mb). In this study, using systematic and in-depth genome-based taxonomic and phylogenetic criteria and comparative studies, we assert the need to unify Xanthomonas with its relatives (Xylella, Stenotrophomonas and Pseudoxanthomonas). Interestingly, Xylella revealed itself as a minor variant lineage embedded within two major Xanthomonas lineages comprising member species of different hosts.

RevDate: 2021-10-04

Pidcock SE, Skvortsov T, Santos FG, et al (2021)

Phylogenetic systematics of Butyrivibrio and Pseudobutyrivibrio genomes illustrate vast taxonomic diversity, open genomes and an abundance of carbohydrate-active enzyme family isoforms.

Microbial genomics, 7(10):.

RevDate: 2021-10-04

Colombi E, Perry BJ, Sullivan JT, et al (2021)

Comparative analysis of integrative and conjugative mobile genetic elements in the genus Mesorhizobium.

Microbial genomics, 7(10):.

Members of the Mesorhizobium genus are soil bacteria that often form nitrogen-fixing symbioses with legumes. Most characterised Mesorhizobium spp. genomes are ~8 Mb in size and harbour extensive pangenomes including large integrative and conjugative elements (ICEs) carrying genes required for symbiosis (ICESyms). Here, we document and compare the conjugative mobilome of 41 complete Mesorhizobium genomes. We delineated 56 ICEs and 24 integrative and mobilizable elements (IMEs) collectively occupying 16 distinct integration sites, along with 24 plasmids. We also demonstrated horizontal transfer of the largest (853,775 bp) documented ICE, the tripartite ICEMspSymAA22. The conjugation systems of all identified ICEs and several plasmids were related to those of the paradigm ICESym ICEMlSymR7A, with each carrying conserved genes for conjugative pilus formation (trb), excision (rdfS), DNA transfer (rlxS) and regulation (fseA). ICESyms have likely evolved from a common ancestor, despite occupying a variety of distinct integration sites and specifying symbiosis with diverse legumes. We found extensive evidence for recombination between ICEs and particularly ICESyms, which all uniquely lack the conjugation entry-exclusion factor gene trbK. Frequent duplication, replacement and pseudogenization of genes for quorum-sensing-mediated activation and antiactivation of ICE transfer suggests ICE transfer regulation is constantly evolving. Pangenome-wide association analysis of the ICE identified genes potentially involved in symbiosis, rhizosphere colonisation and/or adaptation to distinct legume hosts. In summary, the Mesorhizobium genus has accumulated a large and dynamic pangenome that evolves through ongoing horizontal gene transfer of large conjugative elements related to ICEMlSymR7A.

RevDate: 2021-10-04

Cho ES, Cha IT, Roh SW, et al (2021)

Haloferax litoreum sp. nov., Haloferax marinisediminis sp. nov., and Haloferax marinum sp. nov., low salt-tolerant haloarchaea isolated from seawater and sediment.

Antonie van Leeuwenhoek [Epub ahead of print].

Three novel halophilic archaea were isolated from seawater and sediment near Yeoungheungdo Island, Republic of Korea. The genome size and G + C content of the isolates MBLA0076T, MBLA0077T, and MBLA0078T were 3.56, 3.48, and 3.48 Mb and 61.7, 60.8, and 61.1 mol%, respectively. The three strains shared 98.5-99.5 % sequence similarity of the 16 S rRNA gene, whereas their sequence similarity to the 16 S rRNA gene of type strains was below 98.5 %. Phylogenetic analysis based on sequences of the 16 S rRNA and RNA polymerase subunit beta genes indicated that the isolates belonged to the genus Haloferax. The orthologous average nucleotide identity, average amino-acid identity, and in silico DNA-DNA hybridization values were below species delineation thresholds. Pan-genomic analysis indicated that the three novel strains and 11 reference strains had 8981 pan-orthologous groups in total. Fourteen Haloferax strains shared 1766 core pan-genome orthologous groups, which were mainly related to amino acid transport and metabolism. Cells of the three isolates were gram-negative, motile, red-pink pigmented, and pleomorphic. The strains grew optimally at 30 °C (MBLA0076T) and 40 °C (MBLA0077T, MBLA0078T) in the presence of 1.28 M (MBLA0077T) and 1.7 M (MBLA0076T, MBLA0078T) NaCl and 0.1 M (MBLA0077T), 0.2 M (MBLA0076T), and 0.3 M (MBLA0078T) MgCl2·6H2O at pH 7.0-8.0. Cells of all isolates lysed in distilled water; the minimum NaCl concentration necessary to prevent lysis was 0.43 M. The major polar lipids of the three strains were phosphatidylglycerol, phosphatidylglycerol phosphate methyl ester, and sulphated diglycosyl archaeol-1. Based on their phenotypic and genotypic properties, MBLA0076T, MBLA0077T, and MBLA0078T were described as novel species of Haloferax, for which we propose the names Haloferax litoreum sp. nov., Haloferax marinisediminis sp. nov., and Haloferax marinum sp. nov., respectively. The respective type strains of these species are MBLA0076T (= KCTC 4288T = JCM 34,169T), MBLA0077T (= KCTC 4289T = JCM 34,170T), and MBLA0078T (= KCTC 4290T = JCM 34,171T).

RevDate: 2021-10-02

Jia J, Liu M, Feng L, et al (2021)

Comparative genomic analysis reveals the evolution and environmental adaptation of Acinetobacter johnsonii.

Gene pii:S0378-1119(21)00580-1 [Epub ahead of print].

Genome plasticity is a key determinant that Acinetobacter johnsonii could widely distribute in natural and clinical environments. However, little attention has been paid to figure out the changes in the genome during A. johnsonii's evolution. Here, a comparative genomic analysis of A. johnsonii isolated from clinical and environmental sources was conducted. In this study, we found A. johnsonii has an open pan-genome and has great adaptability to different environments. Based on the results of the phylogenetic tree, ANI value and the distribution of accessory genes, we found that strains from the same habitat had a high degree of similarity. Though genes associated with the fundamental process were mostly conserved in evolution, clinical-derived isolates accumulate more genes associated with translational modification, β-lactamase and defense mechanisms, whereas environmental-derived isolates enriched more genes related to substances degradation. In addition, clinical-derived strains harbored some "strong" virulence islands and resistance islands. This study highlights the evolutionary relationship of A. johnsonii isolates from clinical and environmental sources.

RevDate: 2021-10-01

Awan F, Ali MM, Hamid M, et al (2021)

Epi-Gene: An R-Package for Easy Pan-Genome Analysis.

BioMed research international, 2021:5585586.

The main aim of this study was to develop a set of functions that can analyze the genomic data with less time consumption and memory. Epi-gene is presented as a solution to large sequence file handling and computational time problems. It uses less time and less programming skills in order to work with a large number of genomes. In the current study, some features of the Epi-gene R-package were described and illustrated by using a dataset of the 14 Aeromonas hydrophila genomes. The joining, relabeling, and conversion functions were also included in this package to handle the FASTA formatted sequences. To calculate the subsets of core genes, accessory genes, and unique genes, various Epi-gene functions have been used. Heat maps and phylogenetic genome trees were also constructed. This whole procedure was completed in less than 30 minutes. This package can only work on Windows operating systems. Different functions from other packages such as dplyr and ggtree were also used that were available in R computing environment.

RevDate: 2021-09-30

Wambui J, Cernela N, Stevens MJA, et al (2021)

Whole Genome Sequence-Based Identification of Clostridium estertheticum Complex Strains Supports the Need for Taxonomic Reclassification Within the Species Clostridium estertheticum.

Frontiers in microbiology, 12:727022.

Isolates within the Clostridium estertheticum complex (CEC) have routinely been identified through the 16S rRNA sequence, but the high interspecies sequence similarity reduces the resolution necessary for species level identification and often results in ambiguous taxonomic classification. The current study identified CEC isolates from meat juice (MJS) and bovine fecal samples (BFS) and determined the phylogeny of species within the CEC through whole genome sequence (WGS)-based analyses. About 1,054 MJS were screened for CEC using quantitative real-time PCR (qPCR). Strains were isolated from 33 MJS and 34 BFS qPCR-positive samples, respectively. Pan- and core-genome phylogenomics were used to determine the species identity of the isolates. Average nucleotide identity (ANI) and digital DNA-DNA hybridization (dDDH) were used to validate the species identity. The phylogeny of species within the CEC was determined through a combination of these methods. Twenty-eight clostridia strains were isolated from MJS and BFS samples out of which 13 belonged to CEC. At 95% ANI and 70% dDDH thresholds for speciation, six CEC isolates were identified as genomospecies2 (n=3), Clostridium tagluense (n=2) and genomospecies3 (n=1). Lower thresholds of 94% ANI and 58% dDDH were required for the classification of seven CEC isolates into species C. estertheticum and prevent an overlap between species C. estertheticum and Clostridium frigoriphilum. Combination of the two species and abolishment of current subspecies classification within the species C. estertheticum are proposed. These data demonstrate the suitability of phylogenomics to identify CEC isolates and determine the phylogeny within CEC.

RevDate: 2021-09-30

Kim JM, Fukushima Y, Yoshida H, et al (2021)

Comparative genomic features of Streptococcus canis based on the pan-genome orthologous group analysis according to sequence type.

Japanese journal of infectious diseases [Epub ahead of print].

Using bacterial pan-genome obtained through whole genome sequencing (WGS), coding DNA sequences (CDSs) can be clustered into pan-genome orthologous groups (POGs). We aimed to investigate comparative genomic features of Streptococcus canis based on POG analysis and to determine CDSs specific to prevalent sequence type (ST) 9. Twenty WGS datasets on S. canis strains from invasive and non-invasive specimens were retrieved from the National Center for Biotechnology Information Assembly database. Based on the WGS data, we performed comparative genome hybridization (CGH), pan- and core-genome prediction, Venn diagram test with five ST9 strains, and phylogenetic analysis, with ST determination. We compared the CDSs between seven ST9 and 13 non-ST9 strains. We observed genomic diversity based on CGH and Venn diagram. The predicted pan- and core-genomes contained 4,772 and 1,403 genes, respectively. We found five clades consisting of different STs (ST1, ST44/2, ST13/14, ST21/15/41, and ST9) based on phylogenetic tree. There were differences in four pathways (DNA restriction-modification system, DNA-mediated transposition, extracellular region, and response to oxidative stress) regulated by CDSs specific to ST9. Our findings describe genomic diversity in CGH and Venn diagram, pan- and core-genomes, five clades of genomes consisting of different STs, and unique CDS features associated to ST9.

RevDate: 2021-09-28

Huang Z, Zhou X, Stanton C, et al (2021)

Comparative Genomics and Specific Functional Characteristics Analysis of Lactobacillus acidophilus.

Microorganisms, 9(9): pii:microorganisms9091992.

Lactobacillus acidophilus is a common kind of lactic acid bacteria usually found in the human gastrointestinal tract, oral cavity, vagina, and various fermented foods. At present, many studies have focused on the probiotic function and industrial application of L. acidophilus. Additionally, dozens of L. acidophilus strains have been genome sequenced, but there has been no research to compare them at the genomic level. In this study, 46 strains of L. acidophilus were performed comparative analyses to explore their genetic diversity. The results showed that all the L. acidophilus strains were divided into two clusters based on ANI values, phylogenetic analysis and whole genome comparison, due to the difference of their predicted gene composition of bacteriocin operon, CRISPR-Cas systems and prophages mainly. Additionally, L. acidophilus was a pan-genome open species with a difference in carbohydrates utilization, antibiotic resistance, EPS operon, surface layer protein operon and other functional gene composition. This work provides a better understanding of L. acidophilus from a genetic perspective, and offers a frame for the biotechnological potentiality of this species.

RevDate: 2021-09-28

Maguvu TE, CC Bezuidenhout (2021)

Whole Genome Sequencing Based Taxonomic Classification, and Comparative Genomic Analysis of Potentially Human Pathogenic Enterobacter spp. Isolated from Chlorinated Wastewater in the North West Province, South Africa.

Microorganisms, 9(9): pii:microorganisms9091928.

Comparative genomics, in particular, pan-genome analysis, provides an in-depth understanding of the genetic variability and dynamics of a bacterial species. Coupled with whole-genome-based taxonomic analysis, these approaches can help to provide comprehensive, detailed insights into a bacterial species. Here, we report whole-genome-based taxonomic classification and comparative genomic analysis of potential human pathogenic Enterobacter hormaechei subsp. hoffmannii isolated from chlorinated wastewater. Genome Blast Distance Phylogeny (GBDP), digital DNA-DNA hybridization (dDDH), and average nucleotide identity (ANI) confirmed the identity of the isolates. The algorithm PathogenFinder predicted the isolates to be human pathogens with a probability of greater than 0.78. The potential pathogenic nature of the isolates was supported by the presence of biosynthetic gene clusters (BGCs), aerobactin, and aryl polyenes (APEs), which are known to be associated with pathogenic/virulent strains. Moreover, analysis of the genome sequences of the isolates reflected the presence of an arsenal of virulence factors and antibiotic resistance genes that augment the predictions of the algorithm PathogenFinder. The study comprehensively elucidated the genomic features of pathogenic Enterobacter isolates from wastewaters, highlighting the role of wastewaters in the dissemination of pathogenic microbes, and the need for monitoring the effectiveness of the wastewater treatment process.

RevDate: 2021-09-28

Díaz R, Torres-Miranda A, Orellana G, et al (2021)

Comparative Genomic Analysis of Novel Bifidobacterium longum subsp. longum Strains Reveals Functional Divergence in the Human Gut Microbiota.

Microorganisms, 9(9): pii:microorganisms9091906.

Bifidobacterium longum subsp. longum is a prevalent group in the human gut microbiome. Its persistence in the intestinal microbial community suggests a close host-microbe relationship according to age. The subspecies adaptations are related to metabolic capabilities and genomic and functional diversity. In this study, 154 genomes from public databases and four new Chilean isolates were genomically compared through an in silico approach to identify genomic divergence in genes associated with carbohydrate consumption and their possible adaptations to different human intestinal niches. The pangenome of the subspecies was open, which correlates with its remarkable ability to colonize several niches. The new genomes homogenously clustered within subspecies longum, as observed in phylogenetic analysis. B. longum SC664 was different at the sequence level but not in its functions. COG analysis revealed that carbohydrate use is variable among longum subspecies. Glycosyl hydrolases participating in human milk oligosaccharide use were found in certain infant and adult genomes. Predictive genomic analysis revealed that B. longum M12 contained an HMO cluster associated with the use of fucosylated HMOs but only endowed with a GH95, being able to grow in 2-fucosyllactose as the sole carbon source. This study identifies novel genomes with distinct adaptations to HMOs and highlights the plasticity of B. longum subsp. longum to colonize the human gut microbiota.

RevDate: 2021-09-28

Kim E, Yang SM, HY Kim (2021)

Differentiation of Lacticaseibacillus zeae Using Pan-Genome Analysis and Real-Time PCR Method Targeting a Unique Gene.

Foods (Basel, Switzerland), 10(9): pii:foods10092112.

Lacticaseibacillus zeae strains, isolated from raw milk and fermented dairy products, are closely related to the Lacticaseibacillus species that has beneficial probiotic properties. However, it is difficult to distinguish those using conventional methods. In this study, a unique gene was revealed to differentiate L. zeae from other strains of the Lacticaseibacillus species and other species by pan-genome analysis, and a real-time PCR method was developed to rapidly and accurately detect the unique gene. The genome analysis of 141 genomes yielded an 17,978 pan-genome. Among them, 18 accessory genes were specifically present in five genomes of L. zeae. The glycosyltransferase family 8 was identified as a unique gene present only in L. zeae and not in 136 other genomes. A primer designed from the unique gene accurately distinguished L. zeae in pure and mixed DNA and successfully constructed the criterion for the quantified standard curve in real-time PCR. The real-time PCR method was applied to 61 strains containing other Lacticaseibacillus species and distinguished L. zeae with 100% accuracy. Also, the real-time PCR method was proven to be superior to the 16S rRNA gene method in the identification of L. zeae.

RevDate: 2021-09-28

Zhao Y, Wang Y, Xia C, et al (2021)

Whole-Genome Sequencing of Corallococcus sp. Strain EGB Reveals the Genetic Determinants Linking Taxonomy and Predatory Behavior.

Genes, 12(9): pii:genes12091421.

Corallococcus sp. strain EGB is a Gram-negative myxobacteria isolated from saline soil, and has considerable potential for the biocontrol of phytopathogenic fungi. However, the detailed mechanisms related to development and predatory behavior are unclear. To obtain a comprehensive overview of genetic features, the genome of strain EGB was sequenced, annotated, and compared with 10 other Corallococcus species. The strain EGB genome was assembled as a single circular chromosome of 9.4 Mb with 7916 coding genes. Phylogenomics analysis showed that strain EGB was most closely related to Corallococcus interemptor AB047A, and it was inferred to be a novel species within the Corallococcus genus. Comparative genomic analysis revealed that the pan-genome of Corallococcus genus was large and open. Only a small proportion of genes were specific to strain EGB, and most of them were annotated as hypothetical proteins. Subsequent analyses showed that strain EGB produced abundant extracellular enzymes such as chitinases and β-(1,3)-glucanases, and proteases to degrade the cell-wall components of phytopathogenic fungi. In addition, 35 biosynthetic gene clusters potentially coding for antimicrobial compounds were identified in the strain EGB, and the majority of them were present in the dispensable pan-genome with unexplored metabolites. Other genes related to secretion and regulation were also explored for strain EGB. This study opens new perspectives in the greater understanding of the predatory behavior of strain EGB, and facilitates a potential application in the biocontrol of fungal plant diseases in the future.

RevDate: 2021-09-27

Tognon M, Bonnici V, Garrison E, et al (2021)

GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.

PLoS computational biology, 17(9):e1009444 pii:PCOMPBIOL-D-21-00041 [Epub ahead of print].

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at and

RevDate: 2021-09-27

Kim MS, Lee T, Baek J, et al (2021)

Genome assembly of the popular Korean soybean cultivar Hwangkeum.

G3 (Bethesda, Md.), 11(10):.

Massive resequencing efforts have been undertaken to catalog allelic variants in major crop species including soybean, but the scope of the information for genetic variation often depends on short sequence reads mapped to the extant reference genome. Additional de novo assembled genome sequences provide a unique opportunity to explore a dispensable genome fraction in the pan-genome of a species. Here, we report the de novo assembly and annotation of Hwangkeum, a popular soybean cultivar in Korea. The assembly was constructed using PromethION nanopore sequencing data and two genetic maps and was then error-corrected using Illumina short-reads and PacBio SMRT reads. The 933.12 Mb assembly was annotated as containing 79,870 transcripts for 58,550 genes using RNA-Seq data and the public soybean annotation set. Comparison of the Hwangkeum assembly with the Williams 82 soybean reference genome sequence (Wm82.a2.v1) revealed 1.8 million single-nucleotide polymorphisms, 0.5 million indels, and 25 thousand putative structural variants. However, there was no natural megabase-scale chromosomal rearrangement. Incidentally, by adding two novel subfamilies, we found that soybean contains four clearly separated subfamilies of centromeric satellite repeats. Analyses of satellite repeats and gene content suggested that the Hwangkeum assembly is a high-quality assembly. This was further supported by comparison of the marker arrangement of anthocyanin biosynthesis genes and of gene arrangement at the Rsv3 locus. Therefore, the results indicate that the de novo assembly of Hwangkeum is a valuable additional reference genome resource for characterizing traits for the improvement of this important crop species.

RevDate: 2021-09-27

Sato K, Mascher M, Himmelbach A, et al (2021)

Chromosome-scale assembly of wild barley accession "OUH602".

G3 (Bethesda, Md.), 11(10):.

Barley (Hordeum vulgare) was domesticated from its wild ancestral form ca. 10,000 years ago in the Fertile Crescent and is widely cultivated throughout the world, except for in tropical areas. The genome size of both cultivated barley and its conspecific wild ancestor is approximately 5 Gb. High-quality chromosome-level assemblies of 19 cultivated and one wild barley genotype were recently established by pan-genome analysis. Here, we release another equivalent short-read assembly of the wild barley accession "OUH602." A series of genetic and genomic resources were developed for this genotype in prior studies. Our assembly contains more than 4.4 Gb of sequence, with a scaffold N50 value of over 10 Mb. The haplotype shows high collinearity with the most recently updated barley reference genome, "Morex" V3, with some inversions. Gene projections based on "Morex" gene models revealed 46,807 protein-coding sequences and 43,375 protein-coding genes. Alignments to publicly available sequences of bacterial artificial chromosome (BAC) clones of "OUH602" confirm the high accuracy of the assembly. Since more loci of interest have been identified in "OUH602," the release of this assembly, with detailed genomic information, should accelerate gene identification and the utilization of this key wild barley accession.

RevDate: 2021-09-27

Mahtha SK, Purama RK, G Yadav (2021)

StAR-Related Lipid Transfer (START) Domains Across the Rice Pangenome Reveal How Ontogeny Recapitulated Selection Pressures During Rice Domestication.

Frontiers in genetics, 12:737194.

The StAR-related lipid transfer (START) domain containing proteins or START proteins, encoded by a plant amplified family of evolutionary conserved genes, play important roles in lipid binding, transport, signaling, and modulation of transcriptional activity in the plant kingdom, but there is limited information on their evolution, duplication, and associated sub- or neo-functionalization. Here we perform a comprehensive investigation of this family across the rice pangenome, using 10 wild and cultivated varieties. Conservation of START domains across all 10 rice genomes suggests low dispensability and critical functional roles for this family, further supported by chromosomal mapping, duplication and domain structure patterns. Analysis of synteny highlights a preponderance of segmental and dispersed duplication among STARTs, while transcriptomic investigation of the main cultivated variety Oryza sativa var. japonica reveals sub-functionalization amongst genes family members in terms of preferential expression across various developmental stages and anatomical parts, such as flowering. Ka/Ks ratios confirmed strong negative/purifying selection on START family evolution, implying that ontogeny recapitulated selection pressures during rice domestication. Our findings provide evidence for high conservation of START genes across rice varieties in numbers, as well as in their stringent regulation of Ka/Ks ratio, and showed strong functional dependency of plants on START proteins for their growth and reproductive development. We believe that our findings advance the limited knowledge about plant START domain diversity and evolution, and pave the way for more detailed assessment of individual structural classes of START proteins among plants and their domain specific substrate preferences, to complement existing studies in animals and yeast.

RevDate: 2021-09-27

Jaakkola K, Virtanen K, Lahti P, et al (2021)

Comparative Genome Analysis and Spore Heat Resistance Assay Reveal a New Component to Population Structure and Genome Epidemiology Within Clostridium perfringens Enterotoxin-Carrying Isolates.

Frontiers in microbiology, 12:717176.

Clostridium perfringens causes a variety of human and animal enteric diseases including food poisoning, antibiotic-associated diarrhea, and necrotic enteritis. Yet, the reservoirs of enteropathogenic enterotoxin-producing strains remain unknown. We conducted a genomic comparison of 290 strains and a heat resistance phenotyping of 30 C. perfringens strains to elucidate the population structure and ecology of this pathogen. C. perfringens genomes shared a conserved genetic backbone with more than half of the genes of an average genome conserved in >95% of strains. The cpe-carrying isolates were found to share genetic context: the cpe-carrying plasmids had different distribution patterns within the genetic lineages and the estimated pan genome of cpe-carrying isolates had a larger core genome and a smaller accessory genome compared to that of 290 strains. We characterize cpe-negative strains related to chromosomal cpe-carrying strains elucidating the origin of these strains and disclose two distinct groups of chromosomal cpe-carrying strains with different virulence characteristics, spore heat resistance properties, and, presumably, ecological niche. Finally, an antibiotic-associated diarrhea isolate carrying two copies of the enterotoxin cpe gene and the associated genetic lineage with the potential for the emergence of similar strains are outlined. With C. perfringens as an example, implications of input genome quality for pan genome analysis are discussed. Our study furthers the understanding of genome epidemiology and population structure of enteropathogenic C. perfringens and brings new insight into this important pathogen and its reservoirs.

RevDate: 2021-09-27

Wekesa CS, Furch ACU, R Oelmüller (2021)

Isolation and Characterization of High-Efficiency Rhizobia From Western Kenya Nodulating With Common Bean.

Frontiers in microbiology, 12:697567.

Common bean is one of the primary protein sources in third-world countries. They form nodules with nitrogen-fixing rhizobia, which have to be adapted to the local soils. Commercial rhizobial strains such as Rhizobium tropici CIAT899 are often used in agriculture. However, this strain failed to significantly increase the common bean yield in many places, including Kenya, due to the local soils' low pH. We isolated two indigenous rhizobial strains from the nodules of common bean from two fields in Western Kenya that have never been exposed to commercial inocula. We then determined their ability to fix nitrogen in common beans, solubilize phosphorus, and produce indole acetic acid. In greenhouse experiments, common bean plants inoculated with two isolates, B3 and S2 in sterile vermiculite, performed better than those inoculated with CIAT899 or plants grown with nitrogen fertilizer alone. In contrast to CIAT899, both isolates grew in the media with pH 4.8. Furthermore, isolate B3 had higher phosphate solubilization ability and produced more indole acetic acid than the other two rhizobia. Genome analyses revealed that B3 and S2 are different strains of Rhizobium phaseoli. We recommend fieldwork studies in Kenyan soils to test the efficacy of the two isolates in the natural environment in an effort to produce inoculants specific for these soils.

RevDate: 2021-09-26

Vela Gurovic MS, Díaz ML, Gallo CA, et al (2021)

Phylogenomics, CAZyome and core secondary metabolome of Streptomyces albus species.

Molecular genetics and genomics : MGG [Epub ahead of print].

A phylogenomic study conducted with different bioinformatic tools such as TYGS, REALPHY and AAI comparisons revealed a high rate of misidentified Streptomyces albus genomes in GenBank. Only 9 of the 18 annotated genomes available in the public database were correctly identified as S. albus species. The pangenome of the nine in silico confirmed S. albus genomes was almost closed. Lignocellulosic agroresidues were a common niche among strains of the S. albus clade while carbohydrate active enzymes (CAZymes) were highly conserved. Relevant enzymes for cellulose degradation such as beta glucosidases belonging to the GH1 family, a GH6 cellulase and a monooxygenase AA10-CBM2 were encoded by all S. albus genomes. Among them, one GH1 glycosidase would be regulated by CebR. However, this regulatory mechanism was not confirmed for other genes related to cellulose degradation. Based on AntiSMASH predictions, the core secondary metabolome of S. albus encompassed a total of 23 biosynthetic gene clusters (BGCs), where 4 were related to common metabolites within Streptomyces genus. Species specific BGCs included those related to pseudouridimycin and xantholipin. Additionally, four BGCs encoded putative derivatives of ibomycin, the lasso peptide SSV-2086, the lanthipeptide SapB and the terpene isorenieratene. Known metabolites could not be assigned to ten BGCs and three clusters did not match with any previously described BGC. The core genome of S. albus retrieved from nine closely related genomes revealed a high potential for the discovery of novel bioactive metabolites and underexplored regulatory genomic elements related to lignocellulose deconstruction.

RevDate: 2021-09-25

Contreras-Moreira B, Filippi CV, Naamati G, et al (2021)

K-mer counting and curated libraries drive efficient annotation of repeats in plant genomes.

The plant genome [Epub ahead of print].

The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis, or pangenome exploration. Although homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here, we benchmarked a two-step approach, where repeats were first called by k-mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, with the k-mer-based Repeat Detector (Red) and two repeat libraries (REdat, last updated in 2013, and nrTEplants, curated for this work). Custom libraries produced by RepeatModeler were also tested. We obtained repeated genome fractions that matched those reported in the literature but with shorter repeated elements than those produced directly by sequence homology. Inspection of the masked regions that overlapped genes revealed no preference for specific protein domains. Most Red-masked sequences could be successfully classified by sequence similarity, with the complete protocol taking less than 2 h on a desktop Linux box. A guide to curating your own repeat libraries and the scripts for masking and annotating plant genomes can be obtained at

RevDate: 2021-09-24

Horesh G, Taylor-Brown A, McGimpsey S, et al (2021)

Different evolutionary trends form the twilight zone of the bacterial pan-genome.

Microbial genomics, 7(9):.

The pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialized bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug-resistant pathogens. However, current approaches to analyse a species' pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classifying genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7500 Escherichia coli genomes, one of the most-studied bacterial species and used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species' pan-genome.

RevDate: 2021-09-21

Flores Ramos S, Brugger SD, Escapa IF, et al (2021)

Genomic Stability and Genetic Defense Systems in Dolosigranulum pigrum, a Candidate Beneficial Bacterium from the Human Microbiome.

mSystems [Epub ahead of print].

Dolosigranulum pigrum is positively associated with indicators of health in multiple epidemiological studies of human nasal microbiota. Knowledge of the basic biology of D. pigrum is a prerequisite for evaluating its potential for future therapeutic use; however, such data are very limited. To gain insight into D. pigrum's chromosomal structure, pangenome, and genomic stability, we compared the genomes of 28 D. pigrum strains that were collected across 20 years. Phylogenomic analysis showed closely related strains circulating over this period and closure of 19 genomes revealed highly conserved chromosomal synteny. Gene clusters involved in the mobilome and in defense against mobile genetic elements (MGEs) were enriched in the accessory genome versus the core genome. A systematic analysis for MGEs identified the first candidate D. pigrum prophage and insertion sequence. A systematic analysis for genetic elements that limit the spread of MGEs, including restriction modification (RM), CRISPR-Cas, and deity-named defense systems, revealed strain-level diversity in host defense systems that localized to specific genomic sites, including one RM system hot spot. Analysis of CRISPR spacers pointed to a wealth of MGEs against which D. pigrum defends itself. These results reveal a role for horizontal gene transfer and mobile genetic elements in strain diversification while highlighting that in D. pigrum this occurs within the context of a highly stable chromosomal organization protected by a variety of defense mechanisms. IMPORTANCE Dolosigranulum pigrum is a candidate beneficial bacterium with potential for future therapeutic use. This is based on its positive associations with characteristics of health in multiple studies of human nasal microbiota across the span of human life. For example, high levels of D. pigrum nasal colonization in adults predicts the absence of Staphylococcus aureus nasal colonization. Also, D. pigrum nasal colonization in young children is associated with healthy control groups in studies of middle ear infections. Our analysis of 28 genomes revealed a remarkable stability of D. pigrum strains colonizing people in the United States across a 20-year span. We subsequently identified factors that can influence this stability, including genomic stability, phage predators, the role of MGEs in strain-level variation, and defenses against MGEs. Finally, these D. pigrum strains also lacked predicted virulence factors. Overall, these findings add additional support to the potential for D. pigrum as a therapeutic bacterium.

RevDate: 2021-09-20

Sonnenberg CB, P Haugen (2021)

The Pseudoalteromonas multipartite genome: distribution and expression of pangene categories, and a hypothesis for the origin and evolution of the chromid.

G3 (Bethesda, Md.), 11(9):.

Bacterial genomes typically consist of one large chromosome, but can also include secondary replicons. These so-called multipartite genomes are scattered on the bacterial tree of life with the majority of cases belonging to Proteobacteria. Within the class gamma-proteobacteria, multipartite genomes are restricted to the two families Vibrionaceae and Pseudoalteromonadaceae. Whereas the genome of vibrios is well studied, information on the Pseudoalteromonadaceae genome is much scarcer. We have studied Pseudoalteromonadaceae with respect to the origin of the chromid, how pangene categories are distributed, how genes are expressed relative to their genomic location, and identified chromid hallmark genes. We calculated the Pseudoalteromonadaceae pangenome based on 25 complete genomes and found that core/softcore are significantly overrepresented in late replicating sectors of the chromid, regardless of how the chromid is replicated. On the chromosome, core/softcore and shell/cloud genes are only weakly overrepresented at the chromosomal replication origin and termination sequences, respectively. Gene expression is trending downwards with increasing distance from the chromosomal oriC, whereas the chromidal expression pattern is more complex. Moreover, we identified 78 chromid hallmark genes, and BLASTp searches suggest that the majority of them were acquired from the ancestral gene pool of Alteromonadales. Finally, our data strongly suggest that the chromid originates from a plasmid that was acquired in a relatively recent event. In summary, this study extends our knowledge on multipartite genomes, and helps us understand how and why secondary replicons are acquired, why they are maintained, and how they are shaped by evolution.

RevDate: 2021-09-20

Zou W, Ye G, Liu C, et al (2021)

Comparative genome analysis of Clostridium beijerinckii strains isolated from pit mud of Chinese strong flavor baijiu ecosystem.

G3 (Bethesda, Md.) pii:6364901 [Epub ahead of print].

Clostridium beijerinckii is a well-known anaerobic solventogenic bacterium which inhabits a wide range of different niches. Previously, we isolated five butyrate-producing C. beijerinckii strains from pit mud (PM) of strong-flavor baijiu (SFB) ecosystems. Genome annotation of the five strains showed that they could assimilate various carbon sources as well as ammonium to produce acetate, butyrate, lactate, hydrogen, and esters but did not produce the undesirable flavors isopropanol and acetone, making them useful for further exploration in SFB production. Our analysis of the genomes of an additional 233 C. beijerinckii strains revealed an open pangenome based on current sampling and will likely change with additional genomes. The core genome, accessory genome, and strain-specific genes comprised 1567, 8851, and 2154 genes, respectively. A total of 298 genes were found only in the five C. beijerinckii strains from PM, among which only 77 genes were assigned to Clusters of Orthologous Genes categories. In addition, 15 transposase and 12 phage integrase families were found in all five C. beijerinckii strains from PM. Between 18 and 21 genome islands were predicted for the five C. beijerinckii genomes. The existence of a large number of mobile genetic elements indicated that the genomes of the five C. beijerinckii strains evolved with the loss or insertion of DNA fragments in the PM of SFB ecosystems. This study presents a genomic framework of C. beijerinckii strains from PM that could be used for genetic diversification studies and further exploration of these strains.

RevDate: 2021-09-17

Bansal K, Kaur A, Midha S, et al (2021)

Xanthomonas sontii sp. nov., a non-pathogenic bacterium isolated from healthy basmati rice (Oryza sativa) seeds from India.

Antonie van Leeuwenhoek [Epub ahead of print].

We report three yellow-pigmented, Gram-negative, aerobic, rod-shaped, motile bacterial isolates designated as PPL1T, PPL2, and PPL3 from healthy basmati rice seeds. Phenotypic and 16S rRNA gene sequence analysis assigned these isolates to the genus Xanthomonas. The 16S rRNA showed a 99.59% similarity with X. sacchari CFBP 4641T, a sugarcane pathogen. Further, biochemical and fatty acid analysis revealed it to be closer to X. sacchari. Still, it differed from other species in general and known rice associated species such as X. oryzae (pathogenic) and X. maliensis (non-pathogenic) in particular. Interestingly, the isolatess in this study were isolated from healthy rice plants but are closely related to species that is pathogenic and isolated from diseased sugarcane. Accordingly, in planta studies revealed that PPL1T, PPL2, and PPL3 are non-pathogenic to rice plants upon leaf inoculation. Taxonogenomic studies based on orthologous average nucleotide identity (OrthoANI) and digital DNA-DNA hybridization (dDDH) values with type strains of Xanthomonas species were below the recommended threshold values for species delineation. Whole genome-based phylogenomic analysis revealed that these isolates formed a distinct monophyletic clade with X. sacchari CFBP 4641T as their closest neighbour. Further, pangenome analysis revealed PPL1T, PPL2, and PPL3 isolates to comprise NRPS cluster along with a large number of unique genes associated with the novel species. Based on polyphasic and genomic approaches, a novel lineage and species associated with healthy rice seeds for which the name Xanthomonas sontii sp. nov. is proposed. The type strain for the X. sontii sp. nov. is PPL1T (JCM 33631T = CFBP 8688T = ICMP 23426T = MTCC 12491T) and PPL2 (JCM 33632 = CFBP 8689 = ICMP 23427 = MTCC 12492) and PPL3 (JCM 33633 = CFBP 8690 = ICMP 23428 = MTCC 12493) as other strains of the species.

RevDate: 2021-09-20

Zhang M, Zhang Y, Han X, et al (2021)

Whole genome sequencing of Enterobacter mori, an emerging pathogen of kiwifruit and the potential genetic adaptation to pathogenic lifestyle.

AMB Express, 11(1):129.

Members of the Enterobacter genus are gram-negative bacteria, which are used as plant growth-promoting bacteria, and increasingly recovered from economic plants as emerging pathogens. A new Enterobacter mori strain, designated CX01, was isolated as an emerging bacterial pathogen of a recent outbreak of kiwifruit canker-like disease in China. The main symptoms associated with this syndrome are bleeding cankers on the trunk and branch, and brown leaf spots. The genome sequence of E. mori CX01 was determined as a single chromosome of 4,966,908 bp with 4640 predicted open reading frames (ORFs). To better understand the features of the genus and its potential pathogenic mechanisms, five available Enterobacter genomes were compared and a pan-genome of 4870 COGs with 3158 core COGs were revealed. An important feature of the E. mori CX01 genome is that it lacks a type III secretion system often found in pathogenic bacteria, instead it is equipped with type I, II, and VI secretory systems. Besides, the genes encoding putative virulence effectors, two-component systems, nutrient acquisition systems, proteins involved in phytohormone synthesis, which may contribute to the virulence and adaption to the host plant niches are included. The genome sequence of E. mori CX01 has high similarity with that of E. mori LMG 25,706, though the rearrangements occur throughout two genomes. Further pathogenicity assay showed that both strains can either invade kiwifruit or mulberry, indicating they may have similar host range. Comparison with a closely related isolate enabled us to understand its pathogenesis and ecology.

RevDate: 2021-09-17

Yocca AE, PP Edger (2021)

Machine learning approaches to identify core and dispensable genes in pangenomes.

The plant genome [Epub ahead of print].

A gene in a given taxonomic group is either present in every individual (core) or absent in at least a single individual (dispensable). Previous pangenomic studies have identified certain functional differences between core and dispensable genes. However, identifying if a gene belongs to the core or dispensable portion of the genome requires the construction of a pangenome, which involves sequencing the genomes of many individuals. Here we aim to leverage the previously characterized core and dispensable gene content for two grass species [Brachypodium distachyon (L.) P. Beauv. and Oryza sativa L.] to construct a machine learning model capable of accurately classifying genes as core or dispensable using only a single annotated reference genome. Such a model may mitigate the need for pangenome construction, an expensive hurdle especially in orphan crops, which often lack the adequate genomic resources.

RevDate: 2021-09-16

Olanrewaju OS, Ayilara MS, Ayangbenro AS, et al (2021)

Genome Mining of Three Plant Growth-Promoting Bacillus Species from Maize Rhizosphere.

Applied biochemistry and biotechnology [Epub ahead of print].

Bacillus species genomes are rich in plant growth-promoting genetic elements. Bacillus subtilis and Bacillus velezensis are important plant growth promoters; hence, to further improve their abilities, the genetic elements responsible for these traits were characterized and reported. Genetic elements reported include those of auxin, nitrogen fixation, siderophore production, iron acquisition, volatile organic compounds, and antibiotics. Furthermore, the presence of phages and antibiotic-resistant genes in the genomes are reported. Pan-genome analysis was conducted using ten Bacillus species. From the analysis, pan-genome of Bacillus subtilis and Bacillus velezensis are still open. Ultimately, this study brings an insight into the genetic components of the plant growth-promoting abilities of these strains and shows their potential biotechnological applications in agriculture and other relevant sectors.

RevDate: 2021-09-18

Colquhoun RM, Hall MB, Lima L, et al (2021)

Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs.

Genome biology, 22(1):267.

We present pandora, a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of references, detects novel variation and pan-genotypes multiple samples. Using a reference graph of 578 Escherichia coli genomes, we compare 20 diverse isolates. Pandora recovers more rare SNPs than single-reference-based tools, is significantly better than picking the closest RefSeq reference, and provides a stable framework for analyzing diverse samples without reference bias.

RevDate: 2021-09-14

Da Silva K, Pons N, Berland M, et al (2021)

StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs.

PeerJ, 9:e11884.

Current studies are shifting from the use of single linear references to representation of multiple genomes organised in pangenome graphs or variation graphs. Meanwhile, in metagenomic samples, resolving strain-level abundances is a major step in microbiome studies, as associations between strain variants and phenotype are of great interest for diagnostic and therapeutic purposes. We developed StrainFLAIR with the aim of showing the feasibility of using variation graphs for indexing highly similar genomic sequences up to the strain level, and for characterizing a set of unknown sequenced genomes by querying this graph. On simulated data composed of mixtures of strains from the same bacterial species Escherichia coli, results show that StrainFLAIR was able to distinguish and estimate the abundances of close strains, as well as to highlight the presence of a new strain close to a referenced one and to estimate its abundance. On a real dataset composed of a mix of several bacterial species and several strains for the same species, results show that in a more complex configuration StrainFLAIR correctly estimates the abundance of each strain. Hence, results demonstrated how graph representation of multiple close genomes can be used as a reference to characterize a sample at the strain level.

RevDate: 2021-09-10

Hall RJ, Whelan FJ, Cummins EA, et al (2021)

Gene-gene relationships in an Escherichia coli accessory genome are linked to function and mobility.

Microbial genomics, 7(9):.

The pangenome contains all genes encoded by a species, with the core genome present in all strains and the accessory genome in only a subset. Coincident gene relationships are expected within the accessory genome, where the presence or absence of one gene is influenced by the presence or absence of another. Here, we analysed the accessory genome of an Escherichia coli pangenome consisting of 400 genomes from 20 sequence types to identify genes that display significant co-occurrence or avoidance patterns with one another. We present a complex network of genes that are either found together or that avoid one another more often than would be expected by chance, and show that these relationships vary by lineage. We demonstrate that genes co-occur by function, and that several highly connected gene relationships are linked to mobile genetic elements. We find that genes are more likely to co-occur with, rather than avoid, another gene in the accessory genome. This work furthers our understanding of the dynamic nature of prokaryote pangenomes and implicates both function and mobility as drivers of gene relationships.

RevDate: 2021-09-20

Mazzuoli MV, Daunesse M, Varet H, et al (2021)

The CovR regulatory network drives the evolution of Group B Streptococcus virulence.

PLoS genetics, 17(9):e1009761.

Virulence of the neonatal pathogen Group B Streptococcus is under the control of the master regulator CovR. Inactivation of CovR is associated with large-scale transcriptome remodeling and impairs almost every step of the interaction between the pathogen and the host. However, transcriptome analyses suggested a plasticity of the CovR signaling pathway in clinical isolates leading to phenotypic heterogeneity in the bacterial population. In this study, we characterized the CovR regulatory network in a strain representative of the CC-17 hypervirulent lineage responsible of the majority of neonatal meningitis. Transcriptome and genome-wide binding analysis reveal the architecture of the CovR network characterized by the direct repression of a large array of virulence-associated genes and the extent of co-regulation at specific loci. Comparative functional analysis of the signaling network links strain-specificities to the regulation of the pan-genome, including the two specific hypervirulent adhesins and horizontally acquired genes, to mutations in CovR-regulated promoters, and to variability in CovR activation by phosphorylation. This regulatory adaptation occurs at the level of genes, promoters, and of CovR itself, and allows to globally reshape the expression of virulence genes. Overall, our results reveal the direct, coordinated, and strain-specific regulation of virulence genes by the master regulator CovR and suggest that the intra-species evolution of the signaling network is as important as the expression of specific virulence factors in the emergence of clone associated with specific diseases.

RevDate: 2021-09-08

Li G, Jiang T, Li J, et al (2021)

PanSVR: Pan-Genome Augmented Short Read Realignment for Sensitive Detection of Structural Variations.

Frontiers in genetics, 12:731515.

The comprehensive discovery of structure variations (SVs) is fundamental to many genomics studies and high-throughput sequencing has become a common approach to this task. However, due the limited length, it is still non-trivial to state-of-the-art tools to accurately align short reads and produce high-quality SV callsets. Pan-genome provides a novel and promising framework to short read-based SV calling since it enables to comprehensively integrate known variants to reduce the incompleteness and bias of single reference to breakthrough the bottlenecks of short read alignments and provide new evidences to the detection of SVs. However, it is still an open problem to develop effective computational approaches to fully take the advantage of pan-genomes. Herein, we propose Pan-genome augmented Structure Variation calling tool with read Re-alignment (PanSVR), a novel pan-genome-based SV calling approach. PanSVR uses several tailored methods to implement precise re-alignment for SV-spanning reads against well-organized pan-genome reference with plenty of known SVs. PanSVR enables to greatly improve the quality of short read alignments and produce clear and homogenous SV signatures which facilitate SV calling. Benchmark results on real sequencing data suggest that PanSVR is able to largely improve the sensitivity of SV calling than that of state-of-the-art SV callers, especially for the SVs from repeat-rich regions and/or novel insertions which are difficult to existing tools.

RevDate: 2021-09-08

Karthik K, Anbazhagan S, Thomas P, et al (2021)

Genome Sequencing and Comparative Genomics of Indian Isolates of Brucella melitensis.

Frontiers in microbiology, 12:698069.

Brucella melitensis causes small ruminant brucellosis and a zoonotic pathogen prevalent worldwide. Whole genome phylogeny of all available B. melitensis genomes (n = 355) revealed that all Indian isolates (n = 16) clustered in the East Mediterranean lineage except the ADMAS-GI strain. Pangenome analysis indicated the presence of limited accessory genomes with few clades showing specific gene presence/absence pattern. A total of 43 virulence genes were predicted in all the Indian strains of B. melitensis except 2007BM-1 (ricA and wbkA are absent). Multilocus sequence typing (MLST) analysis indicated all except one Indian strain (ADMAS-GI) falling into sequence type (ST 8). In comparison with MLST, core genome phylogeny indicated two major clusters (>70% bootstrap support values) among Indian strains. Clusters with <70% bootstrap support values represent strains with diverse evolutionary origins present among animal and human hosts. Genetic relatedness among animal (sheep and goats) and human strains with 100% bootstrap values shows its zoonotic transfer potentiality. SNP-based analysis indicated similar clustering to that of core genome phylogeny. Among the Indian strains, the highest number of unique SNPs (112 SNPs) were shared by a node that involved three strains from Tamil Nadu. The node SNPs involved several peptidase genes like U32, M16 inactive domain protein, clp protease family protein, and M23 family protein and mostly represented non-synonymous (NS) substitutions. Vaccination has been followed in several parts of the world to prevent small ruminant brucellosis but not in India. Comparison of Indian strains with vaccine strains showed that M5 is genetically closer to most of the Indian strains than Rev.1 strain. The presence of most of the virulence genes among all Indian strains and conserved core genome compositions suggest the use of any circulating strain/genotypes for the development of a vaccine candidate for small ruminant brucellosis in India.

RevDate: 2021-09-08

Agarwal G, Choudhary D, Stice SP, et al (2021)

Pan-Genome-Wide Analysis of Pantoea ananatis Identified Genes Linked to Pathogenicity in Onion.

Frontiers in microbiology, 12:684756.

Pantoea ananatis, a gram negative and facultative anaerobic bacterium is a member of a Pantoea spp. complex that causes center rot of onion, which significantly affects onion yield and quality. This pathogen does not have typical virulence factors like type II or type III secretion systems but appears to require a biosynthetic gene-cluster, HiVir/PASVIL (located chromosomally comprised of 14 genes), for a phosphonate secondary metabolite, and the 'alt' gene cluster (located in plasmid and comprised of 11 genes) that aids in bacterial colonization in onion bulbs by imparting tolerance to thiosulfinates. We conducted a deep pan-genome-wide association study (pan-GWAS) to predict additional genes associated with pathogenicity in P. ananatis using a panel of diverse strains (n = 81). We utilized a red-onion scale necrosis assay as an indicator of pathogenicity. Based on this assay, we differentiated pathogenic (n = 51)- vs. non-pathogenic (n = 30)-strains phenotypically. Pan-genome analysis revealed a large core genome of 3,153 genes and a flexible accessory genome. Pan-GWAS using the presence and absence variants (PAVs) predicted 42 genes, including 14 from the previously identified HiVir/PASVIL cluster associated with pathogenicity, and 28 novel genes that were not previously associated with pathogenicity in onion. Of the 28 novel genes identified, eight have annotated functions of site-specific tyrosine kinase, N-acetylmuramoyl-L-alanine amidase, conjugal transfer, and HTH-type transcriptional regulator. The remaining 20 genes are currently hypothetical. Further, a core-genome SNPs-based phylogeny and horizontal gene transfer (HGT) studies were also conducted to assess the extent of lateral gene transfer among diverse P. ananatis strains. Phylogenetic analysis based on PAVs and whole genome multi locus sequence typing (wgMLST) rather than core-genome SNPs distinguished red-scale necrosis inducing (pathogenic) strains from non-scale necrosis inducing (non-pathogenic) strains of P. ananatis. A total of 1182 HGT events including the HiVir/PASVIL and alt cluster genes were identified. These events could be regarded as a major contributing factor to the diversification, niche-adaptation and potential acquisition of pathogenicity/virulence genes in P. ananatis.

RevDate: 2021-09-14

Letcher B, Hunt M, Z Iqbal (2021)

Gramtools enables multiscale variation analysis with genome graphs.

Genome biology, 22(1):259.

Genome graphs allow very general representations of genetic variation; depending on the model and implementation, variation at different length-scales (single nucleotide polymorphisms (SNPs), structural variants) and on different sequence backgrounds can be incorporated with different levels of transparency. We implement a model which handles this multiscale variation and develop a JSON extension of VCF (jVCF) allowing for variant calls on multiple references, both implemented in our software gramtools. We find gramtools outperforms existing methods for genotyping SNPs overlapping large deletions in M. tuberculosis and is able to genotype on multiple alternate backgrounds in P. falciparum, revealing previously hidden recombination.

RevDate: 2021-09-06

Gupta PK (2021)

GWAS for genetics of complex quantitative traits: Genome to pangenome and SNPs to SVs and k-mers.

BioEssays : news and reviews in molecular, cellular and developmental biology [Epub ahead of print].

The development of improved methods for genome-wide association studies (GWAS) for genetics of quantitative traits has been an active area of research during the last 25 years. This activity initially started with the use of mixed linear model (MLM), which was variously modified. During the last decade, however, with the availability of high throughput next generation sequencing (NGS) technology, development and use of pangenomes and novel markers including structural variations (SVs) and k-mers for GWAS has taken over as a new thrust area of research. Pangenomes and SVs are now available in humans, livestock, and a number of plant species, so that these resources along with k-mers are being used in GWAS for exploring additional genetic variation that was hitherto not available for analysis. These developments have resulted in significant improvement in GWAS methodology for detection of marker-trait associations (MTAs) that are relevant to human healthcare and crop improvement.

RevDate: 2021-09-07

Mann A, Malik S, Rana JS, et al (2021)

Whole genome sequencing data of Klebsiella aerogenes isolated from agricultural soil of Haryana, India.

Data in brief, 38:107311.

Klebsiella aerogenes, is a Gram-negative bacterium, which was previously known as Enterobacter aerogenes. It is present in all environments such as water, soil, air and hospitals; and is an opportunistic pathogen that causes several types of infections. As compared to other clinically important pathogens included in the ESKAPE category (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species), the pangenome and population structure of Klebsiella aerogenes is still poorly understood. For the present study, the bacterial sample was isolated from agricultural soils of Haryana, India. With an aim to identify the occurrence of multi-drug resistance genes in the agricultural field soil bacterial isolate, whole genome sequencing (WGS) of the bacteria was performed; and the antibiotic resistance causing genes, along with the genes responsible for other major functions of the cell; and the different Single Nuceotide Polymorphisms (SNPs) and Insertions and deletions (InDels) were identified. The data presented in this manuscript can be reused by researchers as a reference for determining the antibiotic resistance genes that could be present in different bacterial isolates, and it would also help in determination of functions of various other genes present in other genomes of Klebsiella species.

RevDate: 2021-09-07

Rai A, Jagadeeshwari U, Deepshikha G, et al (2021)

Phylotaxogenomics for the Reappraisal of the Genus Roseomonas With the Creation of Six New Genera.

Frontiers in microbiology, 12:677842.

The genus Roseomonas is a significant group of bacteria which is invariably of great clinical and ecological importance. Previous studies have shown that the genus Roseomonas is polyphyletic in nature. Our present study focused on generating a lucid understanding of the phylogenetic framework for the re-evaluation and reclassification of the genus Roseomonas. Phylogenetic studies based on the 16S rRNA gene and 92 concatenated genes suggested that the genus is heterogeneous, forming seven major groups. Existing Roseomonas species were subjected to an array of genomic, phenotypic, and chemotaxonomic analyses in order to resolve the heterogeneity. Genomic similarity indices (dDDH and ANI) indicated that the members were well-defined at the species level. The Percentage of Conserved Proteins (POCP) and the average Amino Acid Identity (AAI) values between the groups of the genus Roseomonas and other interspersing members of the family Acetobacteraceae were below 65 and 70%, respectively. The pan-genome evaluation depicted that the pan-genome was an open type and the members shared 958 core genes. This claim of reclassification was equally supported by the phenotypic and chemotaxonomic differences between the groups. Thus, in this study, we propose to re-evaluate and reclassify the genus Roseomonas and propose six novel genera as Pararoseomonas gen. nov., Falsiroseomonas gen. nov., Paeniroseomonas gen. nov., Plastoroseomonas gen. nov., Neoroseomonas gen. nov., and Pseudoroseomonas gen. nov.

RevDate: 2021-09-04

Vandamme P, Peeters C, Seth-Smith HMB, et al (2021)

Gulosibacter hominis sp. nov.: a novel human microbiome bacterium that may cause opportunistic infections.

Antonie van Leeuwenhoek [Epub ahead of print].

We present genomic, phylogenomic, and phenotypic taxonomic data to demonstrate that three human ear isolates represent a novel species within the genus Gulosibacter. These isolates could not be identified reliably using MALDI-TOF mass spectrometry during routine diagnostic work, but partial 16S rRNA gene sequence analysis revealed that they belonged to the genus Gulosibacter. Overall genomic relatedness indices between the draft genome sequences of the three isolates and of the type strains of established Gulosibacter species confirmed that the three isolates represented a single novel Gulosibacter species. A biochemical characterisation yielded differential tests between the novel and established Gulosibacter species, which could also be differentiated using MALDI-TOF mass spectrometry. We propose to formally classify these three isolates into Gulosibacter hominis sp. nov., with 401352-2018 T (= LMG 31778 T, CCUG 74795 T) as the type strain. The whole-genome sequence of strain 401352-2018 T has a size of 2,340,181 bp and a G+C content of 62.04 mol%. A Gulosibacter pangenome analysis revealed 467 gene clusters that were exclusively present in G. hominis genomes. While these G. hominis specific gene clusters were enriched in several COG functional categories, this analysis did not reveal functions that suggested a role in the human microbiome, nor did it explain the occurrence of G. hominis in ear infections. The absence of acquired antimicrobial resistance determinants and virulence factors in the G. hominis genomes, and an analysis of publicly available 16S rRNA gene sequences and 16S rRNA amplicon sequencing data sets suggested that G. hominis is a member of the human skin microbiota that may occasionally be involved in opportunistic infections.

RevDate: 2021-09-23

Li Q, Tian S, Yan B, et al (2021)

Building a Chinese pan-genome of 486 individuals.

Communications biology, 4(1):1016.

Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genomes to identify 276 Mbp of DNA sequences that, to our knowledge, are absent in the current human reference. We classified these sequences into individual-specific and common sequences, and propose that the common sequence size is uncapped with a growing population. The 46.646 Mbp common sequences obtained from the 486 individuals improved the accuracy of variant calling and mapping rate when added to the reference genome. We also analyzed the genomic positions of these common sequences and found that they came from genomic regions characterized by high mutation rate and low pathogenicity. Our study authenticates the Chinese pan-genome as representative of DNA sequences specific to the Han Chinese population missing from the GRCh38 reference genome and establishes the newly defined common sequences as candidates to supplement the current human reference.

RevDate: 2021-09-24

Peters S, Pascoe B, Wu Z, et al (2021)

Campylobacter jejuni genotypes are associated with post-infection irritable bowel syndrome in humans.

Communications biology, 4(1):1015.

Campylobacter enterocolitis may lead to post-infection irritable bowel syndrome (PI-IBS) and while some C. jejuni strains are more likely than others to cause human disease, genomic and virulence characteristics promoting PI-IBS development remain uncharacterized. We combined pangenome-wide association studies and phenotypic assays to compare C. jejuni isolates from patients who developed PI-IBS with those who did not. We show that variation in bacterial stress response (Cj0145_phoX), adhesion protein (Cj0628_CapA), and core biosynthetic pathway genes (biotin: Cj0308_bioD; purine: Cj0514_purQ; isoprenoid: Cj0894c_ispH) were associated with PI-IBS development. In vitro assays demonstrated greater adhesion, invasion, IL-8 and TNFα secretion on colonocytes with PI-IBS compared to PI-no-IBS strains. A risk-score for PI-IBS development was generated using 22 genomic markers, four of which were from Cj1631c, a putative heme oxidase gene linked to virulence. Our finding that specific Campylobacter genotypes confer greater in vitro virulence and increased risk of PI-IBS has potential to improve understanding of the complex host-pathogen interactions underlying this condition.

RevDate: 2021-09-03

Porcellato D, Smistad M, Skeie SB, et al (2021)

Whole genome sequencing reveals possible host species adaptation of Streptococcus dysgalactiae.

Scientific reports, 11(1):17350.

Streptococcus dysgalactiae (SD) is an emerging pathogen in human and veterinary medicine, and is associated with several host species, disease phenotypes and virulence mechanisms. SD has traditionally been divided into the subspecies dysgalactiae (SDSD) and subsp. equisimilis (SDSE), but recent molecular studies have indicated that the phylogenetic relationships are more complex. Moreover, the genetic basis for the niche versatility of SD has not been extensively investigated. To expand the knowledge about virulence factors, phylogenetic relationships and host-adaptation strategies of SD, we analyzed 78 SDSD genomes from cows and sheep, and 78 SDSE genomes from other host species. Sixty SDSD and 40 SDSE genomes were newly sequenced in this study. Phylogenetic analysis supported SDSD as a distinct taxonomic entity, presenting a mean value of the average nucleotide identity of 99%. Bovine and ovine associated SDSD isolates clustered separately on pangenome analysis, but no single gene or genetic region was uniquely associated with host species. In contrast, SDSE isolates were more heterogenous and could be delineated in accordance with host. Although phylogenetic clustering suggestive of cross species transmission was observed, we predominantly detected a host restricted distribution of the SD-lineages. Furthermore, lineage specific virulence factors were detected, several of them located in proximity to hotspots for integration of mobile genetic elements. Our study indicates that SD has evolved to adapt to several different host species and infers a potential role of horizontal genetic transfer in niche specialization.

RevDate: 2021-08-31

Bachert BA, Richardson JB, Mlynek KD, et al (2021)

Development, Phenotypic Characterization and Genomic Analysis of a Francisella tularensis Panel for Tularemia Vaccine Testing.

Frontiers in microbiology, 12:725776.

Francisella tularensis is one of several biothreat agents for which a licensed vaccine is needed to protect against this pathogen. To aid in the development of a vaccine protective against pneumonic tularemia, we generated and characterized a panel of F. tularensis isolates that can be used as challenge strains to assess vaccine efficacy. Our panel consists of both historical and contemporary isolates derived from clinical and environmental sources, including human, tick, and rabbit isolates. Whole genome sequencing was performed to assess the genetic diversity in comparison to the reference genome F. tularensis Schu S4. Average nucleotide identity analysis showed >99% genomic similarity across the strains in our panel, and pan-genome analysis revealed a core genome of 1,707 genes, and an accessory genome of 233 genes. Three of the strains in our panel, FRAN254 (tick-derived), FRAN255 (a type B strain), and FRAN256 (a human isolate) exhibited variation from the other strains. Moreover, we identified several unique mutations within the Francisella Pathogenicity Island across multiple strains in our panel, revealing unexpected diversity in this region. Notably, FRAN031 (Scherm) completely lacked the second pathogenicity island but retained virulence in mice. In contrast, FRAN037 (Coll) was attenuated in a murine pneumonic tularemia model and had mutations in pdpB and iglA which likely led to attenuation. All of the strains, except FRAN037, retained full virulence, indicating their effectiveness as challenge strains for future vaccine testing. Overall, we provide a well-characterized panel of virulent F. tularensis strains that can be utilized in ongoing efforts to develop an effective vaccine against pneumonic tularemia to ensure protection is achieved across a range F. tularensis strains.

RevDate: 2021-08-31

Outten J, A Warren (2021)

Methods and Developments in Graphical Pangenomics.

Journal of the Indian Institute of Science [Epub ahead of print].

Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics.

RevDate: 2021-08-31

Mashima I, Liao YC, Lin CH, et al (2021)

Comparative Pan-Genome Analysis of Oral Veillonella Species.

Microorganisms, 9(8):.

The genus Veillonella is a common and abundant member of the oral microbiome. It includes eight species, V. atypica, V. denticariosi, V. dispar, V. infantium, V. nakazawae, V. parvula, V. rogosae and V. tobetusensis. They possess important metabolic pathways that utilize lactate as an energy source. However, the overall metabolome of these species has not been studied. To further understand the metabolic framework of Veillonella in the human oral microbiome, we conducted a comparative pan-genome analysis of the eight species of oral Veillonella. Analysis of the oral Veillonella pan-genome revealed features based on KEGG pathway information to adapt to the oral environment. We found that the fructose metabolic pathway was conserved in all oral Veillonella species, and oral Veillonella have conserved pathways that utilize carbohydrates other than lactate as an energy source. This discovery may help to better understand the metabolic network among oral microbiomes and will provide guidance for the design of future in silico and in vitro studies.

RevDate: 2021-08-31

Agarwal G, Gitaitis RD, B Dutta (2021)

Pan-Genome of Novel Pantoea stewartii subsp. indologenes Reveals Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer.

Microorganisms, 9(8):.

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot on foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onions. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onions and millets or on millets only, respectively. In the current study, we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n = 4) and pv. setariae (n = 13)]. The full spectrum of the pan-genome contained 7030 genes. Among these, 3546 (present in genomes of all 17 strains) were the core genes that were a subset of 3682 soft-core genes (present in ≥16 strains). The accessory genome included 1308 shell genes and 2040 cloud genes (present in ≤2 strains). The pan-genome showed a clear linear progression with >6000 genes, suggesting that the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison with core genome SNPs-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study using Psi strains from both pathovars along with strains from other Pantoea species, namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfer events occurring between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes, including seven gene-clusters, which were associated with the pathogenicity phenotype (necrosis on seedling) on onions. One of the gene-clusters contained 11 genes with known functions and was found to be chromosomally located.

RevDate: 2021-08-31

Lee JY, Lee DH, DH Kim (2021)

Characterization of Martelella soudanensis sp. nov., Isolated from a Mine Sediment.

Microorganisms, 9(8):.

Gram-stain-negative, strictly aerobic, non-spore-forming, non-motile, and rod-shaped bacterial strains, designated NC18T and NC20, were isolated from the sediment near-vertical borehole effluent originating 714 m below the subsurface located in the Soudan Iron Mine in Minnesota, USA. The 16S rRNA gene sequence showed that strains NC18T and NC20 grouped with members of the genus Martelella, including M. mediterranea DSM 17316T and M. limonii YC7034T. The genome sizes and G + C content of both NC18T and NC20 were 6.1 Mb and 61.8 mol%, respectively. Average nucleotide identity (ANI), the average amino acid identity (AAI), and digital DNA-DNA hybridization (dDDH) values were below the species delineation threshold. Pan-genomic analysis showed that NC18T, NC20, M. mediterranea DSM 17316T, M. endophytica YC6887T, and M. lutilitoris GH2-6T had 8470 pan-genome orthologous groups (POGs) in total. Five Martelella strains shared 2258 POG core, which were mainly associated with amino acid transport and metabolism, general function prediction only, carbohydrate transport and metabolism, translation, ribosomal structure and biogenesis, and transcription. The two novel strains had major fatty acids (>5%) including summed feature 8 (C18:1 ω7c and/or C18:1 ω6c), C19:0 cyclo ω8c, C16:0, C18:1 ω7c 11-methyl, C18:0, and summed feature 2 (C12:0 aldehyde and/or iso-C16:1 I and/or C14:0 3-OH). The sole respiratory quinone was uniquinone-10 (Q-10). On the basis of polyphasic taxonomic analyses, strains NC18T and NC20 represent novel species of the genus Martelella, for which the name Martelella soudanensis sp. nov. is proposed. The type strain is NC18T (=KTCT 82174T = NBRC 114661T).

RevDate: 2021-08-31

Castillo D, Donati VL, Jørgensen J, et al (2021)

Comparative Genomic Analyses of Flavobacterium psychrophilum Isolates Reveals New Putative Genetic Determinants of Virulence Traits.

Microorganisms, 9(8):.

The fish pathogen Flavobacterium psychrophilum is currently one of the main pathogenic bacteria hampering the productivity of salmonid farming worldwide. Although putative virulence determinants have been identified, the genetic basis for variation in virulence of F. psychrophilum is not fully understood. In this study, we analyzed whole-genome sequences of a collection of 25 F. psychrophilum isolates from Baltic Sea countries and compared genomic information with a previous determination of their virulence in juvenile rainbow trout. The results revealed a conserved population of F. psychrophilum that were consistently present across the Baltic Sea countries, with no clear association between genomic repertoire, phylogenomic, or gene distribution and virulence traits. However, analysis of the entire genome of four F. psychrophilum isolates by hybrid assembly provided an unprecedented resolution for discriminating even highly related isolates. The results showed that isolates with different virulence phenotypes harbored genetic variances on a number of consecutive leucine-rich repeat (LRR) proteins, repetitive motifs in gliding motility-associated protein, and the insertion of transposable elements into intergenic and genic regions. Thus, these findings provide novel insights into the genetic variation of these elements and their putative role in the modulation of F. psychrophilum virulence.

RevDate: 2021-08-30

Lin N, Tao Y, Gao P, et al (2021)

Comparative Genomics Revealing Insights into Niche Separation of the Genus Methylophilus.

Microorganisms, 9(8):.

The genus Methylophilus uses methanol as a carbon and energy source, which is widely distributed in terrestrial, freshwater and marine ecosystems. Here, three strains (13, 14 and QUAN) related to the genus Methylophilus, were newly isolated from Lake Fuxian sediments. The draft genomes of strains 13, 14 and QUAN were 3.11 Mb, 3.02 Mb, 3.15 Mb with a G+C content of 51.13, 50.48 and 50.33%, respectively. ANI values between strains 13 and 14, 13 and QUAN, and 14 and QUAN were 81.09, 81.06 and 91.46%, respectively. Pan-genome and core-genome included 3994 and 1559 genes across 18 Methylophilus genomes, respectively. Phylogenetic analysis based on 1035 single-copy genes and 16S rRNA genes revealed two clades, one containing strains isolated from aquatic and the other from the leaf surface. Twenty-three aquatic-specific genes, such as 2OG/Fe(II) oxygenase and diguanylate cyclase, reflected the strategy to survive in oxygen-limited water and sediment. Accordingly, 159 genes were identified specific to leaf association. Besides niche separation, Methylophilus could utilize the combination of ANRA and DNRA to convert nitrate to ammonia and reduce sulfate to sulfur according to the complete sulfur metabolic pathway. Genes encoding the cytochrome c protein and riboflavin were detected in Methylophilus genomes, which directly or indirectly participate in electron transfer.

RevDate: 2021-09-15

Xu S, Li Z, Huang Y, et al (2021)

Whole genome sequencing reveals the genomic diversity, taxonomic classification, and evolutionary relationships of the genus Nocardia.

PLoS neglected tropical diseases, 15(8):e0009665.

Nocardia is a complex and diverse genus of aerobic actinomycetes that cause complex clinical presentations, which are difficult to diagnose due to being misunderstood. To date, the genetic diversity, evolution, and taxonomic structure of the genus Nocardia are still unclear. In this study, we investigated the pan-genome of 86 Nocardia type strains to clarify their genetic diversity. Our study revealed an open pan-genome for Nocardia containing 265,836 gene families, with about 99.7% of the pan-genome being variable. Horizontal gene transfer appears to have been an important evolutionary driver of genetic diversity shaping the Nocardia genome and may have caused historical taxonomic confusion from other taxa (primarily Rhodococcus, Skermania, Aldersonia, and Mycobacterium). Based on single-copy gene families, we established a high-accuracy phylogenomic approach for Nocardia using 229 genome sequences. Furthermore, we found 28 potentially new species and reclassified 16 strains. Finally, by comparing the topology between a phylogenomic tree and 384 phylogenetic trees (from 384 single-copy genes from the core genome), we identified a novel locus for inferring the phylogeny of this genus. The dapb1 gene, which encodes dipeptidyl aminopeptidase BI, was far superior to commonly used markers for Nocardia and yielded a topology almost identical to that of genome-based phylogeny. In conclusion, the present study provides insights into the genetic diversity, contributes a robust framework for the taxonomic classification, and elucidates the evolutionary relationships of Nocardia. This framework should facilitate the development of rapid tests for the species identification of highly variable species and has given new insight into the behavior of this genus.

RevDate: 2021-08-27

Shapiro JW, C Putonti (2021)

Rephine.r: a pipeline for correcting gene calls and clusters to improve phage pangenomes and phylogenies.

PeerJ, 9:e11950.

Background: A pangenome is the collection of all genes found in a set of related genomes. For microbes, these genomes are often different strains of the same species, and the pangenome offers a means to compare gene content variation with differences in phenotypes, ecology, and phylogenetic relatedness. Though most frequently applied to bacteria, there is growing interest in adapting pangenome analysis to bacteriophages. However, working with phage genomes presents new challenges. First, most phage families are under-sampled, and homologous genes in related viruses can be difficult to identify. Second, homing endonucleases and intron-like sequences may be present, resulting in fragmented gene calls. Each of these issues can reduce the accuracy of standard pangenome analysis tools.

Methods: We developed an R pipeline called Rephine.r that takes as input the gene clusters produced by an initial pangenomics workflow. Rephine.r then proceeds in two primary steps. First, it identifies three common causes of fragmented gene calls: (1) indels creating early stop codons and new start codons; (2) interruption by a selfish genetic element; and (3) splitting at the ends of the reported genome. Fragmented genes are then fused to create new sequence alignments. In tandem, Rephine.r searches for distant homologs separated into different gene families using Hidden Markov Models. Significant hits are used to merge families into larger clusters. A final round of fragment identification is then run, and results may be used to infer single-copy core genomes and phylogenetic trees.

Results: We applied Rephine.r to three well-studied phage groups: the Tevenvirinae (e.g., T4), the Studiervirinae (e.g., T7), and the Pbunaviruses (e.g., PB1). In each case, Rephine.r recovered additional members of the single-copy core genome and increased the overall bootstrap support of the phylogeny. The Rephine.r pipeline is provided through GitHub ( as a single script for automated analysis and with utility functions to assist in building single-copy core genomes and predicting the sources of fragmented genes.

RevDate: 2021-09-10

Clermont O, Condamine B, Dion S, et al (2021)

The E phylogroup of Escherichia coli is highly diverse and mimics the whole E. coli species population structure.

Environmental microbiology [Epub ahead of print].

To get a global picture of the population structure of the Escherichia coli phylogroup E, encompassing the O157:H7 EHEC lineage, we analysed the whole genome of 144 strains isolated from various continents, hosts and lifestyles and representative of the phylogroup diversity. The strains possess 4331 to 5440 genes with a core genome of 2771 genes and a pangenome of 33 722 genes. The distribution of these genes among the strains shows an asymmetric U-shaped distribution. E phylogenetic strains have the largest genomes of the species, partly explained by the presence of mobile genetic elements. Sixty-eight lineages were delineated, some of them exhibiting extra-intestinal virulence genes and being virulent in the mouse sepsis model. Except for the EHEC lineages and the reference EPEC, EIEC and ETEC strains, very few strains possess intestinal virulence genes. Most of the strains were devoid of acquired resistance genes, but eight strains possessed extended-spectrum beta-lactamase genes. Human strains belong to specific lineages, some of them being virulent and antibiotic-resistant [sequence type complexes (STcs) 350 and 2064]. The E phylogroup mimics all the features of the species as a whole, a phenomenon already observed at the STc level, arguing for a fractal population structure of E. coli.

RevDate: 2021-08-24

Lee AHY, Porto WF, de Faria C, et al (2021)

Genomic insights into the diversity, virulence and resistance of Klebsiella pneumoniae extensively drug resistant clinical isolates.

Microbial genomics, 7(8):.

Klebsiella pneumoniae has been implicated in wide-ranging nosocomial outbreaks, causing severe infections without effective treatments due to antibiotic resistance. Here, we performed genome sequencing of 70 extensively drug resistant clinical isolates, collected from Brasília's hospitals (Brazil) between 2010 and 2014. The majority of strains (60 out of 70) belonged to a single clonal complex (CC), CC258, which has become distributed worldwide in the last two decades. Of these CC258 strains, 44 strains were classified as sequence type 11 (ST11) and fell into two distinct clades, but no ST258 strains were found. These 70 strains had a pan-genome size of 10 366 genes, with a core-genome size of ~4476 genes found in 95 % of isolates. Analysis of sequences revealed diverse mechanisms of resistance, including production of multidrug efflux pumps, enzymes with the same target function but with reduced or no affinity to the drug, and proteins that protected the drug target or inactivated the drug. β-Lactamase production provided the most notable mechanism associated with K. pneumoniae. Each strain presented two or three different β-lactamase enzymes, including class A (SHV, CTX-M and KPC), class B and class C AmpC enzymes, although no class D β-lactamase was identified. Strains carrying the NDM enzyme involved three different ST types, suggesting that there was no common genetic origin.

RevDate: 2021-09-01
CmpDate: 2021-09-01

Woodhouse MR, Cannon EK, Portwood JL, et al (2021)

A pan-genomic approach to genome databases using maize as a model system.

BMC plant biology, 21(1):385.

Research in the past decade has demonstrated that a single reference genome is not representative of a species' diversity. MaizeGDB introduces a pan-genomic approach to hosting genomic data, leveraging the large number of diverse maize genomes and their associated datasets to quickly and efficiently connect genomes, gene models, expression, epigenome, sequence variation, structural variation, transposable elements, and diversity data across genomes so that researchers can easily track the structural and functional differences of a locus and its orthologs across maize. We believe our framework is unique and provides a template for any genomic database poised to host large-scale pan-genomic data.

RevDate: 2021-08-21

Hudec C, Biessy A, Novinscak A, et al (2021)

Comparative Genomics of Potato Common Scab-Causing Streptomyces spp. Displaying Varying Virulence.

Frontiers in microbiology, 12:716522.

Common scab of potato causes important economic losses worldwide following the development of necrotic lesions on tubers. In this study, the genomes of 14 prevalent scab-causing Streptomyces spp. isolated from Prince Edward Island, one of the most important Canadian potato production areas, were sequenced and annotated. Their phylogenomic affiliation was determined, their pan-genome was characterized, and pathogenic determinants involved in their virulence, ranging from weak to aggressive, were compared. 13 out of 14 strains clustered with Streptomyces scabiei, while the last strain clustered with Streptomyces acidiscabies. The toxicogenic and colonization genomic regions were compared, and while some atypical gene organizations were observed, no clear correlation with virulence was observed. The production of the phytotoxin thaxtomin A was also quantified and again, contrary to previous reports in the literature, no clear correlation was found between the amount of thaxtomin A secreted, and the virulence observed. Although no significant differences were observed when comparing the presence/absence of the main virulence factors among the strains of S. scabiei, a distinct profile was observed for S. acidiscabies. Several mutations predicted to affect the functionality of some virulence factors were identified, including one in the bldA gene that correlates with the absence of thaxtomin A production despite the presence of the corresponding biosynthetic gene cluster in S. scabiei LBUM 1485. These novel findings obtained using a large number of scab-causing Streptomyces strains are challenging some assumptions made so far on Streptomyces' virulence and suggest that other factors, yet to be characterized, are also key contributors.

RevDate: 2021-08-22

Vaid RK, Thakur Z, Anand T, et al (2021)

Comparative genome analysis of Salmonella enterica serovar Gallinarum biovars Pullorum and Gallinarum decodes strain specific genes.

PloS one, 16(8):e0255612.

Salmonella enterica serovar Gallinarum biovar Pullorum (bvP) and biovar Gallinarum (bvG) are the etiological agents of pullorum disease (PD) and fowl typhoid (FT) respectively, which cause huge economic losses to poultry industry especially in developing countries including India. Vaccination and biosecurity measures are currently being employed to control and reduce the S. Gallinarum infections. High endemicity, poor implementation of hygiene and lack of effective vaccines pose challenges in prevention and control of disease in intensively maintained poultry flocks. Comparative genome analysis unravels similarities and dissimilarities thus facilitating identification of genomic features that aids in pathogenesis, niche adaptation and in tracing of evolutionary history. The present investigation was carried out to assess the genotypic differences amongst S.enterica serovar Gallinarum strains including Indian strain S. Gallinarum Sal40 VTCCBAA614. The comparative genome analysis revealed an open pan-genome consisting of 5091 coding sequence (CDS) with 3270 CDS belonging to core-genome, 1254 CDS to dispensable genome and strain specific genes i.e. singletons ranging from 3 to 102 amongst the analyzed strains. Moreover, the investigated strains exhibited diversity in genomic features such as virulence factors, genomic islands, prophage regions, toxin-antitoxin cassettes, and acquired antimicrobial resistance genes. Core genome identified in the study can give important leads in the direction of design of rapid and reliable diagnostics, and vaccine design for effective infection control as well as eradication. Additionally, the identified genetic differences among the S. enterica serovar Gallinarum strains could be used for bacterial typing, structure based inhibitor development by future experimental investigations on the data generated.

RevDate: 2021-08-19

Simonsen AK (2021)

Environmental stress leads to genome streamlining in a widely distributed species of soil bacteria.

The ISME journal [Epub ahead of print].

Bacteria have highly flexible pangenomes, which are thought to facilitate evolutionary responses to environmental change, but the impacts of environmental stress on pangenome evolution remain unclear. Using a landscape pangenomics approach, I demonstrate that environmental stress leads to consistent, continuous reduction in genome content along four environmental stress gradients (acidity, aridity, heat, salinity) in naturally occurring populations of Bradyrhizobium diazoefficiens (widespread soil-dwelling plant mutualists). Using gene-level network and duplication functional traits to predict accessory gene distributions across environments, genes predicted to be superfluous are more likely lost in high stress, while genes with multi-functional roles are more likely retained. Genes with higher probabilities of being lost with stress contain significantly higher proportions of codons under strong purifying and positive selection. Gene loss is widespread across the entire genome, with high gene-retention hotspots in close spatial proximity to core genes, suggesting Bradyrhizobium has evolved to cluster essential-function genes (accessory genes with multifunctional roles and core genes) in discrete genomic regions, which may stabilise viability during genomic decay. In conclusion, pangenome evolution through genome streamlining are important evolutionary responses to environmental change. This raises questions about impacts of genome streamlining on the adaptive capacity of bacterial populations facing rapid environmental change.

RevDate: 2021-09-03
CmpDate: 2021-08-19

Belloso Daza MV, Cortimiglia C, Bassi D, et al (2021)

Genome-based studies indicate that the Enterococcus faecium Clade B strains belong to Enterococcus lactis species and lack of the hospital infection associated markers.

International journal of systematic and evolutionary microbiology, 71(8):.

Enterococcus lactis and the heterotypic synonym Enterococcus xinjiangensis from dairy origin have recently been identified as a novel species based on 16S rRNA gene sequence analysis. Enterococcus faecium type strain NCTC 7171T was used as the reference genome for determining E. lactis and E. faecium to be separate species. However, this taxonomic classification did not consider the diverse lineages of E. faecium, and the double nature of hospital-associated (clade A) and community-associated (clade B) isolates. Here, we investigated the taxonomic relationship among isolates of E. faecium of different origins and E. lactis, using a genome-based approach. Additional to 16S rRNA gene sequence analysis, we estimated the relatedness among strains and species using phylogenomics based on the core pangenome, multilocus sequence typing, the average nucleotide identity and digital DNA-DNA hybridization. Moreover, following the available safety assessment schemes, we evaluated the virulence profile and the ampicillin resistance of E. lactis and E. faecium clade B strains. Our results confirmed the genetic and evolutionary differences between clade A and the intertwined clade B and E. lactis group. We also confirmed the absence in these strains of virulence gene markers IS16, hylEfm and esp and the lack of the PBP5 allelic profile associated with ampicillin resistance. Taken together, our findings support the reassignment of the strains of E. faecium clade B as E. lactis.

RevDate: 2021-09-26

Matteoli FP, Pedrosa-Silva F, Dutra-Silva L, et al (2021)

The global population structure and beta-lactamase repertoire of the opportunistic pathogen Serratia marcescens.

Genomics, 113(6):3523-3532 pii:S0888-7543(21)00316-5 [Epub ahead of print].

Serratia marcescens is a global spread nosocomial pathogen. This rod-shaped bacterium displays a broad host range and worldwide geographical distribution. Here we analyze an international collection of this multidrug-resistant, opportunistic pathogen from 35 countries to infer its population structure. We show that S. marcescens comprises 12 lineages; Sm1, Sm4, and Sm10 harbor 78.3% of the known environmental strains. Sm5, Sm6, and Sm7 comprise only human-associated strains which harbor smallest pangenomes, genomic fluidity and lowest levels of core recombination, indicating niche specialization. Sm7 and Sm9 lineages exhibit the most concerning resistome; blaKPC-2 plasmid is widespread in Sm7, whereas Sm9, also an anthropogenic-exclusive lineage, presents highest plasmid/lineage size ratio and plasmid-diversity encoding metallo-beta-lactamases comprising blaNDM-1. The heterogeneity of resistance patterns of S. marcescens lineages elucidated herein highlights the relevance of surveillance programs, using whole-genome sequencing, to provide insights into the molecular epidemiology of carbapenemase producing strains of this species.

RevDate: 2021-09-10

Orsi WD, Magritsch T, Vargas S, et al (2021)

Genome Evolution in Bacteria Isolated from Million-Year-Old Subseafloor Sediment.

mBio, 12(4):e0115021.

Beneath the seafloor, microbial life subsists in isolation from the surface world under persistent energy limitation. The nature and extent of genomic evolution in subseafloor microbes have been unknown. Here, we show that the genomes of Thalassospira bacterial populations cultured from million-year-old subseafloor sediments evolve in clonal populations by point mutation, with a relatively low rate of homologous recombination and elevated numbers of pseudogenes. Ratios of nonsynonymous to synonymous substitutions correlate with the accumulation of pseudogenes, consistent with a role for genetic drift in the subseafloor strains but not in type strains of Thalassospira isolated from the surface world. Consistent with this, pangenome analysis reveals that the subseafloor bacterial genomes have a significantly lower number of singleton genes than the type strains, indicating a reduction in recent gene acquisitions. Numerous insertion-deletion events and pseudogenes were present in a flagellar operon of the subseafloor bacteria, indicating that motility is nonessential in these million-year-old subseafloor sediments. This genomic evolution in subseafloor clonal populations coincided with a phenotypic difference: all subseafloor isolates have a lower rate of growth under laboratory conditions than the Thalassospira xiamenensis type strain. Our findings demonstrate that the long-term physical isolation of Thalassospira, in the absence of recombination, has resulted in clonal populations whereby reduced access to novel genetic material from neighbors has resulted in the fixation of new mutations that accumulate in genomes over millions of years. IMPORTANCE The nature and extent of genomic evolution in subseafloor microbial populations subsisting for millions of years below the seafloor are unknown. Subseafloor populations have ultralow metabolic rates that are hypothesized to restrict reproduction and, consequently, the spread of new traits. Our findings demonstrate that genomes of cultivated bacterial strains from the genus Thalassospira isolated from million-year-old abyssal sediment exhibit greatly reduced levels of homologous recombination, elevated numbers of pseudogenes, and genome-wide evidence of relaxed purifying selection. These substitutions and pseudogenes are fixed into the population, suggesting that the genome evolution of these bacteria has been dominated by genetic drift. Thus, reduced recombination, stemming from long-term physical isolation, resulted in small clonal populations of Thalassospira that have accumulated mutations in their genomes over millions of years.

RevDate: 2021-09-21
CmpDate: 2021-09-21

Huang RR, Yang SR, Zhen C, et al (2021)

Genomic molecular signatures determined characterization of Mycolicibacterium gossypii sp. nov., a fast-growing mycobacterial species isolated from cotton field soil.

Antonie van Leeuwenhoek, 114(10):1735-1744.

A Gram-positive, acid-fast and rapidly growing rod, designated S2-37 T, that could form yellowish colonies was isolated from one soil sample collected from cotton cropping field located in the Xinjiang region of China. Genomic analyses indicated that strain S2-37 T harbored T7SS secretion system and was very likely able to produce mycolic acid, which were typical features of pathogenetic mycobacterial species. 16S rRNA-directed phylogenetic analysis referred that strain S2-37 T was closely related to bacterial species belonging to the genus Mycolicibacterium, which was further confirmed by pan-genome phylogenetic analysis. Digital DNA-DNA hybridization and the average nucleotide identity presented that strain S2-37 T displayed the highest values of 39.1% (35.7-42.6%) and 81.28% with M. litorale CGMCC 4.5724 T, respectively. And characterization of conserved molecular signatures further supported the taxonomic position of strain S2-37 T belonging to the genus Mycolicibacterium. The main fatty acids were identified as C16:0, C18:0, C20:3ω3 and C22:6ω3. In addition, polar lipids profile was mainly composed of diphosphatidylglycerol, phosphatidylethanolamine and phosphatidylinositol. Phylogenetic analyses, distinct fatty aids and antimicrobial resistance profiles indicated that strain S2-37 T represented genetically and phenotypically distinct from its closest phylogenetic neighbour, M. litorale CGMCC 4.5724 T. Here, we propose a novel species of the genus Mycolicibacterium: Mycolicibacterium gossypii sp. nov. with the type strain S2-37 T (= JCM 34327 T = CGMCC 1.18817 T).

RevDate: 2021-08-14

Saco A, Rey-Campos M, Rosani U, et al (2021)

The Evolution and Diversity of Interleukin-17 Highlight an Expansion in Marine Invertebrates and Its Conserved Role in Mucosal Immunity.

Frontiers in immunology, 12:692997.

The interleukin-17 (IL-17) family consists of proinflammatory cytokines conserved during evolution. A comparative genomics approach was applied to examine IL-17 throughout evolution from poriferans to higher vertebrates. Cnidaria was highlighted as the most ancient diverged phylum, and several evolutionary patterns were revealed. Large expansions of the IL-17 repertoire were observed in marine molluscs and echinoderm species. We further studied this expansion in filter-fed Mytilus galloprovincialis, which is a bivalve with a highly effective innate immune system supported by a variable pangenome. We recovered 379 unique IL-17 sequences and 96 receptors from individual genomes that were classified into 23 and 6 isoforms after phylogenetic analyses. Mussel IL-17 isoforms were conserved among individuals and shared between closely related Mytilidae species. Certain isoforms were specifically implicated in the response to a waterborne infection with Vibrio splendidus in mussel gills. The involvement of IL-17 in mucosal immune responses could be conserved in higher vertebrates from these ancestral lineages.

RevDate: 2021-09-01

Zhang X, Liu T, Wang J, et al (2021)

Pan-genome of Raphanus highlights genetic variation and introgression among domesticated, wild, and weedy radishes.

Molecular plant pii:S1674-2052(21)00318-X [Epub ahead of print].

Post-polyploid diploidization associated with descending dysploidy and interspecific introgression drives plant genome evolution by unclear mechanisms. Raphanus is an economically and ecologically important Brassiceae genus and model system for studying post-polyploidization genome evolution and introgression. Here, we report the de novo sequence assemblies for 11 genomes covering most of the typical sub-species and varieties of domesticated, wild and weedy radishes from East Asia, South Asia, Europe, and America. Divergence among the species, sub-species, and South/East Asian types coincided with Quaternary glaciations. A genus-level pan-genome was constructed with family-based, locus-based, and graph-based methods, and whole-genome comparisons revealed genetic variations ranging from single-nucleotide polymorphisms (SNPs) to inversions and translocations of whole ancestral karyotype (AK) blocks. Extensive gene flow occurred between wild, weedy, and domesticated radishes. High frequencies of genome reshuffling, biased retention, and large-fragment translocation have shaped the genomic diversity. Most variety-specific gene-rich blocks showed large structural variations. Extensive translocation and tandem duplication of dispensable genes were revealed in two large rearrangement-rich islands. Disease resistance genes mostly resided on specific and dispensable loci. Variations causing the loss of function of enzymes modulating gibberellin deactivation were identified and could play an important role in phenotype divergence and adaptive evolution. This study provides new insights into the genomic evolution underlying post-polyploid diploidization and lays the foundation for genetic improvement of radish crops, biological control of weeds, and protection of wild species' germplasms.

RevDate: 2021-09-10

Baker JL (2021)

Complete Genomes of Clade G6 Saccharibacteria Suggest a Divergent Ecological Niche and Lifestyle.

mSphere, 6(4):e0053021.

Saccharibacteria (formerly TM7) have reduced genomes and a small cell size and appear to have a parasitic lifestyle dependent on a bacterial host. Although there are at least 6 major clades of Saccharibacteria inhabiting the human oral cavity, complete genomes of oral Saccharibacteria were previously limited to the G1 clade. In this study, nanopore sequencing was used to obtain three complete genome sequences from clade G6. Phylogenetic analysis suggested the presence of at least 3 to 5 distinct species within G6, with two discrete taxa represented by the 3 complete genomes. G6 Saccharibacteria were highly divergent from the more-well-studied clade G1 and had the smallest genomes and lowest GC content of all Saccharibacteria. Pangenome analysis showed that although 97% of shared pan-Saccharibacteria core genes and 89% of G1-specific core genes had putative functions, only 50% of the 244 G6-specific core genes had putative functions, highlighting the novelty of this group. Compared to G1, G6 harbored divergent metabolic pathways. G6 genomes lacked an F1Fo ATPase, the pentose phosphate pathway, and several genes involved in nucleotide metabolism, which were all core genes for G1. G6 genomes were also unique compared to that of G1 in that they encoded d-lactate dehydrogenase, adenylate cyclase, limited glycerolipid metabolism, a homolog to a lipoarabinomannan biosynthesis enzyme, and the means to degrade starch. These differences at key metabolic steps suggest a distinct lifestyle and ecological niche for clade G6, possibly with alternative hosts and/or host dependencies, which would have significant ecological, evolutionary, and likely pathogenic implications. IMPORTANCE Saccharibacteria are ultrasmall parasitic bacteria that are common members of the oral microbiota and have been increasingly linked to disease and inflammation. However, the lifestyle and impact on human health of Saccharibacteria remain poorly understood, especially for the clades with no complete genomes (G2 to G6) or cultured isolates (G2 and G4 to G6). Obtaining complete genomes is of particular importance for Saccharibacteria, because they lack many of the "essential" core genes used for determining draft genome completeness, and few references exist outside clade G1. In this study, complete genomes of 3 G6 strains, representing two candidate species, were obtained and analyzed. The G6 genomes were highly divergent from that of G1 and enigmatic, with 50% of the G6 core genes having no putative functions. The significant difference in encoded functional pathways is suggestive of a distinct lifestyle and ecological niche, probably with alternative hosts and/or host dependencies, which would have major implications in ecology, evolution, and pathogenesis.

RevDate: 2021-09-10

Gómez-Sanz E, Haro-Moreno JM, Jensen SO, et al (2021)

The Resistome and Mobilome of Multidrug-Resistant Staphylococcus sciuri C2865 Unveil a Transferable Trimethoprim Resistance Gene, Designated dfrE, Spread Unnoticed.

mSystems, 6(4):e0051121.

Methicillin-resistant Staphylococcus sciuri (MRSS) strain C2865 from a stranded dog in Nigeria was trimethoprim (TMP) resistant but lacked formerly described staphylococcal TMP-resistant dihydrofolate reductase genes (dfr). Whole-genome sequencing, comparative genomics, and pan-genome analyses were pursued to unveil the molecular bases for TMP resistance via resistome and mobilome profiling. MRSS C2865 comprised a species subcluster and positioned just above the intraspecies boundary. Lack of species host tropism was observed. S. sciuri exhibited an open pan-genome, while MRSS C2865 harbored the highest number of unique genes (75% associated with mobilome). Within this fraction, we discovered a transferable TMP resistance gene, named dfrE, which confers high-level TMP resistance in Staphylococcus aureus and Escherichia coli. dfrE was located in a novel multidrug resistance mosaic plasmid (pUR2865-34) encompassing adaptive, mobilization, and segregational stability traits. dfrE was formerly denoted as dfr_like in Exiguobacterium spp. from fish farm sediment in China but escaped identification in one macrococcal and diverse staphylococcal genomes in different Asian countries. dfrE shares the highest identity with dfr of soil-related Paenibacillus anaericanus (68%). Data analysis discloses that dfrE has emerged from a single ancestor and places S. sciuri as a plausible donor. C2865 unique fraction additionally enclosed novel chromosomal mobile islands, including a multidrug-resistant pseudo-SCCmec cassette, three apparently functional prophages (Siphoviridae), and an SaPI4-related staphylococcal pathogenicity island. Since dfrE seems not yet common in staphylococcal clinical specimens, our data promote early surveillance and enable molecular diagnosis. We evidence the genome plasticity of S. sciuri and highlight its role as a resourceful reservoir for adaptive traits. IMPORTANCE The discovery and surveillance of antimicrobial resistance genes (AMRG) and their mobilization platforms are critical to understand the evolution of bacterial resistance and to restrain further expansion. Limited genomic data are available on Staphylococcus sciuri; regardless, it is considered a reservoir for critical AMRG and mobile elements. We uncover a transferable staphylococcal TMP resistance gene, named dfrE, in a novel mosaic plasmid harboring additional resistance, adaptive, and self-stabilization features. dfrE is present but evaded detection in diverse species from varied sources geographically distant. Our analyses evidence that the dfrE-carrying element has emerged from a single ancestor and position S. sciuri as the donor species for dfrE spread. We also identify novel mobilizable chromosomal islands encompassing AMRG and three unrelated prophages. We prove high intraspecies heterogenicity and genome plasticity for S. sciuri. This work highlights the importance of genome-wide ecological studies to facilitate identification, characterization, and evolution routes of bacteria adaptive features.

RevDate: 2021-09-13
CmpDate: 2021-09-13

Hily JM, Poulicard N, Kubina J, et al (2021)

Metagenomic analysis of nepoviruses: diversity, evolution and identification of a genome region in members of subgroup A that appears to be important for host range.

Archives of virology, 166(10):2789-2801.

Data mining and metagenomic analysis of 277 open reading frame sequences of bipartite RNA viruses of the genus Nepovirus, family Secoviridae, were performed, documenting how challenging it can be to unequivocally assign a virus to a particular species, especially those in subgroups A and C, based on some of the currently adopted taxonomic demarcation criteria. This work suggests a possible need for their amendment to accommodate pangenome information. In addition, we revealed a host-dependent structure of arabis mosaic virus (ArMV) populations at a cladistic level and confirmed a phylogeographic structure of grapevine fanleaf virus (GFLV) populations. We also identified new putative recombination events in members of subgroups A, B and C. The evolutionary specificity of some capsid regions of ArMV and GFLV that were described previously and biologically validated as determinants of nematode transmission was circumscribed in silico. Furthermore, a C-terminal segment of the RNA-dependent RNA polymerase of members of subgroup A was predicted to be a putative host range determinant based on statistically supported higher π (substitutions per site) values for GFLV and ArMV isolates infecting Vitis spp. compared with non-Vitis-infecting ArMV isolates. This study illustrates how sequence information obtained via high-throughput sequencing can increase our understanding of mechanisms that modulate virus diversity and evolution and create new opportunities for advancing studies on the biology of economically important plant viruses.

RevDate: 2021-08-09

Iqbal S, Vollmers J, HA Janjua (2021)

Genome Mining and Comparative Genome Analysis Revealed Niche-Specific Genome Expansion in Antibacterial Bacillus pumilus Strain SF-4.

Genes, 12(7):.

The present study reports the isolation of antibacterial exhibiting Bacillus pumilus (B. pumilus) SF-4 from soil field. The genome of this strain SF-4 was sequenced and analyzed to acquire in-depth genomic level insight related to functional diversity, evolutionary history, and biosynthetic potential. The genome of the strain SF-4 harbor 12 Biosynthetic Gene Clusters (BGCs) including four Non-ribosomal peptide synthetases (NRPSs), two terpenes, and one each of Type III polyketide synthases (PKSs), hybrid (NRPS/PKS), lipopeptide, β-lactone, and bacteriocin clusters. Plant growth-promoting genes associated with de-nitrification, iron acquisition, phosphate solubilization, and nitrogen metabolism were also observed in the genome. Furthermore, all the available complete genomes of B. pumilus strains were used to highlight species boundaries and diverse niche adaptation strategies. Phylogenetic analyses revealed local diversification and indicate that strain SF-4 is a sister group to SAFR-032 and 150a. Pan-genome analyses of 12 targeted strains showed regions of genome plasticity which regulate function of these strains and proposed direct strain adaptations to specific habitats. The unique genome pool carries genes mostly associated with "biosynthesis of secondary metabolites, transport, and catabolism" (Q), "replication, recombination and repair" (L), and "unknown function" (S) clusters of orthologous groups (COG) categories. Moreover, a total of 952 unique genes and 168 exclusively absent genes were prioritized across the 12 genomes. While newly sequenced B. pumilus SF-4 genome consists of 520 accessory, 59 unique, and seven exclusively absent genes. The current study demonstrates genomic differences among 12 B. pumilus strains and offers comprehensive knowledge of the respective genome architecture which may assist in the agronomic application of this strain in future.

RevDate: 2021-08-27

Surachat K, Deachamag P, Kantachote D, et al (2021)

In silico comparative genomics analysis of Lactiplantibacillus plantarum DW12, a potential gamma-aminobutyric acid (GABA)-producing strain.

Microbiological research, 251:126833.

Gamma-aminobutyric acid (GABA) is an amino that plays a major role as a neurotransmitter. It iscommonly produced by lactic acid bacteria (LAB) naturally found in fermented food and fruit. Lactiplantibacillus plantarum DW12 is a high potential GABA-producing strain isolated from a fermented beverage. In this study, to highlight its ability to produce GABA, we sequenced the genome of L. plantarum DW12 and then performed comprehensive bioinformatics and meta-analysis to compare the genomic data of previously published genomes. Also, the evolutionary analysis among L. plantarum species was demonstrated using pan-genome analysis against 576 genomes from the database. As a result, the DW12 genome comprises one circular chromosome of 3,217,574 bp. It contains several genes that encode for the production of antimicrobial compounds including plantaricin A, E, F, J, K, and N. The glutamic acid decarboxylase (GAD) operon was found in the DW12 genome, suggests a high potential of producing GABA in this strain. Therefore, L. plantarum DW12 could be a good candidate as a starter culture in the beverage and food industries due to its safety aspects and ability to produce GABA.

RevDate: 2021-09-17

Hufnagel B, Soriano A, Taylor J, et al (2021)

Pangenome of white lupin provides insights into the diversity of the species.

Plant biotechnology journal [Epub ahead of print].

White lupin is an old crop with renewed interest due to its seed high protein content and high nutritional value. Despite a long domestication history in the Mediterranean basin, modern breeding efforts have been fairly scarce. Recent sequencing of its genome has provided tools for further description of genetic resources but detailed characterization of genomic diversity is still missing. Here, we report the genome sequencing of 39 accessions that were used to establish a white lupin pangenome. We defined 32 068 core genes that are present in all individuals and 14 822 that are absent in some and may represent a gene pool for breeding for improved productivity, grain quality, and stress adaptation. We used this new pangenome resource to identify candidate genes for alkaloid synthesis, a key grain quality trait. The white lupin pangenome provides a novel genetic resource to better understand how domestication has shaped the genomic variability within this crop. Thus, this pangenome resource is an important step towards the effective and efficient genetic improvement of white lupin to help meet the rapidly growing demand for plant protein sources for human and animal consumption.

RevDate: 2021-08-06

Maarala AI, Arasalo O, Valenzuela D, et al (2021)

Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment.

PloS one, 16(8):e0255260.

Computational pan-genomics utilizes information from multiple individual genomes in large-scale comparative analysis. Genetic variation between case-controls, ethnic groups, or species can be discovered thoroughly using pan-genomes of such subpopulations. Whole-genome sequencing (WGS) data volumes are growing rapidly, making genomic data compression and indexing methods very important. Despite current space-efficient repetitive sequence compression and indexing methods, the deployed compression methods are often sequential, computationally time-consuming, and do not provide efficient sequence alignment performance on vast collections of genomes such as pan-genomes. For performing rapid analytics with the ever-growing genomics data, data compression and indexing methods have to exploit distributed and parallel computing more efficiently. Instead of strict genome data compression methods, we will focus on the efficient construction of a compressed index for pan-genomes. Compressed hybrid-index enables fast sequence alignments to several genomes at once while shrinking the index size significantly compared to traditional indexes. We propose a scalable distributed compressed hybrid-indexing method for large genomic data sets enabling pan-genome-based sequence search and read alignment capabilities. We show the scalability of our tool, DHPGIndex, by executing experiments in a distributed Apache Spark-based computing cluster comprising 448 cores distributed over 26 nodes. The experiments have been performed both with human and bacterial genomes. DHPGIndex built a BLAST index for n = 250 human pan-genome with an 870:1 compression ratio (CR) in 342 minutes and a Bowtie2 index with 157:1 CR in 397 minutes. For n = 1,000 human pan-genome, the BLAST index was built in 1520 minutes with 532:1 CR and the Bowtie2 index in 1938 minutes with 76:1 CR. Bowtie2 aligned 14.6 GB of paired-end reads to the compressed (n = 1,000) index in 31.7 minutes on a single node. Compressing n = 13,375,031 (488 GB) GenBank database to BLAST index resulted in CR of 62:1 in 575 minutes. BLASTing 189,864 Crispr-Cas9 gRNA target sequences (23 MB in total) to the compressed index of human pan-genome (n = 1,000) finished in 45 minutes on a single node. 30 MB mixed bacterial sequences were (n = 599) were blasted to the compressed index of 488 GB GenBank database (n = 13,375,031) in 26 minutes on 25 nodes. 78 MB mixed sequences (n = 4,167) were blasted to the compressed index of 18 GB E. coli sequence database (n = 745,409) in 5.4 minutes on a single node.

RevDate: 2021-08-03

Awan F, Ali MM, Dong Y, et al (2021)

In Silico Analysis of Potential Outer Membrane Beta-Barrel Proteins in Aeromonas hydrophila Pangenome.

International journal of peptide research and therapeutics [Epub ahead of print].

Outer membrane proteins (OMPs) of Aeromonas hydrophila have a variety of functional roles in virulence and pathogenesis and represent promising targets for vaccine development. The main objective of this study was to develop an in-silico model of beta-barrel OMP present among the valid A. hydrophila pangenomes (n = 22). With a program named the β-barrel Outer Membrane Protein Predictor (BOMP), total beta-barrel OMPs (n = 3127) were predicted across 22 genomes with the estimated median number of 64 per genome. In pangenome analysis, only 32 OMPs were found to be conserved. These beta-barrel OMPs also showed variations among source of isolation, COG and KEGG classes. Among 32 conserved OMPs, a highly antigenic protein was identified by utilizing Vaxijen. With B cell epitope predictions, two fragments of amino acid sequences i.e. GLTLGAQFTGNNDPQNADRSN (21 mer) and FKPSLAYLRTDVKDNARGI DDTATEY (26 mer) bearing B-cell binding sites were selected. Further, an epitope (12 amino acids: GLTLGAQFTGNN) that complexes to maximum MHC alleles with a higher antigenicity was determined. The analysis of evolutionary forces on the identified OMP sequence and epitope indicated that none of basic amino acid sites has shown significantly different substitution ratios. This conserved protein and epitope will be helpful in developing a vaccine that may be effective against all the A. hydrophila strains. Also, this study provides a theoretical basis for vaccine design against other pathogenic bacteria.

Supplementary Information: The online version contains supplementary material available at 10.1007/s10989-021-10259-z.

RevDate: 2021-07-30

Wang K, Hu H, Tian Y, et al (2021)

The chicken pan-genome reveals gene content variation and a promoter region deletion in IGF2BP1 affecting body size.

Molecular biology and evolution pii:6332014 [Epub ahead of print].

Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 individuals, which identified an additional ∼66.5 Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression level are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based GWAS identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra samples. This is the first time to report the causal variant of chicken body size QTL located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome diversity and provides materials to unveil the evolution history of chicken domestication.

RevDate: 2021-08-19

Hu H, Scheben A, Verpaalen B, et al (2021)

Amborella gene presence/absence variation is associated with abiotic stress responses that may contribute to environmental adaptation.

RevDate: 2021-08-01

Davidson RM, Benoit JB, Kammlade SM, et al (2021)

Genomic characterization of sporadic isolates of the dominant clone of Mycobacterium abscessus subspecies massiliense.

Scientific reports, 11(1):15336.

Recent studies have characterized a dominant clone (Clone 1) of Mycobacterium abscessus subspecies massiliense (M. massiliense) associated with high prevalence in cystic fibrosis (CF) patients, pulmonary outbreaks in the United States (US) and United Kingdom (UK), and a Brazilian epidemic of skin infections. The prevalence of Clone 1 in non-CF patients in the US and the relationship of sporadic US isolates to outbreak clones are not known. We surveyed a reference US Mycobacteria Laboratory and a US biorepository of CF-associated Mycobacteria isolates for Clone 1. We then compared genomic variation and antimicrobial resistance (AMR) mutations between sporadic non-CF, CF, and outbreak Clone 1 isolates. Among reference lab samples, 57/147 (39%) of patients with M. massiliense had Clone 1, including pulmonary and extrapulmonary infections, compared to 11/64 (17%) in the CF isolate biorepository. Core and pan genome analyses revealed that outbreak isolates had similar numbers of single nucleotide polymorphisms (SNPs) and accessory genes as sporadic US Clone 1 isolates. However, pulmonary outbreak isolates were more likely to have AMR mutations compared to sporadic isolates. Clone 1 isolates are present among non-CF and CF patients across the US, but additional studies will be needed to resolve potential routes of transmission and spread.

RevDate: 2021-08-25

Bayer PE, Scheben A, Golicz AA, et al (2021)

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids.

Plant biotechnology journal [Epub ahead of print].

Plant genomes demonstrate significant presence/absence variation (PAV) within a species; however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.

RevDate: 2021-09-15

Hernández-Juárez LE, Camorlinga M, Méndez-Tenorio A, et al (2021)

Analyses of publicly available Hungatella hathewayi genomes revealed genetic distances indicating they belong to more than one species.

Virulence, 12(1):1950-1964.

Hungatella hathewayi has been observed to be a member of the gut microbiome. Unfortunately, little is known about this organism in spite of being associated with human fatalities; it is important to understand virulence mechanisms and epidemiological prospective to cause disease. In this study, a patient with chronic neurologic symptoms presented to the clinic with subsequent isolation of a strain with phenotypic characteristics suggestive of Clostridium difficile. However, whole-genome sequence found the organism to be H. hathewayi. Analysis including publicly available Hungatella genomes found substantial genomic differences as compared to the type strain, indicating this isolate was not C. difficile. We examined the whole-genome of Hungatella species and related genera, using comparative genomics to fully examine species identification and toxin production. Orthogonal phylogenetic using the 16S rRNA gene and entire genome analyses that included genome distance analyses using Genome-to-Genome Distance (GGDC), Average Nucleotide Identity (ANI), and a pan-genome analysis with inclusion of available public genomes determined the speciation to be Hungatella. Two clearly differentiated groups were identified, one including a reference H. hathewayi genome (strain DSM-13,479) and a second group that was determined to be H. effluvii, which included our clinical isolate. Also, some genomes reported as H. hathewayi were found to belong to other genera, including Clostridium and Faecalicatena. We show that the Hungatella species have an open pan-genome reflecting high genomic diversity. This study highlights the importance of correctly assigning taxonomic identification, particularly in disease-associated strains, to better understand virulence and therapeutic options.

RevDate: 2021-07-23

Liu Z, Zhao Y, Sossah FL, et al (2021)

Characterization, Pathogenicity, Phylogeny, and Comparative Genomic Analysis of Pseudomonas tolaasii Strains Isolated from Various Mushrooms in China.

Phytopathology [Epub ahead of print].

Since 2016, devastating bacterial blotch affecting the fruiting bodies of Agaricus bisporus, Cordyceps militaris, Flammulina filiformis, and Pleurotus ostreatus in China has caused severe economic losses. We isolated 102 bacterial strains and characterized them polyphasically. We identified the causal agent as Pseudomonas tolaasii and confirmed the pathogenicity of the strains. A host range test further confirmed the pathogen's ability to infect multiple hosts. This is the first report in China of bacterial blotch in C. militaris caused by P. tolaasii. Whole-genome sequences were generated for three strains: Pt11 (6.48 Mb), Pt51 (6.63 Mb), and Pt53 (6.80 Mb), and pangenome analysis was performed with 13 other publicly accessible P. tolaasii genomes to determine their genetic diversity, virulence, antibiotic resistance, and mobile genetic elements. The pangenome of P. tolaasii is open, and many more gene families are likely to emerge with further genome sequencing. Multilocus sequence analysis using the sequences of four common housekeeping genes (glns, gyrB, rpoB, and rpoD) showed high genetic variability among the P. tolaasii strains, with 115 strains clustered into a monophyletic group. The P. tolaasii strains possess various genes for secretion systems, virulence factors, carbohydrate-active enzymes, toxins, secondary metabolites, and antimicrobial resistance genes that are associated with pathogenesis and adapted to different environments. The myriad of insertion sequences, integrons, prophages, and genome islands encoded in the strains may contribute to genome plasticity, virulence, and antibiotic resistance. These findings advance understanding of the determinants of virulence, which can be targeted for the effective control of bacterial blotch disease.

RevDate: 2021-07-21

Bayer PE, Petereit J, Danilevicz MF, et al (2021)

The application of pangenomics and machine learning in genomic selection in plants.

The plant genome [Epub ahead of print].

Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe examples of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.


ESP Quick Facts

ESP Origins

In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.

ESP Support

In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.

ESP Rationale

Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.

ESP Goal

In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.

ESP Usage

Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.

ESP Content

When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.

ESP Help

Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.

ESP Plans

With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.

Electronic Scholarly Publishing
961 Red Tail Lane
Bellingham, WA 98226

E-mail: RJR8222 @

Papers in Classical Genetics

The ESP began as an effort to share a handful of key papers from the early days of classical genetics. Now the collection has grown to include hundreds of papers, in full-text format.

Digital Books

Along with papers on classical genetics, ESP offers a collection of full-text digital books, including many works by Darwin (and even a collection of poetry — Chicago Poems by Carl Sandburg).


ESP now offers a much improved and expanded collection of timelines, designed to give the user choice over subject matter and dates.


Biographical information about many key scientists.

Selected Bibliographies

Bibliographies on several topics of potential interest to the ESP community are now being automatically maintained and generated on the ESP site.

ESP Picks from Around the Web (updated 07 JUL 2018 )