Viewport Size Code:
Login | Create New Account


About | Classical Genetics | Timelines | What's New | What's Hot

About | Classical Genetics | Timelines | What's New | What's Hot


Bibliography Options Menu

Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

The Electronic Scholarly Publishing Project: Providing world-wide, free access to classic scientific papers and other scholarly materials, since 1993.


ESP: PubMed Auto Bibliography 27 Nov 2020 at 01:32 Created: 


Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: pangenome or "pan-genome" or "pan genome" NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2020-11-26

Hudson LK, Constantine-Renna L, Thomas L, et al (2020)

Genomic characterization and phylogenetic analysis of Salmonella enterica serovar Javiana.

PeerJ, 8:e10256 pii:10256.

Salmonella enterica serovar Javiana is the fourth most reported serovar of laboratory-confirmed human Salmonella infections in the U.S. and in Tennessee (TN). Although Salmonella ser. Javiana is a common cause of human infection, the majority of cases are sporadic in nature rather than outbreak-associated. To better understand Salmonella ser. Javiana microbial population structure in TN, we completed a phylogenetic analysis of 111 Salmonella ser. Javiana clinical isolates from TN collected from Jan. 2017 to Oct. 2018. We identified mobile genetic elements and genes known to confer antibiotic resistance present in the isolates, and performed a pan-genome-wide association study (pan-GWAS) to compare gene content between clades identified in this study. The population structure of TN Salmonella ser. Javiana clinical isolates consisted of three genetic clades: TN clade I (n = 54), TN clade II (n = 4), and TN clade III (n = 48). Using a 5, 10, and 25 hqSNP distance threshold for cluster identification, nine, 12, and 10 potential epidemiologically-relevant clusters were identified, respectively. The majority of genes that were found to be over-represented in specific clades were located in mobile genetic element (MGE) regions, including genes encoding integrases and phage structures (91.5%). Additionally, a large portion of the over-represented genes from TN clade II (44.9%) were located on an 87.5 kb plasmid containing genes encoding a toxin/antitoxin system (ccdAB). Additionally, we completed phylogenetic analyses of global Salmonella ser. Javiana datasets to gain a broader insight into the population structure of this serovar. We found that the global phylogeny consisted of three major clades (one of which all of the TN isolates belonged to) and two cgMLST eBurstGroups (ceBGs) and that the branch length between the two Salmonella ser. Javiana ceBGs (1,423 allelic differences) was comparable to those from other serovars that have been reported as polyphyletic (929-2,850 allelic differences). This study demonstrates the population structure of TN and global Salmonella ser. Javiana isolates, a clinically important Salmonella serovar and can provide guidance for phylogenetic cluster analyses for public health surveillance and response.

RevDate: 2020-11-26

Su F, Tian R, Yang Y, et al (2020)

Comparative Genome Analysis Reveals the Molecular Basis of Niche Adaptation of Staphylococcus epidermidis Strains.

Frontiers in genetics, 11:566080.

Staphylococcus epidermidis is one of the most commonly isolated species from human skin and the second leading cause of bloodstream infections. Here, we performed a large-scale comparative study without any pre-assigned reference to identify genomic determinants associated with the diversity and adaptation of S. epidermidis strains to various environments. Pan-genome of S. epidermidis was open with 435 core proteins and had a pan-genome size of 8,034 proteins. Genome-wide phylogenetic tree showed high heterogeneity and suggested that routine whole genome sequencing was a powerful tool for analyzing the complex evolution of S. epidermidis and for investigating the infection sources. Comparative genome analyses demonstrated a range of antimicrobial resistance (AMR) genes, especially those within mobile genetic elements. The complicated host-bacterium and bacterium-bacterium relationships help S. epidermidis to play a vital role in balancing the epithelial microflora. The highly variable and dynamic nature of the S. epidermidis genome may contribute to its success in adapting to broad habitats. Genes related to biofilm formation and cell toxicity were significantly enriched in the blood and skin, demonstrating their potentials in identifying risk genotypes. This study gave a general landscape of S. epidermidis pan-genome and provided valuable insights into mechanisms for genome evolution and lifestyle adaptation of this ecologically flexible species.

RevDate: 2020-11-26

Jayakodi M, Padmarasu S, Haberer G, et al (2020)

The barley pan-genome reveals the hidden legacy of mutation breeding.

Nature pii:10.1038/s41586-020-2947-8 [Epub ahead of print].

Genetic diversity is key to crop improvement. Owing to pervasive genomic structural variation, a single reference genome assembly cannot capture the full complement of sequence diversity of a crop species (known as the 'pan-genome'1). Multiple high-quality sequence assemblies are an indispensable component of a pan-genome infrastructure. Barley (Hordeum vulgare L.) is an important cereal crop with a long history of cultivation that is adapted to a wide range of agro-climatic conditions2. Here we report the construction of chromosome-scale sequence assemblies for the genotypes of 20 varieties of barley-comprising landraces, cultivars and a wild barley-that were selected as representatives of global barley diversity. We catalogued genomic presence/absence variants and explored the use of structural variants for quantitative genetic analysis through whole-genome shotgun sequencing of 300 gene bank accessions. We discovered abundant large inversion polymorphisms and analysed in detail two inversions that are frequently found in current elite barley germplasm; one is probably the product of mutation breeding and the other is tightly linked to a locus that is involved in the expansion of geographical range. This first-generation barley pan-genome makes previously hidden genetic variation accessible to genetic studies and breeding.

RevDate: 2020-11-25

Akob DM, Hallenbeck M, Beulig F, et al (2020)

Mixotrophic Iron-Oxidizing Thiomonas Isolates from an Acid Mine Drainage-Affected Creek.

Applied and environmental microbiology, 86(24):.

Natural attenuation of heavy metals occurs via coupled microbial iron cycling and metal precipitation in creeks impacted by acid mine drainage (AMD). Here, we describe the isolation, characterization, and genomic sequencing of two iron-oxidizing bacteria (FeOB) species: Thiomonas ferrovorans FB-6 and Thiomonas metallidurans FB-Cd, isolated from slightly acidic (pH 6.3), Fe-rich, AMD-impacted creek sediments. These strains precipitated amorphous iron oxides, lepidocrocite, goethite, and magnetite or maghemite and grew at a pH optimum of 5.5. While Thiomonas spp. are known as mixotrophic sulfur oxidizers and As oxidizers, the FB strains oxidized Fe, which suggests they can efficiently remove Fe and other metals via coprecipitation. Previous evidence for Thiomonas sp. Fe oxidation is largely ambiguous, possibly because of difficulty demonstrating Fe oxidation in heterotrophic/mixotrophic organisms. Therefore, we also conducted a genomic analysis to identify genetic mechanisms of Fe oxidation, other metal transformations, and additional adaptations, comparing the two FB strain genomes with 12 other Thiomonas genomes. The FB strains fall within a relatively novel group of Thiomonas strains that includes another strain (b6) with solid evidence of Fe oxidation. Most Thiomonas isolates, including the FB strains, have the putative iron oxidation gene cyc2, but only the two FB strains possess the putative Fe oxidase genes mtoAB The two FB strain genomes contain the highest numbers of strain-specific gene clusters, greatly increasing the known Thiomonas genetic potential. Our results revealed that the FB strains are two distinct novel species of Thiomonas with the genetic potential for bioremediation of AMD via iron oxidation.IMPORTANCE As AMD moves through the environment, it impacts aquatic ecosystems, but at the same time, these ecosystems can naturally attenuate contaminated waters via acid neutralization and catalyzing metal precipitation. This is the case in the former Ronneburg uranium-mining district, where AMD impacts creek sediments. We isolated and characterized two iron-oxidizing Thiomonas species that are mildly acidophilic to neutrophilic and that have two genetic pathways for iron oxidation. These Thiomonas species are well positioned to naturally attenuate AMD as it discharges across the landscape.

RevDate: 2020-11-21

Khan S, Vancuren SJ, JE Hill (2020)

A Generalist Lifestyle Allows Rare Gardnerella spp. to Persist at Low Levels in the Vaginal Microbiome.

Microbial ecology pii:10.1007/s00248-020-01643-1 [Epub ahead of print].

Gardnerella spp. are considered a hallmark of bacterial vaginosis, a dysbiosis of the vaginal microbiome. There are four cpn60 sequence-based subgroups within the genus (A, B, C and D), and thirteen genome species have been defined recently. Gardnerella spp. co-occur in the vaginal microbiome with varying abundance, and these patterns are shaped by a resource-dependent, exploitative competition, which affects the growth rate of subgroups A, B and C negatively. The growth rate of rarely abundant subgroup D, however, increases with the increasing number of competitors, negatively affecting the growth rate of others. We hypothesized that a nutritional generalist lifestyle and minimal niche overlap with the other more abundant Gardnerella spp. facilitate the maintenance of subgroup D in the vaginal microbiome through negative frequency-dependent selection. Using 40 whole-genome sequences from isolates representing all four subgroups, we found that they could be distinguished based on the content of their predicted proteomes. Proteins associated with carbohydrate and amino acid uptake and metabolism were significant contributors to the separation of subgroups. Subgroup D isolates had significantly more of their proteins assigned to amino acid metabolism than the other subgroups. Subgroup D isolates were also significantly different from others in terms of number and type of carbon sources utilized in a phenotypic assay, while the other three could not be distinguished. Overall, the results suggest that a generalist lifestyle and lack of niche overlap with other Gardnerella spp. leads to subgroup D being favoured by negative frequency-dependent selection in the vaginal microbiome.

RevDate: 2020-11-20

Tahir Ul Qamar M, Zhu X, Khan MS, et al (2020)

Pan-genome: A promising resource for noncoding RNA discovery in plants.

The plant genome, 13(3):e20046.

Plant genomes contain both protein-coding and noncoding sequences including transposable elements (TEs) and noncoding RNAs (ncRNAs). The ncRNAs are recognized as important elements that play fundamental roles in the structural organization and function of plant genomes. Despite various hypotheses, TEs are believed to be a major precursor of ncRNAs. Transposable elements are also prime factors that cause genomic variation among members of a species. Hence, TEs pose a major challenge in the discovery and analysis of ncRNAs. With the increase in the number of sequenced plant genomes, it is now accepted that a single reference genome is insufficient to represent the complete genomic diversity and contents of a species, and exploring the pan-genome of a species is critical. In this review, we summarize the recent progress in the field of plant pan-genomes. We also discuss TEs and their roles in ncRNA biogenesis and present our perspectives on the application of pan-genomes for the discovery of ncRNAs to fully explore and exploit their biological roles in plants.

RevDate: 2020-11-20

Dar HA, Zaheer T, Ullah N, et al (2020)

Pangenome Analysis of Mycobacterium tuberculosis Reveals Core-Drug Targets and Screening of Promising Lead Compounds for Drug Discovery.

Antibiotics (Basel, Switzerland), 9(11): pii:antibiotics9110819.

Tuberculosis, caused by Mycobacterium tuberculosis (M. tuberculosis), is one of the leading causes of human deaths globally according to the WHO TB 2019 report. The continuous rise in multi- and extensive-drug resistance in M. tuberculosis broadens the challenges to control tuberculosis. The availability of a large number of completely sequenced genomes of M. tuberculosis has provided an opportunity to explore the pangenome of the species along with the pan-phylogeny and to identify potential novel drug targets leading to drug discovery. We attempt to calculate the pangenome of M. tuberculosis that comprises a total of 150 complete genomes and performed the phylo-genomic classification and analysis. Further, the conserved core genome (1251 proteins) is subjected to various sequential filters (non-human homology, essentiality, virulence, physicochemical parameters, and pathway analysis) resulted in identification of eight putative broad-spectrum drug targets. Upon molecular docking analyses of these targets with ligands available at the DrugBank database shortlisted a total of five promising ligands with projected inhibitory potential; namely, 2'deoxy-thymidine-5'-diphospho-alpha-d-glucose, uridine diphosphate glucose, 2'-deoxy-thymidine-beta-l-rhamnose, thymidine-5'-triphosphate, and citicoline. We are confident that with further lead optimization and experimental validation, these lead compounds may provide a sound basis to develop safe and effective drugs against tuberculosis disease in humans.

RevDate: 2020-11-19

Bandoy DD (2019)

Large scale enterohemorrhagic E coli population genomic analysis using whole genome typing reveals recombination clusters and potential drug target.

F1000Research, 8:33.

Enterohemorrhagic Escherichia coli continues to be a significant public health risk. With the onset of next generation sequencing, whole genome sequences require a new paradigm of analysis relevant for epidemiology and drug discovery. A large-scale bacterial population genomic analysis was applied to 702 isolates of serotypes associated with EHEC resulting in five pangenome clusters. Serotype incongruence with pangenome types suggests recombination clusters. Core genome analysis was performed to determine the population wide distribution of sdiA as potential drug target. Protein modelling revealed nonsynonymous variants are notably absent in the ligand binding site for quorum sensing, indicating that population wide conservation of the sdiA ligand site can be targeted for potential prophylactic purposes. Applying pathotype-wide pangenomics as a guide for determining evolution of pharmacophore sites is a potential approach in drug discovery.

RevDate: 2020-11-19

Korzhenkov AA, Toshchakov SV, Podosokorskaya OA, et al (2020)

Data on draft genome sequence of Caldanaerobacter sp. strain 1523vc, a thermophilic bacterium, isolated from a hot spring of Uzon Caldera, (Kamchatka, Russia).

Data in brief, 33:106336.

The draft genome sequence of Caldanaerobacter sp. strain 1523vc, a thermophilic bacterium, isolated from a hot spring of Uzon Caldera, (Kamchatka, Russia) is presented. The complete genome assembly was of 2 713 207 bp with predicted completeness of 99.38%. Genome structural annotation revealed 2674 protein-coding genes, 127 pseudogenes and 77 RNA genes. Pangenome analysis of 7 currently available high quality Caldanaerobacter spp. genomes including 1523vc revealed 4673 gene clusters. Of them, 1130 clusters formed a core genome of genus Caldanaerobacter. Of the rest 3543 Caldanaerobacter pangenome genes, 385 were exclusively represented in 1523vc genome. 101 of 2801 Caldanaerobacter CDS were found to be encoding carbohydrate-active enzymes (CAZymes). The majority of CAZymes were predicted to be involved in degradation of beta-linked polysaccharides as chitin, cellulose and hemicelluloses, reflecting the metabolism of strain 1523vc, isolated on cellulose. 5 of 101 CAZyme genes were found to be unique for the strain 1523vc and belonged to GH23, GT56, GH15 and two CE9 family proteins. The draft genome of strain 1523vc was deposited at DBJ/EMBL/GenBank under the accessions JABEQB000000000, PRJNA629090 and SAMN14766777 for Genome, Bioproject and Biosample, respectively.

RevDate: 2020-11-18

Kim J, Sung J, Han K, et al (2020)

A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions.

Genes, 11(11): pii:genes11111350.

The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-contiguity genome assembly of one Korean (AK1) to show that a part of AK1 genome is missing in GRCh38 and that the missing regions harbored ~1390 putative coding elements. Furthermore, we found that multiple populations shared some certain parts in the missing genome when we analyzed the "unmapped" (to GRCh38) reads of fourteen individuals (five East-Asians, four Europeans, and five Africans), amounting to ~5.3 Mb (~0.2% of AK1) of the total genomic regions. The recovered AK1 regions from the "unmapped reads", which were the estimated missing regions that did not exist in GRCh38, harbored candidate coding elements. We verified that most of the common (shared by ≥7 individuals) missing regions exist in human and chimpanzee DNA. Moreover, we further identified the occurrence mechanism and ethnic heterogeneity as well as the presence of the common missing regions. This study illuminates a potential advantage of using a pangenome reference and brings up the need for further investigations on the various features of regions globally missed in GRCh38.

RevDate: 2020-11-16

Li X, Lin J, Hu Y, et al (2020)

PARMAP: A Pan-Genome-Based Computational Framework for Predicting Antimicrobial Resistance.

Frontiers in microbiology, 11:578795.

Antimicrobial resistance (AMR) has emerged as one of the most urgent global threats to public health. Accurate detection of AMR phenotypes is critical for reducing the spread of AMR strains. Here, we developed PARMAP (Prediction of Antimicrobial Resistance by MAPping genetic alterations in pan-genome) to predict AMR phenotypes and to identify AMR-associated genetic alterations based on the pan-genome of bacteria by utilizing machine learning algorithms. When we applied PARMAP to 1,597 Neisseria gonorrhoeae strains, it successfully predicted their AMR phenotypes based on a pan-genome analysis. Furthermore, it identified 328 genetic alterations in 23 known AMR genes and discovered many new AMR-associated genetic alterations in ciprofloxacin-resistant N. gonorrhoeae, and it clearly indicated the genetic heterogeneity of AMR genes in different subtypes of resistant N. gonorrhoeae. Additionally, PARMAP performed well in predicting the AMR phenotypes of Mycobacterium tuberculosis and Escherichia coli, indicating the robustness of the PARMAP framework. In conclusion, PARMAP not only precisely predicts the AMR of a population of strains of a given species but also uses whole-genome sequencing data to prioritize candidate AMR-associated genetic alterations based on their likelihood of contributing to AMR. Thus, we believe that PARMAP will accelerate investigations into AMR mechanisms in other human pathogens.

RevDate: 2020-11-16

Yuan C, Wei Y, Zhang S, et al (2020)

Comparative Genomic Analysis Reveals Genetic Mechanisms of the Variety of Pathogenicity, Antibiotic Resistance, and Environmental Adaptation of Providencia Genus.

Frontiers in microbiology, 11:572642.

The bacterial genus Providencia is Gram-negative opportunistic pathogens, which have been isolated from a variety of environments and organisms, ranging from humans to animals. Providencia alcalifaciens, Providencia rettgeri, and Providencia stuartii are the most common clinical isolates, however, these three species differ in their pathogenicity, antibiotic resistance and environmental adaptation. Genomes of 91 isolates of the genus Providencia were investigated to clarify their genetic diversity, focusing on virulence factors, antibiotic resistance genes, and environmental adaptation genes. Our study revealed an open pan-genome for the genus Providencia containing 14,720 gene families. Species of the genus Providencia exhibited different functional constraints, with the core genes, accessory genes, and unique genes. A maximum-likelihood phylogeny reconstructed with concatenated single-copy core genes classified all Providencia isolates into 11 distant groups. Comprehensive and systematic comparative genomic analyses revealed that specific distributions of virulence genes, which were highly homologous to virulence genes of the genus Proteus, contributed to diversity in pathogenicity of Providencia alcalifaciens, Providencia rettgeri, and Providencia stuartii. Furthermore, multidrug resistance (MDR) phenotypes of isolates of Providencia rettgeri and Providencia stuartii were predominantly due to resistance genes from class 1 and 2 integrons. In addition, Providencia rettgeri and Providencia stuartii harbored more genes related to material transport and energy metabolism, which conferred a stronger ability to adapt to diverse environments. Overall, our study provided valuable insights into the genetic diversity and functional features of the genus Providencia, and revealed genetic mechanisms underlying diversity in pathogenicity, antibiotic resistance and environmental adaptation of members of this genus.

RevDate: 2020-11-13

Gao L, Koo DH, Juliana P, et al (2020)

The Aegilops ventricosa 2NvS segment in bread wheat: cytology, genomics and breeding.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik pii:10.1007/s00122-020-03712-y [Epub ahead of print].

KEY MESSAGE: The first cytological characterization of the 2NvS segment in hexaploid wheat; complete de novo assembly and annotation of 2NvS segment; 2NvS frequency is increasing 2NvS and is associated with higher yield. The Aegilops ventricosa 2NvS translocation segment has been utilized in breeding disease-resistant wheat crops since the early 1990s. This segment is known to possess several important resistance genes against multiple wheat diseases including root knot nematode, stripe rust, leaf rust and stem rust. More recently, this segment has been associated with resistance to wheat blast, an emerging and devastating wheat disease in South America and Asia. To date, full characterization of the segment including its size, gene content and its association with grain yield is lacking. Here, we present a complete cytological and physical characterization of this agronomically important translocation in bread wheat. We de novo assembled the 2NvS segment in two wheat varieties, 'Jagger' and 'CDC Stanley,' and delineated the segment to be approximately 33 Mb. A total of 535 high-confidence genes were annotated within the 2NvS region, with > 10% belonging to the nucleotide-binding leucine-rich repeat (NLR) gene families. Identification of groups of NLR genes that are potentially N genome-specific and expressed in specific tissues can fast-track testing of candidate genes playing roles in various disease resistances. We also show the increasing frequency of 2NvS among spring and winter wheat breeding programs over two and a half decades, and the positive impact of 2NvS on wheat grain yield based on historical datasets. The significance of the 2NvS segment in wheat breeding due to resistance to multiple diseases and a positive impact on yield highlights the importance of understanding and characterizing the wheat pan-genome for better insights into molecular breeding for wheat improvement.

RevDate: 2020-11-13

Piza-Buitrago A, Rincón V, Donato J, et al (2020)

Genome-based characterization of two Colombian clinical Providencia rettgeri isolates co-harboring NDM-1, VIM-2, and other β-lactamases.

BMC microbiology, 20(1):345 pii:10.1186/s12866-020-02030-z.

BACKGROUND: Providencia rettgeri is a nosocomial pathogen associated with urinary tract infections and related to Healthcare-Associated Infection (HAI). In recent years isolates producing New Delhi Metallo-β-lactamase (NDM) and other β-lactamases have been reported that reduce the efficiency of clinical antimicrobial treatments. In this study, we analyzed antibiotic resistance, the presence of resistance genes and the clonal relationship of two P. rettgeri isolates obtained from male patients admitted to the same hospital in Bogotá - Colombia, 2015.

RESULTS: Antibiotic susceptibility profile evaluated by the Kirby-Bauer method revealed that both isolates were resistant to third-generation carbapenems and cephalosporins. Whole-genome sequencing (Illumina HiSeq) followed by SPAdes assembling, Prokka annotation in combination with an in-house Python program and resistance gene detection by ResFinder identified the same six β-lactamase genes in both isolates: blaNDM-1, blaVIM-2, blaCTX-M-15, blaOXA-10, blaCMY-2 and blaTEM-1. Additionally, various resistance genes associated with antibiotic target alteration (arnA, PmrE, PmrF, LpxA, LpxC, gyrB, folP, murA, rpoB, rpsL, tet34) were found and four efflux pumps (RosAB, EmrD, mdtH and cmlA). The additional resistance to gentamicin in one of the two isolates could be explained by a detected SNP in CpxA (Cys191Arg) which is involved in the stress response of the bacterial envelope. Genome BLAST comparison using CGView, the ANI value (99.99%) and the pangenome (using Roary) phylogenetic tree (same clade, small distance) showed high similarity between the isolates. The rMLST analysis indicated that both isolates were typed as rST-61,696, same as the RB151 isolate previously isolated in Bucaramanga, Colombia, 2013, and the FDAARGOS_330 isolate isolated in the USA, 2015.

CONCLUSIONS: We report the coexistence of the carbapenemase genes blaNDM-1, and blaVIM-2, together with the β-lactamase genes blaCTX-M-15, blaOXA-10, blaCMY-2 and blaTEM-1, in P. rettgeri isolates from two patients in Colombia. Whole-genome sequence analysis indicated a circulation of P. rettgeri rST-61,696 strains in America that needs to be investigated further.

RevDate: 2020-11-11

Pandey A, Humbert MV, Jackson A, et al (2020)

Evidence of homologous recombination as a driver of diversity in Brachyspira pilosicoli.

Microbial genomics [Epub ahead of print].

The enteric, pathogenic spirochaete Brachyspira pilosicoli colonizes and infects a variety of birds and mammals, including humans. However, there is a paucity of genomic data available for this organism. This study introduces 12 newly sequenced draft genome assemblies, boosting the cohort of examined isolates by fourfold and cataloguing the intraspecific genomic diversity of the organism more comprehensively. We used several in silico techniques to define a core genome of 1751 genes and qualitatively and quantitatively examined the intraspecific species boundary using phylogenetic analysis and average nucleotide identity, before contextualizing this diversity against other members of the genus Brachyspira. Our study revealed that an additional isolate that was unable to be species typed against any other Brachyspira lacked putative virulence factors present in all other isolates. Finally, we quantified that homologous recombination has as great an effect on the evolution of the core genome of the B. pilosicoli as random mutation (r/m=1.02). Comparative genomics has informed Brachyspira diversity, population structure, host specificity and virulence. The data presented here can be used to contribute to developing advanced screening methods, diagnostic assays and prophylactic vaccines against this zoonotic pathogen.

RevDate: 2020-11-11

Lau BT, Pavlichin D, Hooker AC, et al (2020)

Profiling SARS-CoV-2 mutation fingerprints that range from the viral pangenome to individual infection quasispecies.

medRxiv : the preprint server for health sciences.

Background: The genome of SARS-CoV-2 is susceptible to mutations during viral replication due to the errors generated by RNA-dependent RNA polymerases. These mutations enable the SARS-CoV-2 to evolve into new strains. Viral quasispecies emerge from de novo mutations that occur in individual patients. In combination, these sets of viral mutations provide distinct genetic fingerprints that reveal the patterns of transmission and have utility in contract tracing.

Methods: Leveraging thousands of sequenced SARS-CoV-2 genomes, we performed a viral pangenome analysis to identify conserved genomic sequences. We used a rapid and highly efficient computational approach that relies on k-mers, short tracts of sequence, instead of conventional sequence alignment. Using this method, we annotated viral mutation signatures that were associated with specific strains. Based on these highly conserved viral sequences, we developed a rapid and highly scalable targeted sequencing assay to identify mutations, detect quasispecies and identify mutation signatures from patients. These results were compared to the pangenome genetic fingerprints.

Results: We built a k-mer index for thousands of SARS-CoV-2 genomes and identified conserved genomics regions and landscape of mutations across thousands of virus genomes. We delineated mutation profiles spanning common genetic fingerprints (the combination of mutations in a viral assembly) and rare ones that occur in only small fraction of patients. We developed a targeted sequencing assay by selecting primers from the conserved viral genome regions to flank frequent mutations. Using a cohort of SARS-CoV-2 clinical samples, we identified genetic fingerprints consisting of strain-specific mutations seen across populations and de novo quasispecies mutations localized to individual infections. We compared the mutation profiles of viral samples undergoing analysis with the features of the pangenome.

Conclusions: We conducted an analysis for viral mutation profiles that provide the basis of genetic fingerprints. Our study linked pangenome analysis with targeted deep sequenced SARS-CoV-2 clinical samples. We identified quasispecies mutations occurring within individual patients, mutations demarcating dominant species and the prevalence of mutation signatures, of which a significant number were relatively unique. Analysis of these genetic fingerprints may provide a way of conducting molecular contact tracing.

RevDate: 2020-11-11

Drijver EPMD, Stohr JJJM, Verweij JJ, et al (2020)

Limited Genetic Diversity of blaCMY-2-Containing IncI1-pST12 Plasmids from Enterobacteriaceae of Human and Broiler Chicken Origin in The Netherlands.

Microorganisms, 8(11): pii:microorganisms8111755.

Distinguishing epidemiologically related and unrelated plasmids is essential to confirm plasmid transmission. We compared IncI1-pST12 plasmids from both human and livestock origin and explored the degree of sequence similarity between plasmids from Enterobacteriaceae with different epidemiological links. Short-read sequence data of Enterobacteriaceae cultured from humans and broilers were screened for the presence of both a blaCMY-2 gene and an IncI1-pST12 replicon. Isolates were long-read sequenced on a MinION sequencer (OxfordNanopore Technologies). After plasmid reconstruction using hybrid assembly, pairwise single nucleotide polymorphisms (SNPs) were determined. The plasmids were annotated, and a pan-genome was constructed to compare genes variably present between the different plasmids. Nine Escherichia coli sequences of broiler origin, four Escherichia coli sequences, and one Salmonella enterica sequence of human origin were selected for the current analysis. A circular contig with the IncI1-pST12 replicon and blaCMY-2 gene was extracted from the assembly graph of all fourteen isolates. Analysis of the IncI1-pST12 plasmids revealed a low number of SNP differences (range of 0-9 SNPs). The range of SNP differences overlapped in isolates with different epidemiological links. One-hundred and twelve from a total of 113 genes of the pan-genome were present in all plasmid constructs. Next generation sequencing analysis of blaCMY-2-containing IncI1-pST12 plasmids isolated from Enterobacteriaceae with different epidemiological links show a high degree of sequence similarity in terms of SNP differences and the number of shared genes. Therefore, statements on the horizontal transfer of these plasmids based on genetic identity should be made with caution.

RevDate: 2020-11-10

Gerdol M, Moreira R, Cruz F, et al (2020)

Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel.

Genome biology, 21(1):275 pii:10.1186/s13059-020-02180-3.

BACKGROUND: The Mediterranean mussel Mytilus galloprovincialis is an ecologically and economically relevant edible marine bivalve, highly invasive and resilient to biotic and abiotic stressors causing recurrent massive mortalities in other bivalves. Although these traits have been recently linked with the maintenance of a high genetic variation within natural populations, the factors underlying the evolutionary success of this species remain unclear.

RESULTS: Here, after the assembly of a 1.28-Gb reference genome and the resequencing of 14 individuals from two independent populations, we reveal a complex pan-genomic architecture in M. galloprovincialis, with a core set of 45,000 genes plus a strikingly high number of dispensable genes (20,000) subject to presence-absence variation, which may be entirely missing in several individuals. We show that dispensable genes are associated with hemizygous genomic regions affected by structural variants, which overall account for nearly 580 Mb of DNA sequence not included in the reference genome assembly. As such, this is the first study to report the widespread occurrence of gene presence-absence variation at a whole-genome scale in the animal kingdom.

CONCLUSIONS: Dispensable genes usually belong to young and recently expanded gene families enriched in survival functions, which might be the key to explain the resilience and invasiveness of this species. This unique pan-genome architecture is characterized by dispensable genes in accessory genomic regions that exceed by orders of magnitude those observed in other metazoans, including humans, and closely mirror the open pan-genomes found in prokaryotes and in a few non-metazoan eukaryotes.

RevDate: 2020-11-10

Vasilyev IY, Nikolaeva IV, Siniagina MN, et al (2020)

Multidrug-Resistant Hypervirulent Klebsiella pneumoniae Found Persisting Silently in Infant Gut Microbiota.

International journal of microbiology, 2020:4054393.

Since the spread of multidrug-resistant Klebsiella pneumoniae (MDRKP) strains is considered as a challenge for patients with weakened or suppressed immunity, the emergence of isolates carrying determinants of hypervirulent phenotypes in addition may become a serious problem even for healthy individuals. The aim of this study is an investigation of the nonoutbreak K. pneumoniae emergence occurred in early 2017 at a maternity hospital of Kazan, Russia. Ten bacterial isolates demonstrating multiple drug resistance phenotypes were collected from eight healthy full-term breastfed neonates, observed at the maternity hospital of Kazan, Russia. All the infants and their mothers were dismissed without symptoms or complaints, in a satisfactory condition. Whole-genome shotgun (WGS) sequencing was performed with the purpose to track down a possible spread source(s) and obtain detailed information about resistance determinants and pathogenic potential of the collected isolates. Microdilution tests have confirmed production of extended-spectrum β-lactamases (ESBL) and their resistance to aminoglycoside, β-lactam, fluoroquinolone, sulfonamide, nitrofurantoin, trimethoprim, and fosfomycin antibiotics and Klebsiella phage. The WGS analysis has revealed the genes that are resistant to aminoglycosides, fluoroquinolones, macrolides, sulfonamides, chloramphenicols, tetracyclines, and trimethoprim and ESBL determinants. The pangenome analysis had split the isolates into two phylogenetic clades. The first group, a more heterogeneous clade, was represented by 5 isolates with 4 different in silico multilocus sequence types (MLSTs). The second group contained 5 isolates from infants born vaginally with the single MLST ST23, positive for genes corresponding to hypervirulent phenotypes: yersiniabactin, aerobactin, salmochelin, colibactin, hypermucoid determinants, and specific alleles of K- and O-antigens. The source of the MDRKP spread was not defined. Infected infants have shown no developed disease symptoms.

RevDate: 2020-11-06

Lugli GA, Tarracchini C, Alessandri G, et al (2020)

Decoding the Genomic Variability among Members of the Bifidobacteriumdentium Species.

Microorganisms, 8(11): pii:microorganisms8111720.

Members of the Bifidobacterium dentium species are usually identified in the oral cavity of humans and associated with the development of plaque and dental caries. Nevertheless, they have also been detected from fecal samples, highlighting a widespread distribution among mammals. To explore the genetic variability of this species, we isolated and sequenced the genomes of 18 different B. dentium strains collected from fecal samples of several primate species and an Ursus arctos. Thus, we investigated the genomic variability and metabolic abilities of the new B. dentium isolates together with 20 public genome sequences. Comparative genomic analyses provided insights into the vast metabolic repertoire of the species, highlighting 19 glycosyl hydrolases families shared between each analyzed strain. Phylogenetic analysis of the B. dentium taxon, involving 1140 conserved genes, revealed a very close phylogenetic relatedness among members of this species. Furthermore, low genomic variability between strains was also confirmed by an average nucleotide identity analysis showing values higher than 98.2%. Investigating the genetic features of each strain, few putative functional mobile elements were identified. Besides, a consistent occurrence of defense mechanisms such as CRISPR-Cas and restriction-modification systems may be responsible for the high genome synteny identified among members of this taxon.

RevDate: 2020-11-06

Dahlhausen KE, Jospin G, Coil DA, et al (2020)

Isolation and sequence-based characterization of a koala symbiont: Lonepinella koalarum.

PeerJ, 8:e10177.

Koalas (Phascolarctos cinereus) are highly specialized herbivorous marsupials that feed almost exclusively on Eucalyptus leaves, which are known to contain varying concentrations of many different toxic chemical compounds. The literature suggests that Lonepinella koalarum, a bacterium in the Pasteurellaceae family, can break down some of these toxic chemical compounds. Furthermore, in a previous study, we identified L. koalarum as the most predictive taxon of koala survival during antibiotic treatment. Therefore, we believe that this bacterium may be important for koala health. Here, we isolated a strain of L. koalarum from a healthy koala female and sequenced its genome using a combination of short-read and long-read sequencing. We placed the genome assembly into a phylogenetic tree based on 120 genome markers using the Genome Taxonomy Database (GTDB), which currently does not include any L. koalarum assemblies. Our genome assembly fell in the middle of a group of Haemophilus, Pasteurella and Basfia species. According to average nucleotide identity and a 16S rRNA gene tree, the closest relative of our isolate is L. koalarum strain Y17189. Then, we annotated the gene sequences and compared them to 55 closely related, publicly available genomes. Several genes that are known to be involved in carbohydrate metabolism could exclusively be found in L. koalarum relative to the other taxa in the pangenome, including glycoside hydrolase families GH2, GH31, GH32, GH43 and GH77. Among the predicted genes of L. koalarum were 79 candidates putatively involved in the degradation of plant secondary metabolites. Additionally, several genes coding for amino acid variants were found that had been shown to confer antibiotic resistance in other bacterial species against pulvomycin, beta-lactam antibiotics and the antibiotic efflux pump KpnH. In summary, this genetic characterization allows us to build hypotheses to explore the potentially beneficial role that L. koalarum might play in the koala intestinal microbiome. Characterizing and understanding beneficial symbionts at the whole genome level is important for the development of anti- and probiotic treatments for koalas, a highly threatened species due to habitat loss, wildfires, and high prevalence of Chlamydia infections.

RevDate: 2020-11-04

Kim E, Cho EJ, Yang SM, et al (2020)

Identification and monitoring of Lactobacillus delbrueckii subspecies using pangenomic-based novel genetic markers.

Journal of microbiology and biotechnology pii:jmb.2009.09034 [Epub ahead of print].

Genetic markers currently used for the discrimination of Lactobacillus delbrueckii subspecies have low efficiency for identification at subspecies level. Therefore, the objective of this study was to select novel genetic markers for accurate identification and discrimination of six L. delbrueckii subspecies based on pangenome analysis. This study evaluated L. delbrueckii genomes to avoid making incorrect conclusions in the process of selecting genetic markers due to mislabeled genome. Genome analysis showed that two genomes of L. delbrueckii subspecies deposited in NCBI were misidentified. Based on these results, subspecies-specific genetic markers were selected by comparing pan and core-genome. Genetic markers were confirmed to be specific for 59,196,562 genome sequences via in silico analysis. They were found in all strains of the same subspecies, but not in other subspecies or bacterial strains. These genetic markers also could be used to accurately identify genomes at the subspecies level for genomes known at the species level. A real-time PCR method for the detection of three main subspecies (L. delbrueckii subsp. delbrueckii, lactis, and bulgaricus) was developed to cost-effectively identify them using genetic markers. Results showed 100% specificity for each subspecies. These genetic markers could differentiate each subspecies from 44 other lactic acid bacteria. This real-time PCR method was then applied to monitor 26 probiotics and dairy products. It was also used to identify 64 unknown strains isolated from raw milk samples and dairy products. Results confirmed that unknown isolates and subspecies contained in the product could be accurately identified using this real-time PCR method.

RevDate: 2020-11-01

Rogalski E, Ehrmann MA, RF Vogel (2020)

Intraspecies diversity and genome-phenotype-associations in Fructilactobacillus sanfranciscensis.

Microbiological research pii:S0944-5013(20)30493-6 [Epub ahead of print].

In this study the intraspecies diversity of Fructilactobacillus (F.) sanfranciscensis (formerly Lactobacillus sanfranciscensis) was characterized by comparative genomics supported by physiological data. Twenty-four strains of F. sanfranciscensis were analyzed and sorted into six different genomic clusters. The core genome comprised only 43,14 % of the pan genome, i.e. 0.87 Mbp of 2.04 Mbp. The main annotated genomic differences reside in maltose, fructose and sucrose as well as nucleotide metabolism, use of electron acceptors, and exopolysacchride formation. Furthermore, all strains are well equipped to cope with oxidative stress via NADH oxidase and a distinct thiol metabolism. Only ten of 24 genomes contain two maltose phosphorylase genes (mapA and mapB). In F. sanfranciscensis TMW 1.897 only mapA was found. All strains except those from genomic cluster 2 contained the mannitol dehydrogenase and should therefore be able to use fructose as external electron acceptor. Moreover, six strains were able to grow on fructose as sole carbon source, as they contained a functional fructokinase gene. No growth was observed on pentoses, i.e. xylose, arabinose or ribose, as sole carbon source. This can be referred to the absence of ribose pyranase rbsD in all genomes, and absence of or mutations in numerous other genes, which are essential for arabinose and xylose metabolism. Seven strains were able to produce exopolysaccharides (EPS) from sucrose. In addition, the strains containing levS were able to grow on sucrose as sole carbon source. Strains of one cluster exhibit auxotrophies for purine nucleotides. The physiological and genomic analyses suggest that the biodiversity of F. sanfranciscensis is larger than anticipated. Consequently, "original" habitats and lifestyles of F. sanfranciscensis may vary but can generally be referred to an adaptation to sugary (maltose/sucrose/fructose-rich) and aerobic environments as found in plants and insects. It can dominate sourdoughs as a result of reductive evolution and cooperation with fructose-delivering, acetate-tolerant yeasts.

RevDate: 2020-10-31

Huang WC, Hu Y, Zhang G, et al (2020)

Comparative genomic analysis reveals metabolic diversity of different Paenibacillus groups.

Applied microbiology and biotechnology pii:10.1007/s00253-020-10984-3 [Epub ahead of print].

The genus Paenibacillus was originally recognized based on the 16S rRNA gene phylogeny. Recently, a standardized bacterial taxonomy approach based on a genome phylogeny has substantially revised the classification of Paenibacillus, dividing it into 23 genera. However, the metabolic differences among these groups remain undescribed. Here, genomes of 41 Paenibacillus strains comprising 25 species were sequenced, and a comparative genomic analysis was performed considering these and 187 publicly available Paenibacillus genomes to understand their phylogeny and metabolic differences. Phylogenetic analysis indicated that Paenibacillus clustered into 10 subgroups. Core genome and pan-genome analyses revealed similar functional categories among the different Paenibacillus subgroups; however, each group tended to harbor specific gene families. A large proportion of genes in the subgroups A, E, and G are related to carbohydrate metabolism. Among them, genes related to the glycoside hydrolase family were most abundant. Metabolic reconstruction of the newly sequenced genomes showed that the Embden-Meyerhof-Parnas pathway, pentose phosphate pathway, and citric acid cycle are central pathways of carbohydrate metabolism in Paenibacillus. Further, the genomes of the subgroups A and G lack genes involved in glyoxylate cycle and D-galacturonate degradation, respectively. The current study revealed the metabolic diversity of Paenibacillus subgroups assigned based on a genomic phylogeny and could inform the taxonomy of Paenibacillus. KEY POINTS: • Paenibacillus clustered into 10 subgroups. • Genomic content variation and metabolic diversity in the subgroup A, E, and G were described. • Carbohydrate transport and metabolism is important for Paenibacillus survival.

RevDate: 2020-11-07

Zukancic A, Khan MA, Gurmen SJ, et al (2020)

Staphylococcal Protein A (spa) Locus Is a Hot Spot for Recombination and Horizontal Gene Transfer in Staphylococcus pseudintermedius.

mSphere, 5(5):.

Staphylococcus pseudintermedius is a major canine pathogen but also occasionally colonizes and infects humans. Multidrug-resistant methicillin-resistant S. pseudintermedius (MDR MRSP) strains have emerged globally, making treatment and control of this pathogen challenging. Sequence type 71 (ST71), ST68, and ST45 are the most widespread and successful MDR MRSP clones. The potential genetic factors underlying the clonal success of these and other predominant clones remain unknown. Characterization of the pangenome, lineage-associated accessory genes, and genes acquired through horizontal gene transfer from other bacteria is important for identifying such factors. Here, we analyzed genome sequence data from 622 S. pseudintermedius isolates to investigate the evolution of pathogenicity across lineages. We show that the predominant clones carry one or more lineage-associated virulence genes. The gene encoding staphylococcal protein A (SpA), a key virulence factor involved in immune evasion and a potential vaccine antigen, is deleted in 62% of isolates. Most importantly, we have discovered that the spa locus is a hot spot for recombination and horizontal gene transfer in S. pseudintermedius, where genes related to restriction modification, prophage immunity, mercury resistance, and nucleotide and carbohydrate metabolism have been acquired in different lineages. Our study also establishes that ST45 is composed of two distinct sublineages that differ in their accessory gene content and virulence potential. Collectively, this study reports several previously undetected lineage-associated genetic factors that may have a role in the clonal success of the major MDR MRSP clones. These data provide a framework for future experimental studies on S. pseudintermedius pathogenesis and for developing novel therapeutics against this pathogen.IMPORTANCEStaphylococcus pseudintermedius is a major canine pathogen but can also occasionally infect humans. Identification of genetic factors contributing to the virulence and clonal success of multidrug-resistant S. pseudintermedius clones is critical for the development of therapeutics against this pathogen. Here, we characterized the genome sequences of a global collection of 622 S. pseudintermedius isolates. We show that all major clones, besides carrying core virulence genes, which are present in all strains, carry one or more lineage-specific genes. Many of these genes have been acquired from other bacterial species through a horizontal gene transfer mechanism. Importantly, we have discovered that the staphylococcal protein A gene (spa), a widely used marker for molecular typing of S. pseudintermedius strains and a potential vaccine candidate antigen, is deleted in 62% of strains. Furthermore, the spa locus in S. pseudintermedius acts as a reservoir to accumulate lineage-associated genes with adaptive functions.

RevDate: 2020-11-10

Ding Y, Weckwerth PR, Poretsky E, et al (2020)

Genetic elucidation of interconnected antibiotic pathways mediating maize innate immunity.

Nature plants, 6(11):1375-1388.

Specialized metabolites constitute key layers of immunity that underlie disease resistance in crops; however, challenges in resolving pathways limit our understanding of the functions and applications of these metabolites. In maize (Zea mays), the inducible accumulation of acidic terpenoids is increasingly considered to be a defence mechanism that contributes to disease resistance. Here, to understand maize antibiotic biosynthesis, we integrated association mapping, pan-genome multi-omic correlations, enzyme structure-function studies and targeted mutagenesis. We define ten genes in three zealexin (Zx) gene clusters that encode four sesquiterpene synthases and six cytochrome P450 proteins that collectively drive the production of diverse antibiotic cocktails. Quadruple mutants in which the ability to produce zealexins (ZXs) is blocked exhibit a broad-spectrum loss of disease resistance. Genetic redundancies ensuring pathway resiliency to single null mutations are combined with enzyme substrate promiscuity, creating a biosynthetic hourglass pathway that uses diverse substrates and in vivo combinatorial chemistry to yield complex antibiotic blends. The elucidated genetic basis of biochemical phenotypes that underlie disease resistance demonstrates a predominant maize defence pathway and informs innovative strategies for transferring chemical immunity between crops.

RevDate: 2020-10-27

Slizen MV, OV Galzitskaya (2020)

Comparative Analysis of Proteomes of a Number of Nosocomial Pathogens by KEGG Modules and KEGG Pathways.

International journal of molecular sciences, 21(21): pii:ijms21217839.

Nosocomial (hospital-acquired) infections remain a serious challenge for health systems. The reason for this lies not only in the local imperfection of medical practices and protocols. The frequency of infection with antibiotic-resistant strains of bacteria is growing every year, both in developed and developing countries. In this work, a pangenome and comparative analysis of 201 genomes of Staphylococcus aureus, Enterobacter spp., Pseudomonas aeruginosa, and Mycoplasma spp. was performed on the basis of high-level functional annotations-KEGG pathways and KEGG modules. The first three organisms are serious nosocomial pathogens, often exhibiting multidrug resistance. Analysis of KEGG modules revealed methicillin resistance in 25% of S. aureus strains and resistance to carbapenems in 21% of Enterobacter spp. strains. P. aeruginosa has a wide range of unique efflux systems. One hundred percent of the analyzed strains have at least two drug resistance systems, and 75% of the strains have seven. Each of the organisms has a characteristic set of metabolic features, whose impact on drug resistance can be considered in future studies. Comparing the genomes of nosocomial pathogens with each other and with Mycoplasma genomes can expand our understanding of the versatility of certain metabolic features and mechanisms of drug resistance.

RevDate: 2020-10-26

Zou W, Ye G, Zhang K, et al (2020)

Analysis of the core genome and pangenome of Clostridium butyricum.

Genome [Epub ahead of print].

Clostridium butyricum is an anaerobic bacterium that inhabits broad niches. Clostridium butyricum is known for its production of butyrate, 1,3-propanediol, and hydrogen. This study aimed to present a comparative pan-genome analysis of 24 strains isolated from different niches. We sequenced and annotated the genome of C. butyricum 3-3 isolated from the Chinese baijiu ecosystem. The pan-genome of C. butyricum was open. The core genome, accessory genome, and strain-specific genes comprised 1,011, 4,543, and 1,473 genes, respectively. In the core genome, carbohydrate metabolism was the largest category, and genes in the biosynthetic pathway of butyrate and glycerol metabolism were conserved (in the core or soft-core genome). Furthermore, the 1,3-propanediol operon existed in 20 strains. In the accessory genome, numerous mobile genetic elements belonging to the replication, recombination, and repair (L) category were identified. In addition, genome islands were identified in all 24 strains, ranging from 2 (strain KNU-L09) to 53 (strain SU1), and phage sequences were found in 17 of the 24 strains. This study provides an important genomic framework that could pave the way for the exploration of C. butyricum and future studies on the genetic diversification of C. butyricum.

RevDate: 2020-10-17

Song JM, Liu DX, Xie WZ, et al (2020)

BnPIR: Brassica napus Pan-genome Information Resource for 1,689 accessions.

Plant biotechnology journal [Epub ahead of print].

Brassica napus (B. napus) was originally formed ~7,500 years ago by interspecific hybridization between B. rape and B. oleracea (Chalhoub et al., 2014), which supplies approximately 13%-16% of the vegetable oil globally. B. napus serves as an excellent model for polyploid genomics and evolutionary research in plants. Brassica database (BRAD) has long been used for rapeseed genomic research, which provides genome browser and syntenic relationship for multiple Brassicaceae genomes (Wang et al., 2015).

RevDate: 2020-10-23

Li H, Feng X, C Chu (2020)

The design and construction of reference pangenome graphs with minigraph.

Genome biology, 21(1):265.

The recent advances in sequencing technologies enable the assembly of individual genomes to the quality of the reference genome. How to integrate multiple genomes from the same species and make the integrated representation accessible to biologists remains an open challenge. Here, we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implement our ideas in the minigraph toolkit and demonstrate that we can efficiently construct a pangenome graph and compactly encode tens of thousands of structural variants missing from the current reference genome.

RevDate: 2020-10-16

De Filippis F, Pasolli E, D Ercolini (2020)

Newly Explored Faecalibacterium Diversity Is Connected to Age, Lifestyle, Geography, and Disease.

Current biology : CB pii:S0960-9822(20)31433-0 [Epub ahead of print].

Faecalibacterium is prevalent in the human gut and a promising microbe for the development of next-generation probiotics (NGPs) or biotherapeutics. Analyzing reference Faecalibacterium genomes and almost 3,000 Faecalibacterium-like metagenome-assembled genomes (MAGs) reconstructed from 7,907 human and 203 non-human primate gut metagenomes, we identified the presence of 22 different Faecalibacterium-like species-level genome bins (SGBs), some further divided in different strains according to the subject geographical origin. Twelve SGBs are globally spread in the human gut and show different genomic potential in the utilization of complex polysaccharides, suggesting that higher SGB diversity may be related with increased utilization of plant-based foods. Moreover, up to 11 different species may co-occur in the same subject, with lower diversity in Western populations, as well as intestinal inflammatory states and obesity. The newly explored Faecalibacterium diversity will be able to support the choice of strains suitable as NGPs, guided by the consideration of the differences existing in their functional potential.

RevDate: 2020-11-03

Zhou Z, Charlesworth J, M Achtman (2020)

Accurate reconstruction of bacterial pan- and core genomes with PEPPAN.

Genome research, 30(11):1667-1679.

Bacterial genomes can contain traces of a complex evolutionary history, including extensive homologous recombination, gene loss, gene duplications, and horizontal gene transfer. To reconstruct the phylogenetic and population history of a set of multiple bacteria, it is necessary to examine their pangenome, the composite of all the genes in the set. Here we introduce PEPPAN, a novel pipeline that can reliably construct pangenomes from thousands of genetically diverse bacterial genomes that represent the diversity of an entire genus. PEPPAN outperforms existing pangenome methods by providing consistent gene and pseudogene annotations extended by similarity-based gene predictions, and identifying and excluding paralogs by combining tree- and synteny-based approaches. The PEPPAN package additionally includes PEPPAN_parser, which implements additional downstream analyses, including the calculation of trees based on accessory gene content or allelic differences between core genes. To test the accuracy of PEPPAN, we implemented SimPan, a novel pipeline for simulating the evolution of bacterial pangenomes. We compared the accuracy and speed of PEPPAN with four state-of-the-art pangenome pipelines using both empirical and simulated data sets. PEPPAN was more accurate and more specific than any of the other pipelines and was almost as fast as any of them. As a case study, we used PEPPAN to construct a pangenome of approximately 40,000 genes from 3052 representative genomes spanning at least 80 species of Streptococcus The resulting gene and allelic trees provide an unprecedented overview of the genomic diversity of the entire Streptococcus genus.

RevDate: 2020-11-10

Kumar R, Register K, Christopher-Hennings J, et al (2020)

Population Genomic Analysis of Mycoplasma bovis Elucidates Geographical Variations and Genes associated with Host-Types.

Microorganisms, 8(10):.

: Among more than twenty species belonging to the class Mollecutes, Mycoplasma bovis is the most common cause of bovine mycoplasmosis in North America and Europe. Bovine mycoplasmosis causes significant economic loss in the cattle industry. The number of M. bovis positive herds recently has increased in North America and Europe. Since antibiotic treatment is ineffective and no efficient vaccine is available, M. bovis induced mycoplasmosis is primarily controlled by herd management measures such as the restriction of moving infected animals out of the herds and culling of infected or shedders of M. bovis. To better understand the population structure and genomic factors that may contribute to its transmission, we sequenced 147 M. bovis strains isolated from four different countries viz. USA (n = 121), Canada (n = 22), Israel (n = 3) and Lithuania (n = 1). All except two of the isolates (KRB1 and KRB8) were isolated from two host types i.e., bovine (n = 75) and bison (n = 70). We performed a large-scale comparative analysis of M. bovis genomes by integrating 103 publicly available genomes and our dataset (250 total genomes). Whole genome single nucleotide polymorphism (SNP) based phylogeny using M.agalactiae as an outgroup revealed that M. bovis population structure is composed of five different clades. USA isolates showed a high degree of genomic divergence in comparison to the Australian isolates. Based on host of origin, all the isolates in clade IV was of bovine origin, whereas majority of the isolates in clades III and V was of bison origin. Our comparative genome analysis also revealed that M. bovis has an open pangenome with a large breadth of unexplored diversity of genes. The function based analysis of autogenous vaccine candidates (n = 10) included in this study revealed that their functional diversity does not span the genomic diversity observed in all five clades identified in this study. Our study also found that M. bovis genome harbors a large number of IS elements and their number increases significantly (p = 7.8x10-6) as the genome size increases. Collectively, the genome data and the whole genome-based population analysis in this study may help to develop better understanding of M. bovis induced mycoplasmosis in cattle.

RevDate: 2020-10-11

Eizenga JM, Novak AM, Kobayashi E, et al (2020)

Efficient dynamic variation graphs.

Bioinformatics (Oxford, England) pii:5872523 [Epub ahead of print].

MOTIVATION: Pangenomics is a growing field within computational genomics. Many pangenomic analyses use bidirected sequence graphs as their core data model. However, implementing and correctly using this data model can be difficult, and the scale of pangenomic datasets can be challenging to work at. These challenges have impeded progress in this field.

RESULTS: Here, we present a stack of two C++ libraries, libbdsg and libhandlegraph, which use a simple, field-proven interface, designed to expose elementary features of these graphs while preventing common graph manipulation mistakes. The libraries also provide a Python binding. Using a diverse collection of pangenome graphs, we demonstrate that these tools allow for efficient construction and manipulation of large genome graphs with dense variation. For instance, the speed and memory usage are up to an order of magnitude better than the prior graph implementation in the VG toolkit, which has now transitioned to using libbdsg's implementations.

libhandlegraph and libbdsg are available under an MIT License from and

RevDate: 2020-10-10

Kumar J, D Sen Gupta (2020)

Prospects of next generation sequencing in lentil breeding.

Molecular biology reports pii:10.1007/s11033-020-05891-9 [Epub ahead of print].

Lentil is an important food legume crop that has large and complex genome. During past years, considerable attention has been given on the use of next generation sequencing for enriching the genomic resources including identification of SSR and SNP markers, development of unigenes, transcripts, and identification of candidate genes for biotic and abiotic stresses, analysis of genetic diversity and identification of genes/ QTLs for agronomically important traits. However, in other crops including pulses, next generation sequencing has revolutionized the genomic research and helped in genomic assisted breeding rapidly and cost effectively. The present review discuss current status and future prospects of the use NGS based breeding in lentil.

RevDate: 2020-10-29

Muthuirulandi Sethuvel DP, Mutreja A, Pragasam AK, et al (2020)

Phylogenetic and Evolutionary Analysis Reveals the Recent Dominance of Ciprofloxacin-Resistant Shigella sonnei and Local Persistence of S. flexneri Clones in India.

mSphere, 5(5):.

Shigella is the second leading cause of bacterial diarrhea worldwide. Recently, Shigella sonnei seems to be replacing Shigella flexneri in low- and middle-income countries undergoing economic development. Despite this, studies focusing on these species at the genomic level remain largely unexplored. Here, we compared the genome sequences of S. flexneri and S. sonnei isolates from India with the publicly available genomes of global strains. Our analysis provides evidence for the long-term persistence of all phylogenetic groups (PGs) of S. flexneri and the recent dominance of the ciprofloxacin-resistant S. sonnei lineage in India. Within S. flexneri PGs, the majority of the study isolates belonged to PG3 within the predominance of serotype 2. For S. sonnei, the current pandemic involves globally distributed multidrug-resistant (MDR) clones that belong to Central Asia lineage III. The presence of such epidemiologically dominant lineages in association with stable antimicrobial resistance (AMR) determinants results in successful survival in the community.IMPORTANCEShigella is the second leading cause of bacterial diarrhea worldwide. This has been categorized as a priority pathogen among enteric bacteria by the Global Antimicrobial Resistance Surveillance System (GLASS) of the World Health Organization (WHO). Recently, S. sonnei seems to be replacing S. flexneri in low- and middle-income countries undergoing economic development. Antimicrobial resistance in S. flexneri and S. sonnei is a growing international concern, specifically with the international dominance of the multidrug-resistant (MDR) lineage. Genomic studies focusing on S. flexneri and S. sonnei in India remain largely unexplored. This study provides information on the introduction and expansion of drug-resistant Shigella strains in India for the first time by comparing the genome sequences of S. flexneri and S. sonnei isolates from India with the publicly available genomes of global strains. The study discusses the key differences between the two dominant species of Shigella at the genomic level to understand the evolutionary trends and genome dynamics of emerging and existing resistance clones. The present work demonstrates evidence for the long-term persistence of all PGs of S. flexneri and the recent dominance of a ciprofloxacin-resistant S. sonnei lineage in India.

RevDate: 2020-10-07

Khilyas IV, Sorokina AV, Markelova MI, et al (2020)

Genomic and phenotypic analysis of siderophore-producing Rhodococcus qingshengii strain S10 isolated from an arid weathered serpentine rock environment.

Archives of microbiology pii:10.1007/s00203-020-02057-w [Epub ahead of print].

The success of members of the genus Rhodococcus in colonizing arid rocky environments is owed in part to desiccation tolerance and an ability to extract iron through the secretion and uptake of siderophores. Here, we report a comprehensive genomic and taxonomic analysis of Rhodococcus qingshengii strain S10 isolated from eathered serpentine rock at the arid Khalilovsky massif, Russia. Sequence comparisons of whole genomes and of selected marker genes clearly showed strain S10 to belong to the R. qingshengii species. Four prophage sequences within the R. qingshengii S10 genome were identified, one of which encodes for a putative siderophore-interacting protein. Among the ten non-ribosomal peptides synthase (NRPS) clusters identified in the strain S10 genome, two show high homology to those responsible for siderophore synthesis. Phenotypic analyses demonstrated that R. qingshengii S10 secretes siderophores and possesses adaptive features (tolerance of up to 8% NaCl and pH 9) that should enable survival in its native habitat within dry serpentine rock.

RevDate: 2020-10-09

Sonnenberg CB, Kahlke T, P Haugen (2020)

Vibrionaceae core, shell and cloud genes are non-randomly distributed on Chr 1: An hypothesis that links the genomic location of genes with their intracellular placement.

BMC genomics, 21(1):695.

BACKGROUND: The genome of Vibrionaceae bacteria, which consists of two circular chromosomes, is replicated in a highly ordered fashion. In fast-growing bacteria, multifork replication results in higher gene copy numbers and increased expression of genes located close to the origin of replication of Chr 1 (ori1). This is believed to be a growth optimization strategy to satisfy the high demand of essential growth factors during fast growth. The relationship between ori1-proximate growth-related genes and gene expression during fast growth has been investigated by many researchers. However, it remains unclear which other gene categories that are present close to ori1 and if expression of all ori1-proximate genes is increased during fast growth, or if expression is selectively elevated for certain gene categories.

RESULTS: We calculated the pangenome of all complete genomes from the Vibrionaceae family and mapped the four pangene categories, core, softcore, shell and cloud, to their chromosomal positions. This revealed that core and softcore genes were found heavily biased towards ori1, while shell genes were overrepresented at the opposite part of Chr 1 (i.e., close to ter1). RNA-seq of Aliivibrio salmonicida and Vibrio natriegens showed global gene expression patterns that consistently correlated with chromosomal distance to ori1. Despite a biased gene distribution pattern, all pangene categories contributed to a skewed expression pattern at fast-growing conditions, whereas at slow-growing conditions, softcore, shell and cloud genes were responsible for elevated expression.

CONCLUSION: The pangene categories were non-randomly organized on Chr 1, with an overrepresentation of core and softcore genes around ori1, and overrepresentation of shell and cloud genes around ter1. Furthermore, we mapped our gene distribution data on to the intracellular positioning of chromatin described for V. cholerae, and found that core/softcore and shell/cloud genes appear enriched at two spatially separated intracellular regions. Based on these observations, we hypothesize that there is a link between the genomic location of genes and their cellular placement.

RevDate: 2020-11-03

Malik A, Kim YR, SB Kim (2020)

Genome Mining of the Genus Streptacidiphilus for Biosynthetic and Biodegradation Potential.

Genes, 11(10):.

The genus Streptacidiphilus represents a group of acidophilic actinobacteria within the family Streptomycetaceae, and currently encompasses 15 validly named species, which include five recent additions within the last two years. Considering the potential of the related genera within the family, namely Streptomyces and Kitasatospora, these relatively new members of the family can also be a promising source for novel secondary metabolites. At present, 15 genome data for 11 species from this genus are available, which can provide valuable information on their biology including the potential for metabolite production as well as enzymatic activities in comparison to the neighboring taxa. In this study, the genome sequences of 11 Streptacidiphilus species were subjected to the comparative analysis together with selected Streptomyces and Kitasatospora genomes. This study represents the first comprehensive comparative genomic analysis of the genus Streptacidiphilus. The results indicate that the genomes of Streptacidiphilus contained various secondary metabolite (SM) producing biosynthetic gene clusters (BGCs), some of them exclusively identified in Streptacidiphilus only. Several of these clusters may potentially code for SMs that may have a broad range of bioactivities, such as antibacterial, antifungal, antimalarial and antitumor activities. The biodegradation capabilities of Streptacidiphilus were also explored by investigating the hydrolytic enzymes for complex carbohydrates. Although all genomes were enriched with carbohydrate-active enzymes (CAZymes), their numbers in the genomes of some strains such as Streptacidiphilus carbonis NBRC 100919T were higher as compared to well-known carbohydrate degrading organisms. These distinctive features of each Streptacidiphilus species make them interesting candidates for future studies with respect to their potential for SM production and enzymatic activities.

RevDate: 2020-10-06

Chambers J, Sparks N, Sydney N, et al (2020)

Comparative genomics and pan-genomics of the Myxococcaceae, including a description of five novel species: Myxococcus eversor sp. nov., Myxococcus llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogochensis sp. nov., Myxococcus vastator sp. nov., Pyxidicoccus caerfyrddinensis sp. nov. and Pyxidicoccus trucidator sp. nov.

Genome biology and evolution pii:5918458 [Epub ahead of print].

Members of the predatory Myxococcales (myxobacteria) possess large genomes, undergo multicellular development and produce diverse secondary metabolites, which are being actively prospected for novel drug discovery. To direct such efforts, it is important to understand the relationships between myxobacterial ecology, evolution, taxonomy and genomic variation. This study investigated the genomes and pan-genomes of organisms within the Myxococcaceae, including the genera Myxococcus and Corallococcus, the most abundant myxobacteria isolated from soils. Previously, ten species of Corallococcus were known, while six species of Myxococcus phylogenetically surrounded a third genus (Pyxidicoccus) composed of a single species. Here, we describe draft genome sequences of five novel species within the Myxococcaceae (Myxococcus eversor, Myxococcus llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogochensis, Myxococcus vastator, Pyxidicoccus caerfyrddinensis and Pyxidicoccus trucidator), and for the Pyxidicoccus type species strain, Pyxidicoccus fallax DSM 14698T. Genomic and physiological comparisons demonstrated clear differences between the five novel species and every other Myxococcus or Pyxidicoccus spp. type strain. Subsequent analyses of type strain genomes showed that both the Corallococcus pan-genome and the combined Myxococcus and Pyxidicoccus (Myxococcus/Pyxidicoccus) pan-genome are large and open, but with clear differences. Genomes of Corallococcus spp. are generally smaller than those of Myxococcus/Pyxidicoccus spp., but have core genomes three times larger. Myxococcus/Pyxidicoccus spp. genomes are more variable in size, with larger and more unique sets of accessory genes than those of Corallococcus species. In both genera, biosynthetic gene clusters are relatively enriched in the shell pan-genomes, implying they grant a greater evolutionary benefit than other shell genes, presumably by conferring selective advantages during predation.

RevDate: 2020-10-09
CmpDate: 2020-10-09

Jensen SE, Charles JR, Muleta K, et al (2020)

A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction.

The plant genome, 13(1):e20009.

Successful management and utilization of increasingly large genomic datasets is essential for breeding programs to accelerate cultivar development. To help with this, we developed a Sorghum bicolor Practical Haplotype Graph (PHG) pangenome database that stores haplotypes and variant information. We developed two PHGs in sorghum that were used to identify genome-wide variants for 24 founders of the Chibas sorghum breeding program from 0.01x sequence coverage. The PHG called single nucleotide polymorphisms (SNPs) with 5.9% error at 0.01x coverage-only 3% higher than PHG error when calling SNPs from 8x coverage sequence. Additionally, 207 progenies from the Chibas genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes were imputed from PHG parental haplotypes and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from .57-.73 and are similar to prediction accuracies obtained with genotyping-by-sequencing or targeted amplicon sequencing (rhAmpSeq) markers. This study demonstrates the use of a sorghum PHG to impute SNPs from low-coverage sequence data and shows that the PHG can unify genotype calls across multiple sequencing platforms. By reducing input sequence requirements, the PHG can decrease the cost of genotyping, make GS more feasible, and facilitate larger breeding populations. Our results demonstrate that the PHG is a useful research and breeding tool that maintains variant information from a diverse group of taxa, stores sequence data in a condensed but readily accessible format, unifies genotypes across genotyping platforms, and provides a cost-effective option for genomic selection.

RevDate: 2020-10-06

Roe C, Williamson CHD, Vazquez AJ, et al (2020)

Bacterial Genome Wide Association Studies (bGWAS) and Transcriptomics Identifies Cryptic Antimicrobial Resistance Mechanisms in Acinetobacter baumannii.

Frontiers in public health, 8:451.

Antimicrobial resistance (AMR) in the nosocomial pathogen, Acinetobacter baumannii, is becoming a serious public health threat. While some mechanisms of AMR have been reported, understanding novel mechanisms of resistance is critical for identifying emerging resistance. One of the first steps in identifying novel AMR mechanisms is performing genotype/phenotype association studies; however, performing these studies is complicated by the plastic nature of the A. baumannii pan-genome. In this study, we compared the antibiograms of 12 antimicrobials associated with multiple drug families for 84 A. baumannii isolates, many isolated in Arizona, USA. in silico screening of these genomes for known AMR mechanisms failed to identify clear correlations for most drugs. We then performed a bacterial genome wide association study (bGWAS) looking for associations between all possible 21-mers; this approach generally failed to identify mechanisms that explained the resistance phenotype. In order to decrease the genomic noise associated with population stratification, we compared four phylogenetically-related pairs of isolates with differing susceptibility profiles. RNA-Sequencing (RNA-Seq) was performed on paired isolates and differentially-expressed genes were identified. In these isolate pairs, five different potential mechanisms were identified, highlighting the difficulty of broad AMR surveillance in this species. To verify and validate differential expression, amplicon sequencing was performed. These results suggest that a diagnostic platform based on gene expression rather than genomics alone may be beneficial in certain surveillance efforts. The implementation of such advanced diagnostics coupled with increased AMR surveillance will potentially improve A. baumannii infection treatment and patient outcomes.

RevDate: 2020-10-06

Yang Y, Zhang Y, Cápiro NL, et al (2020)

Genomic Characteristics Distinguish Geographically Distributed Dehalococcoidia.

Frontiers in microbiology, 11:546063.

Dehalococcoidia (Dia) class microorganisms are frequently found in various pristine and contaminated environments. Metagenome-assembled genomes (MAGs) and single-cell amplified genomes (SAGs) studies have substantially improved the understanding of Dia microbial ecology and evolution; however, an updated thorough investigation on the genomic and evolutionary characteristics of Dia microorganisms distributed in geographically distinct environments has not been implemented. In this study, we analyzed available genomic data to unravel Dia evolutionary and metabolic traits. Based on the phylogeny of 16S rRNA genes retrieved from sixty-seven genomes, Dia microorganisms can be categorized into three groups, the terrestrial cluster that contains all Dehalococcoides and Dehalogenimonas strains, the marine cluster I, and the marine cluster II. These results reveal that a higher ratio of horizontally transferred genetic materials was found in the Dia marine clusters compared to that of the Dia terrestrial cluster. Pangenome analysis further suggests that Dia microorganisms have evolved cluster-specific enzymes (e.g., dehalogenase in terrestrial Dia, sulfite reductase in marine Dia) and biosynthesis capabilities (e.g., siroheme biosynthesis in marine Dia). Marine Dia microorganisms are likely adapted to versatile metabolisms for energy conservation besides organohalide respiration. The genomic differences between marine and terrestrial Dia may suggest distinct functions and roles in element cycling (e.g., carbon, sulfur, chlorine), which require interdisciplinary approaches to unravel the physiology and evolution of Dia in various environments.

RevDate: 2020-10-06

Kim HB, Kim E, Yang SM, et al (2020)

Development of Real-Time PCR Assay to Specifically Detect 22 Bifidobacterium Species and Subspecies Using Comparative Genomics.

Frontiers in microbiology, 11:2087.

Bifidobacterium species are used as probiotics to provide beneficial effects to humans. These effects are specific to some species or subspecies of Bifidobacterium. However, some Bifidobacterium species or subspecies are not distinguished because similarity of 16S rRNA and housekeeping gene sequences within Bifidobacterium species is very high. In this study, we developed a real-time polymerase chain reaction (PCR) assay to rapidly and accurately detect 22 Bifidobacterium species by selecting genetic markers using comparative genomic analysis. A total of 210 Bifidobacterium genome sequences were compared to select species- or subspecies-specific genetic markers. A phylogenetic tree based on pan-genomes generated clusters according to Bifidobacterium species or subspecies except that two strains were not grouped with their subspecies. Based on pan-genomes constructed, species- or subspecies-specific genetic markers were selected. The specificity of these markers was confirmed by aligning these genes against 210 genome sequences. Real-time PCR could detect 22 Bifidobacterium specifically. We constructed the criterion for quantification by standard curves. To further test the developed assay for commercial food products, we monitored 26 probiotic products and 7 dairy products. Real-time PCR results and labeling data were then compared. Most of these products (21/33, 63.6%) were consistent with their label claims. Some products labeled at species level only can be detected up to subspecies level through our developed assay.

RevDate: 2020-11-03

Harris LG, Bodger O, Post V, et al (2020)

Temporal Changes in Patient-Matched Staphylococcus epidermidis Isolates from Infections: towards Defining a 'True' Persistent Infection.

Microorganisms, 8(10):.

Staphylococcus epidermidis is found naturally on the skin but is a common cause of persistent orthopaedic device-related infections (ODRIs). This study used a pan-genome and gene-by-gene approach to analyse the clonality of whole genome sequences (WGS) of 115 S. epidermidis isolates from 55 patients with persistent ODRIs. Analysis of the 522 gene core genome revealed that the isolates clustered into three clades, and MLST analysis showed that 83% of the isolates belonged to clonal complex 2 (CC2). Analysis also found 13 isolate pairs had different MLST types and less than 70% similarity within the genes; hence, these were defined as re-infection by a different S. epidermidis strain. Comparison of allelic diversity in the remaining 102 isolates (49 patients) revealed that 6 patients had microevolved infections (>7 allele differences), and only 37 patients (77 isolates) had a 'true' persistent infection. Analysis of the core genomes of isolate pairs from 37 patients found 110/841 genes had variations; mainly in metabolism associated genes. The accessory genome consisted of 2936 genes; with an average size of 1515 genes. To conclude, this study demonstrates the advantage of using WGS for identifying the accuracy of a persistent infection diagnosis. Hence, persistent infections can be defined as 'true' persistent infections if the core genome of paired isolates has ≤7 allele differences; microevolved persistent infection if the paired isolates have >7 allele differences but same MLST type; and polyclonal if they are the same species but a different MLST type.

RevDate: 2020-10-12

Srivastava AK, Srivastava R, Sharma A, et al (2020)

Pan-genome analysis of Exiguobacterium reveals species delineation and genomic similarity with Exiguobacterium profundum PHM 11.

Environmental microbiology reports [Epub ahead of print].

The stint of the bacterial species is convoluting, but the new algorithms to calculate genome-to-genome distance (GGD) and DNA-DNA hybridization (DDH) for comparative genome analysis have rejuvenated the exploration of species and sub-species characterization. The present study reports the first whole genome sequence of Exiguobacterium profundum PHM11. PHM11 genome consist of ~ 2.92 Mb comprising 48 contigs, 47.93% G + C content. Functional annotations revealed a total of 3033 protein coding genes and 33 non-protein coding genes. Out of these, only 2316 could be characterized and others reported as hypothetical proteins. The comparative analysis of predicted proteome of PHM11 with five other Exiguobacterium sp. identified 3806 clusters, out of which the PHM11 shared a total of 2723 clusters having 1664 common clusters, 131 singletons and 928 distributed between five species. The pan-genome analysis of 70 different genomic sequences of Exigubacterium strains devoid of a species taxon was done on the basis of GGD and the DDH which identified eight genomes analogous to the PHM11 at species level and may be characterized as E. profundum. The ANI value and phylogenetic tree analysis also support the same. The results regarding pan-genome analysis provide a convincing insight for delineation of these eight strains to species.

RevDate: 2020-10-01

Patel M, Patel HM, Vohra N, et al (2020)

Complete genome sequencing and comparative genome characterization of the lignocellulosic biomass degrading bacterium Pseudomonas stutzeri MP4687 from cattle rumen.

Biotechnology reports (Amsterdam, Netherlands), 28:e00530.

We report the complete genome sequencing of novel Pseudomonas stutzeri strain MP4687 isolated from cattle rumen. Various strains of P. stutzeri have been reported from different environmental samples including oil-contaminated sites, crop roots, air, and human clinical samples, but not from rumen samples, which is being reported here for the first time. The genome of P. stutzeri MP4687 has a single replicon, 4.75 Mb chromosome and a G + C content of 63.45%. The genome encodes for 4,790 protein coding genes including 164 CAZymes and 345 carbohydrate processing genes. The isolate MP4687 harbors LCB hydrolyzing potential through endoglucanase (4.5 U/mL), xylanase (3.1 U/mL), β-glucosidase (3.3 U/mL) and β-xylosidase (1.9 U/mL) activities. The pangenome analysis further revealed that MP4687 has a very high number of unique genes (>2100) compared to other P. stutzeri genomes, which might have an important role in rumen functioning.

RevDate: 2020-10-01

Verma DK, Vasudeva G, Sidhu C, et al (2020)

Biochemical and Taxonomic Characterization of Novel Haloarchaeal Strains and Purification of the Recombinant Halotolerant α-Amylase Discovered in the Isolate.

Frontiers in microbiology, 11:2082.

Haloarchaea are salt-loving archaea and potential source of industrially relevant halotolerant enzymes. In the present study, three reddish-pink, extremely halophilic archaeal strains, namely wsp1 (wsp-water sample Pondicherry), wsp3, and wsp4, were isolated from the Indian Solar saltern. The phylogenetic analysis based on 16S rRNA gene sequences suggests that both wsp3 and wsp4 strains belong to Halogeometricum borinquense while wsp1 is closely related to Haloferax volcanii species. The comparative genomics revealed an open pangenome for both genera investigated here. Whole-genome sequence analysis revealed that these isolates have multiple copies of industrially/biotechnologically important unique genes and enzymes. Among these unique enzymes, for recombinant expression and purification, we selected four putative α-amylases identified in these three isolates. We successfully purified functional halotolerant recombinant Amy2, from wsp1 using pelB signal sequence-based secretion strategy using Escherichia coli as an expression host. This method may prove useful to produce functional haloarchaeal secretory recombinant proteins suitable for commercial or research applications. Biochemical analysis of Amy2 suggests the halotolerant nature of the enzyme having maximum enzymatic activity observed at 1 M NaCl. We also report the isolation and characterization of carotenoids purified from these isolates. This study highlights the presence of several industrially important enzymes in the haloarchaeal strains which may potentially have improved features like stability and salt tolerance suitable for industrial applications.

RevDate: 2020-10-27

Chen Y, Song W, Xie X, et al (2020)

A Collinearity-Incorporating Homology Inference Strategy for Connecting Emerging Assemblies in the Triticeae Tribe as a Pilot Practice in the Plant Pangenomic Era.

Molecular plant pii:S1674-2052(20)30314-2 [Epub ahead of print].

Plant genome sequencing has dramatically increased, and some species even have multiple high-quality reference versions. Demands for clade-specific homology inference and analysis have increased in the pangenomic era. Here we present a novel method, GeneTribe (, for homology inference among genetically similar genomes that incorporates gene collinearity and shows better performance than traditional sequence-similarity-based methods in terms of accuracy and scalability. The Triticeae tribe is a typical allopolyploid-rich clade with complex species relationships that includes many important crops, such as wheat, barley, and rye. We built Triticeae-GeneTribe (, a homology database, by integrating 12 Triticeae genomes and 3 outgroup model genomes and implemented versatile analysis and visualization functions. With macrocollinearity analysis, we were able to construct a refined model illustrating the structural rearrangements of the 4A-5A-7B chromosomes in wheat as two major translocation events. With collinearity analysis at both the macro- and microscale, we illustrated the complex evolutionary history of homologs of the wheat vernalization gene Vrn2, which evolved as a combined result of genome translocation, duplication, and polyploidization and gene loss events. Our work provides a useful practice for connecting emerging genome assemblies, with awareness of the extensive polyploidy in plants, and will help researchers efficiently exploit genome sequence resources.

RevDate: 2020-11-10

McCubbin T, Gonzalez-Garcia RA, Palfreyman RW, et al (2020)

A Pan-Genome Guided Metabolic Network Reconstruction of Five Propionibacterium Species Reveals Extensive Metabolic Diversity.

Genes, 11(10):.

Propionibacteria have been studied extensively since the early 1930s due to their relevance to industry and importance as human pathogens. Still, their unique metabolism is far from fully understood. This is partly due to their signature high GC content, which has previously hampered the acquisition of quality sequence data, the accurate annotation of the available genomes, and the functional characterization of genes. The recent completion of the genome sequences for several species has led researchers to reassess the taxonomical classification of the genus Propionibacterium, which has been divided into several new genres. Such data also enable a comparative genomic approach to annotation and provide a new opportunity to revisit our understanding of their metabolism. Using pan-genome analysis combined with the reconstruction of the first high-quality Propionibacterium genome-scale metabolic model and a pan-metabolic model of current and former members of the genus Propionibacterium, we demonstrate that despite sharing unique metabolic traits, these organisms have an unexpected diversity in central carbon metabolism and a hidden layer of metabolic complexity. This combined approach gave us new insights into the evolution of Propionibacterium metabolism and led us to propose a novel, putative ferredoxin-linked energy conservation strategy. The pan-genomic approach highlighted key differences in Propionibacterium metabolism that reflect adaptation to their environment. Results were mathematically captured in genome-scale metabolic reconstructions that can be used to further explore metabolism using metabolic modeling techniques. Overall, the data provide a platform to explore Propionibacterium metabolism and a tool for the rational design of strains.

RevDate: 2020-10-23

Feng Y, Fan X, Zhu L, et al (2020)

Phylogenetic and genomic analysis reveals high genomic openness and genetic diversity of Clostridium perfringens.

Microbial genomics, 6(10):.

Clostridium perfringens is associated with a variety of diseases in both humans and animals. Recent advances in genomic sequencing make it timely to re-visit this important pathogen. Although the genome sequence of C. perfringens was first determined in 2002, large-scale comparative genomics with isolates of different origins is still lacking. In this study, we used whole-genome sequencing of 45 C. perfringens isolates with isolation time spanning an 80-year period and performed comparative analysis of 173 genomes from worldwide strains. We also conducted phylogenetic lineage analysis and introduced an openness index (OI) to evaluate the openness of bacterial genomes. We classified all these genomes into five lineages and hypothesized that the origin of C. perfringens dates back to ~80 000 years ago. We showed that the pangenome of the 173 C. perfringens strains contained a total of 26 954 genes, while the core genome comprised 1020 genes, accounting for about a third of the genome of each isolate. We demonstrated that C. perfringens had the highest OI compared with 51 other bacterial species. Intact prophage sequences were found in nearly 70.0 % of C. perfringens genomes, while CRISPR sequences were found only in ~40.0 %. Plasmids were prevalent in C. perfringens isolates, and half of the virulence genes and antibiotic resistance genes (ARGs) identified in all the isolates could be found in plasmids. ARG-sharing network analysis showed that C. perfringens shared its 11 ARGs with 55 different bacterial species, and a high frequency of ARG transfer may have occurred between C. perfringens and species in the genera Streptococcus and Staphylococcus. Correlation analysis showed that the ARG number in C. perfringens strains increased with time, while the virulence gene number was relative stable. Our results, taken together with previous studies, revealed the high genome openness and genetic diversity of C. perfringens and provide a comprehensive view of the phylogeny, genomic features, virulence gene and ARG profiles of worldwide strains.

RevDate: 2020-10-23

Rautiainen M, T Marschall (2020)

GraphAligner: rapid and versatile sequence-to-graph alignment.

Genome biology, 21(1):253.

Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: and source code:

RevDate: 2020-09-24

Sánchez-Osuna M, Cortés P, Llagostera M, et al (2020)

Exploration into the origins and mobilization of di-hydrofolate reductase genes and the emergence of clinical resistance to trimethoprim.

Microbial genomics [Epub ahead of print].

Trimethoprim is a synthetic antibacterial agent that targets folate biosynthesis by competitively binding to the di-hydrofolate reductase enzyme (DHFR). Trimethoprim is often administered synergistically with sulfonamide, another chemotherapeutic agent targeting the di-hydropteroate synthase (DHPS) enzyme in the same pathway. Clinical resistance to both drugs is widespread and mediated by enzyme variants capable of performing their biological function without binding to these drugs. These mutant enzymes were assumed to have arisen after the discovery of these synthetic drugs, but recent work has shown that genes conferring resistance to sulfonamide were present in the bacterial pangenome millions of years ago. Here, we apply phylogenetics and comparative genomics methods to study the largest family of mobile trimethoprim-resistance genes (dfrA). We show that most of the dfrA genes identified to date map to two large clades that likely arose from independent mobilization events. In contrast to sulfonamide resistance (sul) genes, we find evidence of recurrent mobilization in dfrA genes. Phylogenetic evidence allows us to identify novel dfrA genes in the emerging pathogen Acinetobacter baumannii, and we confirm their resistance phenotype in vitro. We also identify a cluster of dfrA homologues in cryptic plasmid and phage genomes, but we show that these enzymes do not confer resistance to trimethoprim. Our methods also allow us to pinpoint the chromosomal origin of previously reported dfrA genes, and we show that many of these ancient chromosomal genes also confer resistance to trimethoprim. Our work reveals that trimethoprim resistance predated the clinical use of this chemotherapeutic agent, but that novel mutations have likely also arisen and become mobilized following its widespread use within and outside the clinic. Hence, this work confirms that resistance to novel drugs may already be present in the bacterial pangenome, and stresses the importance of rapid mobilization as a fundamental element in the emergence and global spread of resistance determinants.

RevDate: 2020-10-29

Jin L, Chen Y, Yang W, et al (2020)

Complete genome sequence of fish-pathogenic Aeromonas hydrophila HX-3 and a comparative analysis: insights into virulence factors and quorum sensing.

Scientific reports, 10(1):15479.

The gram-negative, aerobic, rod-shaped bacterium Aeromonas hydrophila, the causative agent of motile aeromonad septicaemia, has attracted increasing attention due to its high pathogenicity. Here, we constructed the complete genome sequence of a virulent strain, A. hydrophila HX-3 isolated from Pseudosciaena crocea and performed comparative genomics to investigate its virulence factors and quorum sensing features in comparison with those of other Aeromonas isolates. HX-3 has a circular chromosome of 4,941,513 bp with a 61.0% G + C content encoding 4483 genes, including 4318 protein-coding genes, and 31 rRNA, 127 tRNA and 7 ncRNA operons. Seventy interspersed repeat and 153 tandem repeat sequences, 7 transposons, 8 clustered regularly interspaced short palindromic repeats, and 39 genomic islands were predicted in the A. hydrophila HX-3 genome. Phylogeny and pan-genome were also analyzed herein to confirm the evolutionary relationships on the basis of comparisons with other fully sequenced Aeromonas genomes. In addition, the assembled HX-3 genome was successfully annotated against the Cluster of Orthologous Groups of proteins database (76.03%), Gene Ontology database (18.13%), and Kyoto Encyclopedia of Genes and Genome pathway database (59.68%). Two-component regulatory systems in the HX-3 genome and virulence factors profiles through comparative analysis were predicted, providing insights into pathogenicity. A large number of genes related to the AHL-type 1 (ahyI, ahyR), LuxS-type 2 (luxS, pfs, metEHK, litR, luxOQU) and QseBC-type 3 (qseB, qseC) autoinducer systems were also identified. As a result of the expression of the ahyI gene in Escherichia coli BL21 (DE3), combined UPLC-MS/MS profiling led to the identification of several new N-acyl-homoserine lactone compounds synthesized by AhyI. This genomic analysis determined the comprehensive QS systems of A. hydrophila, which might provide novel information regarding the mechanisms of virulence signatures correlated with QS.

RevDate: 2020-09-24

Fang X, Lloyd CJ, BO Palsson (2020)

Reconstructing organisms in silico: genome-scale models and their emerging applications.

Nature reviews. Microbiology pii:10.1038/s41579-020-00440-4 [Epub ahead of print].

Escherichia coli is considered to be the best-known microorganism given the large number of published studies detailing its genes, its genome and the biochemical functions of its molecular components. This vast literature has been systematically assembled into a reconstruction of the biochemical reaction networks that underlie E. coli's functions, a process which is now being applied to an increasing number of microorganisms. Genome-scale reconstructed networks are organized and systematized knowledge bases that have multiple uses, including conversion into computational models that interpret and predict phenotypic states and the consequences of environmental and genetic perturbations. These genome-scale models (GEMs) now enable us to develop pan-genome analyses that provide mechanistic insights, detail the selection pressures on proteome allocation and address stress phenotypes. In this Review, we first discuss the overall development of GEMs and their applications. Next, we review the evolution of the most complete GEM that has been developed to date: the E. coli GEM. Finally, we explore three emerging areas in genome-scale modelling of microbial phenotypes: collections of strain-specific models, metabolic and macromolecular expression models, and simulation of stress responses.

RevDate: 2020-10-30

Phanse Y, Wu CW, Venturino AJ, et al (2020)

A Protective Vaccine against Johne's Disease in Cattle.

Microorganisms, 8(9):.

Johne's disease (JD) caused by Mycobacterium avium subsp. paratuberculosis (M. paratuberculosis) is a chronic infection characterized by the development of granulomatous enteritis in wild and domesticated ruminants. It is one of the most significant livestock diseases not only in the USA but also globally, accounting for USD 200-500 million losses annually for the USA alone with potential link to cases of Crohn's disease in humans. Developing safe and protective vaccines is of a paramount importance for JD control in dairy cows. The current study evaluated the safety, immunity and protective efficacy of a novel live attenuated vaccine (LAV) candidate with and without an adjuvant in comparison to an inactivated vaccine. Results indicated that the LAV, irrespective of the adjuvant presence, induced robust T cell immune responses indicated by proinflammatory cytokine production such as IFN-γ, IFN-α, TNF-α and IL-17 as well as strong response to intradermal skin test against M. paratuberculosis antigens. Furthermore, the LAV was safe with minimal tissue pathology. Finally, calves vaccinated with adjuvanted LAV did not shed M. paratuberculosis post-challenge, a much-desired characteristic of an effective vaccine against JD. Together, this data suggests a strong potential of testing LAV in field trials to curb JD in dairy herds.

RevDate: 2020-10-01

Zhong C, Wang L, K Ning (2020)

Pan-genome study of Thermococcales reveals extensive genetic diversity and genetic evidence of thermophilic adaption.

Environmental microbiology [Epub ahead of print].

Thermococcales has a strong adaptability to extreme environments, which is of profound interest in explaining how complex life forms emerge on earth. However, their gene composition, thermal stability and evolution in hyperthermal environments are still little known. Here, we characterized the pan-genome architecture of 30 Thermococcales species to gain insight into their genetic properties, evolutionary patterns and specific metabolisms adapted to niches. We revealed an open pan-genome of Thermococcales comprising 6070 gene families that tend to increase with the availability of additional genomes. The genome contents of Thermococcales were flexible, with a series of genes experienced gene duplication, progressive divergence, or gene gain and loss events exhibiting distinct functional features. These archaea had concise types of heat shock proteins, such as HSP20, HSP60 and prefoldin, which were constrained by strong purifying selection that governed their conservative evolution. Furthermore, purifying selection forced genes involved in enzyme, motility, secretion system, defence system and chaperones to differ in functional constraints and their disparity in the rate of evolution may be related to adaptation to specific niche. These results deepened our understanding of genetic diversity and adaptation patterns of Thermococcales, and provided valuable research models for studying the metabolic traits of early life forms.

RevDate: 2020-10-30

Khan M, Stapleton F, Summers S, et al (2020)

Antibiotic Resistance Characteristics of Pseudomonas aeruginosa Isolated from Keratitis in Australia and India.

Antibiotics (Basel, Switzerland), 9(9):.

This study investigated genomic differences in Australian and Indian Pseudomonas aeruginosa isolates from keratitis (infection of the cornea). Overall, the Indian isolates were resistant to more antibiotics, with some of those isolates being multi-drug resistant. Acquired genes were related to resistance to fluoroquinolones, aminoglycosides, beta-lactams, macrolides, sulphonamides, and tetracycline and were more frequent in Indian (96%) than in Australian (35%) isolates (p = 0.02). Indian isolates had large numbers of gene variations (median 50,006, IQR = 26,967-50,600) compared to Australian isolates (median 26,317, IQR = 25,681-33,780). There were a larger number of mutations in the mutL and uvrD genes associated with the mismatch repair (MMR) system in Indian isolates, which may result in strains losing their efficacy for DNA repair. The number of gene variations were greater in isolates carrying MMR system genes or exoU. In the phylogenetic division, the number of core genes were similar in both groups, but Indian isolates had larger numbers of pan genes (median 6518, IQR = 6040-6935). Clones related to three different sequence types-ST308, ST316, and ST491-were found among Indian isolates. Only one clone, ST233, containing two strains was present in Australian isolates. The most striking differences between Australian and Indian isolates were carriage of exoU (that encodes a cytolytic phospholipase) in Indian isolates and exoS (that encodes for GTPase activator activity) in Australian isolates, large number of acquired resistance genes, greater changes to MMR genes, and a larger pan genome as well as increased overall genetic variation in the Indian isolates.

RevDate: 2020-10-02

Yin Z, Zhang S, Wei Y, et al (2020)

Horizontal Gene Transfer Clarifies Taxonomic Confusion and Promotes the Genetic Diversity and Pathogenicity of Plesiomonas shigelloides.

mSystems, 5(5):.

Plesiomonas shigelloides is an emerging pathogen that has been shown to be involved in gastrointestinal diseases and extraintestinal infections in humans. However, the taxonomic position, evolutionary dynamics, and pathogenesis of P. shigelloides remain unclear. We reported the draft genome sequences of 12 P. shigelloides strains representing different serogroups. We were able to determine a clear distinction between P. shigelloides and other members of Enterobacterales via core genome phylogeny, Neighbor-Net network, and average genome identity analysis. The pan-genome analysis of P. shigelloides revealed extensive genetic diversity and presented large flexible gene repertoires, while the core genome phylogeny exhibited a low level of clonality. The discordance between the core genome phylogeny and the pan-genome phylogeny indicated that flexible accessory genomes account for an important proportion of the evolution of P. shigelloides, which was subsequently characterized by determinations of hundreds of horizontally transferred genes (horizontal genes), massive gene expansions and contractions, and diverse mobile genetic elements (MGEs). The apparently high levels of horizontal gene transfer (HGT) in P. shigelloides were conferred from bacteria with novel properties from other taxa (mainly Vibrionaceae and Aeromonadaceae), which caused the historical taxonomic confusion and shaped the virulence gene pools. Furthermore, P. shigelloides genomes contain many macromolecular secretion system genes, virulence factor genes, and resistance genes, indicating its potential to cause intestinal and invasive infections. Collectively, our work provides insights into the phylogenetic position, evolutionary dynamic, and pathogenesis of P. shigelloides at the genomic level, which could facilitate the observation and research of this important pathogen.IMPORTANCE The taxonomic position of P. shigelloides has been the subject of debate for a long time, and until now, the evolutionary dynamics and pathogenesis of P. shigelloides were unclear. In this study, pan-genome analysis indicated extensive genetic diversity and the presence of large and variable gene repertoires. Our results revealed that horizontal gene transfer was the focal driving force for the genetic diversity of the P. shigelloides pan-genome and might have contributed to the emergence of novel properties. Vibrionaceae and Aeromonadaceae were found to be the predominant donor taxa for horizontal genes, which might have caused the taxonomic confusion historically. Comparative genomic analysis revealed the potential of P. shigelloides to cause intestinal and invasive diseases. Our results could advance the understanding of the evolution and pathogenesis of P. shigelloides, particularly in elucidating the role of horizontal gene transfer and investigating virulence-related elements.

RevDate: 2020-10-02

Ross DE, Marshall CW, Gulliver D, et al (2020)

Defining Genomic and Predicted Metabolic Features of the Acetobacterium Genus.

mSystems, 5(5):.

Acetogens are anaerobic bacteria capable of fixing CO2 or CO to produce acetyl coenzyme A (acetyl-CoA) and ultimately acetate using the Wood-Ljungdahl pathway (WLP). Acetobacterium woodii is the type strain of the Acetobacterium genus and has been critical for understanding the biochemistry and energy conservation in acetogens. Members of the Acetobacterium genus have been isolated from a variety of environments or have had genomes recovered from metagenome data, but no systematic investigation has been done on the unique and various metabolisms of the genus. To gain a better appreciation for the metabolic breadth of the genus, we sequenced the genomes of 4 isolates (A. fimetarium, A. malicum, A. paludosum, and A. tundrae) and conducted a comparative genome analysis (pan-genome) of 11 different Acetobacterium genomes. A unifying feature of the Acetobacterium genus is the carbon-fixing WLP. The methyl (cluster II) and carbonyl (cluster III) branches of the Wood-Ljungdahl pathway are highly conserved across all sequenced Acetobacterium genomes, but cluster I encoding the formate dehydrogenase is not. In contrast to A. woodii, all but four strains encode two distinct Rnf clusters, Rnf being the primary respiratory enzyme complex. Metabolism of fructose, lactate, and H2:CO2 was conserved across the genus, but metabolism of ethanol, methanol, caffeate, and 2,3-butanediol varied. Additionally, clade-specific metabolic potential was observed, such as amino acid transport and metabolism in the psychrophilic species, and biofilm formation in the A. wieringae clade, which may afford these groups an advantage in low-temperature growth or attachment to solid surfaces, respectively.IMPORTANCE Acetogens are anaerobic bacteria capable of fixing CO2 or CO to produce acetyl-CoA and ultimately acetate using the Wood-Ljungdahl pathway (WLP). This autotrophic metabolism plays a major role in the global carbon cycle and, if harnessed, can help reduce greenhouse gas emissions. Overall, the data presented here provide a framework for examining the ecology and evolution of the Acetobacterium genus and highlight the potential of these species as a source for production of fuels and chemicals from CO2 feedstocks.

RevDate: 2020-10-01

Chen Z, Erickson DL, J Meng (2020)

Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.

BMC genomics, 21(1):631.

BACKGROUND: We benchmarked the hybrid assembly approaches of MaSuRCA, SPAdes, and Unicycler for bacterial pathogens using Illumina and Oxford Nanopore sequencing by determining genome completeness and accuracy, antimicrobial resistance (AMR), virulence potential, multilocus sequence typing (MLST), phylogeny, and pan genome. Ten bacterial species (10 strains) were tested for simulated reads of both mediocre- and low-quality, whereas 11 bacterial species (12 strains) were tested for real reads.

RESULTS: Unicycler performed the best for achieving contiguous genomes, closely followed by MaSuRCA, while all SPAdes assemblies were incomplete. MaSuRCA was less tolerant of low-quality long reads than SPAdes and Unicycler. The hybrid assemblies of five antimicrobial-resistant strains with simulated reads provided consistent AMR genotypes with the reference genomes. The MaSuRCA assembly of Staphylococcus aureus with real reads contained msr(A) and tet(K), while the reference genome and SPAdes and Unicycler assemblies harbored blaZ. The AMR genotypes of the reference genomes and hybrid assemblies were consistent for the other five antimicrobial-resistant strains with real reads. The numbers of virulence genes in all hybrid assemblies were similar to those of the reference genomes, irrespective of simulated or real reads. Only one exception existed that the reference genome and hybrid assemblies of Pseudomonas aeruginosa with mediocre-quality long reads carried 241 virulence genes, whereas 184 virulence genes were identified in the hybrid assemblies of low-quality long reads. The MaSuRCA assemblies of Escherichia coli O157:H7 and Salmonella Typhimurium with mediocre-quality long reads contained 126 and 118 virulence genes, respectively, while 110 and 107 virulence genes were detected in their MaSuRCA assemblies of low-quality long reads, respectively. All approaches performed well in our MLST and phylogenetic analyses. The pan genomes of the hybrid assemblies of S. Typhimurium with mediocre-quality long reads were similar to that of the reference genome, while SPAdes and Unicycler were more tolerant of low-quality long reads than MaSuRCA for the pan-genome analysis. All approaches functioned well in the pan-genome analysis of Campylobacter jejuni with real reads.

CONCLUSIONS: Our research demonstrates the hybrid assembly pipeline of Unicycler as a superior approach for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.

RevDate: 2020-09-14

Psomopoulos FE, van Helden J, Médigue C, et al (2020)

Ancestral state reconstruction of metabolic pathways across pangenome ensembles.

Microbial genomics [Epub ahead of print].

As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.

RevDate: 2020-09-28

Gardon H, Biderre-Petit C, Jouan-Dufournel I, et al (2020)

A drift-barrier model drives the genomic landscape of a structured bacterial population.

Molecular ecology [Epub ahead of print].

Bacterial populations differentiate over time and space to form distinct genetic units. The mechanisms governing this diversification are presumed to result from the ecological context of living units to adapt to specific niches. Recently, a model assuming the acquisition of advantageous genes among populations rather than whole genome sweeps has emerged to explain population differentiation. However, the characteristics of these exchanged, or flexible, genes and whether their evolution is driven by adaptive or neutral processes remain controversial. By analysing the flexible genome of single-amplified genomes of co-occurring populations of the marine Prochlorococcus HLII ecotype, we highlight that genomic compartments - rather than population units - are characterized by different evolutionary trajectories. The dynamics of gene fluxes vary across genomic compartments and therefore the effectiveness of selection depends on the fluctuation of the effective population size along the genome. Taken together, these results support the drift-barrier model of bacterial evolution.

RevDate: 2020-09-28

Christian RW, Hewitt SL, Nelson G, et al (2020)

Plastid transit peptides-where do they come from and where do they all belong? Multi-genome and pan-genomic assessment of chloroplast transit peptide evolution.

PeerJ, 8:e9772.

Subcellular relocalization of proteins determines an organism's metabolic repertoire and thereby its survival in unique evolutionary niches. In plants, the plastid and its various morphotypes import a large and varied number of nuclear-encoded proteins to orchestrate vital biochemical reactions in a spatiotemporal context. Recent comparative genomics analysis and high-throughput shotgun proteomics data indicate that there are a large number of plastid-targeted proteins that are either semi-conserved or non-conserved across different lineages. This implies that homologs are differentially targeted across different species, which is feasible only if proteins have gained or lost plastid targeting peptides during evolution. In this study, a broad, multi-genome analysis of 15 phylogenetically diverse genera and in-depth analyses of pangenomes from Arabidopsis and Brachypodium were performed to address the question of how proteins acquire or lose plastid targeting peptides. The analysis revealed that random insertions or deletions were the dominant mechanism by which novel transit peptides are gained by proteins. While gene duplication was not a strict requirement for the acquisition of novel subcellular targeting, 40% of novel plastid-targeted genes were found to be most closely related to a sequence within the same genome, and of these, 30.5% resulted from alternative transcription or translation initiation sites. Interestingly, analysis of the distribution of amino acids in the transit peptides of known and predicted chloroplast-targeted proteins revealed monocot and eudicot-specific preferences in residue distribution.

RevDate: 2020-09-28

Zhang X, Li F, Cui S, et al (2020)

Prevalence and Distribution Characteristics of blaKPC-2 and blaNDM-1 Genes in Klebsiella pneumoniae.

Infection and drug resistance, 13:2901-2910.

Background: Carbapenem-resistant Klebsiella pneumoniae infections have caused major concern and posed a global threat to public health. As blaKPC-2 and blaNDM-1 genes are the most widely reported carbapenem resistant genes in K. pneumonia, it is crucial to study the prevalence and geographical distribution of these two genes for further understanding of their transmission mode and mechanism.

Purpose: Here, we investigated the prevalence and distribution of blaKPC-2 and blaNDM-1 genes in carbapenem-resistant K. pneumoniae strains from a tertiary hospital and from 1579 genomes available in the NCBI database, and further analyzed the possible core structure of blaKPC-2 or blaNDM-1 genes among global genome data.

Materials and Methods: K. pneumoniae strains from a tertiary hospital in China during 2013-2018 were collected and their antimicrobial susceptibility testing for 28 antibiotics was determined. Whole-genome sequencing of carbapenem-resistant K. pneumoniae strains was used to investigate the genetic characterization. The phylogenetic relationships of these strains were investigated through pan-genome analysis. The epidemiology and distribution of blaKPC-2 and blaNDM-1 genes in K. pneumoniae based on 1579 global genomes and carbapenem-resistant K. pneumoniae strains from hospital were analyzed using bioinformatics. The possible core structure carrying blaKPC-2 or blaNDM-1 genes was investigated among global data.

Results: A total of 19 carbapenem-resistant K. pneumoniae were isolated in a tertiary hospital. All isolates had a multi-resistant pattern and eight kinds of resistance genes. The phylogenetic analysis showed all isolates in the hospital were dominated by two lineages composed of ST11 and ST25, respectively. ST11 and ST25 were the major ST type carrying blaKPC-2 and blaNDM-1 genes, respectively. Among 1579 global genomes data, 147 known ST types (1195 genomes) have been identified, while ST258 (23.6%) and ST11 (22.1%) were the globally prevalent clones among the known ST types. Genetic environment analysis showed that the ISKpn7-dnaA/ISKpn27 -blaKPC-2-ISkpn6 and blaNDM-1-ble-trpf-nagA may be the core structure in the horizontal transfer of blaKPC-2 and blaNDM-1 , respectively. In addition, DNA transferase (hin) may be involved in the horizontal transfer or the expression of blaNDM-1 .

Conclusion: There was clonal transmission of carbapenem-resistant K. pneumoniae in the tertiary hospital in China. The prevalence and distribution of blaKPC-2 and blaNDM-1 varied by countries and were driven by different transposons carrying the core structure. This study shed light on the genetic environment of blaKPC-2 and blaNDM-1 and offered basic information about the mechanism of carbapenem-resistant K. pneumoniae dissemination.

RevDate: 2020-09-09

Liu Y, Z Tian (2020)

From one linear genome to a graph-based pan-genome: a new era for genomics.

Science China. Life sciences pii:10.1007/s11427-020-1808-0 [Epub ahead of print].

RevDate: 2020-09-09

González-Dominici LI, Saati-Santamaría Z, P García-Fraile (2020)

Genome Analysis and Genomic Comparison of the Novel Species Arthrobacter ipsi Reveal Its Potential Protective Role in Its Bark Beetle Host.

Microbial ecology pii:10.1007/s00248-020-01593-8 [Epub ahead of print].

The pine engraver beetle, Ips acuminatus Gyll, is a bark beetle that causes important damages in Scots pine (Pinus sylvestris) forests and plantations. As almost all higher organisms, Ips acuminatus harbours a microbiome, although the role of most members of its microbiome is not well understood. As part of a work in which we analysed the bacterial diversity associated to Ips acuminatus, we isolated the strain Arthrobacter sp. IA7. In order to study its potential role within the bark beetle holobiont, we sequenced and explored its genome and performed a pan-genome analysis of the genus Arthrobacter, showing specific genes of strain IA7 that might be related with its particular role in its niche. Based on these investigations, we suggest several potential roles of the bacterium within the beetle. Analysis of genes related to secondary metabolism indicated potential antifungal capability, confirmed by the inhibition of several entomopathogenic fungal strains (Metarhizium anisopliae CCF0966, Lecanicillium muscarium CCF6041, L. muscarium CCF3297, Isaria fumosorosea CCF4401, I. farinosa CCF4808, Beauveria bassiana CCF4422 and B. brongniartii CCF1547). Phylogenetic analyses of the 16S rRNA gene, six concatenated housekeeping genes (tuf-secY-rpoB-recA-fusA-atpD) and genome sequences indicated that strain IA7 is closely related to A. globiformis NBRC 12137T but forms a new species within the genus Arthrobacter; this was confirmed by digital DNA-DNA hybridization (37.10%) and average nucleotide identity (ANIb) (88.9%). Based on phenotypic and genotypic features, we propose strain IA7T as the novel species Arthrobacter ipsi sp. nov. (type strain IA7T = CECT 30100T = LMG 31782T) and suggest its protective role for its host.

RevDate: 2020-10-28
CmpDate: 2020-10-28

Boisen N, Østerlund MT, Joensen KG, et al (2020)

Redefining enteroaggregative Escherichia coli (EAEC): Genomic characterization of epidemiological EAEC strains.

PLoS neglected tropical diseases, 14(9):e0008613.

Although enteroaggregative E. coli (EAEC) has been implicated as a common cause of diarrhea in multiple settings, neither its essential genomic nature nor its role as an enteric pathogen are fully understood. The current definition of this pathotype requires demonstration of cellular adherence; a working molecular definition encompasses E. coli which do not harbor the heat-stable or heat-labile toxins of enterotoxigenic E. coli (ETEC) and harbor the genes aaiC, aggR, and/or aatA. In an effort to improve the definition of this pathotype, we report the most definitive characterization of the pan-genome of EAEC to date, applying comparative genomics and functional characterization on a collection of 97 EAEC strains isolated in the course of a multicenter case-control diarrhea study (Global Enteric Multi-Center Study, GEMS). Genomic analysis revealed that the EAEC strains mapped to all phylogenomic groups of E. coli. Circa 70% of strains harbored one of the five described AAF variants; there were no additional AAF variants identified, and strains that lacked an identifiable AAF generally did not have an otherwise complete AggR regulon. An exception was strains that harbored an ETEC colonization factor (CF) CS22, like AAF a member of the chaperone-usher family of adhesins, but not phylogenetically related to the AAF family. Of all genes scored, sepA yielded the strongest association with diarrhea (P = 0.002) followed by the increased serum survival gene, iss (p = 0.026), and the outer membrane protease gene ompT (p = 0.046). Notably, the EAEC genomes harbored several genes characteristically associated with other E. coli pathotypes. Our data suggest that a molecular definition of EAEC could comprise E. coli strains harboring AggR and a complete AAF(I-V) or CS22 gene cluster. Further, it is possible that strains meeting this definition could be both enteric bacteria and urinary/systemic pathogens.

RevDate: 2020-09-07

Bonnici V, Maresi E, R Giugno (2020)

Challenges in gene-oriented approaches for pangenome content discovery.

Briefings in bioinformatics pii:5901976 [Epub ahead of print].

Given a group of genomes, represented as the sets of genes that belong to them, the discovery of the pangenomic content is based on the search of genetic homology among the genes for clustering them into families. Thus, pangenomic analyses investigate the membership of the families to the given genomes. This approach is referred to as the gene-oriented approach in contrast to other definitions of the problem that takes into account different genomic features. In the past years, several tools have been developed to discover and analyse pangenomic contents. Because of the hardness of the problem, each tool applies a different strategy for discovering the pangenomic content. This results in a differentiation of the performance of each tool that depends on the composition of the input genomes. This review reports the main analysis instruments provided by the current state of the art tools for the discovery of pangenomic contents. Moreover, unlike previous works, the presented study compares pangenomic tools from a methodological perspective, analysing the causes that lead a given methodology to outperform other tools. The analysis is performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters. The benchmarks used to compare the pangenomic tools, in addition to the computational pipeline developed for this purpose, are available at Contact: V. Bonnici, R. Giugno Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

RevDate: 2020-09-03

Zhu Z, Wang L, Qian H, et al (2020)

Comparative genome analysis of 12 Shigella sonnei strains: virulence, resistance, and their interactions.

International microbiology : the official journal of the Spanish Society for Microbiology pii:10.1007/s10123-020-00145-x [Epub ahead of print].

Shigellosis is a highly infectious disease that is mainly transmitted via fecal-oral contact of the bacteria Shigella. Four species have been identified in Shigella genus, among which Shigella flexneri is used to be the most prevalent species globally and commonly isolated from developing countries. However, it is being replaced by Shigella sonnei that is currently the main causative agent for dysentery pandemic in many emerging industrialized countries such as Asia and the Middle East. For a better understanding of S. sonnei virulence and antibiotic resistance, we sequenced 12 clinical S. sonnei strains with varied antibiotic-resistance profiles collected from four cities in Jiangsu Province, China. Phylogenomic analysis clustered antibiotic-sensitive and resistant S. sonnei into two distinct groups while pan-genome analysis reveals the presence and absence of unique genes in each group. Screening of 31 classes of virulence factors found out that type 2 secretion system is doubled in resistant strains. Further principle component analysis based on the interactions between virulence and resistance indicated that abundant virulence factors are associated with higher levels of antibiotic resistance. The result present here is based on statistical analysis of a small sample size and serves basically as a guidance for further experimental and theoretical studies.

RevDate: 2020-11-08

Muñoz-Ramirez ZY, Pascoe B, Mendez-Tenorio A, et al (2020)

A 500-year tale of co-evolution, adaptation, and virulence: Helicobacter pylori in the Americas.

The ISME journal pii:10.1038/s41396-020-00758-0 [Epub ahead of print].

Helicobacter pylori is a common component of the human stomach microbiota, possibly dating back to the speciation of Homo sapiens. A history of pathogen evolution in allopatry has led to the development of genetically distinct H. pylori subpopulations, associated with different human populations, and more recent admixture among H. pylori subpopulations can provide information about human migrations. However, little is known about the degree to which some H. pylori genes are conserved in the face of admixture, potentially indicating host adaptation, or how virulence genes spread among different populations. We analyzed H. pylori genomes from 14 countries in the Americas, strains from the Iberian Peninsula, and public genomes from Europe, Africa, and Asia, to investigate how admixture varies across different regions and gene families. Whole-genome analyses of 723 H. pylori strains from around the world showed evidence of frequent admixture in the American strains with a complex mosaic of contributions from H. pylori populations originating in the Americas as well as other continents. Despite the complex admixture, distinctive genomic fingerprints were identified for each region, revealing novel American H. pylori subpopulations. A pan-genome Fst analysis showed that variation in virulence genes had the strongest fixation in America, compared with non-American populations, and that much of the variation constituted non-synonymous substitutions in functional domains. Network analyses suggest that these virulence genes have followed unique evolutionary paths in the American populations, spreading into different genetic backgrounds, potentially contributing to the high risk of gastric cancer in the region.

RevDate: 2020-11-10

Carroll LM, Huisman JS, M Wiedmann (2020)

Twentieth-century emergence of antimicrobial resistant human- and bovine-associated Salmonella enterica serotype Typhimurium lineages in New York State.

Scientific reports, 10(1):14428.

Salmonella enterica serotype Typhimurium (S. Typhimurium) boasts a broad host range and can be transmitted between livestock and humans. While members of this serotype can acquire resistance to antimicrobials, the temporal dynamics of this acquisition is not well understood. Using New York State (NYS) and its dairy cattle farms as a model system, 87 S. Typhimurium strains isolated from 1999 to 2016 from either human clinical or bovine-associated sources in NYS were characterized using whole-genome sequencing. More than 91% of isolates were classified into one of four major lineages, two of which were largely susceptible to antimicrobials but showed sporadic antimicrobial resistance (AMR) gene acquisition, and two that were largely multidrug-resistant (MDR). All four lineages clustered by presence and absence of elements in the pan-genome. The two MDR lineages, one of which resembled S. Typhimurium DT104, were predicted to have emerged circa 1960 and 1972. The two largely susceptible lineages emerged earlier, but showcased sporadic AMR determinant acquisition largely after 1960, including acquisition of cephalosporin resistance-conferring genes after 1985. These results confine the majority of AMR acquisition events in NYS S. Typhimurium to the twentieth century, largely within the era of antibiotic usage.

RevDate: 2020-09-29
CmpDate: 2020-09-29

Bellas CM, Schroeder DC, Edwards A, et al (2020)

Flexible genes establish widespread bacteriophage pan-genomes in cryoconite hole ecosystems.

Nature communications, 11(1):4403.

Bacteriophage genomes rapidly evolve via mutation and horizontal gene transfer to counter evolving bacterial host defenses; such arms race dynamics should lead to divergence between phages from similar, geographically isolated ecosystems. However, near-identical phage genomes can reoccur over large geographical distances and several years apart, conversely suggesting many are stably maintained. Here, we show that phages with near-identical core genomes in distant, discrete aquatic ecosystems maintain diversity by possession of numerous flexible gene modules, where homologous genes present in the pan-genome interchange to create new phage variants. By repeatedly reconstructing the core and flexible regions of phage genomes from different metagenomes, we show a pool of homologous gene variants co-exist for each module in each location, however, the dominant variant shuffles independently in each module. These results suggest that in a natural community, recombination is the largest contributor to phage diversity, allowing a variety of host recognition receptors and genes to counter bacterial defenses to co-exist for each phage.

RevDate: 2020-09-03
CmpDate: 2020-09-03

Alam I, Kamau AA, Kulmanov M, et al (2020)

Functional Pangenome Analysis Shows Key Features of E Protein Are Preserved in SARS and SARS-CoV-2.

Frontiers in cellular and infection microbiology, 10:405.

The spread of the novel coronavirus (SARS-CoV-2) has triggered a global emergency, that demands urgent solutions for detection and therapy to prevent escalating health, social, and economic impacts. The spike protein (S) of this virus enables binding to the human receptor ACE2, and hence presents a prime target for vaccines preventing viral entry into host cells. The S proteins from SARS and SARS-CoV-2 are similar, but structural differences in the receptor binding domain (RBD) preclude the use of SARS-specific neutralizing antibodies to inhibit SARS-CoV-2. Here we used comparative pangenomic analysis of all sequenced reference Betacoronaviruses, complemented with functional and structural analyses. This analysis reveals that, among all core gene clusters present in these viruses, the envelope protein E shows a variant cluster shared by SARS and SARS-CoV-2 with two completely-conserved key functional features, namely an ion-channel, and a PDZ-binding motif (PBM). These features play a key role in the activation of the inflammasome causing the acute respiratory distress syndrome, the leading cause of death in SARS and SARS-CoV-2 infections. Together with functional pangenomic analysis, mutation tracking, and previous evidence, on E protein as a determinant of pathogenicity in SARS, we suggest E protein as an alternative therapeutic target to be considered for further studies to reduce complications of SARS-CoV-2 infections in COVID-19.

RevDate: 2020-09-28

Kumar R, Bröms JE, A Sjöstedt (2020)

Exploring the Diversity Within the Genus Francisella - An Integrated Pan-Genome and Genome-Mining Approach.

Frontiers in microbiology, 11:1928.

Pan-genome analysis is a powerful method to explore genomic heterogeneity and diversity of bacterial species. Here we present a pan-genome analysis of the genus Francisella, comprising a dataset of 63 genomes and encompassing clinical as well as environmental isolates from distinct geographic locations. To determine the evolutionary relationship within the genus, we performed phylogenetic whole-genome studies utilizing the average nucleotide identity, average amino acid identity, core genes and non-recombinant loci markers. Based on the analyses, the phylogenetic trees obtained identified two distinct clades, A and B and a diverse cluster designated C. The sizes of the pan-, core-, cloud-, and shell-genomes of Francisella were estimated and compared to those of two other facultative intracellular pathogens, Legionella and Piscirickettsia. Francisella had the smallest core-genome, 692 genes, compared to 886 and 1,732 genes for Legionella and Piscirickettsia respectively, while the pan-genome of Legionella was more than twice the size of that of the other two genera. Also, the composition of the Francisella Type VI secretion system (T6SS) was analyzed. Distinct differences in the gene content of the T6SS were identified. In silico approaches performed to identify putative substrates of these systems revealed potential effectors targeting the cell wall, inner membrane, cellular nucleic acids as well as proteins, thus constituting attractive targets for site-directed mutagenesis. The comparative analysis performed here provides a comprehensive basis for the assessment of the phylogenomic relationship of members of the genus Francisella and for the identification of putative T6SS virulence traits.

RevDate: 2020-09-28

Bannantine JP, Conde C, Bayles DO, et al (2020)

Genetic Diversity Among Mycobacterium avium Subspecies Revealed by Analysis of Complete Genome Sequences.

Frontiers in microbiology, 11:1701.

Mycobacterium avium comprises four subspecies that contain both human and veterinary pathogens. At the inception of this study, twenty-eight M. avium genomes had been annotated as RefSeq genomes, facilitating direct comparisons. These genomes represent strains from around the world and provided a unique opportunity to examine genome dynamics in this species. Each genome was confirmed to be classified correctly based on SNP genotyping, nucleotide identity and presence/absence of repetitive elements or other typing methods. The Mycobacterium avium subspecies paratuberculosis (Map) genome size and organization was remarkably consistent, averaging 4.8 Mb with a variance of only 29.6 kb among the 13 strains. Comparing recombination events along with the larger genome size and variance observed among Mycobacterium avium subspecies avium (Maa) and Mycobacterium avium subspecies hominissuis (Mah) strains (collectively termed non-Map) suggests horizontal gene transfer occurs in non-Map, but not in Map strains. Overall, M. avium subspecies could be divided into two major sub-divisions, with the Map type II (bovine strains) clustering tightly on one end of a phylogenetic spectrum and Mah strains clustering more loosely together on the other end. The most evolutionarily distinct Map strain was an ovine strain, designated Telford, which had >1,000 SNPs and showed large rearrangements compared to the bovine type II strains. The Telford strain clustered with Maa strains as an intermediate between Map type II and Mah. SNP analysis and genome organization analyses repeatedly demonstrated the conserved nature of Map versus the mosaic nature of non-Map M. avium strains. Finally, core and pangenomes were developed for Map and non-Map strains. A total of 80% Map genes belonged to the Map core genome, while only 40% of non-Map genes belonged to the non-Map core genome. These genomes provide a more complete and detailed comparison of these subspecies strains as well as a blueprint for how genetic diversity originated.

RevDate: 2020-09-28

Costa SS, Guimarães LC, Silva A, et al (2020)

First Steps in the Analysis of Prokaryotic Pan-Genomes.

Bioinformatics and biology insights, 14:1177932220938064.

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.

RevDate: 2020-10-12

Zhou L, Zhang T, Tang S, et al (2020)

Pan-genome analysis of Paenibacillus polymyxa strains reveals the mechanism of plant growth promotion and biocontrol.

Antonie van Leeuwenhoek, 113(11):1539-1558.

Rapid development of gene sequencing technologies has led to an exponential increase in microbial sequencing data. Genome research of a single organism does not capture the changes in the characteristics of genetic information within a species. Pan-genome analysis gives us a broader perspective to study the complete genetic information of a species. Paenibacillus polymyxa is a Gram-positive bacterium and an important plant growth-promoting rhizobacterium with the ability to produce multiple antibiotics, such as fusaricidin, lantibiotic, paenilan, and polymyxin. Our study explores the pan-genome of 14 representative P. polymyxa strains isolated from around the world. Heap's law model and curve fitting confirmed an open pan-genome of P. polymyxa. The phylogenetic and collinearity analyses reflected that the evolutionary classification of P. polymyxa strains are not associated with geographical area and ecological niches. Few genes related to phytohormone synthesis and phosphate solubilization were conserved; however, the nif cluster gene associated with nitrogen fixation exists only in some strains. This finding is indicative of nitrogen fixing ability is not stable in P. polymyxa. Analysis of antibiotic gene clusters in P. polymyxa revealed the presence of these genes in both core and accessory genomes. This observation indicates that the difference in living environment led to loss of ability to synthesize antibiotics in some strains. The current pan-genomic analysis of P. polymyxa will help us understand the mechanisms of biological control and plant growth promotion. It will also promote the use of P. polymyxa in agriculture.

RevDate: 2020-11-03
CmpDate: 2020-11-03

Ouyabe M, Tanaka N, Shiwa Y, et al (2020)

Rhizobium dioscoreae sp. nov., a plant growth-promoting bacterium isolated from yam (Dioscorea species).

International journal of systematic and evolutionary microbiology, 70(9):5054-5062.

This study investigated endophytic nitrogen-fixing bacteria isolated from two species of yam (water yam, Dioscorea alata L.; lesser yam, Dioscorea esculenta L.) grown in nutrient-poor alkaline soil conditions on Miyako Island, Okinawa, Japan. Two bacterial strains of the genus Rhizobium, S-93T and S-62, were isolated. The phylogenetic tree, based on the almost-complete 16S rRNA gene sequences (1476 bp for each strain), placed them in a distinct clade, with Rhizobium miluonense CCBAU 41251T, Rhizobium hainanense I66T, Rhizobium multihospitium HAMBI 2975T, Rhizobium freirei PRF 81T and Rhizobium tropici CIAT 899T being their closest species. Their bacterial fatty acid profile, with major components of C19 : 0 cyclo ω8c and summed feature 8, as well as other phenotypic characteristics and DNA G+C content (59.65 mol%) indicated that the novel strains belong to the genus Rhizobium. Pairwise average nucleotide identity analyses separated the novel strains from their most closely related species with similarity values of 90.5, 88.9, 88.5, 84.5 and 84.4 % for R. multihospitium HAMBI 2975T, R. tropici CIAT 899T, R. hainanense CCBAU 57015T, R. miluonense HAMBI 2971T and R. freirei PRF 81T, respectively; digital DNA-DNA hybridization values were in the range of 26-42 %. Considering the phenotypic characteristics as well as the genomic data, it is suggested that strains S-93T and S-62 represent a new species, for which the name Rhizobium dioscoreae is proposed. The type strain is S-93T (=NRIC 0988T=NBRC 114257T=DSM 110498T).

RevDate: 2020-08-18

Clawson ML, Schuller G, Dickey AM, et al (2020)

Differences between predicted outer membrane proteins of genotype 1 and 2 Mannheimia haemolytica.

BMC microbiology, 20(1):250.

BACKGROUND: Mannheimia haemolytica strains isolated from North American cattle have been classified into two genotypes (1 and 2). Although members of both genotypes have been isolated from the upper and lower respiratory tracts of cattle with or without bovine respiratory disease (BRD), genotype 2 strains are much more frequently isolated from diseased lungs than genotype 1 strains. The mechanisms behind the increased association of genotype 2 M. haemolytica with BRD are not fully understood. To address that, and to search for interventions against genotype 2 M. haemolytica, complete, closed chromosome assemblies for 35 genotype 1 and 34 genotype 2 strains were generated and compared. Searches were conducted for the pan genome, core genes shared between the genotypes, and for genes specific to either genotype. Additionally, genes encoding outer membrane proteins (OMPs) specific to genotype 2 M. haemolytica were identified, and the diversity of their protein isoforms was characterized with predominantly unassembled, short-read genomic sequences for up to 1075 additional strains.

RESULTS: The pan genome of the 69 sequenced M. haemolytica strains consisted of 3111 genes, of which 1880 comprised a shared core between the genotypes. A core of 112 and 179 genes or gene variants were specific to genotype 1 and 2, respectively. Seven genes encoding predicted OMPs; a peptidase S6, a ligand-gated channel, an autotransporter outer membrane beta-barrel domain-containing protein (AOMB-BD-CP), a porin, and three different trimeric autotransporter adhesins were specific to genotype 2 as their genotype 1 homologs were either pseudogenes, or not detected. The AOMB-BD-CP gene, however, appeared to be truncated across all examined genotype 2 strains and to likely encode dysfunctional protein. Homologous gene sequences from additional M. haemolytica strains confirmed the specificity of the remaining six genotype 2 OMP genes and revealed they encoded low isoform diversity at the population level.

CONCLUSION: Genotype 2 M. haemolytica possess genes encoding conserved OMPs not found intact in more commensally prone genotype 1 strains. Some of the genotype 2 specific genes identified in this study are likely to have important biological roles in the pathogenicity of genotype 2 M. haemolytica, which is the primary bacterial cause of BRD.

RevDate: 2020-09-28

Xu S, Cheng J, Meng X, et al (2020)

Complete Genome and Comparative Genome Analysis of Lactobacillus reuteri YSJL-12, a Potential Probiotics Strain Isolated From Healthy Sow Fresh Feces.

Evolutionary bioinformatics online, 16:1176934320942192.

Lactobacillus reuteri YSJL-12 was isolated from healthy sow fresh feces and used as probiotics additives previously. To investigate the genetic basis on probiotic potential and identify the genes in the strain, the complete genome of YSJL-12 was sequenced. Then comparative genome analysis on 9 strains of Lactobacillus reuteri was performed. The genome of YSJL-12 consisted of a circular 2,084,748 bp chromosome and 2 circular plasmids (51,906 and 15,134 bp). From among the 2065 protein-coding sequences (CDSs), the genes resistant to the environmental stress were identified. The function of COG (Clusters of Orthologous Group) protein genes was predicted, and the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were analyzed. The comparative genome analysis indicated that the pan-genome contained a core genome of 1257 orthologous gene clusters, an accessory genome of 1064 orthologous gene clusters, and 1148 strain-specific genes, and the antibacterial mechanism among Lactobacillus reuteri strains might be different. The phylogenetic analysis and genomic collinearity revealed that the phylogenetic relationship among 9 strains of Lactobacillus reuteri was connected with host species and showed host specificity. The research could help us to better predict genes function and understand genetic basis on adapting to host gut in Lactobacillus reuteri YSJL-12.

RevDate: 2020-08-11

Bernardes JS, Eberle RJ, Vieira FRJ, et al (2020)

A comparative pan-genomic analysis of 53 C. pseudotuberculosis strains based on functional domains.

Journal of biomolecular structure & dynamics [Epub ahead of print].

Corynebacterium pseudotuberculosis is a pathogenic bacterium with great veterinary and economic importance. It is classified into two biovars: ovis, nitrate-negative, that causes lymphadenitis in small ruminants and equi, nitrate-positive, causing ulcerative lymphangitis in equines. With the explosive growth of available genomes of several strains, pan-genome analysis has opened new opportunities for understanding the dynamics and evolution of C. pseudotuberculosis. However, few pan-genomic studies have compared biovars equi and ovis. Such studies have considered a reduced number of strains and compared entire genomes. Here we conducted an original pan-genome analysis based on protein sequences and their functional domains. We considered 53 C. pseudotuberculosis strains from both biovars isolated from different hosts and countries. We have analysed conserved domains, common domains more frequently found in each biovar and biovar-specific (unique) domains. Our results demonstrated that biovar equi is more variable; there is a significant difference in the number of proteins per strains, probably indicating the occurrence of more gene loss/gain events. Moreover, strains of biovar equi presented a higher number of biovar-specific domains, 77 against only eight in biovar ovis, most of them are associated with virulence mechanisms. With this domain analysis, we have identified functional differences among strains of biovars ovis and equi that could be related to niche-adaptation and probably help to better understanding mechanisms of virulence and pathogenesis. The distribution patterns of functional domains identified in this work might have impacts on bacterial physiology and lifestyle, encouraging the development of new diagnoses, vaccines, and treatments for C. pseudotuberculosis diseases. Communicated by Ramaswamy H. Sarma.

RevDate: 2020-09-21

Pan Y, Awan F, Zhenbao M, et al (2020)

Preliminary view of the global distribution and spread of the tet(X) family of tigecycline resistance genes.

The Journal of antimicrobial chemotherapy, 75(10):2797-2803.

BACKGROUND: The emergence of plasmid-mediated tet(X3)/tet(X4) genes is threatening the role of tigecycline as a last-resort antibiotic to treat clinical infections caused by XDR bacteria. Considering the possible public health threat posed by tet(X) and its variants [which we collectively call 'tet(X) genes' in this study], global monitoring and surveillance are urgently required.

OBJECTIVES: Here we conducted a worldwide survey of the global distribution and spread of tet(X) genes.

METHODS: We analysed a comprehensive dataset of bacterial genomes in conjunction with surveillance data from our laboratory and the NCBI database, as well as sufficient metadata to characterize the results.

RESULTS: The global distribution features of tet(X) genes were revealed. We clustered three types of genetic backbones of tet(X) genes embedded or transferred in bacterial genomes. Our pan-genome analyses revealed a large genetic pool composed of tet(X)-carrying sequences. Moreover, phylogenetic trees of tet(X) genes and tet(X)-like proteins were built.

CONCLUSIONS: To the best of our knowledge, our results provide the first view of the global distribution of tet(X) genes, demonstrate the features of tet(X)-carrying fragments and highlight the possible evolution of tigecycline-inactivation enzymes in diverse bacterial species and habitats.

RevDate: 2020-08-08

Santos DDS, Calaça PRA, Porto ALF, et al (2020)

What Differentiates Probiotic from Pathogenic Bacteria? The Genetic Mobility of Enterococcus faecium Offers New Molecular Insights.

Omics : a journal of integrative biology [Epub ahead of print].

Enterococcus faecium is a lactic acid bacterium with applications in food engineering and nutrigenomics, including as starter cultures in fermented foods. To differentiate the E. faecium probiotic from pathogenic bacteria, physiological analyses are often used but they do not guarantee that a bacterial strain is not pathogenic. We report here new findings and an approach based on comparison of the genetic mobility of (1) probiotic, (2) pathogenic, and (3) nonpathogenic and non-probiotic strains, so as to differentiate probiotics, and inform their safe use. The region of the 16S ribosomal DNA (rDNA) genes of different E. faecium strains native to Pernambuco-Brazil was used with the GenBank query sequence. Complete genomes were selected and divided into three groups as noted above to identify the mobile genetic elements (MGEs) (transposase, integrase, conjugative transposon protein and phage) and antibiotic resistance genes (ARGs), and to undertake pan-genome analysis and multiple genome alignment. Differences in the number of MGEs were found in ARGs, in the presence and absence of the genes that differentiate E. faecium probiotics and pathogenic bacteria genetically. Our data suggest that genetic mobility appears to be informative in differentiating between probiotic and pathogenic strains. While the present findings are not necessarily applicable to all probiotics, they offer novel molecular insights to guide future research in nutrigenomics, clinical medicine, and food engineering on new ways to differentiate pathogenic from probiotic bacteria.

RevDate: 2020-11-03

Son S, Oh JD, Lee SH, et al (2020)

Comparative genomics of canine Lactobacillus reuteri reveals adaptation to a shared environment with humans.

Genes & genomics, 42(9):1107-1116.

BACKGROUND: Lactobacillus reuteri is a gram-positive, non-motile bacterial species that has been used as a representative microorganism model to describe the ecology and evolution of vertebrate gut symbionts.

OBJECTIVE: Because the genetic features and evolutionary strategies of L. reuteri from the gastrointestinal tract of canines remain unknown, we tried to construct draft genome canine L. reuteri and investigate modified, acquired, or lost genetic features that have facilitated the evolution and adaptation of strains to specific environmental niches by this study.

METHODS: To examine canine L. reuteri, we sequenced an L. reuteri strain isolated from a dog in Korea. A comparative genomic approach was used to assess genetic diversity and gain insight into the distinguishing features related to different hosts based on 27 published genomic sequences.

RESULTS: The pan-genome of 28 L. reuteri strains contained 7,369 gene families, and the core genome contained 1070 gene families. The ANI tree based on the core genes in the canine L. reuteri strain (C1) was very close to those for three strains (IRT, DSM20016, JCM1112) from humans. Evolutionarily, these four strains formed one clade, which we regarded as C1-clade in this study. We could investigate a total of 32,050 amino acid substitutions among the 28 L. reuteri strain genomes. In this comparison, 283 amino acid substitutions were specific to strain C1 and four strains in C1-clade shared most of these 283 C1-strain specific amino acid substitutions, suggesting strongly similar selective pressure. In accessory genes, we could identify 127 C1-clade host-specific genes and found that several genes were closely related to replication, recombination, and repair.

CONCLUSION: This study provides new insights into the adaptation of L. reuteri to the canine intestinal habitat, and suggests that the genome of L. reuteri from canines is closely associated with their living and shared environment with humans.

RevDate: 2020-09-04

Botelho J, Grosso F, L Peixe (2020)

ICEs Are the Main Reservoirs of the Ciprofloxacin-Modifying crpP Gene in Pseudomonas aeruginosa.

Genes, 11(8):.

The ciprofloxacin-modifying crpP gene was recently identified in a plasmid isolated from a Pseudomonas aeruginosa clinical isolate. Homologues of this gene were also identified in Escherichia coli, Klebsiella pneumoniae and Acinetobacter baumannii. We set out to explore the mobile elements involved in the acquisition and spread of this gene in publicly available and complete genomes of Pseudomonas spp. All Pseudomonas complete genomes were downloaded from NCBI's Refseq library and were inspected for the presence of the crpP gene. The mobile elements carrying this gene were further characterized. The crpP gene was identified only in P. aeruginosa, in more than half of the complete chromosomes (61.9%, n = 133/215) belonging to 52 sequence types, of which the high-risk clone ST111 was the most frequent. We identified 136 crpP-harboring integrative and conjugative elements (ICEs), with 93.4% belonging to the mating-pair formation G (MPFG) family. The ICEs were integrated at the end of a tRNALys gene and were all flanked by highly conserved 45-bp direct repeats. The crpP-carrying ICEs contain 26 core genes (2.2% of all 1193 genes found in all the ICEs together), which are present in 99% or more of the crpP-harboring ICEs. The most frequently encoded traits on these ICEs include replication, transcription, intracellular trafficking and cell motility. Our work suggests that ICEs are the main vectors promoting the dissemination of the ciprofloxacin-modifying crpP gene in P. aeruginosa.

RevDate: 2020-09-28

Petit RA, TD Read (2020)

Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes.

mSystems, 5(4):.

Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.

RevDate: 2020-11-02
CmpDate: 2020-11-02

Tao Y, Jordan DR, ES Mace (2020)

A Graph-Based Pan-Genome Guides Biological Discovery.

Molecular plant, 13(9):1247-1249.

RevDate: 2020-09-09

Correia K, R Mahadevan (2020)

Pan-Genome-Scale Network Reconstruction: Harnessing Phylogenomics Increases the Quantity and Quality of Metabolic Models.

Biotechnology journal [Epub ahead of print].

A genome-scale network reconstruction (GENRE) is a knowledgebase for an organism and has various applications. Available genome sequences have risen in recent years, but the number of curated GENREs has not kept pace. Existing yeast GENREs contain significant commission and omission errors. Current practices limit the quantity and quality of GENREs. An open and transparent phylogenomic-driven framework is outlined to address these issues. The method is demonstrated with 33 yeasts and fungi in Dikarya. A pan-fungal metabolic network called FYRMENT (Fungal and Yeast Metabolic Network) ( is created, and annotated with ortholog groups from AYbRAH ( Metabolic models for lower-level taxons are compiled. The fungal pan-GENRE contains 1553 orthologs, 2759 reactions, 2251 metabolites. The GENREs have higher genomic and metabolic coverage than existing yeast and fungal GENREs created with other methods. Metabolic simulations show the maximum amino acid yields from glucose differs between yeast lineages, indicating metabolic networks have evolved. Curating genomes and reactions at higher taxonomic-levels increases the quantity and quality of GENREs than conventional approaches. This approach can scale to other branches in the tree of life.

RevDate: 2020-09-28

Parlikar A, Kalia K, Sinha S, et al (2020)

Understanding genomic diversity, pan-genome, and evolution of SARS-CoV-2.

PeerJ, 8:e9576.

Coronovirus disease 2019 (COVID-19) infection, which originated from Wuhan, China, has seized the whole world in its grasp and created a huge pandemic situation before humanity. Since December 2019, genomes of numerous isolates have been sequenced and analyzed for testing confirmation, epidemiology, and evolutionary studies. In the first half of this article, we provide a detailed review of the history and origin of COVID-19, followed by the taxonomy, nomenclature and genome organization of its causative agent Severe Acute Respiratory Syndrome-related Coronavirus-2 (SARS-CoV-2). In the latter half, we analyze subgenus Sarbecovirus (167 SARS-CoV-2, 312 SARS-CoV, and 5 Pangolin CoV) genomes to understand their diversity, origin, and evolution, along with pan-genome analysis of genus Betacoronavirus members. Whole-genome sequence-based phylogeny of subgenus Sarbecovirus genomes reasserted the fact that SARS-CoV-2 strains evolved from their common ancestors putatively residing in bat or pangolin hosts. We predicted a few country-specific patterns of relatedness and identified mutational hotspots with high, medium and low probability based on genome alignment of 167 SARS-CoV-2 strains. A total of 100-nucleotide segment-based homology studies revealed that the majority of the SARS-CoV-2 genome segments are close to Bat CoV, followed by some to Pangolin CoV, and some are unique ones. Open pan-genome of genus Betacoronavirus members indicates the diversity contributed by the novel viruses emerging in this group. Overall, the exploration of the diversity of these isolates, mutational hotspots and pan-genome will shed light on the evolution and pathogenicity of SARS-CoV-2 and help in developing putative methods of diagnosis and treatment.

RevDate: 2020-11-06

Söderlund R, Formenti N, Caló S, et al (2020)

Comparative genome analysis of Erysipelothrix rhusiopathiae isolated from domestic pigs and wild boars suggests host adaptation and selective pressure from the use of antibiotics.

Microbial genomics, 6(8):.

The disease erysipelas caused by Erysipelothrix rhusiopathiae (ER) is a major concern in pig production. In the present study the genomes of ER from pigs (n=87), wild boars (n=71) and other sources (n=85) were compared in terms of whole-genome SNP variation, accessory genome content and the presence of genetic antibiotic resistance determinants. The aim was to investigate if genetic features among ER were associated with isolate origin in order to better estimate the risk of transmission of porcine-adapted strains from wild boars to free-range pigs and to increase our understanding of the evolution of ER. Pigs and wild boars carried isolates representing all ER clades, but clade one only occurred in healthy wild boars and healthy pigs. Several accessory genes or gene variants were found to be significantly associated with the pig and wild boar hosts, with genes predicted to encode cell wall-associated or extracellular proteins overrepresented. Gene variants associated with serovar determination and capsule production in serovars known to be pathogenic for pigs were found to be significantly associated with pigs as hosts. In total, 30 % of investigated pig isolates but only 6 % of wild boar isolates carried resistance genes, most commonly tetM (tetracycline) and lsa(E) together with lnu(B) (lincosamides, pleuromutilin and streptogramin A). The incidence of variably present genes including resistance determinants was weakly linked to phylogeny, indicating that host adaptation in ER has evolved multiple times in diverse lineages mediated by recombination and the acquisition of mobile genetic elements. The presented results support the occurrence of host-adapted ER strains, but they do not indicate frequent transmission between wild boars and domestic pigs. This article contains data hosted by Microreact.

RevDate: 2020-08-05

Derakhshani H, Bernier SP, Marko VA, et al (2020)

Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools.

BMC genomics, 21(1):519.

BACKGROUND: Illumina technology currently dominates bacterial genomics due to its high read accuracy and low sequencing cost. However, the incompleteness of draft genomes generated by Illumina reads limits their application in comprehensive genomics analyses. Alternatively, hybrid assembly using both Illumina short reads and long reads generated by single molecule sequencing technologies can enable assembly of complete bacterial genomes, yet the high per-genome cost of long-read sequencing limits the widespread use of this approach in bacterial genomics. Here we developed a protocol for hybrid assembly of complete bacterial genomes using miniaturized multiplexed Illumina sequencing and non-barcoded PacBio sequencing of a synthetic genomic pool (SGP), thus significantly decreasing the overall per-genome cost of sequencing.

RESULTS: We evaluated the performance of SGP hybrid assembly on the genomes of 20 bacterial isolates with different genome sizes, a wide range of GC contents, and varying levels of phylogenetic relatedness. By improving the contiguity of Illumina assemblies, SGP hybrid assembly generated 17 complete and 3 nearly complete bacterial genomes. Increased contiguity of SGP hybrid assemblies resulted in considerable improvement in gene prediction and annotation. In addition, SGP hybrid assembly was able to resolve repeat elements and identify intragenomic heterogeneities, e.g. different copies of 16S rRNA genes, that would otherwise go undetected by short-read-only assembly. Comprehensive comparison of SGP hybrid assemblies with those generated using multiplexed PacBio long reads (long-read-only assembly) also revealed the relative advantage of SGP hybrid assembly in terms of assembly quality. In particular, we observed that SGP hybrid assemblies were completely devoid of both small (i.e. single base substitutions) and large assembly errors. Finally, we show the ability of SGP hybrid assembly to differentiate genomes of closely related bacterial isolates, suggesting its potential application in comparative genomics and pangenome analysis.

CONCLUSION: Our results indicate the superiority of SGP hybrid assembly over both short-read and long-read assemblies with respect to completeness, contiguity, accuracy, and recovery of small replicons. By lowering the per-genome cost of sequencing, our parallel sequencing and hybrid assembly pipeline could serve as a cost effective and high throughput approach for completing high-quality bacterial genomes.

RevDate: 2020-10-29
CmpDate: 2020-10-29

Haberer G, Kamal N, Bauer E, et al (2020)

European maize genomes highlight intraspecies variation in repeat and gene content.

Nature genetics, 52(9):950-957.

The diversity of maize (Zea mays) is the backbone of modern heterotic patterns and hybrid breeding. Historically, US farmers exploited this variability to establish today's highly productive Corn Belt inbred lines from blends of dent and flint germplasm pools. Here, we report de novo genome sequences of four European flint lines assembled to pseudomolecules with scaffold N50 ranging from 6.1 to 10.4 Mb. Comparative analyses with two US Corn Belt lines explains the pronounced differences between both germplasms. While overall syntenic order and consolidated gene annotations reveal only moderate pangenomic differences, whole-genome alignments delineating the core and dispensable genome, and the analysis of heterochromatic knobs and orthologous long terminal repeat retrotransposons unveil the dynamics of the maize genome. The high-quality genome sequences of the flint pool complement the maize pangenome and provide an important tool to study maize improvement at a genome scale and to enhance modern hybrid breeding.

RevDate: 2020-10-23

Muqaddasi QH, Brassac J, Ebmeyer E, et al (2020)

Prospects of GWAS and predictive breeding for European winter wheat's grain protein content, grain starch content, and grain hardness.

Scientific reports, 10(1):12541.

Grain quality traits determine the classification of registered wheat (Triticum aestivum L.) varieties. Although environmental factors and crop management practices exert a considerable influence on wheat quality traits, a significant proportion of the variance is attributed to the genetic factors. To identify the underlying genetic factors of wheat quality parameters viz., grain protein content (GPC), grain starch content (GSC), and grain hardness (GH), we evaluated 372 diverse European wheat varieties in replicated field trials in up to eight environments. We observed that all of the investigated traits hold a wide and significant genetic variation, and a significant negative correlation exists between GPC and GSC plus grain yield. Our association analyses based on 26,694 high-quality single nucleotide polymorphic markers revealed a strong quantitative genetic nature of GPC and GSC with associations on groups 2, 3, and 6 chromosomes. The identification of known Puroindoline-b gene for GH provided a positive analytic proof for our studies. We report that a locus QGpc.ipk-6A controls both GPC and GSC with opposite allelic effects. Based on wheat's reference and pan-genome sequences, the physical characterization of two loci viz., QGpc.ipk-2B and QGpc.ipk-6A facilitated the identification of the candidate genes for GPC. Furthermore, by exploiting additive and epistatic interactions of loci, we evaluated the prospects of predictive breeding for the investigated traits that suggested its efficient use in the breeding programs.

RevDate: 2020-10-23

Flament-Simon SC, de Toro M, Chuprikova L, et al (2020)

High diversity and variability of pipolins among a wide range of pathogenic Escherichia coli strains.

Scientific reports, 10(1):12452.

Self-synthesizing transposons are integrative mobile genetic elements (MGEs) that encode their own B-family DNA polymerase (PolB). Discovered a few years ago, they are proposed as key players in the evolution of several groups of DNA viruses and virus-host interaction machinery. Pipolins are the most recent addition to the group, are integrated in the genomes of bacteria from diverse phyla and also present as circular plasmids in mitochondria. Remarkably, pipolins-encoded PolBs are proficient DNA polymerases endowed with DNA priming capacity, hence the name, primer-independent PolB (piPolB). We have now surveyed the presence of pipolins in a collection of 2,238 human and animal pathogenic Escherichia coli strains and found that, although detected in only 25 positive isolates (1.1%), they are present in E. coli strains from a wide variety of pathotypes, serotypes, phylogenetic groups and sequence types. Overall, the pangenome of strains carrying pipolins is highly diverse, despite the fact that a considerable number of strains belong to only three clonal complexes (CC10, CC23 and CC32). Comparative analysis with a set of 67 additional pipolin-harboring genomes from GenBank database spanning strains from diverse origin, further confirmed these results. The genetic structure of pipolins shows great flexibility and variability, with the piPolB gene and the attachment sites being the only common features. Most pipolins contain one or more recombinases that would be involved in excision/integration of the element in the same conserved tRNA gene. This mobilization mechanism might explain the apparent incompatibility of pipolins with other integrative MGEs such as integrons. In addition, analysis of cophylogeny between pipolins and pipolin-harboring strains showed a lack of congruence between several pipolins and their host strains, in agreement with horizontal transfer between hosts. Overall, these results indicate that pipolins can serve as a vehicle for genetic transfer among circulating E. coli and possibly also among other pathogenic bacteria.

RevDate: 2020-10-23

Crysnanto D, H Pausch (2020)

Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery.

Genome biology, 21(1):184.

BACKGROUND: The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references.

RESULTS: We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels.

CONCLUSIONS: We develop the first variation-aware reference graph for an agricultural animal ( Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.

RevDate: 2020-08-25

Yin Z, Liu J, Du B, et al (2020)

Whole-Genome-Based Survey for Polyphyletic Serovars of Salmonella enterica subsp. enterica Provides New Insights into Public Health Surveillance.

International journal of molecular sciences, 21(15):.

Serotyping has traditionally been considered the basis for surveillance of Salmonella, but it cannot distinguish distinct lineages sharing the same serovar that vary in host range, pathogenicity and epidemiology. However, polyphyletic serovars have not been extensively investigated. Public health microbiology is currently being transformed by whole-genome sequencing (WGS) data, which promote the lineage determination using a more powerful and accurate technique than serotyping. The focus in this study is to survey and analyze putative polyphyletic serovars. The multi-locus sequence typing (MLST) phylogenetic analysis identified four putative polyphyletic serovars, namely, Montevideo, Bareilly, Saintpaul, and Muenchen. Whole-genome-based phylogeny and population structure highlighted the polyphyletic nature of Bareilly and Saintpaul and the multi-lineage nature of Montevideo and Muenchen. The population of these serovars was defined by extensive genetic diversity, the open pan genome and the small core genome. Source niche metadata revealed putative existence of lineage-specific niche adaptation (host-preference and environmental-preference), exhibited by lineage-specific genomic contents associated with metabolism and transport. Meanwhile, differences in genetic profiles relating to virulence and antimicrobial resistance within each lineage may contribute to pathogenicity and epidemiology. The results also showed that recombination events occurring at the H1-antigen loci may be an important reason for polyphyly. The results presented here provide the genomic basis of simple, rapid, and accurate identification of phylogenetic lineages of these serovars, which could have important implications for public health.

RevDate: 2020-08-27

Fang H, Xu JB, Nie Y, et al (2020)

Pan-genomic analysis reveals that the evolution of Dietzia species depends on their living habitats.

Environmental microbiology [Epub ahead of print].

The bacterial genus Dietzia is widely distributed in various environments. The genomes of 26 diverse strains of Dietzia, including almost all the type strains, were analysed in this study. This analysis revealed a lipid metabolism gene richness, which could explain the ability of Dietzia to live in oil related environments. The pan-genome consists of 83,976 genes assigned into 10,327 gene families, 792 of which are shared by all the genomes of Dietzia. Mathematical extrapolation of the data suggests that the Dietzia pan-genome is open. Both gene duplication and gene loss contributed to the open pan-genome, while horizontal gene transfer was limited. Dietzia strains primarily gained their diverse metabolic capacity through more ancient gene duplications. Phylogenetic analysis of Dietzia isolated from aquatic and terrestrial environments showed two distinct clades from the same ancestor. The genome sizes of Dietzia strains from aquatic environments were significantly larger than those from terrestrial environments, which was mainly due to the occurrence of more gene loss events during the evolutionary progress of the strains from terrestrial environments. The evolutionary history of Dietzia was tightly coupled to environmental conditions, and iron concentrations should be one of the key factors shaping the genomes of the Dietzia lineages.

RevDate: 2020-09-28

Moreno-Pérez A, Pintado A, Murillo J, et al (2020)

Host Range Determinants of Pseudomonas savastanoi Pathovars of Woody Hosts Revealed by Comparative Genomics and Cross-Pathogenicity Tests.

Frontiers in plant science, 11:973.

The study of host range determinants within the Pseudomonas syringae complex is gaining renewed attention due to its widespread distribution in non-agricultural environments, evidence of large variability in intra-pathovar host range, and the emergence of new epidemic diseases. This requires the establishment of appropriate model pathosystems facilitating integration of phenotypic, genomic and evolutionary data. Pseudomonas savastanoi pv. savastanoi is a model pathogen of the olive tree, and here we report a closed genome of strain NCPPB 3335, plus draft genome sequences of three strains isolated from oleander (pv. nerii), ash (pv. fraxini) and broom plants (pv. retacarpa). We then conducted a comparative genomic analysis of these four new genomes plus 16 publicly available genomes, representing 20 strains of these four P. savastanoi pathovars of woody hosts. Despite overlapping host ranges, cross-pathogenicity tests using four plant hosts clearly separated these pathovars and lead to pathovar reassignment of two strains. Critically, these functional assays were pivotal to reconcile phylogeny with host range and to define pathovar-specific genes repertoires. We report a pan-genome of 7,953 ortholog gene families and a total of 45 type III secretion system effector genes, including 24 core genes, four genes exclusive of pv. retacarpa and several genes encoding pathovar-specific truncations. Noticeably, the four pathovars corresponded with well-defined genetic lineages, with core genome phylogeny and hierarchical clustering of effector genes closely correlating with pathogenic specialization. Knot-inducing pathovars encode genes absent in the canker-inducing pv. fraxini, such as those related to indole acetic acid, cytokinins, rhizobitoxine, and a bacteriophytochrome. Other pathovar-exclusive genes encode type I, type II, type IV, and type VI secretion system proteins, the phytotoxine phevamine A, a siderophore, c-di-GMP-related proteins, methyl chemotaxis proteins, and a broad collection of transcriptional regulators and transporters of eight different superfamilies. Our combination of pathogenicity analyses and genomics tools allowed us to correctly assign strains to pathovars and to propose a repertoire of host range-related genes in the P. syringae complex.

RevDate: 2020-11-06

Kc R, Leong KWC, Harkness NM, et al (2020)

Whole-genome analyses reveal gene content differences between nontypeable Haemophilus influenzae isolates from chronic obstructive pulmonary disease compared to other clinical phenotypes.

Microbial genomics, 6(8):.

Nontypeable Haemophilus influenzae (NTHi) colonizes human upper respiratory airways and plays a key role in the course and pathogenesis of acute exacerbations of chronic obstructive pulmonary disease (COPD). Currently, it is not possible to distinguish COPD isolates of NTHi from other clinical isolates of NTHi using conventional genotyping methods. Here, we analysed the core and accessory genome of 568 NTHi isolates, including 40 newly sequenced isolates, to look for genetic distinctions between NTHi isolates from COPD with respect to other illnesses, including otitis media, meningitis and pneumonia. Phylogenies based on polymorphic sites in the core-genome did not show discrimination between NTHi strains collected from different clinical phenotypes. However, pan-genome-wide association studies identified 79 unique NTHi accessory genes that were significantly associated with COPD. Furthermore, many of the COPD-related NTHi genes have known or predicted roles in virulence, transmembrane transport of metal ions and nutrients, cellular respiration and maintenance of redox homeostasis. This indicates that specific genes may be required by NTHi for its survival or virulence in the COPD lung. These results advance our understanding of the pathogenesis of NTHi infection in COPD lungs.


ESP Quick Facts

ESP Origins

In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.

ESP Support

In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.

ESP Rationale

Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.

ESP Goal

In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.

ESP Usage

Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.

ESP Content

When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.

ESP Help

Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.

ESP Plans

With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.

Electronic Scholarly Publishing
961 Red Tail Lane
Bellingham, WA 98226

E-mail: RJR8222 @

Papers in Classical Genetics

The ESP began as an effort to share a handful of key papers from the early days of classical genetics. Now the collection has grown to include hundreds of papers, in full-text format.

Digital Books

Along with papers on classical genetics, ESP offers a collection of full-text digital books, including many works by Darwin (and even a collection of poetry — Chicago Poems by Carl Sandburg).


ESP now offers a much improved and expanded collection of timelines, designed to give the user choice over subject matter and dates.


Biographical information about many key scientists.

Selected Bibliographies

Bibliographies on several topics of potential interest to the ESP community are now being automatically maintained and generated on the ESP site.

ESP Picks from Around the Web (updated 07 JUL 2018 )