Bibliography on: Pangenome

Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Monat C, Schreiber M, Stein N, et al (2018)

Prospects of pan-genomics in barley.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik pii:10.1007/s00122-018-3234-z [Epub ahead of print].

The concept of a pan-genome refers to intraspecific diversity in genome content and structure, encompassing both genes and intergenic space. Pan-genomic studies employ a combination of de novo sequence assembly and reference-based alignment to discover and genotype structural variants. The large size and complex structure of Triticeae genomes were for a long time an obstacle for genomic research in barley and its relatives. Now that a reference genome is available, computational pipelines for high-quality sequence assembly are in place, and sequence costs continue to drop, investigations into the structural diversity of the barley genome seem within reach. Here, we review the recent progress on pan-genomics in the model grass Brachypodium distachyon, and the cereal crops rice and maize, and devise a multi-tiered strategy for a pan-genome project in barley. Our design involves: (1) the construction of high-quality de novo sequence assemblies for a small core set of representative genotypes, (2) short-read sequencing of a large diversity panel of genebank accessions to medium coverage and (3) the use of complementary methods such as chromosome-conformation capture sequencing and k-mer-based association genetics. The in silico representation of the barley pan-genome may inform about the mechanisms of structural genome evolution in the Triticeae and supplement quantitative genetics models of crop performance for better accuracy and predictive ability.

Mohapatra B, Kazy SK, P Sar (2018)

Comparative genome analysis of arsenic reducing, hydrocarbon metabolizing groundwater bacterium Achromobacter sp. KAs 3-5T explains its competitive edge for survival in aquifer environment.

Genomics pii:S0888-7543(18)30421-X [Epub ahead of print].

Whole genome sequence of arsenic reducing, hydrocarbon metabolizing groundwater bacterium Achromobacter sp. KAs 3-5T was explored to understand the genomic basis of its As-ecophysiology and niche adaptation in aquifer environment. The genome (5.6 Mbp, 65.5 G + C %) encodes 4840 proteins, 1138 enzymes, 53 tRNAs, 11 rRNAs, 608 signal peptides, and 1.13% horizontally transferred genes. Presence of genes encoding cytosolic As5+-reduction (arsCBH, ACR3), aromatics utilization (bph/naph, catABC, boxABCD, genACB), Fe-transformation (tonB, achromobactin, FUR, FeR), and denitrification (nar, nap) processes were observed and validated through proteomics. Phylogenomic analysis (< 90% ANI, < 50% DDH) confirmed strain KAs 3-5T to be novel representative of the genus Achromobacter. An asymptotic open pan-genome (20,855 genes) and high correlation between genomic and ecological diversity suggested niche preference ability of this genus. Assemblage of species specific genes affiliated to transcription-regulation, membrane transport, and redox-transformation explained the strain's competitive survival strategies in As-rich oligotrophic groundwater.

Fontana A, Zacconi C, L Morelli (2018)

Genetic Signatures of Dairy Lactobacillus casei Group.

Frontiers in microbiology, 9:2611.

Lactobacillus casei/Lactobacillus paracasei group of species contains strains adapted to a wide range of environments, from dairy products to intestinal tract of animals and fermented vegetables. Understanding the gene acquisitions and losses that induced such different adaptations, implies a comparison between complete genomes, since evolutionary differences spread on the whole sequence. This study compared 12 complete genomes of L. casei/paracasei dairy-niche isolates and 7 genomes of L. casei/paracasei isolated from other habitats (i.e., corn silage, human intestine, sauerkraut, beef, congee). Phylogenetic tree construction and average nucleotide identity (ANI) metric showed a clustering of the two dairy L. casei strains ATCC393 and LC5, indicating a lower genetic relatedness in comparison to the other strains. Genomic analysis revealed a core of 313 genes shared by dairy and non-dairy Lactic Acid bacteria (LAB), within a pan-genome of 9,462 genes. Functional category analyses highlighted the evolutionary genes decay of dairy isolates, particularly considering carbohydrates and amino acids metabolisms. Specifically, dairy L. casei/paracasei strains lost the ability to metabolize myo-inositol and taurine (i.e., iol and tau gene clusters). However, gene acquisitions by dairy strains were also highlighted, mostly related to defense mechanisms and host-pathogen interactions (i.e., yueB, esaA, and sle1). This study aimed to be a preliminary investigation on dairy and non-dairy marker genes that could be further characterized for probiotics or food applications.

Hiller NL, R Sá-Leão (2018)

Puzzling Over the Pneumococcal Pangenome.

Frontiers in microbiology, 9:2580.

The Gram positive bacterium Streptococcus pneumoniae (pneumococcus) is a major human pathogen. It is a common colonizer of the human host, and in the nasopharynx, sinus, and middle ear it survives as a biofilm. This mode of growth is optimal for multi-strain colonization and genetic exchange. Over the last decades, the far-reaching use of antibiotics and the widespread implementation of pneumococcal multivalent conjugate vaccines have posed considerable selective pressure on pneumococci. This scenario provides an exceptional opportunity to study the evolution of the pangenome of a clinically important bacterium, and has the potential to serve as a case study for other species. The goal of this review is to highlight key findings in the studies of pneumococcal genomic diversity and plasticity.

Biessy A, Novinscak A, Blom J, et al (2018)

Diversity of phytobeneficial traits revealed by whole-genome analysis of worldwide-isolated phenazine-producing Pseudomonas spp.

Environmental microbiology [Epub ahead of print].

Plant-beneficial Pseudomonas spp. competitively colonize the rhizosphere and display plant-growth promotion and/or disease-suppression activities. Some strains within the P. fluorescens species complex produce phenazine derivatives, such as phenazine-1-carboxylic acid. These antimicrobial compounds are broadly inhibitory to numerous soil-dwelling plant pathogens and play a role in the ecological competence of phenazine-producing Pseudomonas spp. We assembled a collection encompassing 63 strains representative of the worldwide diversity of plant-beneficial phenazine-producing Pseudomonas spp. In this study, we report the sequencing of 58 complete genomes using PacBio RS II sequencing technology. Distributed among four subgroups within the P. fluorescens species complex, the diversity of our collection is reflected by the large pangenome which accounts for 25,413 protein-coding genes. We identified genes and clusters encoding for numerous phytobeneficial traits, including antibiotics, siderophores and cyclic lipopeptides biosynthesis, some of which were previously unknown in these microorganisms. Finally, we gained insight into the evolutionary history of the phenazine biosynthetic operon. Given its diverse genomic context, it is likely that this operon was relocated several times during Pseudomonas evolution. Our findings acknowledge the tremendous diversity of plant-beneficial phenazine-producing Pseudomonas spp., paving the way for comparative analyses to identify new genetic determinants involved in biocontrol, plant-growth promotion and rhizosphere competence. This article is protected by copyright. All rights reserved.

Nanayakkara BS, O'Brien CL, DM Gordon (2018)

Diversity and distribution of Klebsiella capsules in E. coli.

Environmental microbiology reports [Epub ahead of print].

E. coli strains responsible for elevated counts in freshwater reservoirs in Australia carry a capsule originating from Klebsiella. The occurrence of Klebsiella capsules in E. coli was about 7% overall and 23 different capsule types were detected. Capsules were observed in strains from phylogroups A, B1, and C, but were absent from phylogroup B2, D, E, and F strains. In general, few A, B1, or C lineages were capsule-positive, but when a lineage was encapsulated multiple different capsule types were present. All Klebsiella capsule-positive strains were of serogroups O8, O9, and O89. Regardless of the phylogroup, O9 strains were more likely to be capsule-positive than O8 strains. Given the sequence similarity, it appears that both the capsule region and the O-antigen gene region are transferred to E. coli from Klebsiella as a single block via horizontal gene transfer events. Pan genome analysis indicated that there were only modest differences between encapsulated and non-encapsulated strains belonging to phylogroup A. The possession of a Klebsiella capsule, but not the type of capsule, is likely a key determinant of the bloom status. This article is protected by copyright. All rights reserved.

Al-Bassam MM, Haist J, Neumann SA, et al (2018)

Expression Patterns, Genomic Conservation and Input Into Developmental Regulation of the GGDEF/EAL/HD-GYP Domain Proteins in Streptomyces.

Frontiers in microbiology, 9:2524.

To proliferate, antibiotic-producing Streptomyces undergo a complex developmental transition from vegetative growth to the production of aerial hyphae and spores. This morphological switch is controlled by the signaling molecule cyclic bis-(3',5') di-guanosine-mono-phosphate (c-di-GMP) that binds to the master developmental regulator, BldD, leading to repression of key sporulation genes during vegetative growth. However, a systematical analysis of all the GGDEF/EAL/HD-GYP proteins that control c-di-GMP levels in Streptomyces is still lacking. Here, we have FLAG-tagged all 10 c-di-GMP turnover proteins in Streptomyces venezuelae and characterized their expression patterns throughout the life cycle, revealing that the diguanylate cyclase (DGC) CdgB and the phosphodiesterase (PDE) RmdB are the most abundant GGDEF/EAL proteins. Moreover, we have deleted all the genes coding for c-di-GMP turnover enzymes individually and analyzed morphogenesis of the mutants in macrocolonies. We show that the composite GGDEF-EAL protein CdgC is an active DGC and that deletion of the DGCs cdgB and cdgC enhance sporulation whereas deletion of the PDEs rmdA and rmdB delay development in S. venezuelae. By comparing the pan genome of 93 fully sequenced Streptomyces species we show that the DGCs CdgA, CdgB, and CdgC, and the PDE RmdB represent the most conserved c-di-GMP-signaling proteins in the genus Streptomyces.

Pinto M, González-Díaz A, Machado MP, et al (2018)

Insights into the population structure and pan-genome of Haemophilus influenzae.

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases pii:S1567-1348(18)30484-2 [Epub ahead of print].

The human-restricted bacterium Haemophilus influenzae is responsible for respiratory infections in both children and adults. While colonization begins in the upper airways, it can spread throughout the respiratory tract potentially leading to invasive infections. Although the spread of H. influenzae serotype b (Hib) has been prevented by vaccination, the emergence of infections by other serotypes as well as by non-typeable isolates (NTHi) have been observed, prompting the need for novel prevention strategies. Here, we aimed to study the population structure of H. influenzae and to get some insights into its pan-genome. We studied 305H. influenzae strains, enrolling 217 publicly available genomes, as well as 88 newly sequenced H. influenzae invasive strains isolated in Portugal, spanning a 24-year period. NTHi isolates presented a core-SNP-based genetic diversity about 10-fold higher than the one observed for Hib. The analysis of key factors involved in pathogenesis, such as lipooligosaccharides, hemagglutinating pili and High Molecular Weight-adhesins, suggests that NTHi shape its virulence repertoire, either by acquisition and loss of genes or by SNP-based diversification, likely towards host immune evasion and persistence. Discreet NTHi subpopulations structures are proposed based on core-genome supported with 17 candidate genetic markers identified in the accessory genome. Additionally, this study provides two bioinformatics tools for in silico rapid identification of H. influenzae serotypes and NTHi clades previously proposed, obviating laboratory-based demanding procedures. The present study constitutes an important genomic framework that could lay way for future studies on the genetic determinants underlying invasiveness and disease and population structure of H. influenzae.

Wüthrich D, Irmler S, Berthoud H, et al (2018)

Conversion of Methionine to Cysteine in Lactobacillus paracasei Depends on the Highly Mobile cysK-ctl-cysE Gene Cluster.

Frontiers in microbiology, 9:2415.

Milk and dairy products are rich in nutrients and are therefore habitats for various microbiomes. However, the composition of nutrients can be quite diverse, in particular among the sulfur containing amino acids. In milk, methionine is present in a 25-fold higher abundance than cysteine. Interestingly, a fraction of strains of the species L. paracasei - a flavor-enhancing adjunct culture species - can grow in medium with methionine as the sole sulfur source. In this study, we focus on genomic and evolutionary aspects of sulfur dependence in L. paracasei strains. From 24 selected L. paracasei strains, 16 strains can grow in medium with methionine as sole sulfur source. We sequenced these strains to perform gene-trait matching. We found that one gene cluster - consisting of a cysteine synthase, a cystathionine lyase, and a serine acetyltransferase - is present in all strains that grow in medium with methionine as sole sulfur source. In contrast, strains that depend on other sulfur sources do not have this gene cluster. We expanded the study and searched for this gene cluster in other species and detected it in the genomes of many bacteria species used in the food production. The comparison to these species showed that two different versions of the gene cluster exist in L. paracasei which were likely gained in two distinct events of horizontal gene transfer. Additionally, the comparison of 62 L. paracasei genomes and the two versions of the gene cluster revealed that this gene cluster is mobile within the species.

Fleshman A, Mullins K, Sahl J, et al (2018)

Corrigendum: Comparative pan-genomic analyses of Orientia tsutsugamushi reveal an exceptional model of bacterial evolution driving genomic diversity.

Microbial genomics, 4(10):.

Franz E, Rotariu O, Lopes BS, et al (2018)

Phylogeographic analysis reveals multiple international transmission events have driven the global emergence of Escherichia coli O157:H7.

Clinical infectious diseases : an official publication of the Infectious Diseases Society of America pii:5146342 [Epub ahead of print].

Background: Shiga toxin-producing Escherchia coli O157:H7 is a zoonotic pathogen which causes numerous food and waterborne disease outbreaks. It is globally distributed but its origin and temporal sequence of geographical spread is unknown.

Methods: We analysed Whole Genome Sequencing data of 757 isolates from 4 continents and performed a pan genome analysis to identify the core genome and from this extracted single nucleotide polymorphisms. Timed phylogeographic analysis was performed on a subset of the isolates to investigate it's worldwide spread.

Results: The common ancestor of this set of isolates occurred around 1890 (1845-1925) and originated from the Netherlands. Phylogeographic analysis identified 34 major transmission events. The earliest were predominantly intercontinental from Europe to Australia around 1937 (1909-1958), to USA in 1941 (1921-1962), to Canada in 1960 (1943-1979), and from Australia to New Zealand in 1966 (1943-1982). This pre-dates the first reported human case of E. coli O157:H7 in 1975 from the USA.

Conclusions: Inter- and intra- continental transmission events have resulted in the current international distribution of E. coli O157:H7 and it is likely that these events were facilitated by animal movements (e.g. Holstein Friesian cattle). These findings will inform policy on action that is crucial to reduce further spread of E. coli O157:H7 and other (emerging) STEC strains globally.

De Filippis F, La Storia A, Villani F, et al (2018)

Strain-level diversity analysis of Pseudomonas fragi after in situ pangenome reconstruction shows distinctive spoilage-associated metabolic traits clearly selected by different storage conditions.

Applied and environmental microbiology pii:AEM.02212-18 [Epub ahead of print].

Microbial spoilage of raw meat causes huge economic losses every year. Understanding the microbial ecology associated to the spoilage and its dynamics during refrigerated storage of meat can help in preventing and delaying the spoilage-related activities. Raw meat microbiota is usually complex but only few members will develop during storage and cause spoilage, upon the pressure of several external factors, such as temperature and oxygen availability. We characterized the metagenome of beef packed aerobically or under-vacuum during refrigerated storage to explore how different packaging conditions may influence microbial composition and potential spoilage-associated activities. Different population dynamics and spoilage-associated genomic repertoires occurred in beef stored in air or vacuum-packaging. Moreover, pangenomics of Pseudomonas fragi strains extracted from metagenomes was carried out. We demonstrated the presence of specific, storage-driven strain-level profiles of Pseudomonas fragi, characterized by a different gene repertoire, thus potentially able to act differently during meat spoilage. The results provide new knowledge on strain-level microbial ecology associated to meat spoilage and can be of value for future strategies of spoilage prevention and food waste reduction.IMPORTANCE This work provides insights on the mechanisms involved in raw beef spoilage during refrigerated storage and on the selective pressure exerted by the packaging conditions. We highlighted the presence of different microbial metagenomes during spoilage of beef packaged aerobically or under-vacuum. The packaging condition was able to select specific Pseudomonas fragi strains, with a distinctive genomic repertoire. This study may help in deciphering the behaviour of different biomes directly in-situ in food and in understanding the specific contribution of different strains to food spoilage.

Hii SYF, Ahmad N, Hashim R, et al (2018)

A SNP-based phylogenetic analysis of Corynebacterium diphtheriae in Malaysia.

BMC research notes, 11(1):760 pii:10.1186/s13104-018-3868-6.

OBJECTIVE: There is a lack of study in Corynebacterium diphtheriae isolates in Malaysia. The alarming surge of cases in year 2016 lead us to evaluate the local clinical C. diphtheriae strains in Malaysia. We conducted single nucleotide polymorphism phylogenetic analysis on the core and pan-genome as well as toxin and diphtheria toxin repressor (DtxR) genes of Malaysian C. diphtheriae isolates from the year 1986-2016.

RESULTS: The comparison between core and pan-genomic comparison showed variation in the distribution of C. diphtheriae. The local isolates portrayed a heterogenous trait and a close relationship between Malaysia's and Belarus's, Africa's and India's strains were observed. A toxigenic C. diphtheriae clone was noted to be circulating in the Malaysian population for nearly 30 years and from our study, the non-toxigenic and toxigenic C. diphtheriae strains can be differentiated significantly into two large clusters, A and B respectively. Analysis against vaccine strain, PW8 portrayed that the amino acid composition of toxin and DtxR in Malaysia's local strains are well-conserved and there was no functional defect noted. Hence, the change in efficacy of the currently used toxoid vaccine is unlikely to occur.

Subedi D, Vijay AK, Kohli GS, et al (2018)

Comparative genomics of clinical strains of Pseudomonas aeruginosa strains isolated from different geographic sites.

Scientific reports, 8(1):15668 pii:10.1038/s41598-018-34020-7.

The large and complex genome of Pseudomonas aeruginosa, which consists of significant portions (up to 20%) of transferable genetic elements contributes to the rapid development of antibiotic resistance. The whole genome sequences of 22 strains isolated from eye and cystic fibrosis patients in Australia and India between 1992 and 2007 were used to compare genomic divergence and phylogenetic relationships as well as genes for antibiotic resistance and virulence factors. Analysis of the pangenome indicated a large variation in the size of accessory genome amongst 22 stains and the size of the accessory genome correlated with number of genomic islands, insertion sequences and prophages. The strains were diverse in terms of sequence type and dissimilar to that of global epidemic P. aeruginosa clones. Of the eye isolates, 62% clustered together within a single lineage. Indian eye isolates possessed genes associated with resistance to aminoglycoside, beta-lactams, sulphonamide, quaternary ammonium compounds, tetracycline, trimethoprims and chloramphenicols. These genes were, however, absent in Australian isolates regardless of source. Overall, our results provide valuable information for understanding the genomic diversity of P. aeruginosa isolated from two different infection types and countries.

Chaudhari NM, Gautam A, Gupta VK, et al (2018)

PanGFR-HM: A Dynamic Web Resource for Pan-Genomic and Functional Profiling of Human Microbiome With Comparative Features.

Frontiers in microbiology, 9:2322.

The conglomerate of microorganisms inhabiting various body-sites of human, known as the human microbiome, is one of the key determinants of human health and disease. Comprehensive pan-genomic and functional analysis approach for human microbiome components can enrich our understanding about impact of microbiome on human health. By utilizing this approach we developed PanGFR-HM ( - a novel dynamic web-resource that integrates genomic and functional characteristics of 1293 complete microbial genomes available from Human Microbiome Project. The resource allows users to explore genomic/functional diversity and genome-based phylogenetic relationships between human associated microbial genomes, not provided by any other resource. The key features implemented here include pan-genome and functional analysis of organisms based on taxonomy or body-site, and comparative analysis between groups of organisms. The first feature can also identify probable gene-loss events and significantly over/under represented KEGG/COG categories within pan-genome. The unique second feature can perform comparative genomic, functional and pathways analysis between 4 groups of microbes. The dynamic nature of this resource enables users to define parameters for orthologous clustering and to select any set of organisms for analysis. As an application for comparative feature of PanGFR-HM, we performed a comparative analysis with 67 Lactobacillus genomes isolated from human gut, oral cavity and urogenital tract, and therefore characterized the body-site specific genes, enzymes and pathways. Altogether, PanGFR-HM, being unique in its content and functionality, is expected to provide a platform for microbiome-based comparative functional and evolutionary genomics.

Johnson TJ, Elnekave E, Miller EA, et al (2018)

Phylogenomic analysis of extraintestinal pathogenic Escherichia coli ST1193, an emerging multidrug-resistant clonal group.

Antimicrobial agents and chemotherapy pii:AAC.01913-18 [Epub ahead of print].

The fluoroquinolone-resistant ST1193 clonal group of Escherichia coli, from the ST14 clonal complex (STc14) within phylogenetic group B2, has appeared recently as an important cause of extraintestinal disease in humans. Although this emerging lineage has been characterized to some extent using conventional methods, it has not been studied extensively at the genomic level. Here, we used whole genome sequence analysis to compare 355 ST1193 isolates with 72 isolates from other STs within STc14. Using core genome phylogeny, the ST1193 isolates formed a tightly clustered clade with many genotypic similarities, as compared to ST14 isolates. All ST1193 isolates possessed the same set of three chromosomal mutations conferring fluoroquinolone resistance, carried the fimH64 allele, and were lactose non-fermenting. Analysis revealed an evolutionary progression from K1 to K5 capsular types and acquisition of an F-type virulence plasmid followed by changes in plasmid structure congruent with genome phylogeny. In contrast, the numerous identified antimicrobial resistance genes were distributed incongruently with the underlying phylogeny, suggesting frequent gain or loss of the corresponding resistance gene cassettes despite retention of the presumed carrier plasmids. Pangenome analysis revealed gains and losses of genetic loci occurring during the transition from ST14 to ST1193, and from the K1 to K5 capsular types. Using time-scaled phylogenetic analysis, we estimated that current ST1193 clades first emerged approximately 25 years ago. Overall, ST1193 appears to be a recently emerged clone in which both stepwise and mosaic evolution likely have contributed to epidemiologic success.

Bettgenhaeuser J, SG Krattinger (2018)

Rapid gene cloning in cereals.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik pii:10.1007/s00122-018-3210-7 [Epub ahead of print].

KEY MESSAGE: The large and complex genomes of many cereals hindered cloning efforts in the past. Advances in genomics now allow the rapid cloning of genes from humanity's most valuable crops. The past two decades were characterized by a genomics revolution that entailed profound changes to crop research, plant breeding, and agriculture. Today, high-quality reference sequences are available for all major cereal crop species. Large resequencing and pan-genome projects start to reveal a more comprehensive picture of the genetic makeup and the diversity among domesticated cereals and their wild relatives. These technological advancements will have a dramatic effect on dissecting genotype-phenotype associations and on gene cloning. In this review, we will highlight the status of the genomic resources available for various cereal crops and we will discuss their implications for gene cloning. A particular focus will be given to the cereal species barley and wheat, which are characterized by very large and complex genomes that have been inaccessible to rapid gene cloning until recently. With the advancements in genomics and the development of several rapid gene-cloning methods, it has now become feasible to tackle the cloning of most agriculturally important genes, even in wheat and barley.

Fraunhofer ME, Geißler AJ, Behr J, et al (2018)

Comparative Genomics of Lactobacillus brevis Reveals a Significant Plasmidome Overlap of Brewery and Insect Isolates.

Current microbiology pii:10.1007/s00284-018-1581-2 [Epub ahead of print].

Lactobacillus (L.) brevis represents a versatile, ubiquitistic species of lactic acid bacteria, occurring in various foods, as well as plants and intestinal tracts. The ability to deal with considerably differing environmental conditions in the respective ecological niches implies a genomic adaptation to the particular requirements to use it as a habitat beyond a transient state. Given the isolation source, 24 L. brevis genomes were analyzed via comparative genomics to get a broad view of the genomic complexity and ecological versatility of this species. This analysis showed L. brevis being a genetically diverse species possessing a remarkably large pan genome. As anticipated, it proved difficult to draw a correlation between chromosomal settings and isolation source. However, on plasmidome level, brewery- and insect-derived strains grouped into distinct clusters, referable to a noteworthy gene sharing between both groups. The brewery-specific plasmidome is characterized by several genes, which support a life in the harsh environment beer, but 40% of the brewery plasmidome were found in insect-derived strains as well. This suggests a close interaction between these habitats. Further analysis revealed the presence of a truncated horC cluster version in brewery- and insect-associated strains. This disproves horC, the major contributor to survival in beer, as brewery isolate specific. We conclude that L. brevis does not perform rigorous chromosomal changes to live in different habitats. Rather it appears that the species retains a certain genetic diversity in the plasmidome and meets the requirements of a particular ecological niche with the acquisition of appropriate plasmids.

Mercante JW, Caravas JA, Ishaq MK, et al (2018)

Genomic heterogeneity differentiates clinical and environmental subgroups of Legionella pneumophila sequence type 1.

PloS one, 13(10):e0206110 pii:PONE-D-18-17595.

Legionella spp. are the cause of a severe bacterial pneumonia known as Legionnaires' disease (LD). In some cases, current genetic subtyping methods cannot resolve LD outbreaks caused by common, potentially endemic L. pneumophila (Lp) sequence types (ST), which complicates laboratory investigations and environmental source attribution. In the United States (US), ST1 is the most prevalent clinical and environmental Lp sequence type. In order to characterize the ST1 population, we sequenced 289 outbreak and non-outbreak associated clinical and environmental ST1 and ST1-variant Lp strains from the US and, together with international isolate sequences, explored their genetic and geographic diversity. The ST1 population was highly conserved at the nucleotide level; 98% of core nucleotide positions were invariant and environmental isolates unassociated with human disease (n = 99) contained ~65% more nucleotide diversity compared to clinical-sporadic (n = 139) or outbreak-associated (n = 28) ST1 subgroups. The accessory pangenome of environmental isolates was also ~30-60% larger than other subgroups and was enriched for transposition and conjugative transfer-associated elements. Up to ~10% of US ST1 genetic variation could be explained by geographic origin, but considerable genetic conservation existed among strains isolated from geographically distant states and from different decades. These findings provide new insight into the ST1 population structure and establish a foundation for interpreting genetic relationships among ST1 strains; these data may also inform future analyses for improved outbreak investigations.

Kavvas ES, Catoiu E, Mih N, et al (2018)

Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance.

Nature communications, 9(1):4306 pii:10.1038/s41467-018-06634-y.

Mycobacterium tuberculosis is a serious human pathogen threat exhibiting complex evolution of antimicrobial resistance (AMR). Accordingly, the many publicly available datasets describing its AMR characteristics demand disparate data-type analyses. Here, we develop a reference strain-agnostic computational platform that uses machine learning approaches, complemented by both genetic interaction analysis and 3D structural mutation-mapping, to identify signatures of AMR evolution to 13 antibiotics. This platform is applied to 1595 sequenced strains to yield four key results. First, a pan-genome analysis shows that M. tuberculosis is highly conserved with sequenced variation concentrated in PE/PPE/PGRS genes. Second, the platform corroborates 33 genes known to confer resistance and identifies 24 new genetic signatures of AMR. Third, 97 epistatic interactions across 10 resistance classes are revealed. Fourth, detailed structural analysis of these genes yields mechanistic bases for their selection. The platform can be used to study other human pathogens.

Zoledowska S, Motyka-Pomagruk A, Sledz W, et al (2018)

High genomic variability in the plant pathogenic bacterium Pectobacterium parmentieri deciphered from de novo assembled complete genomes.

BMC genomics, 19(1):751 pii:10.1186/s12864-018-5140-9.

BACKGROUND: Pectobacterium parmentieri is a newly established species within the plant pathogenic family Pectobacteriaceae. Bacteria belonging to this species are causative agents of diseases in economically important crops (e.g. potato) in a wide range of different environmental conditions, encountered in Europe, North America, Africa, and New Zealand. Severe disease symptoms result from the activity of P. parmentieri virulence factors, such as plant cell wall degrading enzymes. Interestingly, we observe significant phenotypic differences among P. parmentieri isolates regarding virulence factors production and the abilities to macerate plants. To establish the possible genomic basis of these differences, we sequenced 12 genomes of P. parmentieri strains (10 isolated in Poland, 2 in Belgium) with the combined use of Illumina and PacBio approaches. De novo genome assembly was performed with the use of SPAdes software, while annotation was conducted by NCBI Prokaryotic Genome Annotation Pipeline.

RESULTS: The pan-genome study was performed on 15 genomes (12 de novo assembled and three reference strains: P. parmentieri CFBP 8475T, P. parmentieri SCC3193, P. parmentieri WPP163). The pan-genome includes 3706 core genes, a high number of accessory (1468) genes, and numerous unique (1847) genes. We identified the presence of well-known genes encoding virulence factors in the core genome fraction, but some of them were located in the dispensable genome. A significant fraction of horizontally transferred genes, virulence-related gene duplications, as well as different CRISPR arrays were found, which can explain the observed phenotypic differences. Finally, we found also, for the first time, the presence of a plasmid in one of the tested P. parmentieri strains isolated in Poland.

CONCLUSIONS: We can hypothesize that a large number of the genes in the dispensable genome and significant genomic variation among P. parmentieri strains could be the basis of the potential wide host range and widespread diffusion of P. parmentieri. The obtained data on the structure and gene content of P. parmentieri strains enabled us to speculate on the importance of high genomic plasticity for P. parmentieri adaptation to different environments.

Yu J, Golicz AA, Lu K, et al (2018)

Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars.

Plant biotechnology journal [Epub ahead of print].

Sesame (Sesamum indicum L.) is an important oil crop renowned for its high oil content and quality. Recently, genome assemblies for five sesame varieties including two landraces (S. indicum cv. Baizhima and Mishuozhima) and three modern cultivars (S. indicum var. Zhongzhi13, Yuzhi11 and Swetha), have become available providing a rich resource for comparative genomic analyses and gene discovery. Here, we employed a reference-assisted assembly approach to improve the draft assemblies of four of the sesame varieties. We then constructed a sesame pan-genome of 554.05 Mb. The pan-genome contained 26,472 orthologous gene clusters; 15,409 (58.21%) of them were core (present across all five sesame genomes), whereas the remaining 41.79% (11,063) clusters and the 15,890 variety-specific genes were dispensable. Comparisons between varieties suggest that modern cultivars from China and India display significant genomic variation. The gene families unique to the sesame modern cultivars contain genes mainly related to yield and quality, while those unique to the landraces contain genes involved in environmental adaptation. Comparative evolutionary analysis indicates that several genes involved in plant-pathogen interaction and lipid metabolism are under positive selection, which may be associated with sesame environmental adaption and selection for high seed oil content. This study of the sesame pan-genome provides insights into the evolution and genomic characteristics of this important oilseed and constitutes a resource for further sesame crop improvement. This article is protected by copyright. All rights reserved.

Bobay LM, H Ochman (2018)

Factors driving effective population size and pan-genome evolution in bacteria.

BMC evolutionary biology, 18(1):153 pii:10.1186/s12862-018-1272-4.

BACKGROUND: Knowledge of population-level processes is essential to understanding the efficacy of selection operating within a species. However, attempts at estimating effective population sizes (Ne) are particularly challenging in bacteria due to their extremely large census populations sizes, varying rates of recombination and arbitrary species boundaries.

RESULTS: In this study, we estimated Ne for 153 species (152 bacteria and one archaeon) defined under a common framework and found that ecological lifestyle and growth rate were major predictors of Ne; and that contrary to theoretical expectations, Ne was unaffected by recombination rate. Additionally, we found that Ne shapes the evolution and diversity of total gene repertoires of prokaryotic species.

CONCLUSION: Together, these results point to a new model of genome architecture evolution in prokaryotes, in which pan-genome sizes, not individual genome sizes, are governed by drift-barrier evolution.

Chun BH, Kim KH, Jeong SE, et al (2019)

Genomic and metabolic features of the Bacillus amyloliquefaciens group- B. amyloliquefaciens, B. velezensis, and B. siamensis- revealed by pan-genome analysis.

Food microbiology, 77:146-157.

The genomic and metabolic features of the Bacillus amyloliquefaciens group comprising B. amyloliquefaciens, B. velezensis, and B. siamensis were investigated through a pan-genome analysis combined with an experimental verification of some of the functions identified. All B. amyloliquefaciens group genomes were retrieved from GenBank and their phylogenetic relatedness was subsequently investigated. Genome comparisons of B. amyloliquefaciens, B. siamensis, and B. velezensis showed that their genomic and metabolic features were similar; however species-specific features were also identified. Energy metabolism-related genes are more enriched in B. amyloliquefaciens, whereas secondary metabolite biosynthesis-related genes are enriched in B. velezensis. Compared to B. amyloliquefaciens and B. siamensis, B. velezensis harbors more genes in its core-genome which are involved in the biosynthesis of antimicrobial compounds, as well as genes involved in d-galacturonate and d-fructuronate metabolism. B. amyloliquefaciens, B. siamensis, and B. velezensis all harbor a xanthine oxidase gene cluster (xoABCDE) in their core-genomes that is involved in metabolizing xanthine and uric acid to glycine and oxalureate. A reconstruction of B. amyloliquefaciens group metabolic pathways using their individual pan-genomes revealed that the B. amyloliquefaciens group strains have the ability to metabolize diverse carbon sources aerobically, or anaerobically, and can produce various metabolites such as lactate, ethanol, acetate, CO2, xylitol, diacetyl, acetoin, and 2,3-butanediol. This study therefore provides insights into the genomic and metabolic features of the B. amyloliquefaciens group.

Wright ES, DA Baum (2018)

Exclusivity offers a sound yet practical species criterion for bacteria despite abundant gene flow.

BMC genomics, 19(1):724 pii:10.1186/s12864-018-5099-6.

BACKGROUND: The question of whether bacterial species objectively exist has long divided microbiologists. A major source of contention stems from the fact that bacteria regularly engage in horizontal gene transfer (HGT), making it difficult to ascertain relatedness and draw boundaries between taxa. A natural way to define taxa is based on exclusivity of relatedness, which applies when members of a taxon are more closely related to each other than they are to any outsider. It is largely unknown whether exclusive bacterial taxa exist when averaging over the genome or are rare due to rampant hybridization.

RESULTS: Here, we analyze a collection of 701 genomes representing a wide variety of environmental isolates from the family Streptomycetaceae, whose members are competent at HGT. We find that the presence/absence of auxiliary genes in the pan-genome displays a hierarchical (tree-like) structure that correlates significantly with the genealogy of the core-genome. Moreover, we identified the existence of many exclusive taxa, although individual genes often contradict these taxa. These conclusions were supported by repeating the analysis on 1,586 genomes belonging to the genus Bacillus. However, despite confirming the existence of exclusive groups (taxa), we were unable to identify an objective threshold at which to assign the rank of species.

CONCLUSIONS: The existence of bacterial taxa is justified by considering average relatedness across the entire genome, as captured by exclusivity, but is rejected if one requires unanimous agreement of all parts of the genome. We propose using exclusivity to delimit taxa and conventional genome similarity thresholds to assign bacterial taxa to the species rank. This approach recognizes species that are phylogenetically meaningful, while also establishing some degree of comparability across species-ranked taxa in different bacterial clades.

Peng Y, Tang S, Wang D, et al (2018)

MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks.

GigaScience pii:5114262 [Epub ahead of print].

Pangenome analyses facilitate the interpretation of genetic diversity and evolutionary history of a taxon. However, there is an urgent and unmet need to develop new tools for advanced pangenome construction and visualization, especially for metagenomic data. Here we present an integrated pipeline, named MetaPGN, for construction and graphical visualization of pangenome network from either microbial genomes or metagenomes. Given either isolated genomes or metagenomic assemblies coupled with a reference genome of the targeted taxon, MetaPGN generates a pangenome in a topological network, consisting of genes (nodes) and gene-gene genomic adjacencies (edges) of which biological information can be easily updated and retrieved. MetaPGN also includes a self-developed Cytoscape plugin for layout of and interaction with the resulting pangenome network, providing an intuitive and interactive interface for full exploration of genetic diversity. We demonstrate the utility of MetaPGN by constructing Escherichia coli (E. coli) pangenome networks from five E. coli pathogenic strains and 760 human gut microbiomes respectively, revealing extensive genetic diversity of E. coli within both isolates and gut microbial populations. With the ability to extract and visualize gene contents and gene-gene physical adjacencies of a specific taxon from large-scale metagenomic data, MetaPGN provides advantages in expanding pangenome analysis to uncultured microbial taxa. MetaPGN is available at

Sharma V, Mobeen F, T Prakash (2018)

Exploration of Survival Traits, Probiotic Determinants, Host Interactions, and Functional Evolution of Bifidobacterial Genomes Using Comparative Genomics.

Genes, 9(10): pii:genes9100477.

Members of the genus Bifidobacterium are found in a wide-range of habitats and are used as important probiotics. Thus, exploration of their functional traits at the genus level is of utmost significance. Besides, this genus has been demonstrated to exhibit an open pan-genome based on the limited number of genomes used in earlier studies. However, the number of genomes is a crucial factor for pan-genome calculations. We have analyzed the pan-genome of a comparatively larger dataset of 215 members of the genus Bifidobacterium belonging to different habitats, which revealed an open nature. The pan-genome for the 56 probiotic and human-gut strains of this genus, was also found to be open. The accessory- and unique-components of this pan-genome were found to be under the operation of Darwinian selection pressure. Further, their genome-size variation was predicted to be attributed to the abundance of certain functions carried by genomic islands, which are facilitated by insertion elements and prophages. In silico functional and host-microbe interaction analyses of their core-genome revealed significant genomic factors for niche-specific adaptations and probiotic traits. The core survival traits include stress tolerance, biofilm formation, nutrient transport, and Sec-secretion system, whereas the core probiotic traits are imparted by the factors involved in carbohydrate- and protein-metabolism and host-immunomodulations.

Awan F, Dong Y, Liu J, et al (2018)

Comparative genome analysis provides deep insights into Aeromonas hydrophila taxonomy and virulence-related factors.

BMC genomics, 19(1):712 pii:10.1186/s12864-018-5100-4.

BACKGROUND: Aeromonas hydrophila is a potential zoonotic pathogen and primary fish pathogen. With overlapping characteristics, multiple isolates are often mislabelled and misclassified. Moreover, the potential pathogenic factors among the publicly available genomes in A. hydrophila strains of different origins have not yet been investigated.

RESULTS: To identify the valid strains of A. hydrophila and their pathogenic factors, we performed a pan-genomic study. It revealed that there were 13 mislabelled strains and 49 valid strains that were further verified by Average nucleotide identity (ANI), digital DNA-DNA hybridization (dDDH) and in silico multiple locus strain typing (MLST). Multiple numbers of phages were detected among the strains and among them Aeromonas phi 018 was frequently present. The diversity in type III secretion system (T3SS) and conservation of type II and type VI secretion systems (T2SS and T6SS, respectively) among all the strains are important to study for designing future strategies. The most prevalent antibiotic resistances were found to be beta-lactamase, polymyxin and colistin resistances. The comparative analyses of sequence type (ST) 251 and other ST groups revealed that there were higher numbers of virulence factors in ST-251 than in other STs group.

CONCLUSION: Publicly available genomes have 13 mislabelled organisms, and there are only 49 valid A. hydrophila strains. This valid pan-genome identifies multiple prophages that can be further utilized. Different A. hydrophila strains harbour multiple virulence factors and antibiotic resistance genes. Identification of such factors is important for designing future treatment regimes.

Sheikhizadeh Anari S, de Ridder D, Schranz ME, et al (2018)

Efficient inference of homologs in large eukaryotic pan-proteomes.

BMC bioinformatics, 19(1):340 pii:10.1186/s12859-018-2362-4.

BACKGROUND: Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, there is a need for efficient standalone tools to detect homologs in novel data.

RESULTS: To address this, we present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. We demonstrate accuracy, scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa.

CONCLUSIONS: We clearly observed the trade-off between recall and precision in our homology inference. Favoring recall or precision strongly depends on the application. The clustering behavior of our program can be optimized for particular applications by altering a few key parameters. The program is available for public use at as an extension to our pan-genomic analysis tool, PanTools.

Wang LYR, Jokinen CC, Laing CR, et al (2018)

Multi-Year Persistence of Verotoxigenic Escherichia coli (VTEC) in a Closed Canadian Beef Herd: A Cohort Study.

Frontiers in microbiology, 9:2040.

In this study, fecal samples were collected from a closed beef herd in Alberta, Canada from 2012 to 2015. To limit serotype bias, which was observed in enrichment broth cultures, Verotoxigenic Escherichia coli (VTEC) were isolated directly from samples using a hydrophobic grid-membrane filter verotoxin immunoblot assay. Overall VTEC isolation rates were similar for three different cohorts of yearling heifers on both an annual (68.5 to 71.8%) and seasonal basis (67.3 to 76.0%). Across all three cohorts, O139:H19 (37.1% of VTEC-positive samples), O22:H8 (15.8%) and O?(O108):H8 (15.4%) were among the most prevalent serotypes. However, isolation rates for serotypes O139:H19, O130:H38, O6:H34, O91:H21, and O113:H21 differed significantly between cohort-years, as did isolation rates for some serotypes within a single heifer cohort. There was a high level of VTEC serotype diversity with an average of 4.3 serotypes isolated per heifer and 65.8% of the heifers classified as "persistent shedders" of VTEC based on the criteria of >50% of samples positive and ≥4 consecutive samples positive. Only 26.8% (90/336) of the VTEC isolates from yearling heifers belonged to the human disease-associated seropathotypes A (O157:H7), B (O26:H11, O111:NM), and C (O22:H8, O91:H21, O113:H21, O137:H41, O2:H6). Conversely, seropathotypes B (O26:NM, O111:NM) and C (O91:H21, O2:H29) strains were dominant (76.0%, 19/25) among VTEC isolates from month-old calves from this herd. Among VTEC from heifers, carriage rates of vt1, vt2, vt1+vt2, eae, and hlyA were 10.7, 20.8, 68.5, 3.9, and 88.7%, respectively. The adhesin gene saa was present in 82.7% of heifer strains but absent from all of 13 eae+ve strains (from serotypes/intimin types O157:H7/γ1, O26:H11/β1, O111:NM/θ, O84:H2/ζ, and O182:H25/ζ). Phylogenetic relationships inferred from wgMLST and pan genome-derived core SNP analysis showed that strains clustered by phylotype and serotype. Further, VTEC strains of the same serotype usually shared the same suite of antibiotic resistance and virulence genes, suggesting the circulation of dominant clones within this distinct herd. This study provides insight into the diverse and dynamic nature of VTEC populations within groups of cattle and points to a broad spectrum of human health risks associated with these E. coli strains.

Golanowska M, Potrykus M, Motyka-Pomagruk A, et al (2018)

Comparison of Highly and Weakly Virulent Dickeya solani Strains, With a View on the Pangenome and Panregulon of This Species.

Frontiers in microbiology, 9:1940.

Bacteria belonging to the genera Dickeya and Pectobacterium are responsible for significant economic losses in a wide variety of crops and ornamentals. During last years, increasing losses in potato production have been attributed to the appearance of Dickeya solani. The D. solani strains investigated so far share genetic homogeneity, although different virulence levels were observed among strains of various origins. The purpose of this study was to investigate the genetic traits possibly related to the diverse virulence levels by means of comparative genomics. First, we developed a new genome assembly pipeline which allowed us to complete the D. solani genomes. Four de novo sequenced and ten publicly available genomes were used to identify the structure of the D. solani pangenome, in which 74.8 and 25.2% of genes were grouped into the core and dispensable genome, respectively. For D. solani panregulon analysis, we performed a binding site prediction for four transcription factors, namely CRP, KdgR, PecS and Fur, to detect the regulons of these virulence regulators. Most of the D. solani potential virulence factors were predicted to belong to the accessory regulons of CRP, KdgR, and PecS. Thus, some differences in gene expression could exist between D. solani strains. The comparison between a highly and a low virulent strain, IFB0099 and IFB0223, respectively, disclosed only small differences between their genomes but significant differences in the production of virulence factors like pectinases, cellulases and proteases, and in their mobility. The D. solani strains also diverge in the number and size of prophages present in their genomes. Another relevant difference is the disruption of the adhesin gene fhaB2 in the highly virulent strain. Strain IFB0223, which has a complete adhesin gene, is less mobile and less aggressive than IFB0099. This suggests that in this case, mobility rather than adherence is needed in order to trigger disease symptoms. This study highlights the utility of comparative genomics in predicting D. solani traits involved in the aggressiveness of this emerging plant pathogen.

Bayer PE, Golicz AA, Tirnaz S, et al (2018)

Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome.

Plant biotechnology journal [Epub ahead of print].

Brassica oleracea is an important agricultural species encompassing many vegetable crops including cabbage, cauliflower, broccoli and kale, however it can be susceptible to a variety of fungal diseases such as clubroot, blackleg, leaf spot, and downy mildew. Resistance to these diseases is meditated by specific disease resistance genes-analogs (RGAs) which are differently distributed across B. oleracea lines. The sequenced reference cultivar does not contain all B. oleracea genes due to gene presence/absence variation between individuals, which makes it necessary to search for RGA candidates in the B. oleracea pangenome. Here we present a comparative analysis of RGA candidates in the pangenome of B. oleracea. We show that the presence of RGA candidates differs between lines and suggest that in B. oleracea, SNPs and presence/absence variation drive RGA diversity using separate mechanisms. We identified 32 RGA candidates linked to Sclerotinia, clubroot, and Fusarium wilt resistance QTL, and these findings have implications for crop breeding in B. oleracea, which may also be applicable in other crops species. This article is protected by copyright. All rights reserved.

Checcucci A, diCenzo G, Ghini V, et al (2018)

Creation and characterization of a genomically hybrid strain in the nitrogen-fixing symbiotic bacterium Sinorhizobium meliloti.

ACS synthetic biology [Epub ahead of print].

Many bacteria, often associated with eukaryotic hosts and of relevance for biotechnological applications, harbour a multipartite genome composed of more than one replicon. Biotechnologically relevant phenotypes are often encoded by genes residing on the secondary replicons. A synthetic biology approach to developing enhanced strains for biotechnological purposes could therefore involve merging pieces or entire replicons from multiple strains into a single genome. Here we report the creation of a genomic hybrid strain in a model multipartite genome species, the plant-symbiotic bacterium Sinorhizobium meliloti. We term this strain as cis-hybrid, since it is produced by genomic material coming from the same species' pangenome. In particular, we moved the secondary replicon pSymA (accounting for nearly 20% of total genome content) from a donor S. meliloti strain to an acceptor strain. The cis-hybrid strain was screened for a panel of complex phenotypes (carbon/nitrogen utilization phenotypes, intra- and extra-cellular metabolomes, symbiosis, and various microbiological tests). Additionally, metabolic network reconstruction and constraint-based modelling were employed for in silico prediction of metabolic flux reorganization. Phenotypes of the cis-hybrid strain were in good agreement with those of both parental strains. Interestingly, the symbiotic phenotype showed a marked cultivar-specific improvement with the cis-hybrid strains compared to both parental strains. These results provide a proof-of-principle for the feasibility of genome-wide replicon-based remodelling of bacterial strains for improved biotechnological applications in precision agriculture.

Le KK, Whiteside MD, Hopkins JE, et al (2018)

Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses.

Database : the journal of biological databases and curation, 2018:1-10 pii:5096058.

Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies. The Spfy platform facilitates rapid phenotype identification, as well as the efficient storage and downstream comparative analysis of tens of thousands of genome sequences. Though generally applicable to bacterial genome sequences, Spfy currently contains 10 243 Escherichia coli genomes, for which in-silico serotype and Shiga-toxin subtype, as well as the presence of known virulence factors and antimicrobial resistance determinants have been computed. Additionally, the presence/absence of the entire E. coli pan-genome was computed and linked to each genome. Owing to its database of diverse pre-computed results, and the ability to easily incorporate user data, Spfy facilitates hypothesis testing in fields ranging from population genomics to epidemiology, while mitigating the re-computation of analyses. The graph approach of Spfy is flexible, and can accommodate new analysis software modules as they are developed, easily linking new results to those already stored. Spfy provides a database and analyses approach for E. coli that is able to match the rapid accumulation of WGS data in public databases.

Kavya VNS, Tayal K, Srinivasan R, et al (2018)

Sequence Alignment on Directed Graphs.

Journal of computational biology : a journal of computational molecular cell biology [Epub ahead of print].

Genomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAGs). To achieve this, acyclic extensions of the input graphs are first constructed through expensive loop unrolling steps (DAGification). Furthermore, such graph extensions could have considerable blowup in their size and in the worst case the blow-up factor is proportional to the input sequence length. We provide a novel alignment algorithm V-ALIGN that aligns the input sequence directly on the input graph while avoiding such expensive DAGification steps. V-ALIGN is based on a novel dynamic programming (DP) formulation that allows gapped alignment directly on the input graph. It supports affine and linear gaps. We also propose refinements to V-ALIGN for better performance in practice. With the proposed refinements, the time to fill the DP table has linear dependence on the sizes of the sequence, the graph, and its feedback vertex set. We conducted experiments to compare the proposed algorithm against the existing POA-based techniques. We also performed alignment experiments on the genome variation graphs constructed from the 1000 Genomes data. For aligning short sequences, standard approaches restrict the expensive gapped alignment to small filtered subgraphs having high similarity to the input sequence. In such cases, the performance of V-ALIGN for gapped alignment on the filtered subgraph depends on the subgraph sizes.

Chen X, Zhang Y, Zhang Z, et al (2018)

PGAweb: A Web Server for Bacterial Pan-Genome Analysis.

Frontiers in microbiology, 9:1910.

An astronomical increase in microbial genome data in recent years has led to strong demand for bioinformatic tools for pan-genome analysis within and across species. Here, we present PGAweb, a user-friendly, web-based tool for bacterial pan-genome analysis, which is composed of two main pan-genome analysis modules, PGAP and PGAP-X. PGAweb provides key interactive and customizable functions that include orthologous clustering, pan-genome profiling, sequence variation and evolution analysis, and functional classification. PGAweb presents features of genomic structural dynamics and sequence diversity with different visualization methods that are helpful for intuitively understanding the dynamics and evolution of bacterial genomes. PGAweb has an intuitive interface with one-click setting of parameters and is freely available at

Syme RA, Tan KC, Rybak K, et al (2018)

Pan-Parastagonospora Comparative Genome Analysis - effector prediction and genome evolution.

Genome biology and evolution pii:5090454 [Epub ahead of print].

We report a fungal pan-genome study involving Parastagonospora spp., including 21 isolates of the wheat (Triticum aestivum) pathogen P. nodorum, 10 of the grass-infecting P. avenae and 2 of a closely-related undefined sister species. We observed substantial variation in the distribution of polymorphisms across the pan-genome, including repeat-induced point mutations (RIP), diversifying selection and gene gains and losses. We also discovered chromosome-scale inter and intra-specific presence/absence variation of some sequences, suggesting the occurrence of one or more accessory chromosomes or regions that may play a role in host-pathogen interactions.The presence of known pathogenicity effector loci SnToxA, SnTox1 and SnTox3 varied substantially among isolates. Three P. nodorum isolates lacked functional versions for all three loci whilst three P. avenae isolates carried one or both of the SnTox1 and SnTox3 genes, indicating previously unrecognized potential for discovering additional effectors in the P. nodorum-wheat pathosystem. We utilized the pan-genomic comparative analysis to improve the prediction of pathogenicity effector candidates, recovering the three confirmed effectors among our top-ranked candidates. We propose applying this pan-genomic approach to identify the effector repertoire involved in other host-microbe interactions involving necrotrophic pathogens in the Pezizomycotina.

Yang T, Zhong J, Zhang J, et al (2018)

Pan-Genomic Study of Mycobacterium tuberculosis Reflecting the Primary/Secondary Genes, Generality/Individuality, and the Interconversion Through Copy Number Variations.

Frontiers in microbiology, 9:1886.

Tuberculosis (TB) has surpassed HIV as the leading infectious disease killer worldwide since 2014. The main pathogen, Mycobacterium tuberculosis (Mtb), contains ~4,000 genes that account for ~90% of the genome. However, it is still unclear which of these genes are primary/secondary, which are responsible for generality/individuality, and which interconvert during evolution. Here we utilized a pan-genomic analysis of 36 Mtb genomes to address these questions. We identified 3,679 Mtb core (i.e., primary) genes, determining their phenotypic generality (e.g., virulence, slow growth, dormancy). We also observed 1,122 dispensable and 964 strain-specific secondary genes, reflecting partially shared and lineage-/strain-specific individualities. Among which, five L2 lineage-specific genes might be related to the increased virulence of the L2 lineage. Notably, we discovered 28 Mtb "Super Core Genes" (SCGs: more than a copy in at least 90% strains), which might be of increased importance, and reflected the "super phenotype generality." Most SCGs encode PE/PPE, virulence factors, antigens, and transposases, and have been verified as playing crucial roles in Mtb pathogenicity. Further investigation of the 28 SCGs demonstrated the interconversion among SCGs, single-copy core, dispensable, and strain-specific genes through copy number variations (CNVs) during evolution; different mutations on different copies highlight the delicate adaptive-evolution regulation amongst Mtb lineages. This reflects that the importance of genes varied through CNVs, which might be driven by selective pressure from environment/host-adaptation. In addition, compared with Mycobacterium bovis (Mbo), Mtb possesses 48 specific single core genes that partially reflect the differences between Mtb and Mbo individuality.

Asaf S, Khan AL, Khan MA, et al (2018)

Complete genome sequencing and analysis of endophytic Sphingomonas sp. LK11 and its potential in plant growth.

3 Biotech, 8(9):389.

Our study aimed to elucidate the plant growth-promoting characteristics and the structure and composition of Sphingomonas sp. LK11 genome using the single molecule real-time (SMRT) sequencing technology of Pacific Biosciences. The results revealed that LK11 produces different types of gibberellins (GAs) in pure culture and significantly improves soybean plant growth by influencing endogenous GAs compared with non-inoculated control plants. Detailed genomic analyses revealed that the Sphingomonas sp. LK11 genome consists of a circular chromosome (3.78 Mbp; 66.2% G+C content) and two circular plasmids (122,975 bps and 34,160 bps; 63 and 65% G+C content, respectively). Annotation showed that the LK11 genome consists of 3656 protein-coding genes, 59 tRNAs, and 4 complete rRNA operons. Functional analyses predicted that LK11 encodes genes for phosphate solubilization and nitrate/nitrite ammonification, which are beneficial for promoting plant growth. Genes for production of catalases, superoxide dismutase, and peroxidases that confer resistance to oxidative stress in plants were also identified in LK11. Moreover, genes for trehalose and glycine betaine biosynthesis were also found in LK11 genome. Similarly, Sphingomonas spp. analysis revealed an open pan-genome and a total of 8507 genes were identified in the Sphingomonas spp. pan-genome and about 1356 orthologous genes were found to comprise the core genome. However, the number of genomes analyzed was not enough to describe complete gene sets. Our findings indicated that the genetic makeup of Sphingomonas sp. LK11 can be utilized as an eco-friendly bioresource for cleaning contaminated sites and promoting growth of plants confronted with environmental perturbations.

Kiu R, LJ Hall (2018)

Response: Commentary: Probing Genomic Aspects of the Multi-Host Pathogen Clostridium perfringens Reveals Significant Pangenome Diversity, and a Diverse Array of Virulence Factors.

Frontiers in microbiology, 9:1857.

RevDate: 2018-08-30

Large-Scale Comparative Analysis of Microbial Pan-genomes using PanOCT.

Bioinformatics (Oxford, England) pii:5079328 [Epub ahead of print].

Summary: The JCVI Pan-Genome Pipeline is a collection of programs to run PanOCT and tools that support and extend the capabilities of PanOCT. PanOCT (Pan-genome Ortholog Clustering Tool) is a tool for pan-genome analysis of closely related prokaryotic species or strains. The JCVI Pan-Genome Pipeline wrapper invokes command-line utilities that prepare input genomes, invoke third-party tools such as NCBI Blast+, run PanOCT, generate a consensus pan-genome, annotate features of the pan-genome, detect sets of genes of interest such as antimicrobial resistance (AMR) genes, and generate figures, tables, and html pages to visualize the results. The pipeline can run in a hierarchical mode, lowering the RAM and compute resources used.

Availability: Source code, demo data, and detailed documentation are freely available at

Mehdizadeh Gohari I, JF Prescott (2018)

Commentary: Probing Genomic Aspects of the Multi-Host Pathogen Clostridium perfringens Reveals Significant Pangenome Diversity, and a Diverse Array of Virulence Factors.

Frontiers in microbiology, 9:1856.

RevDate: 2018-08-28

The Genome Biology of Effector Gene Evolution in Filamentous Plant Pathogens.

Annual review of phytopathology, 56:21-40.

Filamentous pathogens, including fungi and oomycetes, pose major threats to global food security. Crop pathogens cause damage by secreting effectors that manipulate the host to the pathogen's advantage. Genes encoding such effectors are among the most rapidly evolving genes in pathogen genomes. Here, we review how the major characteristics of the emergence, function, and regulation of effector genes are tightly linked to the genomic compartments where these genes are located in pathogen genomes. The presence of repetitive elements in these compartments is associated with elevated rates of point mutations and sequence rearrangements with a major impact on effector diversification. The expression of many effectors converges on an epigenetic control mediated by the presence of repetitive elements. Population genomics analyses showed that rapidly evolving pathogens show high rates of turnover at effector loci and display a mosaic in effector presence-absence polymorphism among strains. We conclude that effective pathogen containment strategies require a thorough understanding of the effector genome biology and the pathogen's potential for rapid adaptation.

Ou L, Li D, Lv J, et al (2018)

Pan-genome of cultivated pepper (Capsicum) and its use in gene presence-absence variation analyses.

Argemi X, Matelska D, Ginalski K, et al (2018)

Comparative genomic analysis of Staphylococcus lugdunensis shows a closed pan-genome and multiple barriers to horizontal gene transfer.

BMC genomics, 19(1):621 pii:10.1186/s12864-018-4978-1.

RESULTS: We demonstrate that S. lugdunensis possesses a closed pan-genome with a very limited number of new genes, in contrast to other staphylococci that have an open pan-genome. Whole-genome nucleotide and amino acid identity levels are also higher than in other staphylococci. We identified numerous genetic barriers to horizontal gene transfer that might explain this result. The S. lugdunensis genome has multiple operons encoding for restriction-modification, CRISPR/Cas and toxin/antitoxin systems. We also identified a new PIN-like domain-associated protein that might belong to a larger operon, comprising a metalloprotease, that could function as a new toxin/antitoxin or detoxification system.

CONCLUSION: We show that S. lugdunensis has a unique genome profile within staphylococci, with a closed pan-genome and several systems to prevent horizontal gene transfer. Its virulence in clinical settings does not rely on its ability to acquire and exchange antibiotic resistance genes or other virulence factors as shown for other staphylococci.

Pena-Gonzalez A, Rodriguez-R LM, Marston CK, et al (2018)

Genomic Characterization and Copy Number Variation of Bacillus anthracis Plasmids pXO1 and pXO2 in a Historical Collection of 412 Strains.

mSystems, 3(4): pii:mSystems00065-18.

Bacillus anthracis plasmids pXO1 and pXO2 carry the main virulence factors responsible for anthrax. However, the extent of copy number variation within the species and how the plasmids are related to pXO1/pXO2-like plasmids in other species of the Bacillus cereus sensu lato group remain unclear. To gain new insights into these issues, we sequenced 412 B. anthracis strains representing the total phylogenetic and ecological diversity of the species. Our results revealed that B. anthracis genomes carried, on average, 3.86 and 2.29 copies of pXO1 and pXO2, respectively, and also revealed a positive linear correlation between the copy numbers of pXO1 and pXO2. No correlation between the plasmid copy number and the phylogenetic relatedness of the strains was observed. However, genomes of strains isolated from animal tissues generally maintained a higher plasmid copy number than genomes of strains from environmental sources (P < 0.05 [Welch two-sample t test]). Comparisons against B. cereus genomes carrying complete or partial pXO1-like and pXO2-like plasmids showed that the plasmid-based phylogeny recapitulated that of the main chromosome, indicating limited plasmid horizontal transfer between or within these species. Comparisons of gene content revealed a closed pXO1 and pXO2 pangenome; e.g., plasmids encode <8 unique genes, on average, and a single large fragment deletion of pXO1 in one B. anthracis strain (2000031682) was detected. Collectively, our results provide a more complete view of the genomic diversity of B. anthracis plasmids, their copy number variation, and the virulence potential of other Bacillus species carrying pXO1/pXO2-like plasmids. IMPORTANCE Bacillus anthracis microorganisms are of historical and epidemiological importance and are among the most homogenous bacterial groups known, even though the B. anthracis genome is rich in mobile elements. Mobile elements can trigger the diversification of lineages; therefore, characterizing the extent of genomic variation in a large collection of strains is critical for a complete understanding of the diversity and evolution of the species. Here, we sequenced a large collection of B. anthracis strains (>400) that were recovered from human, animal, and environmental sources around the world. Our results confirmed the remarkable stability of gene content and synteny of the anthrax plasmids and revealed no signal of plasmid exchange between B. anthracis and pathogenic B. cereus isolates but rather predominantly vertical descent. These findings advance our understanding of the biology and pathogenomic evolution of B. anthracis and its plasmids.

Thind AK, Wicker T, Müller T, et al (2018)

Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome dynamics between two wheat cultivars.

Genome biology, 19(1):104 pii:10.1186/s13059-018-1477-2.

BACKGROUND: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the dynamics of wheat genomes on a megabase scale.

RESULTS: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes-the old landrace Chinese Spring and the elite Swiss spring wheat line 'CH Campala Lr22a'. Both chromosomes were assembled into megabase-sized scaffolds. There is a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations reveals four large indels of more than 100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the molecular mechanisms that caused these indels. Three of the large indels affect copy number of NLRs, a gene family involved in plant immunity. Analysis of SNP density reveals four haploblocks of 4, 8, 9 and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Gene content across the two chromosomes was highly conserved. Ninety-nine percent of the genic sequences were present in both genotypes and the fraction of unique genes ranged from 0.4 to 0.7%.

CONCLUSIONS: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations and gene content. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

Cleary A, Ramaraj T, Kahanda I, et al (2018)

Exploring Frequented Regions in Pan-Genomic Graphs.

IEEE/ACM transactions on computational biology and bioinformatics [Epub ahead of print].

We consider the problem of identifying regions within a pan-genome De Bruijn graph that are traversed by many sequence paths. We define such regions and the subpaths that traverse them as frequented regions (FRs). In this work, we formalize the FR problem and describe an efficient algorithm for finding FRs. Subsequently, we propose some applications of FRs based on machine-learning and pan-genome graph simplification. We demonstrate the effectiveness of these applications using data sets for the organisms Staphylococcus aureus (bacterium) and Saccharomyces cerevisiae (yeast). We corroborate the biological relevance of FRs such as identifying introgressions in yeast that aid in alcohol tolerance, and show that FRs are useful for classification of yeast strains by industrial use and visualizing pan-genomic space.

Das S, Pettersson BMF, Behra PRK, et al (2018)

Extensive genomic diversity among Mycobacterium marinum strains revealed by whole genome sequencing.

Scientific reports, 8(1):12040 pii:10.1038/s41598-018-30152-y.

Mycobacterium marinum is the causative agent for the tuberculosis-like disease mycobacteriosis in fish and skin lesions in humans. Ubiquitous in its geographical distribution, M. marinum is known to occupy diverse fish as hosts. However, information about its genomic diversity is limited. Here, we provide the genome sequences for 15 M. marinum strains isolated from infected humans and fish. Comparative genomic analysis of these and four available genomes of the M. marinum strains M, E11, MB2 and Europe reveal high genomic diversity among the strains, leading to the conclusion that M. marinum should be divided into two different clusters, the "M"- and the "Aronson"-type. We suggest that these two clusters should be considered to represent two M. marinum subspecies. Our data also show that the M. marinum pan-genome for both groups is open and expanding and we provide data showing high number of mutational hotspots in M. marinum relative to other mycobacteria such as Mycobacterium tuberculosis. This high genomic diversity might be related to the ability of M. marinum to occupy different ecological niches.

Pluta R, M Espinosa (2018)

Antisense and yet sensitive: Copy number control of rolling circle-replicating plasmids by small RNAs.

Wiley interdisciplinary reviews. RNA [Epub ahead of print].

Bacterial plasmids constitute a wealth of shared DNA amounting to about 20% of the total prokaryotic pangenome. Plasmids replicate autonomously and control their replication by maintaining a fairly constant number of copies within a given host. Plasmids should acquire a good fitness to their hosts so that they do not constitute a genetic load. Here we review some basic concepts in plasmid biology, pertaining to the control of replication and distribution of plasmid copies among daughter cells. A particular class of plasmids is constituted by those that replicate by the rolling circle mode (rolling circle-replicating [RCR]-plasmids). They are small double-stranded DNA molecules, with a rather high number of copies in the original host. RCR-plasmids control their replication by means of a small short-lived antisense RNA, alone or in combination with a plasmid-encoded transcriptional repressor protein. Two plasmid prototypes have been studied in depth, namely the staphylococcal plasmid pT181 and the streptococcal plasmid pMV158, each corresponding to the two types of replication control circuits, respectively. We further discuss possible applications of the plasmid-encoded antisense RNAs and address some future directions that, in our opinion, should be pursued in the study of these small molecules. This article is categorized under: Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs RNA Structure and Dynamics > Influence of RNA Structure in Biological Systems.

González-Torres P, T Gabaldón (2018)

Genome Variation in the Model Halophilic Bacterium Salinibacter ruber.

Frontiers in microbiology, 9:1499.

The halophilic bacterium Salinibacter ruber is an abundant and ecologically important member of halophilic communities worldwide. Given its broad distribution and high intraspecific genetic diversity, S. ruber is considered one of the main models for ecological and evolutionary studies of bacterial adaptation to hypersaline environments. However, current insights on the genomic diversity of this species is limited to the comparison of the genomes of two co-isolated strains. Here, we present a comparative genomic analysis of eight S. ruber strains isolated at two different time points in each of two different Mediterranean solar salterns. Our results show an open pangenome with contrasting evolutionary patterns in the core and accessory genomes. We found that the core genome is shaped by extensive homologous recombination (HR), which results in limited sequence variation within population clusters. In contrast, the accessory genome is modulated by horizontal gene transfer (HGT), with genomic islands and plasmids acting as gateways to the rest of the genome. In addition, both types of genetic exchange are modulated by restriction and modification (RM) or CRISPR-Cas systems. Finally, genes differentially impacted by such processes reveal functional processes potentially relevant for environmental interactions and adaptation to extremophilic conditions. Altogether, our results support scenarios that conciliate "Neutral" and "Constant Diversity" models of bacterial evolution.

Springer NM, Anderson SN, Andorf CM, et al (2018)

The maize W22 genome provides a foundation for functional genomics and transposon biology.

Nature genetics pii:10.1038/s41588-018-0158-0 [Epub ahead of print].

The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using short-read sequencing technologies. We show that significant structural heterogeneity exists in comparison to the B73 reference genome at multiple scales, from transposon composition and copy number variation to single-nucleotide polymorphisms. The generation of this reference genome enables accurate placement of thousands of Mutator (Mu) and Dissociation (Ds) transposable element insertions for reverse and forward genetics studies. Annotation of the genome has been achieved using RNA-seq analysis, differential nuclease sensitivity profiling and bisulfite sequencing to map open reading frames, open chromatin sites and DNA methylation profiles, respectively. Collectively, the resources developed here integrate W22 as a community reference genome for functional genomics and provide a foundation for the maize pan-genome.

Wolf IR, Paschoal AR, Quiroga C, et al (2018)

Functional annotation and distribution overview of RNA families in 27 Streptococcus agalactiae genomes.

BMC genomics, 19(1):556 pii:10.1186/s12864-018-4951-z.

BACKGROUND: Streptococcus agalactiae, also known as Group B Streptococcus (GBS), is a Gram-positive bacterium that colonizes the gastrointestinal and genitourinary tract of humans. This bacterium has also been isolated from various animals, such as fish and cattle. Non-coding RNAs (ncRNAs) can act as regulators of gene expression in bacteria, such as Streptococcus pneumoniae and Streptococcus pyogenes. However, little is known about the genomic distribution of ncRNAs and RNA families in S. agalactiae.

RESULTS: Comparative genome analysis of 27 S. agalactiae strains showed more than 5 thousand genomic regions identified and classified as Core, Exclusive, and Shared genome sequences. We identified 27 to 89 RNA families per genome distributed over these regions, from these, 25 were in Core regions while Shared and Exclusive regions showed variations amongst strains. We propose that the amount and type of ncRNA present in each genome can provide a pattern to contribute in the identification of the clonal types.

CONCLUSIONS: The identification of RNA families provides an insight over ncRNAs, sRNAs and ribozymes function, that can be further explored as targets for antibiotic development or studied in gene regulation of cellular processes. RNA families could be considered as markers to determine infection capabilities of different strains. Lastly, pan-genome analysis of GBS including the full range of functional transcripts provides a broader approach in the understanding of this pathogen.

RevDate: 2018-07-27

Luo Y, Cheng Y, Yi J, et al (2018)

Complete Genome Sequence of Industrial Biocontrol Strain Paenibacillus polymyxa HY96-2 and Further Analysis of Its Biocontrol Mechanism.

Frontiers in microbiology, 9:1520.

Paenibacillus polymyxa (formerly known as Bacillus polymyxa) has been extensively studied for agricultural applications as a plant-growth-promoting rhizobacterium and is also an important biocontrol agent. Our team has developed the P. polymyxa strain HY96-2 from the tomato rhizosphere as the first microbial biopesticide based on P. polymyxa for controlling plant diseases around the world, leading to the commercialization of this microbial biopesticide in China. However, further research is essential for understanding its precise biocontrol mechanisms. In this paper, we report the complete genome sequence of HY96-2 and the results of a comparative genomic analysis between different P. polymyxa strains. The complete genome size of HY96-2 was found to be 5.75 Mb and 5207 coding sequences were predicted. HY96-2 was compared with seven other P. polymyxa strains for which complete genome sequences have been published, using phylogenetic tree, pan-genome, and nucleic acid co-linearity analysis. In addition, the genes and gene clusters involved in biofilm formation, antibiotic synthesis, and systemic resistance inducer production were compared between strain HY96-2 and two other strains, namely, SC2 and E681. The results revealed that all three of the P. polymyxa strains have the ability to control plant diseases via the mechanisms of colonization (biofilm formation), antagonism (antibiotic production), and induced resistance (systemic resistance inducer production). However, the variation of the corresponding genes or gene clusters between the three strains may lead to different antimicrobial spectra and biocontrol efficacies. Two possible pathways of biofilm formation in P. polymyxa were reported for the first time after searching the KEGG database. This study provides a scientific basis for the further optimization of the field applications and quality standards of industrial microbial biopesticides based on HY96-2. It may also serve as a reference for studying the differences in antimicrobial spectra and biocontrol capability between different biocontrol agents.

RevDate: 2018-07-25

Aherfi S, Andreani J, Baptiste E, et al (2018)

A Large Open Pangenome and a Small Core Genome for Giant Pandoraviruses.

Frontiers in microbiology, 9:1486.

Giant viruses of amoebae are distinct from classical viruses by the giant size of their virions and genomes. Pandoraviruses are the record holders in size of genomes and number of predicted genes. Three strains, P. salinus, P. dulcis, and P. inopinatum, have been described to date. We isolated three new ones, namely P. massiliensis, P. braziliensis, and P. pampulha, from environmental samples collected in Brazil. We describe here their genomes, the transcriptome and proteome of P. massiliensis, and the pangenome of the group encompassing the six pandoravirus isolates. Genome sequencing was performed with an Illumina MiSeq instrument. Genome annotation was performed using GeneMarkS and Prodigal softwares and comparative genomic analyses. The core genome and pangenome were determined using notably ProteinOrtho and CD-HIT programs. Transcriptomics was performed for P. massiliensis with the Illumina MiSeq instrument; proteomics was also performed for this virus using 1D/2D gel electrophoresis and mass spectrometry on a Synapt G2Si Q-TOF traveling wave mobility spectrometer. The genomes of the three new pandoraviruses are comprised between 1.6 and 1.8 Mbp. The genomes of P. massiliensis, P. pampulha, and P. braziliensis were predicted to harbor 1,414, 2,368, and 2,696 genes, respectively. These genes comprise up to 67% of ORFans. Phylogenomic analyses showed that P. massiliensis and P. braziliensis were more closely related to each other than to the other pandoraviruses. The core genome of pandoraviruses comprises 352 clusters of genes, and the ratio core genome/pangenome is less than 0.05. The extinction curve shows clearly that the pangenome is still open. A quarter of the gene content of P. massiliensis was detected by transcriptomics. In addition, a product for a total of 162 open reading frames were found by proteomic analysis of P. massiliensis virions, including notably the products of 28 ORFans, 99 hypothetical proteins, and 90 core genes. Further analyses should allow to gain a better knowledge and understanding of the evolution and origin of these giant pandoraviruses, and of their relationships with viruses and cellular microorganisms.

RevDate: 2018-07-23

Fleshman A, Mullins K, Sahl J, et al (2018)

Comparative pan-genomic analyses of Orientia tsutsugamushi reveal an exceptional model of bacterial evolution driving genomic diversity.

Microbial genomics [Epub ahead of print].

Orientia tsutsugamushi, formerly Rickettsia tsutsugamushi, is an obligate intracellular pathogen that causes scrub typhus, an underdiagnosed acute febrile disease with high morbidity. Scrub typhus is transmitted by the larval stage (chigger) of Leptotrombidium mites and is irregularly distributed across endemic regions of Asia, Australia and islands of the western Pacific Ocean. Previous work to understand population genetics in O. tsutsugamushi has been based on sub-genomic sampling methods and whole-genome characterization of two genomes. In this study, we compared 40 genomes from geographically dispersed areas and confirmed patterns of extensive homologous recombination likely driven by transposons, conjugative elements and repetitive sequences. High rates of lateral gene transfer (LGT) among O. tsutsugamushi genomes appear to have effectively eliminated a detectable clonal frame, but not our ability to infer evolutionary relationships and phylogeographical clustering. Pan-genomic comparisons using 31 082 high-quality bacterial genomes from 253 species suggests that genomic duplication in O. tsutsugamushi is almost unparalleled. Unlike other highly recombinant species where the uptake of exogenous DNA largely drives genomic diversity, the pan-genome of O. tsutsugamushi is driven by duplication and divergence. Extensive gene innovation by duplication is most commonly attributed to plants and animals and, in contrast with LGT, is thought to be only a minor evolutionary mechanism for bacteria. The near unprecedented evolutionary characteristics of O. tsutsugamushi, coupled with extensive intra-specific LGT, expand our present understanding of rapid bacterial evolutionary adaptive mechanisms.

RevDate: 2018-07-23

Zhou Z, Lundstrøm I, Tran-Dien A, et al (2018)

Pan-genome Analysis of Ancient and Modern Salmonella enterica Demonstrates Genomic Stability of the Invasive Para C Lineage for Millennia.

Current biology : CB pii:S0960-9822(18)30694-8 [Epub ahead of print].

Salmonella enterica serovar Paratyphi C causes enteric (paratyphoid) fever in humans. Its presentation can range from asymptomatic infections of the blood stream to gastrointestinal or urinary tract infection or even a fatal septicemia [1]. Paratyphi C is very rare in Europe and North America except for occasional travelers from South and East Asia or Africa, where the disease is more common [2, 3]. However, early 20th-century observations in Eastern Europe [3, 4] suggest that Paratyphi C enteric fever may once have had a wide-ranging impact on human societies. Here, we describe a draft Paratyphi C genome (Ragna) recovered from the 800-year-old skeleton (SK152) of a young woman in Trondheim, Norway. Paratyphi C sequences were recovered from her teeth and bones, suggesting that she died of enteric fever and demonstrating that these bacteria have long caused invasive salmonellosis in Europeans. Comparative analyses against modern Salmonella genome sequences revealed that Paratyphi C is a clade within the Para C lineage, which also includes serovars Choleraesuis, Typhisuis, and Lomita. Although Paratyphi C only infects humans, Choleraesuis causes septicemia in pigs and boar [5] (and occasionally humans), and Typhisuis causes epidemic swine salmonellosis (chronic paratyphoid) in domestic pigs [2, 3]. These different host specificities likely evolved in Europe over the last ∼4,000 years since the time of their most recent common ancestor (tMRCA) and are possibly associated with the differential acquisitions of two genomic islands, SPI-6 and SPI-7. The tMRCAs of these bacterial clades coincide with the timing of pig domestication in Europe [6].

RevDate: 2018-07-20

Zhong C, Han M, Yu S, et al (2018)

Pan-genome analyses of 24 Shewanella strains re-emphasize the diversification of their functions yet evolutionary dynamics of metal-reducing pathway.

Biotechnology for biofuels, 11:193 pii:1201.

Background: Shewanella strains are important dissimilatory metal-reducing bacteria which are widely distributed in diverse habitats. Despite efforts to genomically characterize Shewanella, knowledge of the molecular components, functional information and evolutionary patterns remain lacking, especially for their compatibility in the metal-reducing pathway. The increasing number of genome sequences of Shewanella strains offers a basis for pan-genome studies.

Results: A comparative pan-genome analysis was conducted to study genomic diversity and evolutionary relationships among 24 Shewanella strains. Results revealed an open pan-genome of 13,406 non-redundant genes and a core-genome of 1878 non-redundant genes. Selective pressure acted on the invariant members of core genome, in which purifying selection drove evolution in the housekeeping mechanisms. Shewanella strains exhibited extensive genome variability, with high levels of gene gain and loss during the evolution, which affected variable gene sets and facilitated the rapid evolution. Additionally, genes related to metal reduction were diversely distributed in Shewanella strains and evolved under purifying selection, which highlighted the basic conserved functionality and specificity of respiratory systems.

Conclusions: The diversity of genes present in the accessory and specific genomes of Shewanella strains indicates that each strain uses different strategies to adapt to diverse environments. Horizontal gene transfer is an important evolutionary force in shaping Shewanella genomes. Purifying selection plays an important role in the stability of the core-genome and also drives evolution in mtr-omc cluster of different Shewanella strains.

RevDate: 2018-07-20

Collins FWJ, Mesa-Pereira B, O'Connor PM, et al (2018)

Reincarnation of Bacteriocins From the Lactobacillus Pangenomic Graveyard.

Frontiers in microbiology, 9:1298.

Bacteria commonly produce narrow spectrum bacteriocins as a means of inhibiting closely related species competing for similar resources in an environment. The increasing availability of genomic data means that it is becoming easier to identify bacteriocins encoded within genomes. Often, however, the presence of bacteriocin genes in a strain does not always translate into biological antimicrobial activity. For example, when analysing the Lactobacillus pangenome we identified strains encoding ten pediocin-like bacteriocin structural genes which failed to display inhibitory activity. Nine of these bacteriocins were novel whilst one was identified as the previously characterized bacteriocin "penocin A." The composition of these bacteriocin operons varied between strains, often with key components missing which are required for bacteriocin production, such as dedicated bacteriocin transporters and accessory proteins. In an effort to functionally express these bacteriocins, the structural genes for the ten pediocin homologs were cloned alongside the dedicated pediocin PA-1 transporter in both Escherichia coli and Lactobacillus paracasei heterologous hosts. Each bacteriocin was cloned with its native leader sequence and as a fusion protein with the pediocin PA-1 leader sequence. Several of these bacteriocins displayed a broader spectrum of inhibition than the original pediocin PA-1. We show how potentially valuable bacteriocins can easily be "reincarnated" from in silico data and produced in vitro despite often lacking the necessary accompanying machinery. Moreover, the study demonstrates how genomic datasets such as the Lactobacilus pangenome harbor a potential "arsenal" of antimicrobial activity with the possibility of being activated when expressed in more genetically amenable hosts.

RevDate: 2018-07-16

Holley G, Wittler R, Stoye J, et al (2018)

Dynamic Alignment-Free and Reference-Free Read Compression.

Journal of computational biology : a journal of computational molecular cell biology, 25(7):825-836.

The advent of high throughput sequencing (HTS) technologies raises a major concern about storage and transmission of data produced by these technologies. In particular, large-scale sequencing projects generate an unprecedented volume of genomic sequences ranging from tens to several thousands of genomes per species. These collections contain highly similar and redundant sequences, also known as pangenomes. The ideal way to represent and transfer pangenomes is through compression. A number of HTS-specific compression tools have been developed to reduce the storage and communication costs of HTS data, yet none of them is designed to process a pangenome. In this article, we present dynamic alignment-free and reference-free read compression (DARRC), a new alignment-free and reference-free compression method. It addresses the problem of pangenome compression by encoding the sequences of a pangenome as a guided de Bruijn graph. The novelty of this method is its ability to incrementally update DARRC archives with new genome sequences without full decompression of the archive. DARRC can compress both single-end and paired-end read sequences of any length using all symbols of the IUPAC nucleotide code. On a large Pseudomonas aeruginosa data set, our method outperforms all other tested tools. It provides a 30% compression ratio improvement in single-end mode compared with the best performing state-of-the-art HTS-specific compression method in our experiments.

RevDate: 2018-07-14

Driscoll CB, Meyer KA, Šulčius S, et al (2018)

A closely-related clade of globally distributed bloom-forming cyanobacteria within the Nostocales.

Harmful algae, 77:93-107.

In order to better understand the relationships among current Nostocales cyanobacterial blooms, eight genomes were sequenced from cultured isolates or from environmental metagenomes of recent planktonic Nostocales blooms. Phylogenomic analysis of publicly available sequences placed the new genomes among a group of 15 genomes from four continents in a distinct ADA clade (Anabaena/Dolichospermum/Aphanizomenon) within the Nostocales. This clade contains four species-level groups, two of which include members with both Anabaena-like and Aphanizomenon flos-aquae-like morphology. The genomes contain many repetitive genetic elements and a sizable pangenome, in which ABC-type transporters are highly represented. Alongside common core genes for photosynthesis, the differentiation of N2-fixing heterocysts, and the uptake and incorporation of the major nutrients P, N and S, we identified several gene pathways in the pangenome that may contribute to niche partitioning. Genes for problematic secondary metabolites-cyanotoxins and taste-and-odor compounds-were sporadically present, as were other polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) gene clusters. By contrast, genes predicted to encode the ribosomally generated bacteriocin peptides were found in all genomes.

RevDate: 2018-07-11

Rizzi R, Cairo M, Makinen V, et al (2018)

Hardness of Covering Alignment: Phase Transition in Post-Sequence Genomics.

IEEE/ACM transactions on computational biology and bioinformatics [Epub ahead of print].

Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes. More broadly, our approach can also be seen as a minimal extension of sequence alignment to labelled directed acyclic graphs (labeled DAGs). Namely, we show that finding a covering alignment of two labeled DAGs is NP-hard even on binary alphabets. A covering alignment asks for two paths (red) and (green) in DAG and two paths (red) and (green) in DAG that cover the nodes of the graphs and maximize the sum of the global alignment scores: , where is the concatenation of labels on the path P. Pair-wise alignment of haplotype sequences forming a diploid chromosome can be converted to a two-path coverable labelled DAG, and then the covering alignment models the similarity of two diploids over arbitrary recombinations. Reduction to the other direction shows that problem NP-hard on alphabets of size 3.

RevDate: 2018-07-06

Tetz G, V Tetz (2018)

Tetz's theory and law of longevity.

Theory in biosciences = Theorie in den Biowissenschaften pii:10.1007/s12064-018-0267-4 [Epub ahead of print].

Here, we present new theory and law of longevity intended to evaluate fundamental factors that control lifespan. This theory is based on the fact that genes affecting host organism longevity are represented by subpopulations: genes of host eukaryotic cells, commensal microbiota, and non-living genetic elements. Based on Tetz's theory of longevity, we propose that lifespan and aging are defined by the accumulation of alterations over all genes of macroorganism and microbiome and the non-living genetic elements associated with them. Tetz's law of longevity states that longevity is limited by the accumulation of alterations to the limiting value that is not compatible with life. Based on theory and law, we also propose a novel model to calculate several parameters, including the rate of aging and the remaining lifespan of individuals. We suggest that this theory and model have explanatory and predictive potential to eukaryotic organisms, allowing the influence of diseases, medication, and medical procedures to be re-examined in relation to longevity. Such estimates also provide a framework to evaluate new fundamental aspects that control aging and lifespan.

RevDate: 2018-07-05

Choi S, Jin GD, Park J, et al (2018)

Pangenomics of Lactobacillus plantarum revealed Group-specific genomic profiles without habitat association.

Journal of microbiology and biotechnology pii:10.4014/jmb.1803.03029 [Epub ahead of print].

Lactobacillus plantarum is a lactic acid bacterium that promotes animal intestinal health as a probiotic and is found in a wide variety of habitats. Here, we investigated the genomic features of different clusters of L. plantarum strains via pan-genomic analysis. We compared the genomes of 108 L. plantarum strains that were available from the NCBI GenBank database. These genomes were 2.9-3.7 Mbp in size and 44-45% in G+C content. A total of 8,847 orthologs were collected, and 1,709 genes were identified to be shared as core genes by all the strains analyzed. On the basis of SNPs from the core genes, 108 strains were clustered into five major groups (G1-G5) that are different from previous reports and are not clearly associated with habitats. Analysis of group-specific enriched or depleted genes revealed that G1 and G2 were rich in genes for carbohydrate utilization (L-arabinose, L-rhamnose,and fructo-oligosaccharides) and that G3, G4, and G5 possessed more genes for restriction-modification system and MazEF toxin-antitoxin. These results indicate that there are critical differences in gene content and survival strategies among genetically clustered L. plantarum strains, regardless of habitats.

RevDate: 2018-07-16

Lemos Junior WJF, da Silva Duarte V, Treu L, et al (2018)

Whole genome comparison of two Starmerella bacillaris strains with other wine yeasts uncovers genes involved in modulating important winemaking traits.

FEMS yeast research, 18(7):.

Starmerella bacillaris is an osmotolerant yeast with interesting winemaking traits such as low-ethanol and high-glycerol production, previously considered as wine spoilage and recently proposed to improve the sensory quality of wine. This is the first work performing a whole-genome analysis of the variants identified by comparing two S. bacillaris strains (PAS13 and FRI751). Additionally, an extensive search for orthologous genes against Saccharomyces and non-Saccharomyces yeasts produced a detailed reconstruction of the pan-genome for yeast species used in winemaking. Starmerella bacillaris PAS13 was able to produce 36% more glycerol than S. bacillaris FRI751 without increasing ethanol level over 5% (v/v). Orthologous genes revealed new insights in the response to osmotic stress determined by the mitogen-activated protein kinase (MAPK) from S. bacillaris strains. The comparison between the two S. bacillaris genomes revealed 33 771 high-quality variants that were ranked considering their predicted impact on gene functions. Furthermore, analysis of structural variations in the genome revealed five translocations. The absence of some transcriptional factors involved in the regulation of GPD (glycerol-3-phosphate dehydrogenase), like the protein kinases YpK1p and YpK2p, and the identification of a tandem duplication increasing the GPP1 (glycerol-3-phosphate phosphatase) gene copy number suggest a remarkably different regulation of the glycerol pathway for S. bacillaris in comparison to S. cerevisiae.

RevDate: 2018-07-11

Her HL, YW Wu (2018)

A pan-genome-based machine learning approach for predicting antimicrobial resistance activities of the Escherichia coli strains.

Bioinformatics (Oxford, England), 34(13):i89-i95.

Motivation: Antimicrobial resistance (AMR) is becoming a huge problem in both developed and developing countries, and identifying strains resistant or susceptible to certain antibiotics is essential in fighting against antibiotic-resistant pathogens. Whole-genome sequences have been collected for different microbial strains in order to identify crucial characteristics that allow certain strains to become resistant to antibiotics; however, a global inspection of the gene content responsible for AMR activities remains to be done.

Results: We propose a pan-genome-based approach to characterize antibiotic-resistant microbial strains and test this approach on the bacterial model organism Escherichia coli. By identifying core and accessory gene clusters and predicting AMR genes for the E. coli pan-genome, we not only showed that certain classes of genes are unevenly distributed between the core and accessory parts of the pan-genome but also demonstrated that only a portion of the identified AMR genes belong to the accessory genome. Application of machine learning algorithms to predict whether specific strains were resistant to antibiotic drugs yielded the best prediction accuracy for the set of AMR genes within the accessory part of the pan-genome, suggesting that these gene clusters were most crucial to AMR activities in E. coli. Selecting subsets of AMR genes for different antibiotic drugs based on a genetic algorithm (GA) achieved better prediction performances than the gene sets established in the literature, hinting that the gene sets selected by the GA may warrant further analysis in investigating more details about how E. coli fight against antibiotics.

Supplementary information: Supplementary data are available at Bioinformatics online.

RevDate: 2018-06-29

Gemmell MR, Berry S, Mukhopadhya I, et al (2018)

Comparative genomics of Campylobacter concisus: Analysis of clinical strains reveals genome diversity and pathogenic potential.

Emerging microbes & infections, 7(1):116 pii:10.1038/s41426-018-0118-x.

In recent years, an increasing number of Campylobacter species have been associated with human gastrointestinal (GI) diseases including gastroenteritis, inflammatory bowel disease, and colorectal cancer. Campylobacter concisus, an oral commensal historically linked to gingivitis and periodontitis, has been increasingly detected in the lower GI tract. In the present study, we generated robust genome sequence data from C. concisus strains and undertook a comprehensive pangenome assessment to identify C. concisus virulence properties and to explain potential adaptations acquired while residing in specific ecological niche(s) of the GI tract. Genomes of 53 new C. concisus strains were sequenced, assembled, and annotated including 36 strains from gastroenteritis patients, 13 strains from Crohn's disease patients and four strains from colitis patients (three collagenous colitis and one lymphocytic colitis). When compared with previous published sequences, strains clustered into two main groups/genomospecies (GS) with phylogenetic clustering explained neither by disease phenotype nor sample location. Paired oral/faecal isolates, from the same patient, indicated that there are few genetic differences between oral and gut isolates which suggests that gut isolates most likely reflect oral strain relocation. Type IV and VI secretion systems genes, genes known to be important for pathogenicity in the Campylobacter genus, were present in the genomes assemblies, with 82% containing Type VI secretion system genes. Our findings indicate that C. concisus strains are genetically diverse, and the variability in bacterial secretion system content may play an important role in their virulence potential.

RevDate: 2018-07-08

Clarke TH, Brinkac LM, Inman JM, et al (2018)

PanACEA: a bioinformatics tool for the exploration and visualization of bacterial pan-chromosomes.

BMC bioinformatics, 19(1):246 pii:10.1186/s12859-018-2250-y.

BACKGROUND: Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Pan-genomes consist of large amounts of data, which can restrict researchers ability to locate and analyze these regions. Multiple software packages are available to visualize pan-genomes, but currently their ability to address these concerns are limited by using only pre-computed data sets, prioritizing core over variable gene clusters, or by not accounting for pan-chromosome positioning in the viewer.

RESULTS: We introduce PanACEA (Pan-genome Atlas with Chromosome Explorer and Analyzer), which utilizes locally-computed interactive web-pages to view ordered pan-genome data. It consists of multi-tiered, hierarchical display pages that extend from pan-chromosomes to both core and variable regions to single genes. Regions and genes are functionally annotated to allow for rapid searching and visual identification of regions of interest with the option that user-supplied genomic phylogenies and metadata can be incorporated. PanACEA's memory and time requirements are within the capacities of standard laptops. The capability of PanACEA as a research tool is demonstrated by highlighting a variable region important in differentiating strains of Enterobacter hormaechei.

CONCLUSIONS: PanACEA can rapidly translate the results of pan-chromosome programs into an intuitive and interactive visual representation. It will empower researchers to visually explore and identify regions of the pan-chromosome that are most biologically interesting, and to obtain publication quality images of these regions.

RevDate: 2018-07-08

Matey-Hernandez ML, Danish Pan Genome Consortium, Brunak S, et al (2018)

Benchmarking the HLA typing performance of Polysolver and Optitype in 50 Danish parental trios.

BMC bioinformatics, 19(1):239 pii:10.1186/s12859-018-2239-6.

BACKGROUND: The adaptive immune response intrinsically depends on hypervariable human leukocyte antigen (HLA) genes. Concomitantly, correct HLA phenotyping is crucial for successful donor-patient matching in organ transplantation. The cost and technical limitations of current laboratory techniques, together with advances in next-generation sequencing (NGS) methodologies, have increased the need for precise computational typing methods.

RESULTS: We tested two widespread HLA typing methods using high quality full genome sequencing data from 150 individuals in 50 family trios from the Genome Denmark project. First, we computed descendant accuracies assessing the agreement in the inheritance of alleles from parents to offspring. Second, we compared the locus-specific homozygosity rates as well as the allele frequencies; and we compared those to the observed values in related populations. We provide guidelines for testing the accuracy of HLA typing methods by comparing family information, which is independent of the availability of curated alleles.

CONCLUSIONS: Although current computational methods for HLA typing generally provide satisfactory results, our benchmark - using data with ultra-high sequencing depth - demonstrates the incompleteness of current reference databases, and highlights the importance of providing genomic databases addressing current sequencing standards, a problem yet to be resolved before benefiting fully from personalised medicine approaches HLA phenotyping is essential.

RevDate: 2018-06-25

Cislak A, Grabowski S, J Holub (2018)

SOPanG: online text searching over a pan-genome.

Bioinformatics (Oxford, England) pii:5043008 [Epub ahead of print].

Motivation: The many thousands of high-quality genomes available nowadays imply a shift from single genome to pan-genomic analyses. A basic algorithmic building brick for such a scenario is online search over a collection of similar texts, a problem with surprisingly few solutions presented so far.

Results: We present SOPanG, a simple tool for exact pattern matching over an elastic-degenerate string, a recently proposed simplified model for the pan-genome. Thanks to bit-parallelism, it achieves pattern matching speeds above 400MB/s, more than an order of magnitude higher than of other software.

Availability: SOPanG is available for free from:

Supplementary information: Supplementary data are available at Bioinformatics online.

RevDate: 2018-06-27

Zhang X, Liu Z, Wei G, et al (2018)

In Silico Genome-Wide Analysis Reveals the Potential Links Between Core Genome of Acidithiobacillus thiooxidans and Its Autotrophic Lifestyle.

Frontiers in microbiology, 9:1255.

The coinage "pan-genome" was first introduced dating back to 2005, and was used to elaborate the entire gene repertoire of any given species. Core genome consists of genes shared by all bacterial strains studied and is considered to encode essential functions associated with species' basic biology and phenotypes, yet its relatedness with bacterial lifestyle of the species remains elusive. We performed the pan-genome analysis of sulfur-oxidizing acidophile Acidithiobacillus thiooxidans as a case study to highlight species' core genome and its relevance with autotrophic lifestyle of bacterial species. The mathematical modeling based on bacterial genomes of A. thiooxidans species, including a novel strain ZBY isolated from Zambian copper mine plus eight other recognized strains, was attempted to extrapolate the expansion of its pan-genome, suggesting that A. thiooxidans pan-genome is closed. Further investigation revealed a common set of genes, many of which were assigned to metabolic profiles, notably with respect to energy metabolism, amino acid metabolism, and carbohydrate metabolism. The predicted metabolic profiles of A. thiooxidans were characterized by the fixation of inorganic carbon, assimilation of nitrogen compounds, and aerobic oxidation of various sulfur species. Notably, several hydrogenase (H2ase)-like genes dispersed in core genome might represent the novel classes due to the potential functional disparities, despite being closely related homologous genes that code for H2ase. Overall, the findings shed light on the distinguishing features of A. thiooxidans genomes on a global scale, and extend the understanding of its conserved core genome pertaining to autotrophic lifestyle.

RevDate: 2018-07-06

Tschitschko B, Erdmann S, DeMaere MZ, et al (2018)

Genomic variation and biogeography of Antarctic haloarchaea.

Microbiome, 6(1):113 pii:10.1186/s40168-018-0495-3.

BACKGROUND: The genomes of halophilic archaea (haloarchaea) often comprise multiple replicons. Genomic variation in haloarchaea has been linked to viral infection pressure and, in the case of Antarctic communities, can be caused by intergenera gene exchange. To expand understanding of genome variation and biogeography of Antarctic haloarchaea, here we assessed genomic variation between two strains of Halorubrum lacusprofundi that were isolated from Antarctic hypersaline lakes from different regions (Vestfold Hills and Rauer Islands). To assess variation in haloarchaeal populations, including the presence of genomic islands, metagenomes from six hypersaline Antarctic lakes were characterised.

RESULTS: The sequence of the largest replicon of each Hrr. lacusprofundi strain (primary replicon) was highly conserved, while each of the strains' two smaller replicons (secondary replicons) were highly variable. Intergenera gene exchange was identified, including the sharing of a type I-B CRISPR system. Evaluation of infectivity of an Antarctic halovirus provided experimental evidence for the differential susceptibility of the strains, bolstering inferences that strain variation is important for modulating interactions with viruses. A relationship was found between genomic structuring and the location of variation within replicons and genomic islands, demonstrating that the way in which haloarchaea accommodate genomic variability relates to replicon structuring. Metagenome read and contig mapping and clustering and scaling analyses demonstrated biogeographical patterning of variation consistent with environment and distance effects. The metagenome data also demonstrated that specific haloarchaeal species dominated the hypersaline systems indicating they are endemic to Antarctica.

CONCLUSION: The study describes how genomic variation manifests in Antarctic-lake haloarchaeal communities and provides the basis for future assessments of Antarctic regional and global biogeography of haloarchaea.

RevDate: 2018-06-21

Yu J, Zhao J, Song Y, et al (2018)

Comparative Genomics of the Herbivore Gut Symbiont Lactobacillus reuteri Reveals Genetic Diversity and Lifestyle Adaptation.

Frontiers in microbiology, 9:1151.

Lactobacillus reuteri is a catalase-negative, Gram-positive, non-motile, obligately heterofermentative bacterial species that has been used as a model to describe the ecology and evolution of vertebrate gut symbionts. However, the genetic features and evolutionary strategies of L. reuteri from the gastrointestinal tract of herbivores remain unknown. Therefore, 16 L. reuteri strains isolated from goat, sheep, cow, and horse in Inner Mongolia, China were sequenced in this study. A comparative genomic approach was used to assess genetic diversity and gain insight into the distinguishing features related to the different hosts based on 21 published genomic sequences. Genome size, G + C content, and average nucleotide identity values of the L. reuteri strains from different hosts indicated that the strains have broad genetic diversity. The pan-genome of 37 L. reuteri strains contained 8,680 gene families, and the core genome contained 726 gene families. A total of 92,270 nucleotide mutation sites were discovered among 37 L. reuteri strains, and all core genes displayed a Ka/Ks ratio much lower than 1, suggesting strong purifying selective pressure (negative selection). A highly robust maximum likelihood tree based on the core genes shown in the herbivore isolates were divided into three clades; clades A and B contained most of the herbivore isolates and were more closely related to human isolates and vastly distinct from clade C. Some functional genes may be attributable to host-specific of the herbivore, omnivore, and sourdough groups. Moreover, the numbers of genes encoding cell surface proteins and active carbohydrate enzymes were host-specific. This study provides new insight into the adaptation of L. reuteri to the intestinal habitat of herbivores, suggesting that the genomic diversity of L. reuteri from different ecological origins is closely associated with their living environment.

RevDate: 2018-06-19

Sibbesen JA, Maretty L, Danish Pan-Genome Consortium, et al (2018)

Accurate genotyping across variant classes and lengths using variant graphs.

Nature genetics pii:10.1038/s41588-018-0145-5 [Epub ahead of print].

Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a 'variation-prior' database containing already known variants significantly improves sensitivity.

RevDate: 2018-06-19

Kawasaki M, Delamare-Deboutteville J, Bowater RO, et al (2018)

Microevolution of aquatic Streptococcus agalactiae ST-261 from Australia indicates dissemination via imported tilapia and ongoing adaptation to marine hosts or environment.

Applied and environmental microbiology pii:AEM.00859-18 [Epub ahead of print].

Streptococcus agalactiae (GBS) causes disease in a wide range of animals. The serotype Ib lineage is highly adapted to aquatic hosts, exhibiting substantial genome reduction compared with terrestrial conspecifics. Here we sequence genomes from 40 GBS isolates including 25 from wild fish and captive stingrays in Australia, six local veterinary or human clinical isolates, and nine isolates from farmed tilapia in Honduras and compare with 42 genomes from public databases. Phylogenetic analysis based on non-recombinant core genome SNPs indicated that aquatic serotype Ib isolates from Queensland were distantly related to local veterinary and human clinical isolates. In contrast, Australian aquatic isolates are most closely related to a tilapia isolate from Israel, differing by only 63 core-genome SNPs. A consensus minimum spanning tree based on core genome SNPs indicates dissemination of ST-261 from an ancestral tilapia strain, which is congruent with several introductions of tilapia into Australia from Israel during the 1970s and 1980s. Pan-genome analysis identified 1,440 genes as core with the majority being dispensable or strain-specific with non-protein-coding intergenic regions (IGRs) divided amongst core and strain-specific genes. Aquatic serotype Ib strains have lost many virulence factors during adaptation, but six adhesins were well conserved across the aquatic isolates and might be critical for virulence in fish and targets for vaccine development. The close relationship amongst recent ST-261 isolates from Ghana, USA and China with the Israeli tilapia isolate from 1988 implicates the global trade in tilapia seed for aquaculture in the widespread dissemination of serotype Ib fish-adapted GBS.ImportanceStreptococcus agalactiae (GBS) is a significant pathogen of humans and animals. Some lineages have become adapted to particular hosts and serotype Ib is highly specialized to fish. Here we show that this lineage is likely to have been distributed widely by the global trade in tilapia for aquaculture, with probable introduction into Australia in the 1970s and subsequent dissemination in wild fish populations. We report variability in the polysaccharide capsule amongst this lineage, but identify a cohort of common surface proteins that may be a focus of future vaccine development to reduce the biosecurity risk in international fish trade.

RevDate: 2018-07-11

Rodriguez-R LM, Gunturu S, Harvey WT, et al (2018)

The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level.

Nucleic acids research, 46(W1):W282-W288.

The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called 'Clade Project') to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at

RevDate: 2018-06-22

Mahfouz N, Caucci S, Achatz E, et al (2018)

High genomic diversity of multi-drug resistant wastewater Escherichia coli.

Scientific reports, 8(1):8928 pii:10.1038/s41598-018-27292-6.

Wastewater treatment plants play an important role in the emergence of antibiotic resistance. They provide a hot spot for exchange of resistance within and between species. Here, we analyse and quantify the genomic diversity of the indicator Escherichia coli in a German wastewater treatment plant and we relate it to isolates' antibiotic resistance. Our results show a surprisingly large pan-genome, which mirrors how rich an environment a treatment plant is. We link the genomic analysis to a phenotypic resistance screen and pinpoint genomic hot spots, which correlate with a resistance phenotype. Besides well-known resistance genes, this forward genomics approach generates many novel genes, which correlated with resistance and which are partly completely unknown. A surprising overall finding of our analyses is that we do not see any difference in resistance and pan genome size between isolates taken from the inflow of the treatment plant and from the outflow. This means that while treatment plants reduce the amount of bacteria released into the environment, they do not reduce the potential for antibiotic resistance of these bacteria.

RevDate: 2018-06-15

Legendre M, Fabre E, Poirot O, et al (2018)

Diversity and evolution of the emerging Pandoraviridae family.

Nature communications, 9(1):2285 pii:10.1038/s41467-018-04698-4.

With DNA genomes reaching 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infecting pandoraviruses remained up to now the most complex viruses since their discovery in 2013. Our isolation of three new strains from distant locations and environments is now used to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses reveals many non-coding transcripts and significantly reduces the former set of predicted protein-coding genes. Here we show that the pandoraviruses exhibit an open pan-genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain-specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggest that de novo gene creation could contribute to the evolution of the giant pandoravirus genomes.

RevDate: 2018-06-27

Fang X, Monk JM, Mih N, et al (2018)

Escherichia coli B2 strains prevalent in inflammatory bowel disease patients have distinct metabolic capabilities that enable colonization of intestinal mucosa.

BMC systems biology, 12(1):66 pii:10.1186/s12918-018-0587-5.

BACKGROUND: Escherichia coli is considered a leading bacterial trigger of inflammatory bowel disease (IBD). E. coli isolates from IBD patients primarily belong to phylogroup B2. Previous studies have focused on broad comparative genomic analysis of E. coli B2 isolates, and identified virulence factors that allow B2 strains to reside within human intestinal mucosa. Metabolic capabilities of E. coli strains have been shown to be related to their colonization site, but remain unexplored in IBD-associated strains.

RESULTS: In this study, we utilized pan-genome analysis and genome-scale models (GEMs) of metabolism to study metabolic capabilities of IBD-associated E. coli B2 strains. The study yielded three results: i) Pan-genome analysis of 110 E. coli strains (including 53 isolates from IBD studies) revealed discriminating metabolic genes between B2 strains and other strains; ii) Both comparative genomic analysis and GEMs suggested that B2 strains have an advantage in degrading and utilizing sugars derived from mucus glycan, and iii) GEMs revealed distinct metabolic features in B2 strains that potentially allow them to utilize energy more efficiently. For example, B2 strains lack the enzymes to degrade amadori products, but instead rely on neighboring bacteria to convert these substrates into a more readily usable and potentially less sought after product.

CONCLUSIONS: Taken together, these results suggest that the metabolic capabilities of B2 strains vary significantly from those of other strains, enabling B2 strains to colonize intestinal mucosa.The results from this study motivate a broad experimental assessment of the nutritional effects on E. coli B2 pathophysiology in IBD patients.

RevDate: 2018-06-08

de Moraes MH, Soto EB, Salas González I, et al (2018)

Genome-Wide Comparative Functional Analyses Reveal Adaptations of Salmonella sv. Newport to a Plant Colonization Lifestyle.

Frontiers in microbiology, 9:877.

Outbreaks of salmonellosis linked to the consumption of vegetables have been disproportionately associated with strains of serovar Newport. We tested the hypothesis that strains of sv. Newport have evolved unique adaptations to persistence in plants that are not shared by strains of other Salmonella serovars. We used a genome-wide mutant screen to compare growth in tomato fruit of a sv. Newport strain from an outbreak traced to tomatoes, and a sv. Typhimurium strain from animals. Most genes in the sv. Newport strain that were selected during persistence in tomatoes were shared with, and similarly selected in, the sv. Typhimurium strain. Many of their functions are linked to central metabolism, including amino acid biosynthetic pathways, iron acquisition, and maintenance of cell structure. One exception was a greater need for the core genes involved in purine metabolism in sv. Typhimurium than in sv. Newport. We discovered a gene, papA, that was unique to sv. Newport and contributed to the strain's fitness in tomatoes. The papA gene was present in about 25% of sv. Newport Group III genomes and generally absent from other Salmonella genomes. Homologs of papA were detected in the genomes of Pantoea, Dickeya, and Pectobacterium, members of the Enterobacteriacea family that can colonize both plants and animals.

RevDate: 2018-06-08

Adamek M, Alanjary M, Sales-Ortells H, et al (2018)

Comparative genomics reveals phylogenetic distribution patterns of secondary metabolites in Amycolatopsis species.

BMC genomics, 19(1):426 pii:10.1186/s12864-018-4809-4.

BACKGROUND: Genome mining tools have enabled us to predict biosynthetic gene clusters that might encode compounds with valuable functions for industrial and medical applications. With the continuously increasing number of genomes sequenced, we are confronted with an overwhelming number of predicted clusters. In order to guide the effective prioritization of biosynthetic gene clusters towards finding the most promising compounds, knowledge about diversity, phylogenetic relationships and distribution patterns of biosynthetic gene clusters is necessary.

RESULTS: Here, we provide a comprehensive analysis of the model actinobacterial genus Amycolatopsis and its potential for the production of secondary metabolites. A phylogenetic characterization, together with a pan-genome analysis showed that within this highly diverse genus, four major lineages could be distinguished which differed in their potential to produce secondary metabolites. Furthermore, we were able to distinguish gene cluster families whose distribution correlated with phylogeny, indicating that vertical gene transfer plays a major role in the evolution of secondary metabolite gene clusters. Still, the vast majority of the diverse biosynthetic gene clusters were derived from clusters unique to the genus, and also unique in comparison to a database of known compounds. Our study on the locations of biosynthetic gene clusters in the genomes of Amycolatopsis' strains showed that clusters acquired by horizontal gene transfer tend to be incorporated into non-conserved regions of the genome thereby allowing us to distinguish core and hypervariable regions in Amycolatopsis genomes.

CONCLUSIONS: Using a comparative genomics approach, it was possible to determine the potential of the genus Amycolatopsis to produce a huge diversity of secondary metabolites. Furthermore, the analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites. Our results cast light on the interconnections between secondary metabolite gene clusters and provide a way to prioritize biosynthetic pathways in the search and discovery of novel compounds.

RevDate: 2018-06-02

Zhao Q, Feng Q, Lu H, et al (2018)

Publisher Correction: Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice.

When published, this article did not initially appear open access. This error has been corrected, and the open access status of the paper is noted in all versions of the paper.

RevDate: 2018-07-17

Angermeyer A, Das MM, Singh DV, et al (2018)

Analysis of 19 Highly Conserved Vibrio cholerae Bacteriophages Isolated from Environmental and Patient Sources Over a Twelve-Year Period.

Viruses, 10(6): pii:v10060299.

The Vibrio cholerae biotype "El Tor" is responsible for all of the current epidemic and endemic cholera outbreaks worldwide. These outbreaks are clonal, and it is hypothesized that they originate from the coastal areas near the Bay of Bengal, where the lytic bacteriophage ICP1 (International Centre for Diarrhoeal Disease Research, Bangladesh cholera phage 1) specifically preys upon these pathogenic outbreak strains. ICP1 has also been the dominant bacteriophage found in cholera patient stools since 2001. However, little is known about the genomic differences between the ICP1 strains that have been collected over time. Here, we elucidate the pan-genome and the phylogeny of the ICP1 strains by aligning, annotating, and analyzing the genomes of 19 distinct isolates that were collected between 2001 and 2012. Our results reveal that the ICP1 isolates are highly conserved and possess a large core-genome as well as a smaller, somewhat flexible accessory-genome. Despite its overall conservation, ICP1 strains have managed to acquire a number of unknown genes, as well as a CRISPR-Cas system which is known to be critical for its ongoing struggle for co-evolutionary dominance over its host. This study describes a foundation on which to construct future molecular and bioinformatic studies of these V. cholerae-associated bacteriophages.

RevDate: 2018-06-07

Bulagonda EP, Manivannan B, Mahalingam N, et al (2018)

Comparative genomic analysis of a naturally competent Elizabethkingia anophelis isolated from an eye infection.

Scientific reports, 8(1):8447 pii:10.1038/s41598-018-26874-8.

Elizabethkingia anophelis has now emerged as an opportunistic human pathogen. However, its mechanisms of transmission remain unexplained. Comparative genomic (CG) analysis of E. anopheles endophthalmitis strain surprisingly found from an eye infection patient with twenty-five other E. anophelis genomes revealed its potential to participate in horizontal gene transfer. CG analysis revealed that the study isolate has an open pan genome and has undergone extensive gene rearrangements. We demonstrate that the strain is naturally competent, hitherto not reported in any members of Elizabethkingia. Presence of competence related genes, mobile genetic elements, Type IV, VI secretory systems and a unique virulence factor arylsulfatase suggests a different lineage of the strain. Deciphering the genome of E. anophelis having a reservoir of antibiotic resistance genes and virulence factors associated with diverse human infections may open up avenues to deal with the myriad of its human infections and devise strategies to combat the pathogen.

RevDate: 2018-06-03

Oyedara OO, Segura-Cabrera A, Guo X, et al (2018)

Whole-Genome Sequencing and Comparative Genome Analysis Provided Insight into the Predatory Features and Genetic Diversity of Two Bdellovibrio Species Isolated from Soil.

International journal of genomics, 2018:9402073.

Bdellovibrio spp. are predatory bacteria with great potential as antimicrobial agents. Studies have shown that members of the genus Bdellovibrio exhibit peculiar characteristics that influence their ecological adaptations. In this study, whole genomes of two different Bdellovibrio spp. designated SKB1291214 and SSB218315 isolated from soil were sequenced. The core genes shared by all the Bdellovibrio spp. considered for the pangenome analysis including the epibiotic B. exovorus were 795. The number of unique genes identified in Bdellovibrio spp. SKB1291214, SSB218315, W, and B. exovorus JJS was 1343, 113, 857, and 1572, respectively. These unique genes encode hydrolytic, chemotaxis, and transporter proteins which might be useful for predation in the Bdellovibrio strains. Furthermore, the two Bdellovibrio strains exhibited differences based on the % GC content, amino acid identity, and 16S rRNA gene sequence. The 16S rRNA gene sequence of Bdellovibrio sp. SKB1291214 shared 99% identity with that of an uncultured Bdellovibrio sp. clone 12L 106 (a pairwise distance of 0.008) and 95-97% identity (a pairwise distance of 0.043) with that of other culturable terrestrial Bdellovibrio spp., including strain SSB218315. In Bdellovibrio sp. SKB1291214, 174 bp sequence was inserted at the host interaction (hit) locus region usually attributed to prey attachment, invasion, and development of host independent Bdellovibrio phenotypes. Also, a gene equivalent to Bd0108 in B. bacteriovorus HD100 was not conserved in Bdellovibrio sp. SKB1291214. The results of this study provided information on the genetic characteristics and diversity of the genus Bdellovibrio that can contribute to their successful applications as a biocontrol agent.

RevDate: 2018-05-28

Gias E, Brosnahan CL, Orr D, et al (2018)

In vivo growth and genomic characterization of rickettsia-like organisms isolated from farmed Chinook salmon (Oncorhynchus tshawytscha) in New Zealand.

Journal of fish diseases [Epub ahead of print].

A rickettsia-like organism, designated NZ-RLO2, was isolated from Chinook salmon (Oncorhynchus tshawytscha) farmed in the South Island, New Zealand. In vivo growth showed NZ-RLO2 was able to grow in CHSE-214, EPC, BHK-21, C6/36 and Sf21 cell lines, while Piscirickettsia salmonis LF-89T grew in all but BHK-21 and Sf21. NZ-RLO2 grew optimally in EPC at 15°C, CHSE-214 and EPC at 18°C. The growth of LF-89 T was optimal at 15°C, 18°C and 22°C in CHSE-24, but appeared less efficient in EPC cells at all temperatures. Pan-genome comparison of predicted proteomes shows that available Chilean strains of P. salmonis grouped into two clusters (p-value = 94%). NZ-RLO2 was genetically different from previously described NZ-RLO1, and both strains grouped separately from the Chilean strains in one of the two clusters (p-value = 88%), but were closely related to each other. TaqMan and Sybr Green real-time PCR targeting RNA polymerase (rpoB) and DNA primase (dnaG), respectively, were developed to detect NZ-RLO2. This study indicates that the New Zealand strains showed a closer genetic relationship to one of the Chilean P. salmonis clusters; however, more Piscirickettsia genomes from wider geographical regions and diverse hosts are needed to better understand the classification within this genus.

RevDate: 2018-06-22
CmpDate: 2018-06-22

Hurtado R, Carhuaricra D, Soares S, et al (2018)

Pan-genomic approach shows insight of genetic divergence and pathogenic-adaptation of Pasteurella multocida.

Gene, 670:193-206.

Pasteurella multocida is a gram-negative, non-motile bacterial pathogen, which is associated with chronic and acute infections as snuffles, pneumonia, atrophic rhinitis, fowl cholera and hemorrhagic septicemia. These diseases affect a wide range of domestic animals, leading to significant morbidity and mortality and causing significant economic losses worldwide. Due to the interest in deciphering the genetic diversity and process adaptive between P. multocida strains, this work aimed was to perform a pan-genome analysis to evidence horizontal gene transfer and positive selection among 23 P. multocida strains isolated from distinct diseases and hosts. The results revealed an open pan-genome containing 3585 genes and an accessory genome presenting 1200 genes. The phylogenomic analysis based on the presence/absence of genes and islands exhibit high levels of plasticity, which reflects a high intraspecific diversity and a possible adaptive mechanism responsible for the specific disease manifestation between the established groups (pneumonia, fowl cholera, hemorrhagic septicemia and snuffles). Additionally, we identified differences in accessory genes among groups, which are involved in sugar metabolism and transport systems, virulence-related genes and a high concentration of hypothetical proteins. However, there was no specific indispensable functional mechanism to decisively correlate the presence of genes and their adaptation to a specific host/disease. Also, positive selection was found only for two genes from sub-group hemorrhagic septicemia, serotype B. This comprehensive comparative genome analysis will provide new insights of horizontal gene transfers that play an essential role in the diversification and adaptation mechanism into P. multocida species to a specific disease.

RevDate: 2018-05-25

Abreu VAC, Popin RV, Alvarenga DO, et al (2018)

Corrigendum: Genomic and Genotypic Characterization of Cylindrospermopsis raciborskii: Toward an Intraspecific Phylogenetic Evaluation by Comparative Genomics.

Frontiers in microbiology, 9:979.

[This corrects the article on p. 306 in vol. 9, PMID: 29535689.].

RevDate: 2018-07-05
CmpDate: 2018-07-05

Jiao J, Ni M, Zhang B, et al (2018)

Coordinated regulation of core and accessory genes in the multipartite genome of Sinorhizobium fredii.

PLoS genetics, 14(5):e1007428 pii:PGENETICS-D-18-00237.

Prokaryotes benefit from having accessory genes, but it is unclear how accessory genes can be linked with the core regulatory network when developing adaptations to new niches. Here we determined hierarchical core/accessory subsets in the multipartite pangenome (composed of genes from the chromosome, chromid and plasmids) of the soybean microsymbiont Sinorhizobium fredii by comparing twelve Sinorhizobium genomes. Transcriptomes of two S. fredii strains at mid-log and stationary growth phases and in symbiotic conditions were obtained. The average level of gene expression, variation of expression between different conditions, and gene connectivity within the co-expression network were positively correlated with the gene conservation level from strain-specific accessory genes to genus core. Condition-dependent transcriptomes exhibited adaptive transcriptional changes in pangenome subsets shared by the two strains, while strain-dependent transcriptomes were enriched with accessory genes on the chromid. Proportionally more chromid genes than plasmid genes were co-expressed with chromosomal genes, while plasmid genes had a higher within-replicon connectivity in expression than chromid ones. However, key nitrogen fixation genes on the symbiosis plasmid were characterized by high connectivity in both within- and between-replicon analyses. Among those genes with host-specific upregulation patterns, chromosomal znu and mdt operons, encoding a conserved high-affinity zinc transporter and an accessory multi-drug efflux system, respectively, were experimentally demonstrated to be involved in host-specific symbiotic adaptation. These findings highlight the importance of integrative regulation of hierarchical core/accessory components in the multipartite genome of bacteria during niche adaptation and in shaping the prokaryotic pangenome in the long run.

RevDate: 2018-06-27

Satti M, Tanizawa Y, Endo A, et al (2018)

Comparative analysis of probiotic bacteria based on a new definition of core genome.

Journal of bioinformatics and computational biology, 16(3):1840012.

The commensal genus Bifidobacterium has probiotic properties. We prepared a public library of the gene functions of the genus Bifidobacterium for its online annotation. Orthologous gene cluster analysis showed that the pan genomes of Bifidobacterium and Lactobacillus exhibit striking similarities when mapped to the Clusters of Orthologous Group (COG) database of proteins. When the core genes in each genus were selected based on our statistical definition of "core genome", core genes were present in at least 92% of 52 Bifidobacterium and in 97% of 178 Lactobacillus genomes. Functional comparison of the core genes of the two genera revealed a significant difference in the categories "amino acid transport and metabolism" representing their difference in niche specificity. Over-represented Bifidobacterium protein families were primarily involved in host interactions, the complex compound metabolism, and in stress responses. These findings coincide with the published information and validate our bias-resilient definition of the core genome.

RevDate: 2018-05-25

Lacey JA, Allnutt TR, Vezina B, et al (2018)

Whole genome analysis reveals the diversity and evolutionary relationships between necrotic enteritis-causing strains of Clostridium perfringens.

BMC genomics, 19(1):379 pii:10.1186/s12864-018-4771-1.

BACKGROUND: Clostridium perfringens causes a range of diseases in animals and humans including necrotic enteritis in chickens and food poisoning and gas gangrene in humans. Necrotic enteritis is of concern in commercial chicken production due to the cost of the implementation of infection control measures and to productivity losses. This study has focused on the genomic analysis of a range of chicken-derived C. perfringens isolates, from around the world and from different years. The genomes were sequenced and compared with 20 genomes available from public databases, which were from a diverse collection of isolates from chickens, other animals, and humans. We used a distance based phylogeny that was constructed based on gene content rather than sequence identity. Similarity between strains was defined as the number of genes that they have in common divided by their total number of genes. In this type of phylogenetic analysis, evolutionary distance can be interpreted in terms of evolutionary events such as acquisition and loss of genes, whereas the underlying properties (the gene content) can be interpreted in terms of function. We also compared these methods to the sequence-based phylogeny of the core genome.

RESULTS: Distinct pathogenic clades of necrotic enteritis-causing C. perfringens were identified. They were characterised by variable regions encoded on the chromosome, with predicted roles in capsule production, adhesion, inhibition of related strains, phage integration, and metabolism. Some strains have almost identical genomes, even though they were isolated from different geographic regions at various times, while other highly distant genomes appear to result in similar outcomes with regard to virulence and pathogenesis.

CONCLUSIONS: The high level of diversity in chicken isolates suggests there is no reliable factor that defines a chicken strain of C. perfringens, however, disease-causing strains can be defined by the presence of netB-encoding plasmids. This study reveals that horizontal gene transfer appears to play a significant role in genetic variation of the C. perfringens chromosome as well as the plasmid content within strains.

RevDate: 2018-05-22

Kulsum U, Kapil A, Singh H, et al (2018)

NGSPanPipe: A Pipeline for Pan-genome Identification in Microbial Strains from Experimental Reads.

Advances in experimental medicine and biology, 1052:39-49.

Recent advancements in sequencing technologies have decreased both time span and cost for sequencing the whole bacterial genome. High-throughput Next-Generation Sequencing (NGS) technology has led to the generation of enormous data concerning microbial populations publically available across various repositories. As a consequence, it has become possible to study and compare the genomes of different bacterial strains within a species or genus in terms of evolution, ecology and diversity. Studying the pan-genome provides insights into deciphering microevolution, global composition and diversity in virulence and pathogenesis of a species. It can also assist in identifying drug targets and proposing vaccine candidates. The effective analysis of these large genome datasets necessitates the development of robust tools. Current methods to develop pan-genome do not support direct input of raw reads from the sequencer machine but require preprocessing of reads as an assembled protein/gene sequence file or the binary matrix of orthologous genes/proteins. We have designed an easy-to-use integrated pipeline, NGSPanPipe, which can directly identify the pan-genome from short reads. The output from the pipeline is compatible with other pan-genome analysis tools. We evaluated our pipeline with other methods for developing pan-genome, i.e. reference-based assembly and de novo assembly using simulated reads of Mycobacterium tuberculosis. The single script pipeline ( is applicable for all bacterial strains. It integrates multiple in-house Perl scripts and is freely accessible from .

RevDate: 2018-05-25

Kim YB, Kim JY, Song HS, et al (2018)

Novel haloarchaeon Natrinema thermophila having the highest growth temperature among haloarchaea with a large genome size.

Scientific reports, 8(1):7777 pii:10.1038/s41598-018-25887-7.

Environmental temperature is one of the most important factors for the growth and survival of microorganisms. Here we describe a novel extremely halophilic archaeon (haloarchaea) designated as strain CBA1119T isolated from solar salt. Strain CBA1119T had the highest maximum and optimal growth temperatures (66 °C and 55 °C, respectively) and one of the largest genome sizes among haloarchaea (5.1 Mb). It also had the largest number of strain-specific pan-genome orthologous groups and unique pathways among members of the genus Natrinema in the class Halobacteria. A dendrogram based on the presence/absence of genes and a phylogenetic tree constructed based on OrthoANI values highlighted the particularities of strain CBA1119T as compared to other Natrinema species and other haloarchaea members. The large genome of strain CBA1119T may provide information on genes that confer tolerance to extreme environmental conditions, which may lead to the discovery of other thermophilic strains with potential applications in industrial biotechnology.

RevDate: 2018-05-18

Vinuesa P, Ochoa-Sánchez LE, B Contreras-Moreira (2018)

GET_PHYLOMARKERS, a Software Package to Select Optimal Orthologous Clusters for Phylogenomics and Inferring Pan-Genome Phylogenies, Used for a Critical Geno-Taxonomic Revision of the Genus Stenotrophomonas.

Frontiers in microbiology, 9:771.

The massive accumulation of genome-sequences in public databases promoted the proliferation of genome-level phylogenetic analyses in many areas of biological research. However, due to diverse evolutionary and genetic processes, many loci have undesirable properties for phylogenetic reconstruction. These, if undetected, can result in erroneous or biased estimates, particularly when estimating species trees from concatenated datasets. To deal with these problems, we developed GET_PHYLOMARKERS, a pipeline designed to identify high-quality markers to estimate robust genome phylogenies from the orthologous clusters, or the pan-genome matrix (PGM), computed by GET_HOMOLOGUES. In the first context, a set of sequential filters are applied to exclude recombinant alignments and those producing anomalous or poorly resolved trees. Multiple sequence alignments and maximum likelihood (ML) phylogenies are computed in parallel on multi-core computers. A ML species tree is estimated from the concatenated set of top-ranking alignments at the DNA or protein levels, using either FastTree or IQ-TREE (IQT). The latter is used by default due to its superior performance revealed in an extensive benchmark analysis. In addition, parsimony and ML phylogenies can be estimated from the PGM. We demonstrate the practical utility of the software by analyzing 170 Stenotrophomonas genome sequences available in RefSeq and 10 new complete genomes of Mexican environmental S. maltophilia complex (Smc) isolates reported herein. A combination of core-genome and PGM analyses was used to revise the molecular systematics of the genus. An unsupervised learning approach that uses a goodness of clustering statistic identified 20 groups within the Smc at a core-genome average nucleotide identity (cgANIb) of 95.9% that are perfectly consistent with strongly supported clades on the core- and pan-genome trees. In addition, we identified 16 misclassified RefSeq genome sequences, 14 of them labeled as S. maltophilia, demonstrating the broad utility of the software for phylogenomics and geno-taxonomic studies. The code, a detailed manual and tutorials are freely available for Linux/UNIX servers under the GNU GPLv3 license at A docker image bundling GET_PHYLOMARKERS with GET_HOMOLOGUES is available at, which can be easily run on any platform.

RevDate: 2018-05-22

Valenzuela D, Norri T, Välimäki N, et al (2018)

Towards pan-genome read alignment to improve variation calling.

BMC genomics, 19(Suppl 2):87 pii:10.1186/s12864-018-4465-8.

BACKGROUND: Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation.

RESULTS: We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation - a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: .

CONCLUSIONS: Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.

RevDate: 2018-05-17

Howat AM, Vollmers J, Taubert M, et al (2018)

Comparative Genomics and Mutational Analysis Reveals a Novel XoxF-Utilizing Methylotroph in the Roseobacter Group Isolated From the Marine Environment.

Frontiers in microbiology, 9:766.

The Roseobacter group comprises a significant group of marine bacteria which are involved in global carbon and sulfur cycles. Some members are methylotrophs, using one-carbon compounds as a carbon and energy source. It has recently been shown that methylotrophs generally require a rare earth element when using the methanol dehydrogenase enzyme XoxF for growth on methanol. Addition of lanthanum to methanol enrichments of coastal seawater facilitated the isolation of a novel methylotroph in the Roseobacter group: Marinibacterium anthonyi strain La 6. Mutation of xoxF5 revealed the essential nature of this gene during growth on methanol and ethanol. Physiological characterization demonstrated the metabolic versatility of this strain. Genome sequencing revealed that strain La 6 has the largest genome of all Roseobacter group members sequenced to date, at 7.18 Mbp. Multilocus sequence analysis (MLSA) showed that whilst it displays the highest core gene sequence similarity with subgroup 1 of the Roseobacter group, it shares very little of its pangenome, suggesting unique genetic adaptations. This research revealed that the addition of lanthanides to isolation procedures was key to cultivating novel XoxF-utilizing methylotrophs from the marine environment, whilst genome sequencing and MLSA provided insights into their potential genetic adaptations and relationship to the wider community.

RevDate: 2018-05-15

Zolfo M, Asnicar F, Manghi P, et al (2018)

Profiling microbial strains in urban environments using metagenomic sequencing data.

Biology direct, 13(1):9 pii:10.1186/s13062-018-0211-z.

BACKGROUND: The microbial communities populating human and natural environments have been extensively characterized with shotgun metagenomics, which provides an in-depth representation of the microbial diversity within a sample. Microbes thriving in urban environments may be crucially important for human health, but have received less attention than those of other environments. Ongoing efforts started to target urban microbiomes at a large scale, but the most recent computational methods to profile these metagenomes have never been applied in this context. It is thus currently unclear whether such methods, that have proven successful at distinguishing even closely related strains in human microbiomes, are also effective in urban settings for tasks such as cultivation-free pathogen detection and microbial surveillance. Here, we aimed at a) testing the currently available metagenomic profiling tools on urban metagenomics; b) characterizing the organisms in urban environment at the resolution of single strain and c) discussing the biological insights that can be inferred from such methods.

RESULTS: We applied three complementary methods on the 1614 metagenomes of the CAMDA 2017 challenge. With MetaMLST we identified 121 known sequence-types from 15 species of clinical relevance. For instance, we identified several Acinetobacter strains that were close to the nosocomial opportunistic pathogen A. nosocomialis. With StrainPhlAn, a generalized version of the MetaMLST approach, we inferred the phylogenetic structure of Pseudomonas stutzeri strains and suggested that the strain-level heterogeneity in environmental samples is higher than in the human microbiome. Finally, we also probed the functional potential of the different strains with PanPhlAn. We further showed that SNV-based and pangenome-based profiling provide complementary information that can be combined to investigate the evolutionary trajectories of microbes and to identify specific genetic determinants of virulence and antibiotic resistances within closely related strains.

CONCLUSION: We show that strain-level methods developed primarily for the analysis of human microbiomes can be effective for city-associated microbiomes. In fact, (opportunistic) pathogens can be tracked and monitored across many hundreds of urban metagenomes. However, while more effort is needed to profile strains of currently uncharacterized species, this work poses the basis for high-resolution analyses of microbiomes sampled in city and mass transportation environments.

REVIEWERS: This article was reviewed by Alexandra Bettina Graf, Daniel Huson and Trevor Cickovski.

RevDate: 2018-05-07

Kumar R, Acharya V, Singh D, et al (2018)

Strategies for high-altitude adaptation revealed from high-quality draft genome of non-violacein producing Janthinobacterium lividum ERGS5:01.

Standards in genomic sciences, 13:11 pii:313.

A light pink coloured bacterial strain ERGS5:01 isolated from glacial stream water of Sikkim Himalaya was affiliated to Janthinobacterium lividum based on 16S rRNA gene sequence identity and phylogenetic clustering. Whole genome sequencing was performed for the strain to confirm its taxonomy as it lacked the typical violet pigmentation of the genus and also to decipher its survival strategy at the aquatic ecosystem of high elevation. The PacBio RSII sequencing generated genome of 5,168,928 bp with 4575 protein-coding genes and 118 RNA genes. Whole genome-based multilocus sequence analysis clustering, in silico DDH similarity value of 95.1% and, the ANI value of 99.25% established the identity of the strain ERGS5:01 (MCC 2953) as a non-violacein producing J. lividum. The genome comparisons across genus Janthinobacterium revealed an open pan-genome with the scope of the addition of new orthologous cluster to complete the genomic inventory. The genomic insight provided the genetic basis of freezing and frequent freeze-thaw cycle tolerance and, for industrially important enzymes. Extended insight into the genome provided clues of crucial genes associated with adaptation in the harsh aquatic ecosystem of high altitude.

RevDate: 2018-05-11

Oliver A, Kay M, KK Cooper (2018)

Comparative genomics of cocci-shaped Sporosarcina strains with diverse spatial isolation.

BMC genomics, 19(1):310 pii:10.1186/s12864-018-4635-8.

BACKGROUND: Cocci-shaped Sporosarcina strains are currently one of the few known cocci-shaped spore-forming bacteria, yet we know very little about the genomics. The goal of this study is to utilize comparative genomics to investigate the diversity of cocci-shaped Sporosarcina strains that differ in their geographical isolation and show different nutritional requirements.

RESULTS: For this study, we sequenced 28 genomes of cocci-shaped Sporosarcina strains isolated from 13 different locations around the world. We generated the first six complete genomes and methylomes utilizing PacBio sequencing, and an additional 22 draft genomes using Illumina sequencing. Genomic analysis revealed that cocci-shaped Sporosarcina strains contained an average genome of 3.3 Mb comprised of 3222 CDS, 54 tRNAs and 6 rRNAs, while only two strains contained plasmids. The cocci-shaped Sporosarcina genome on average contained 2.3 prophages and 15.6 IS elements, while methylome analysis supported the diversity of these strains as only one of 31 methylation motifs were shared under identical growth conditions. Analysis with a 90% identity cut-off revealed 221 core genes or ~ 7% of the genome, while a 30% identity cut-off generated a pan-genome of 8610 genes. The phylogenetic relationship of the cocci-shaped Sporosarcina strains based on either core genes, accessory genes or spore-related genes consistently resulted in the 29 strains being divided into eight clades.

CONCLUSIONS: This study begins to unravel the phylogenetic relationship of cocci-shaped Sporosarcina strains, and the comparative genomics of these strains supports identification of several new species.

RevDate: 2018-05-03

Wang W, Mauleon R, Hu Z, et al (2018)

Genomic variation in 3,010 diverse accessions of Asian cultivated rice.

Nature, 557(7703):43-49.

Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence-absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding.


