Viewport Size Code:
Login | Create New Account


About | Classical Genetics | Timelines | What's New | What's Hot

About | Classical Genetics | Timelines | What's New | What's Hot


Bibliography Options Menu

Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

The Electronic Scholarly Publishing Project: Providing world-wide, free access to classic scientific papers and other scholarly materials, since 1993.


ESP: PubMed Auto Bibliography 20 Mar 2019 at 01:32 Created: 


Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: pangenome or "pan-genome" or "pan genome" NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

RevDate: 2019-03-15

Naz K, Naz A, Ashraf ST, et al (2019)

PanRV: Pangenome-reverse vaccinology approach for identifications of potential vaccine candidates in microbial pangenome.

BMC bioinformatics, 20(1):123 pii:10.1186/s12859-019-2713-9.

BACKGROUND: A revolutionary diversion from classical vaccinology to reverse vaccinology approach has been observed in the last decade. The ever-increasing genomic and proteomic data has greatly facilitated the vaccine designing and development process. Reverse vaccinology is considered as a cost-effective and proficient approach to screen the entire pathogen genome. To look for broad-spectrum immunogenic targets and analysis of closely-related bacterial species, the assimilation of pangenome concept into reverse vaccinology approach is essential. The categories of species pangenome such as core, accessory, and unique genes sets can be analyzed for the identification of vaccine candidates through reverse vaccinology.

RESULTS: We have designed an integrative computational pipeline term as "PanRV" that employs both the pangenome and reverse vaccinology approaches. PanRV comprises of four functional modules including i) Pangenome Estimation Module (PGM) ii) Reverse Vaccinology Module (RVM) iii) Functional Annotation Module (FAM) and iv) Antibiotic Resistance Association Module (ARM). The pipeline is tested by using genomic data from 301 genomes of Staphylococcus aureus and the results are verified by experimentally known antigenic data.

CONCLUSION: The proposed pipeline has proved to be the first comprehensive automated pipeline that can precisely identify putative vaccine candidates exploiting the microbial pangenome. PanRV is a Linux based package developed in JAVA language. An executable installer is provided for ease of installation along with a user manual at .

RevDate: 2019-03-13

Li H, Ding X, Chen C, et al (2019)

Enrichment of phosphate solubilizing bacteria during late developmental stages of eggplant (Solanum melongena L.).

FEMS microbiology ecology, 95(3):.

Understanding the ecology of phosphate solubilizing bacteria (PSBs) is critical for developing better strategies to increase crop productivity. In this study, the diversity of PSBs and of the total bacteria in the rhizosphere of eggplant (Solanum melongena L.) cultivated in organic, integrated and conventional farming systems was compared at four developmental stages of its lifecycle. Both selective culture and high-throughput sequencing analysis of 16S rRNA amplicons indicated that Enterobacter with strong or very strong in vivo phosphate solubilization activities was enriched in the rhizosphere during the fruiting stage. The high-throughput sequencing analysis results demonstrated that farming systems explained 23% of total bacterial community variation. Plant development and farming systems synergistically shaped the rhizospheric bacterial community, in which the degree of variation influenced by farming systems decreased over the plant development phase from 56% to 26.3% to 16.3%, and finally to no significant effect as the plant reached at fruiting stage. Pangenome analysis indicated that two-component and transporter systems varied between the rhizosphere and soil PSBs. This study elucidated the complex interactions among farming systems, plant development and rhizosphere microbiomes.

RevDate: 2019-03-12

Burgueño-Roman A, Castañeda-Ruelas GM, Pacheco-Arjona R, et al (2019)

Pathogenic potential of non-typhoidal Salmonella serovars isolated from aquatic environments in Mexico.

Genes & genomics pii:10.1007/s13258-019-00798-7 [Epub ahead of print].

BACKGROUND: River water has been implicated as a source of non-typhoidal Salmonella (NTS) serovars in Mexico.

OBJECTIVE: To dissect the molecular pathogenesis and defense strategies of seven NTS strains isolated from river water in Mexico.

METHODS: The genome of Salmonella serovars Give, Pomona, Kedougou, Stanley, Oranienburg, Sandiego, and Muenchen were sequenced using the whole-genome shotgun methodology in the Illumina Miseq platform. The genoma annotation and evolutionary analyses were conducted in the RAST and FigTree servers, respectively. The MLST was performed using the SRST2 tool and the comparisons between strains were clustered and visualized using the Gview server. Experimental virulence assay was included to evaluate the pathogenic potential of strains.

RESULTS: We report seven high-quality draft genomes, ranging from ~ 4.61 to ~ 5.12 Mb, with a median G + C value, coding DNA sequence, and protein values of 52.1%, 4697 bp, and 4,589 bp, respectively. The NTS serovars presented with an open pan-genome, offering novel genetic content. Each NTS serovar had an indistinguishable virulotype with a core genome (352 virulence genes) closely associated with Salmonella pathogenicity; 13 genes were characterized as serotype specific, which could explain differences in pathogenicity. All strains maintained highly conserved genetic content regarding the Salmonella pathogenicity islands (1-5) (86.9-100%), fimbriae (84.6%), and hypermutation (100%) genes. Adherence and invasion capacity were confirmed among NTS strains in Caco-2 cells.

CONCLUSION: Our results demonstrated the arsenal of virulence and defense molecular factors harbored on NTS serovars and highlight that environmental NTS strains are waterborne pathogens worthy of attention.

RevDate: 2019-03-12

van Tonder AJ, Bray JE, Jolley KA, et al (2019)

Genomic Analyses of >3,100 Nasopharyngeal Pneumococci Revealed Significant Differences Between Pneumococci Recovered in Four Different Geographical Regions.

Frontiers in microbiology, 10:317.

Understanding the structure of a bacterial population is essential in order to understand bacterial evolution. Estimating the core genome (those genes common to all, or nearly all, strains of a species) is a key component of such analyses. The size and composition of the core genome varies by dataset, but we hypothesized that the variation between different collections of the same bacterial species would be minimal. To investigate this, we analyzed the genome sequences of 3,118 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (United States), and Maela (Thailand). The analyses revealed a "supercore" genome (genes shared by all 3,118 pneumococci) of 558 genes, although an additional 354 core genes were shared by pneumococci from Reykjavik, Southampton, and Boston. Overall, the size and composition of the core and pan-genomes among pneumococci recovered in Reykjavik, Southampton, and Boston were similar. Maela pneumococci were distinctly different in that they had a smaller core genome and larger pan-genome. The pan-genome of Maela pneumococci contained several >25 Kb sequence regions (flanked by pneumococcal genes) that were homologous to genomic regions found in other bacterial species. Overall, our work revealed that some subsets of the global pneumococcal population are highly heterogeneous, and our hypothesis was rejected. This is an important finding in terms of understanding genetic variation among pneumococci and is also an essential point of consideration before generalizing the findings from a single dataset to the wider pneumococcal population.

RevDate: 2019-03-12

Obolski U, Gori A, Lourenço J, et al (2019)

Identifying genes associated with invasive disease in S. pneumoniae by applying a machine learning approach to whole genome sequence typing data.

Scientific reports, 9(1):4049 pii:10.1038/s41598-019-40346-7.

Streptococcus pneumoniae, a normal commensal of the upper respiratory tract, is a major public health concern, responsible for substantial global morbidity and mortality due to pneumonia, meningitis and sepsis. Why some pneumococci invade the bloodstream or CSF (so-called invasive pneumococcal disease; IPD) is uncertain. In this study we identify genes associated with IPD. We transform whole genome sequence (WGS) data into a sequence typing scheme, while avoiding the caveat of using an arbitrary genome as a reference by substituting it with a constructed pangenome. We then employ a random forest machine-learning algorithm on the transformed data, and find 43 genes consistently associated with IPD across three geographically distinct WGS data sets of pneumococcal carriage isolates. Of the genes we identified as associated with IPD, we find 23 genes previously shown to be directly relevant to IPD, as well as 18 uncharacterized genes. We suggest that these uncharacterized genes identified by us are also likely to be relevant for IPD.

RevDate: 2019-03-11

Khaleque HN, González C, Shafique R, et al (2019)

Uncovering the Mechanisms of Halotolerance in the Extremely Acidophilic Members of the Acidihalobacter Genus Through Comparative Genome Analysis.

Frontiers in microbiology, 10:155.

There are few naturally occurring environments where both acid and salinity stress exist together, consequently, there has been little evolutionary pressure for microorganisms to develop systems that enable them to deal with both stresses simultaneously. Members of the genus Acidihalobacter are iron- and sulfur-oxidizing, halotolerant acidophiles that have developed the ability to tolerate acid and saline stress and, therefore, have the potential to bioleach ores with brackish or saline process waters under acidic conditions. The genus consists of four members, A. prosperus DSM 5130T, A. prosperus DSM 14174, A. prosperus F5 and "A. ferrooxidans" DSM 14175. An in depth genome comparison was undertaken in order to provide a more comprehensive description of the mechanisms of halotolerance used by the different members of this genus. Pangenome analysis identified 29, 3 and 9 protein families related to halotolerance in the core, dispensable and unique genomes, respectively. The genes for halotolerance showed Ka/Ks ratios between 0 and 0.2, confirming that they are conserved and stabilized. All the Acidihalobacter genomes contained similar genes for the synthesis and transport of ectoine, which was recently found to be the dominant osmoprotectant in A. prosperus DSM 14174 and A. prosperus DSM 5130T. Similarities also existed in genes encoding low affinity potassium pumps, however, A. prosperus DSM 14174 was also found to contain genes encoding high affinity potassium pumps. Furthermore, only A. prosperus DSM 5130T and "A. ferrooxidans" DSM 14175 contained genes allowing the uptake of taurine as an osmoprotectant. Variations were also seen in genes encoding proteins involved in the synthesis and/or transport of periplasmic glucans, sucrose, proline, and glycine betaine. This suggests that versatility exists in the Acidihalobacter genus in terms of the mechanisms they can use for halotolerance. This information is useful for developing hypotheses for the search for life on exoplanets and moons.

RevDate: 2019-03-09

Tahir Ul Qamar M, Zhu X, Xing F, et al (2019)

ppsPCP: A Plant Presence/absence Variants Scanner and Pan-genome Construction Pipeline.

Bioinformatics (Oxford, England) pii:5372683 [Epub ahead of print].

SUMMARY: Since the idea of pan-genomics emerged several tools and pipelines have been introduced for prokaryotic pan-genomics. However, not a single comprehensive pipeline has been reported which could overcome multiple challenges associated with eukaryotic pan-genomics. To aid the eukaryotic pan-genomic studies, here we present ppsPCP pipeline which is designed for eukaryotes especially for plants. It is capable of scanning presence/absence variants (PAVs) and constructing a fully annotated pan-genome. We believe with these unique features of PAV scanning and building a pan-genome together with its annotation, ppsPCP will be useful for plant pan-genomic studies and aid researchers to study genetic/phenotypic variations and genomic diversity.

The ppsPCP is freely available at github DOI: and webpage

RevDate: 2019-03-09

Rautiainen M, Mäkinen V, T Marschall (2019)

Bit-parallel sequence-to-graph alignment.

Bioinformatics (Oxford, England) pii:5372677 [Epub ahead of print].

MOTIVATION: Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction, and variant calling with respect to a variation graph.

RESULTS: We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers' bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with nodes and edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of for acyclic graphs and for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared to a previous (asymptotically optimal) alignment algorithm by Navarro (2000).


SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2019-03-08

Velsko IM, Perez MS, VP Richards (2019)

Resolving phylogenetic relationships for Streptococcus mitis and Streptococcus oralis through core and pan genome analyses.

Genome biology and evolution pii:5371073 [Epub ahead of print].

Taxonomic and phylogenetic relationships of Streptococcus mitis and Streptococcus oralis have been difficult to establish biochemically and genetically. We used core genome analyses of S. mitis and S. oralis, as well as the closely-related species S. pneumoniae and S. parasanguinis, to clarify the phylogenetic relationships between S. mitis and S. oralis, as well as within subclades of S. oralis. All S. mitis (n = 67), S. oralis (n = 89), S. parasanguinis (n = 27), and 27 S. pneumoniae genome assemblies were downloaded from NCBI and re-annotated. All genes were delineated into homologous clusters and maximum-likelihood phylogenies built from putatively non-recombinant core gene sets. Population structure was determined using Bayesian genome clustering, and patristic distance calculated between populations. Population-specific gene content was assessed using a phylogenetic-based genome-wide association approach. Streptococcus mitis and S. oralis formed distinct clades, but species mixing suggests taxonomic mis-assignment. Patristic distance between populations suggest S. oralis subsp. dentisani is a distinct species, while S. oralis subsp. tigurinus and subsp. oralis are supported as subspecies, and that S. mitis comprises two subspecies. None of the genes within the pan-genomes of S. mitis and S. oralis could be statistically correlated with either, and the dispensable genomes showed extensive variation among isolates. These are likely important factors contributing to established overlap in biochemical characteristics for these taxa. Based on core genome analysis, the substructure of S. oralis and S. mitis should be re-defined, and species assignments within S. oralis and S. mitis should be made based on whole genome analysis to be robust to mis-assignment.

RevDate: 2019-03-03

Eisenbach L, Geissler AJ, Ehrmann MA, et al (2019)

Comparative genomics of Lactobacillus sakei supports the development of starter strain combinations.

Microbiological research, 221:1-9.

Strains of Lactobacillus sakei can be isolated from a variety of sources including meat, fermented sausages, sake, sourdough, sauerkraut or kimchi. Selected strains are widely used as starter cultures for sausage fermentation. Recently we have demonstrated that control about the lactic microbiota in fermenting sausages is achieved rather by pairs or strain sets than by single strains. In this work we characterized the pan genome of L. sakei to enable exploitation of the genomic diversity of L. sakei for the establishment of assertive starter strain sets. We have established the full genome sequences of nine L. sakei strains from different sources of isolation and included in the analysis the genome of L. sakei 23K. Comparative genomics revealed an accessory genome comprising about 50% of the pan genome and different lineages of strains with no relation to their source of isolation. Group and strain specific differences could be found, which namely referred to agmatine and citrate metabolism. The presence of genes encoding metabolic pathways for fructose, sucrose and trehalose as well as gluconate in all strains suggests a general adaptation to plant/sugary environments and a life in communities with other genera. Analysis of the plasmidome did not reveal any specific mechanisms of adaptation to a habitat. The predicted differences of metabolic settings enable prediction of partner strains, which can occupy the meat environment to a large extent and establish competitive exclusion of autochthonous microbiota. This may assist the development of a new generation of meat starter cultures containing L. sakei strains.

RevDate: 2019-02-28

Raphael BH, Huynh T, Brown E, et al (2019)

Culture of Clinical Specimens Reveals Extensive Diversity of Legionella pneumophila Strains in Arizona.

mSphere, 4(1): pii:4/1/e00649-18.

Between 2000 and 2017, a total of 236 Legionella species isolates from Arizona were submitted to the CDC for reference testing. Most of these isolates were recovered from bronchoalveolar lavage specimens. Although the incidence of legionellosis in Arizona is less than the overall U.S. incidence, Arizona submits the largest number of isolates to the CDC for testing compared to those from other states. In addition to a higher proportion of culture confirmation of legionellosis cases in Arizona than in other states, all Legionellapneumophila isolates are forwarded to the CDC for confirmatory testing. Compared to that from other states, a higher proportion of isolates from Arizona were identified as belonging to L. pneumophila serogroups 6 (28.2%) and 8 (8.9%). Genome sequencing was conducted on 113 L. pneumophila clinical isolates not known to be associated with outbreaks in order to understand the genomic diversity of strains causing legionellosis in Arizona. Whole-genome multilocus sequence typing (wgMLST) revealed 17 clusters of isolates sharing at least 99% identical allele content. Only two of these clusters contained isolates from more than one individual with exposure at the same facility. Additionally, wgMLST analysis revealed a group of 31 isolates predominantly belonging to serogroup 6 and containing isolates from three separate clusters. Single nucleotide polymorphism (SNP) and pangenome analysis were used to further resolve genome sequences belonging to a subset of isolates. This study demonstrates that culture of clinical specimens for Legionella spp. reveals a highly diverse population of strains causing legionellosis in Arizona which could be underappreciated using other diagnostic approaches.IMPORTANCE Culture of clinical specimens from patients with Legionnaires' disease is rarely performed, restricting our understanding of the diversity and ecology of Legionella Culture of Legionella from patient specimens in Arizona revealed a greater proportion of non-serogroup 1 Legionellapneumophila isolates than in other U.S. isolates examined. Disease caused by such isolates may go undetected using other diagnostic methods. Moreover, genome sequence analysis revealed that these isolates were genetically diverse, and understanding these populations may help in future environmental source attribution studies.

RevDate: 2019-02-27

Gabbett MT, Laporte J, Sekar R, et al (2019)

Molecular Support for Heterogonesis Resulting in Sesquizygotic Twinning.

The New England journal of medicine, 380(9):842-849.

Sesquizygotic multiple pregnancy is an exceptional intermediate between monozygotic and dizygotic twinning. We report a monochorionic twin pregnancy with fetal sex discordance. Genotyping of amniotic fluid from each sac showed that the twins were maternally identical but chimerically shared 78% of their paternal genome, which makes them genetically in between monozygotic and dizygotic; they are sesquizygotic. We observed no evidence of sesquizygosis in 968 dizygotic twin pairs whom we screened by means of pangenome single-nucleotide polymorphism genotyping. Data from published repositories also show that sesquizygosis is a rare event. Detailed genotyping implicates chimerism arising at the juncture of zygotic division, termed heterogonesis, as the likely initial step in the causation of sesquizygosis.

RevDate: 2019-02-27

Caputo A, Fournier PE, D Raoult (2019)

Genome and pan-genome analysis to classify emerging bacteria.

Biology direct, 14(1):5 pii:10.1186/s13062-019-0234-0.

BACKGROUND: In the recent years, genomic and pan-genomic studies have become increasingly important. Culturomics allows to study human microbiota through the use of different culture conditions, coupled with a method of rapid identification by MALDI-TOF, or 16S rRNA. Bacterial taxonomy is undergoing many changes as a consequence. With the help of pan-genomic analyses, species can be redefined, and new species definitions generated.

RESULTS: Genomics, coupled with culturomics, has led to the discovery of many novel bacterial species or genera, including Akkermansia muciniphila and Microvirga massiliensis. Using the genome to define species has been applied within the genus Klebsiella. A discontinuity or an abrupt break in the core/pan-genome ratio can uncover novel species.

CONCLUSIONS: Applying genomic and pan-genomic analyses to the reclassification of other bacterial species or genera will be important in the future of medical microbiology. The pan-genome is one of many new innovative tools in bacterial taxonomy.

REVIEWERS: This article was reviewed by William Martin, Eric Bapteste and James Mcinerney.

OPEN PEER REVIEW: Reviewed by William Martin, Eric Bapteste and James Mcinerney.

RevDate: 2019-02-25

Entwistle S, Li X, Y Yin (2019)

Orphan Genes Shared by Pathogenic Genomes Are More Associated with Bacterial Pathogenicity.

mSystems, 4(1): pii:mSystems00290-18.

Orphan genes (also known as ORFans [i.e., orphan open reading frames]) are new genes that enable an organism to adapt to its specific living environment. Our focus in this study is to compare ORFans between pathogens (P) and nonpathogens (NP) of the same genus. Using the pangenome idea, we have identified 130,169 ORFans in nine bacterial genera (505 genomes) and classified these ORFans into four groups: (i) SS-ORFans (P), which are only found in a single pathogenic genome; (ii) SS-ORFans (NP), which are only found in a single nonpathogenic genome; (iii) PS-ORFans (P), which are found in multiple pathogenic genomes; and (iv) NS-ORFans (NP), which are found in multiple nonpathogenic genomes. Within the same genus, pathogens do not always have more genes, more ORFans, or more pathogenicity-related genes (PRGs)-including prophages, pathogenicity islands (PAIs), virulence factors (VFs), and horizontal gene transfers (HGTs)-than nonpathogens. Interestingly, in pathogens of the nine genera, the percentages of PS-ORFans are consistently higher than those of SS-ORFans, which is not true in nonpathogens. Similarly, in pathogens of the nine genera, the percentages of PS-ORFans matching the four types of PRGs are also always higher than those of SS-ORFans, but this is not true in nonpathogens. All of these findings suggest the greater importance of PS-ORFans for bacterial pathogenicity. IMPORTANCE Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases.

RevDate: 2019-02-20

Lin JN, Lai CH, Yang CH, et al (2019)

Genomic Features, Comparative Genomics, and Antimicrobial Susceptibility Patterns of Elizabethkingia bruuniana.

Scientific reports, 9(1):2267 pii:10.1038/s41598-019-38998-6.

Elizabethkingia bruuniana is a novel species of the Elizabethkingia genus. There is scant information on this microorganism. Here, we report the whole-genome features and antimicrobial susceptibility patterns of E. bruuniana strain EM798-26. Elizabethkingia strain EM798-26 was initially identified as E. miricola. This isolate contained a circular genome of 4,393,011 bp. The whole-genome sequence-based phylogeny revealed that Elizabethkingia strain EM798-26 was in the same group of the type strain E. bruuniana G0146T. Both in silico DNA-DNA hybridization and average nucleotide identity analysis clearly demonstrated that Elizabethkingia strain EM798-26 was a species of E. bruuniana. The pan-genome analysis identified 2,875 gene families in the core genome and 5,199 gene families in the pan genome of eight publicly available E. bruuniana genome sequences. The unique genes accounted for 0.2-12.1% of the pan genome in each E. bruuniana. A total of 59 potential virulence factor homologs were predicted in the whole-genome of E. bruuniana strain EM798-26. This isolate was nonsusceptible to multiple antibiotics, but susceptible to aminoglycosides, minocycline, and levofloxacin. The whole-genome sequence analysis of E. bruuniana EM798-26 revealed 29 homologs of antibiotic resistance-related genes. This study presents the genomic features of E. bruuniana. Knowledge of the genomic characteristics provides valuable insights into a novel species.

RevDate: 2019-02-20

López-Pérez M, Jayakumar JM, Haro-Moreno JM, et al (2019)

Evolutionary Model of Cluster Divergence of the Emergent Marine Pathogen Vibrio vulnificus: From Genotype to Ecotype.

mBio, 10(1): pii:mBio.02852-18.

Vibrio vulnificus, an opportunistic pathogen, is the causative agent of a life-threatening septicemia and a rising problem for aquaculture worldwide. The genetic factors that differentiate its clinical and environmental strains remain enigmatic. Furthermore, clinical strains have emerged from every clade of V. vulnificus In this work, we investigated the underlying genomic properties and population dynamics of the V. vulnificus species from an evolutionary and ecological point of view. Genome comparisons and bioinformatic analyses of 113 V. vulnificus isolates indicate that the population of V. vulnificus is made up of four different clusters. We found evidence that recombination and gene flow between the two largest clusters (cluster 1 [C1] and C2) have drastically decreased to the point where they are diverging independently. Pangenome and phenotypic analyses showed two markedly different lifestyles for these two clusters, indicating commensal (C2) and bloomer (C1) ecotypes, with differences in carbohydrate utilization, defense systems, and chemotaxis, among other characteristics. Nonetheless, we identified frequent intra- and interspecies exchange of mobile genetic elements (e.g., antibiotic resistance plasmids, novel "chromids," or two different and concurrent type VI secretion systems) that provide high levels of genetic diversity in the population. Surprisingly, we identified strains from both clusters in the mucosa of aquaculture species, indicating that manmade niches are bringing strains from the two clusters together. We propose an evolutionary model of V. vulnificus that could be broadly applicable to other pathogenic vibrios and facultative bacterial pathogens to pursue strategies to prevent their infections and emergence.IMPORTANCEVibrio vulnificus is an emergent marine pathogen and is the cause of a deadly septicemia. However, the genetic factors that differentiate its clinical and environmental strains and its several biotypes remain mostly enigmatic. In this work, we investigated the underlying genomic properties and population dynamics of the V. vulnificus species to elucidate the traits that make these strains emerge as a human pathogen. The acquisition of different ecological determinants could have allowed the development of highly divergent clusters with different lifestyles within the same environment. However, we identified strains from both clusters in the mucosa of aquaculture species, indicating that manmade niches are bringing strains from the two clusters together, posing a potential risk of recombination and of emergence of novel variants. We propose a new evolutionary model that provides a perspective that could be broadly applicable to other pathogenic vibrios and facultative bacterial pathogens to pursue strategies to prevent their infections.

RevDate: 2019-02-20

Issa E, Salloum T, Panossian B, et al (2019)

Genome Mining and Comparative Analysis of Streptococcus intermedius Causing Brain Abscess in a Child.

Pathogens (Basel, Switzerland), 8(1): pii:pathogens8010022.

Streptococcus intermedius (SI) is associated with prolonged hospitalization and low survival rates. The genetic mechanisms involved in brain abscess development and genome evolution in comparison to other members of the Streptococcus anginosus group are understudied. We performed a whole-genome comparative analysis of an SI isolate, LAU_SINT, associated with brain abscess following sinusitis with all SI genomes in addition to S. constellatus and S. anginosus. Selective pressure on virulence factors, phages, pan-genome evolution and single-nucleotide polymorphism analysis were assessed. The structural details of the type seven secretion system (T7SS) was elucidated and compared with different organisms. ily and nanA were both abundant and conserved. Nisin resistance determinants were found in 47% of the isolates. Pan-genome and SNPs-based analysis didn't reveal significant geo-patterns. Our results showed that two SC isolates were misidentified as SI. We propose the presence of four T7SS modules (I⁻IV) located on various genomic islands. We detected a variety of factors linked to metal ions binding on the GIs carrying T7SS. This is the first detailed report characterizing the T7SS and its link to nisin resistance and metal ions binding in SI. These and yet uncharacterized T7SS transmembrane proteins merit further studies and could represent potential therapeutic targets.

RevDate: 2019-02-19

Grytten I, Rand KD, Nederbragt AJ, et al (2019)

Graph Peak Caller: Calling ChIP-seq peaks on graph-based reference genomes.

PLoS computational biology, 15(2):e1006731 pii:PCOMPBIOL-D-18-00533 [Epub ahead of print].

Graph-based representations are considered to be the future for reference genomes, as they allow integrated representation of the steadily increasing data on individual variation. Currently available tools allow de novo assembly of graph-based reference genomes, alignment of new read sets to the graph representation as well as certain analyses like variant calling and haplotyping. We here present a first method for calling ChIP-Seq peaks on read data aligned to a graph-based reference genome. The method is a graph generalization of the peak caller MACS2, and is implemented in an open source tool, Graph Peak Caller. By using the existing tool vg to build a pan-genome of Arabidopsis thaliana, we validate our approach by showing that Graph Peak Caller with a pan-genome reference graph can trace variants within peaks that are not part of the linear reference genome, and find peaks that in general are more motif-enriched than those found by MACS2.

RevDate: 2019-02-18

Lu QF, Cao DM, Su LL, et al (2019)

Genus-Wide Comparative Genomics Analysis of Neisseria to Identify New Genes Associated with Pathogenicity and Niche Adaptation of Neisseria Pathogens.

International journal of genomics, 2019:6015730.

N. gonorrhoeae and N. meningitidis, the only two human pathogens of Neisseria, are closely related species. But the niches they survived in and their pathogenic characteristics are distinctly different. However, the genetic basis of these differences has not yet been fully elucidated. In this study, comparative genomics analysis was performed based on 15 N. gonorrhoeae, 75 N. meningitidis, and 7 nonpathogenic Neisseria genomes. Core-pangenome analysis found 1111 conserved gene families among them, and each of these species groups had opening pangenome. We found that 452, 78, and 319 gene families were unique in N. gonorrhoeae, N. meningitidis, and both of them, respectively. Those unique gene families were regarded as candidates that related to their pathogenicity and niche adaptation. The relationships among them have been partly verified by functional annotation analysis. But at least one-third genes for each gene set have not found the certain functional information. Simple sequence repeat (SSR), the basis of gene phase variation, was found abundant in the membrane or related genes of each unique gene set, which may facilitate their adaptation to variable host environments. Protein-protein interaction (PPI) analysis found at least five distinct PPI clusters in N. gonorrhoeae and four in N. meningitides, and 167 and 52 proteins with unknown function were contained within them, respectively.

RevDate: 2019-02-14

Stevens MJA, Tasara T, Klumpp J, et al (2019)

Whole-genome-based phylogeny of Bacillus cytotoxicus reveals different clades within the species and provides clues on ecology and evolution.

Scientific reports, 9(1):1984 pii:10.1038/s41598-018-36254-x.

Bacillus cytotoxicus is a member of the Bacillus cereus group linked to fatal cases of diarrheal disease. Information on B. cytotoxicus is very limited; in particular comprehensive genomic data is lacking. Thus, we applied a genomic approach to characterize B. cytotoxicus and decipher its population structure. To this end, complete genomes of ten B. cytotoxicus were sequenced and compared to the four publicly available full B. cytotoxicus genomes and genomes of other B. cereus group members. Average nucleotide identity, core genome, and pan genome clustering resulted in clear distinction of B. cytotoxicus strains from other strains of the B. cereus group. Genomic content analyses showed that a hydroxyphenylalanine operon is present in B. cytotoxicus, but absent in all other members of the B. cereus group. It enables degradation of aromatic compounds to succinate and pyruvate and was likely acquired from another Bacillus species. It allows for utilization of tyrosine and might have given a B. cytotoxicus ancestor an evolutionary advantage resulting in species differentiation. Plasmid content showed that B. cytotoxicus is flexible in exchanging genes, allowing for quick adaptation to the environment. Genome-based phylogenetic analyses divided the B. cytotoxicus strains into four clades that also differed in virulence gene content.

RevDate: 2019-02-12

Lye ZN, MD Purugganan (2019)

Copy Number Variation in Domestication.

Trends in plant science pii:S1360-1385(19)30015-9 [Epub ahead of print].

Domesticated plants have long served as excellent models for studying evolution. Many genes and mutations underlying important domestication traits have been identified, and most causal mutations appear to be SNPs. Copy number variation (CNV) is an important source of genetic variation that has been largely neglected in studies of domestication. Ongoing work demonstrates the importance of CNVs as a source of genetic variation during domestication, and during the diversification of domesticated taxa. Here, we review how CNVs contribute to evolutionary processes underlying domestication, and review examples of domestication traits caused by CNVs. We draw from examples in plant species, but also highlight cases in animal systems that could illuminate the roles of CNVs in the domestication process.

RevDate: 2019-02-05

Arora S, Steuernagel B, Gaurav K, et al (2019)

Resistance gene cloning from a wild crop relative by sequence capture and association genetics.

Nature biotechnology, 37(2):139-143.

Disease resistance (R) genes from wild relatives could be used to engineer broad-spectrum resistance in domesticated crops. We combined association genetics with R gene enrichment sequencing (AgRenSeq) to exploit pan-genome variation in wild diploid wheat and rapidly clone four stem rust resistance genes. AgRenSeq enables R gene cloning in any crop that has a diverse germplasm panel.

RevDate: 2019-02-05

Zou Y, Xue W, Luo G, et al (2019)

1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses.

Nature biotechnology, 37(2):179-185.

Reference genomes are essential for metagenomic analyses and functional characterization of the human gut microbiota. We present the Culturable Genome Reference (CGR), a collection of 1,520 nonredundant, high-quality draft genomes generated from >6,000 bacteria cultivated from fecal samples of healthy humans. Of the 1,520 genomes, which were chosen to cover all major bacterial phyla and genera in the human gut, 264 are not represented in existing reference genome catalogs. We show that this increase in the number of reference bacterial genomes improves the rate of mapping metagenomic sequencing reads from 50% to >70%, enabling higher-resolution descriptions of the human gut microbiome. We use the CGR genomes to annotate functions of 338 bacterial species, showing the utility of this resource for functional studies. We also carry out a pan-genome analysis of 38 important human gut species, which reveals the diversity and specificity of functional enrichment between their core and dispensable genomes.

RevDate: 2019-02-04

McCarthy CGP, DA Fitzpatrick (2019)

Pan-genome analyses of model fungal species.

Microbial genomics [Epub ahead of print].

The concept of the species 'pan-genome', the union of 'core' conserved genes and all 'accessory' non-conserved genes across all strains of a species, was first proposed in prokaryotes to account for intraspecific variability. Species pan-genomes have been extensively studied in prokaryotes, but evidence of species pan-genomes has also been demonstrated in eukaryotes such as plants and fungi. Using a previously published methodology based on sequence homology and conserved microsynteny, in addition to bespoke pipelines, we have investigated the pan-genomes of four model fungal species: Saccharomyces cerevisiae, Candida albicans, Cryptococcus neoformans var. grubii and Aspergillus fumigatus. Between 80 and 90 % of gene models per strain in each of these species are core genes that are highly conserved across all strains of that species, many of which are involved in housekeeping and conserved survival processes. In many of these species, the remaining 'accessory' gene models are clustered within subterminal regions and may be involved in pathogenesis and antimicrobial resistance. Analysis of the ancestry of species core and accessory genomes suggests that fungal pan-genomes evolve by strain-level innovations such as gene duplication as opposed to wide-scale horizontal gene transfer. Our findings lend further supporting evidence to the existence of species pan-genomes in eukaryote taxa.

RevDate: 2019-02-02

Lugli GA, Mancino W, Milani C, et al (2019)

Dissecting the evolutionary development of the Bifidobacterium animalis species through comparative genomics analyses.

Applied and environmental microbiology pii:AEM.02806-18 [Epub ahead of print].

Bifidobacteria are members of the gut microbiota of animals, including mammals, birds and social insects. In this study, we analyzed and determined the pan-genome of Bifidobacterium animalis species, encompassing Bifidobacterium animalis subsp. animalis and the Bifidobacterium animalis subsp. lactis taxon, which is one of the most intensely exploited probiotic bifidobacterial species. In order to reveal differences within the B. animalis species, detailed comparative genomics and phylogenomics analyses were performed, indicating that these two subspecies recently arose through divergent evolutionary events. A subspecies-specific core genome was identified for both B. animalis subspecies, revealing the existence of subspecies-defining genes involved in carbohydrate metabolism. Notably, these in silico analyses coupled with carbohydrate profiling assays suggest genetic adaptations toward a distinct glycan milieu for each member of the B. animalis subspecies, resulting in a divergent evolutionary development of the two subspecies.IMPORTANCEThe majority of characterized B. animalis strains have been isolated from human fecal samples. In order to explore genome variability within this species, we isolated 15 novel strains from the GIT of different animals, including mammals and birds. The current study allowed us to reconstruct the pan-genome of this taxon, including the genome contents of 56 B. animalis strains. Through careful assessment of subspecies-specific core-genes of the B. animalis subsp. animalis/lactis taxon, we identified genes encoding enzymes involved in carbohydrate transport and metabolism, while unveiling specific gene-acquisition and -loss events that caused the evolutionary emergence of these two subspecies.

RevDate: 2019-02-01

Nono AD, Chen K, X Liu (2019)

Comparison of different functional prediction scores using a gene-based permutation model for identifying cancer driver genes.

BMC medical genomics, 12(Suppl 1):22 pii:10.1186/s12920-018-0452-9.

BACKGROUND: Identifying cancer driver genes (CDG) is a crucial step in cancer genomic toward the advancement of precision medicine. However, driver gene discovery is a very challenging task because we are not only dealing with huge amount of data; but we are also faced with the complexity of the disease including the heterogeneity of background somatic mutation rate in each cancer patient. It is generally accepted that CDG harbor variants conferring growth advantage in the malignant cell and they are positively selected, which are critical to cancer development; whereas, non-driver genes harbor random mutations with no functional consequence on cancer. Based on this fact, function prediction based approaches for identifying CDG have been proposed to interrogate the distribution of functional predictions among mutations in cancer genomes (eLS 1-16, 2016). Assuming most of the observed mutations are passenger mutations and given the quantitative predictions for the functional impact of the mutations, genes enriched of functional or deleterious mutations are more likely to be drivers. The promises of these methods have been continually refined and can therefore be applied to increase accuracy in detecting new candidate CDGs. However, current function prediction based approaches only focus on coding mutations and lack a systematic way to pick the best mutation deleteriousness prediction algorithms for usage.

RESULTS: In this study, we propose a new function prediction based approach to discover CDGs through a gene-based permutation approach. Our method not only covers both coding and non-coding regions of the genes; but it also accounts for the heterogeneous mutational context in cohort of cancer patients. The permutation model was implemented independently using seven popular deleteriousness prediction scores covering splicing regions (SPIDEX), coding regions (MetaLR, and VEST3) and pan-genome (CADD, DANN, Fathmm-MKL coding and Fathmm-MKL noncoding). We applied this new approach to somatic single nucleotide variants (SNVs) from whole-genome sequences of 119 breast and 24 lung cancer patients and compared the seven deleteriousness prediction scores for their performance in this study.

CONCLUSION: The new function prediction based approach not only predicted known cancer genes listed in the Cancer Gene Census (CGC), but also new candidate CDGs that are worth further investigation. The results showed the advantage of utilizing pan-genome deleteriousness prediction scores in function prediction based methods. Although VEST3 score, a deleteriousness prediction score for missense mutations, has the best performance in breast cancer, it was topped by CADD and Fathmm-MKL coding, two pan-genome deleteriousness prediction scores, in lung cancer.

RevDate: 2019-02-03

Leviatan S, E Segal (2019)

A Significant Expansion of Our Understanding of the Composition of the Human Microbiome.

mSystems, 4(1): pii:mSystems00010-19.

Shotgun sequencing of samples taken from the human microbiome often reveals only partial mapping of the sequenced metagenomic reads to existing reference genomes. Such partial mappability indicates that many genomes are missing in our reference genome set. This is particularly true for non-Western populations and for samples that do not originate from the gut. Pasolli et al. (E. Pasolli, F. Asnicar, S. Manara, M. Zolfo, et al., Cell, 2019, perform a grand effort to expand the reference set, and to better classify its members, revealing a wider pangenome of existing species as well as identifying new species of previously unknown taxonomic branches.

RevDate: 2019-01-30

Sánchez-Osuna M, Cortés P, Barbé J, et al (2018)

Origin of the Mobile Di-Hydro-Pteroate Synthase Gene Determining Sulfonamide Resistance in Clinical Isolates.

Frontiers in microbiology, 9:3332.

Sulfonamides are synthetic chemotherapeutic agents that work as competitive inhibitors of the di-hydro-pteroate synthase (DHPS) enzyme, encoded by the folP gene. Resistance to sulfonamides is widespread in the clinical setting and predominantly mediated by plasmid- and integron-borne sul1-3 genes encoding mutant DHPS enzymes that do not bind sulfonamides. In spite of their clinical importance, the genetic origin of sul1-3 genes remains unknown. Here we analyze sul genes and their genetic neighborhoods to uncover sul signature elements that enable the elucidation of their genetic origin. We identify a protein sequence Sul motif associated with sul-encoded proteins, as well as consistent association of a phosphoglucosamine mutase gene (glmM) with the sul2 gene. We identify chromosomal folP genes bearing these genetic markers in two bacterial families: the Rhodobiaceae and the Leptospiraceae. Bayesian phylogenetic inference of FolP/Sul and GlmM protein sequences clearly establishes that sul1-2 and sul3 genes originated as a mobilization of folP genes present in, respectively, the Rhodobiaceae and the Leptospiraceae, and indicate that the Rhodobiaceae folP gene was transferred from the Leptospiraceae. Analysis of %GC content in folP/sul gene sequences supports the phylogenetic inference results and indicates that the emergence of the Sul motif in chromosomally encoded FolP proteins is ancient and considerably predates the clinical introduction of sulfonamides. In vitro assays reveal that both the Rhodobiaceae and the Leptospiraceae, but not other related chromosomally encoded FolP proteins confer resistance in a sulfonamide-sensitive Escherichia coli background, indicating that the Sul motif is associated with sulfonamide resistance. Given the absence of any known natural sulfonamides targeting DHPS, these results provide a novel perspective on the emergence of resistance to synthetic chemotherapeutic agents, whereby preexisting resistant variants in the vast bacterial pangenome may be rapidly selected for and disseminated upon the clinical introduction of novel chemotherapeuticals.

RevDate: 2019-01-30

Slama HB, Cherif-Silini H, Chenari Bouket A, et al (2018)

Screening for Fusarium Antagonistic Bacteria From Contrasting Niches Designated the Endophyte Bacillus halotolerans as Plant Warden Against Fusarium.

Frontiers in microbiology, 9:3236.

Date palm (Phoenix dactylifera L.) plantations in North Africa are nowadays threatened with the spread of the Bayoud disease caused by Fusarium oxysporum f. sp. albedinis, already responsible for destroying date production in other infected areas, mainly in Morocco. Biological control holds great promise for sustainable and environmental-friendly management of the disease. In this study, the additional benefits to agricultural ecosystems of using plant growth promoting rhizobacteria (PGPR) or endophytes are addressed. First, PGPR or endophytes can offer an interesting bio-fertilization, meaning that it can add another layer to the sustainability of the approach. Additionally, screening of contrasting niches can yield bacterial actors that could represent wardens against whole genera or groups of plant pathogenic agents thriving in semi-arid to arid ecosystems. Using this strategy, we recovered four bacterial isolates, designated BFOA1, BFOA2, BFOA3 and BFOA4, that proved very active against F. oxysporum f. sp. albedinis. BFOA1-BFOA4 proved also active against 16 Fusarium isolates belonging to four species: F. oxysporum (with strains phytopathogenic of Olea europaea and tomato), F. solani (with different strains attacking O. europaea and potato), F. acuminatum (pathogenic on O. europaea) and F. chlamydosporum (phytopathogenic of O. europaea). BFOA1-BFOA4 bacterial isolates exhibited strong activities against another four major phytopathogens: Botrytis cinerea, Alternaria alternata, Phytophthora infestans, and Rhizoctonia bataticola. Isolates BFOA1-BFOA4 had the ability to grow at temperatures up to 35°C, pH range of 5-10, and tolerate high concentrations of NaCl and up to 30% PEG. The isolates also showed relevant direct and indirect PGP features, including growth on nitrogen-free medium, phosphate solubilization and auxin biosynthesis, as well as resistance to metal and xenobiotic stress. Phylogenomic analysis of BFOA1-BFOA4 isolates indicated that they all belong to Bacillus halotolerans, which could therefore considered as a warden against Fusarium infection in plants. Comparative genomics allowed us to functionally describe the open pan genome of B. halotolerans and LC-HRMS and GCMS analyses, enabling the description of diverse secondary metabolites including pulegone, 2-undecanone, and germacrene D, with important antimicrobial and insecticidal properties. In conclusion, B. halotolerans could be used as an efficient bio-fertilizer and bio-control agent in semi-arid and arid ecosystems.

RevDate: 2019-01-30

Arabaghian H, Salloum T, Alousi S, et al (2019)

Molecular Characterization of Carbapenem Resistant Klebsiella pneumoniae and Klebsiella quasipneumoniae Isolated from Lebanon.

Scientific reports, 9(1):531 pii:10.1038/s41598-018-36554-2.

Klebsiella pneumoniae is a Gram-negative organism and a major public health threat. In this study, we used whole-genome sequences to characterize 32 carbapenem-resistant K. pneumoniae (CRKP) and two carbapenem-resistant K. quasipneumoniae (CRKQ). Antimicrobial resistance was assessed using disk diffusion and E-test, while virulence was assessed in silico. The capsule type was determined by sequencing the wzi gene. The plasmid diversity was assessed by PCR-based replicon typing to detect the plasmid incompatibility (Inc) groups. The genetic relatedness was determined by multilocus sequence typing, pan-genome, and recombination analysis. All of the isolates were resistant to ertapenem together with imipenem and/or meropenem. Phenotypic resistance was due to blaOXA-48, blaNDM-1, blaNDM-7, or the coupling of ESBLs and outer membrane porin modifications. This is the first comprehensive study reporting on the WGS of CRKP and the first detection of CRKQ in the region. The presence and dissemination of CRKP and CRKQ, with some additionally having characteristics of hypervirulent clones such as the hypermucoviscous phenotype and the capsular type K2, are particularly concerning. Additionally, mining the completely sequenced K. pneumoniae genomes revealed the key roles of mobile genetic elements in the spread of antibiotic resistance and in understanding the epidemiology of these clinically significant pathogens.

RevDate: 2019-01-24

Zhu D, He J, Yang Z, et al (2019)

Comparative analysis reveals the Genomic Islands in Pasteurella multocida population genetics: on Symbiosis and adaptability.

BMC genomics, 20(1):63 pii:10.1186/s12864-018-5366-6.

BACKGROUND: Pasteurella multocida (P. multocida) is a widespread opportunistic pathogen that infects human and various animals. Genomic Islands (GIs) are one of the most important mobile components that quickly help bacteria acquire large fragments of foreign genes. However, the effects of GIs on P. multocida are unknown in the evolution of bacterial populations.

RESULTS: Ten avian-sourced P. multocida obtained through high-throughput sequencing together with 104 publicly available P. multocida genomes were used to analyse their population genetics, thus constructed a pan-genome containing 3948 protein-coding genes. Through the pan-genome, the open evolutionary pattern of P. multocida was revealed, and the functional components of 944 core genes, 2439 accessory genes and 565 unique genes were analysed. In addition, a total of 280 GIs were predicted in all strains. Combined with the pan-genome of P. multocida, the GIs accounted for 5.8% of the core genes in the pan-genome, mainly related to functional metabolic activities; the accessory genes accounted for 42.3%, mainly for the enrichment of adaptive genes; and the unique genes accounted for 35.4%, containing some defence mechanism-related genes.

CONCLUSIONS: The effects of GIs on the population genetics of P. multocida evolution and adaptation to the environment are reflected by the proportion and function of the pan-genome acquired from GIs, and the large quantities of GI data will aid in additional population genetics studies.

RevDate: 2019-02-06

Wilson K, B Ely (2019)

Analyses of four new Caulobacter Phicbkviruses indicate independent lineages.

The Journal of general virology, 100(2):321-331.

Bacteriophages with genomes larger than 200 kbp are considered giant phages, and the giant Phicbkviruses are the most frequently isolated Caulobacter crescentus phages. In this study, we compare six bacteriophage genomes that differ from the genomes of the majority of Phicbkviruses. Four of these genomes are much larger than those of the rest of the Phicbkviruses, with genome sizes that are more than 250 kbp. A comparison of 16 Phicbkvirus genomes identified a 'core genome' of 69 genes that is present in all of these Phicbkvirus genomes, as well as shared accessory genes and genes that are unique for each phage. Most of the core genes are clustered into the regions coding for structural proteins or those involved in DNA replication. A phylogenetic analysis indicated that these 16 CaulobacterPhicbkvirus genomes are related, but they represent four distinct branches of the Phicbkvirus genomic tree with distantly related branches sharing little nucleotide homology. In contrast, pairwise comparisons within each branch of the phylogenetic tree showed that more than 80 % of the entire genome is shared among phages within a group. This conservation of the genomes within each branch indicates that horizontal gene transfer events between the groups are rare. Therefore, the Phicbkvirus genus consists of at least four different phylogenetic branches that are evolving independently from one another. One of these branches contains a 27-gene inversion relative to the other three branches. Also, an analysis of the tRNA genes showed that they are relatively mobile within the Phicbkvirus genus.

RevDate: 2019-02-01

Sherman RM, Forman J, Antonescu V, et al (2019)

Author Correction: Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Nature genetics, 51(2):364.

In the version of this article initially published, the statement "there are no pan-genomes for any other animal or plant species" was incorrect. The statement has been corrected to "there are no reported pan-genomes for any other animal species, to our knowledge." We thank David Edwards for bringing this error to our attention. The error has been corrected in the HTML and PDF versions of the article.

RevDate: 2019-01-18

Blaustein RA, McFarland AG, Ben Maamar S, et al (2019)

Pangenomic Approach To Understanding Microbial Adaptations within a Model Built Environment, the International Space Station, Relative to Human Hosts and Soil.

mSystems, 4(1): pii:mSystems00281-18.

Understanding underlying mechanisms involved in microbial persistence in the built environment (BE) is essential for strategically mitigating potential health risks. To test the hypothesis that BEs impose selective pressures resulting in characteristic adaptive responses, we performed a pangenomics meta-analysis leveraging 189 genomes (accessed from GenBank) of two epidemiologically important taxa, Bacillus cereus and Staphylococcus aureus, isolated from various origins: the International Space Station (ISS; a model BE), Earth-based BEs, soil, and humans. Our objectives were to (i) identify differences in the pangenomic composition of generalist and host-associated organisms, (ii) characterize genes and functions involved in BE-associated selection, and (iii) identify genomic signatures of ISS-derived strains of potential relevance for astronaut health. The pangenome of B. cereus was more expansive than that of S. aureus, which had a dominant core component. Genomic contents of both taxa significantly correlated with isolate origin, demonstrating an importance for biogeography and potential niche adaptations. ISS/BE-enriched functions were often involved in biosynthesis, catabolism, materials transport, metabolism, and stress response. Multiple origin-enriched functions also overlapped across taxa, suggesting conserved adaptive processes. We further characterized two mobile genetic elements with local neighborhood genes encoding biosynthesis and stress response functions that distinctively associated with B. cereus from the ISS. Although antibiotic resistance genes were present in ISS/BE isolates, they were also common in counterparts elsewhere. Overall, despite differences in microbial lifestyle, some functions appear common to remaining viable in the BE, and those functions are not typically associated with direct impacts on human health. IMPORTANCE The built environment contains a variety of microorganisms, some of which pose critical human health risks (e.g., hospital-acquired infection, antibiotic resistance dissemination). We uncovered a combination of complex biological functions that may play a role in bacterial survival under the presumed selective pressures in a model built environment-the International Space Station-by using an approach to compare pangenomes of bacterial strains from two clinically relevant species (B. cereus and S. aureus) isolated from both built environments and humans. Our findings suggest that the most crucial bacterial functions involved in this potential adaptive response are specific to bacterial lifestyle and do not appear to have direct impacts on human health.

RevDate: 2019-01-15

Abreo E, N Altier (2019)

Pangenome of Serratia marcescens strains from nosocomial and environmental origins reveals different populations and the links between them.

Scientific reports, 9(1):46 pii:10.1038/s41598-018-37118-0.

Serratia marcescens is a Gram-negative bacterial species that can be found in a wide range of environments like soil, water and plant surfaces, while it is also known as an opportunistic human pathogen in hospitals and as a plant growth promoting bacteria (PGPR) in crops. We have used a pangenome-based approach, based on publicly available genomes, to apply whole genome multilocus sequence type schemes to assess whether there is an association between source and genotype, aiming at differentiating between isolates from nosocomial sources and the environment, and between strains reported as PGPR from other environmental strains. Most genomes from a nosocomial setting and environmental origin could be assigned to the proposed nosocomial or environmental MLSTs, which is indicative of an association between source and genotype. The fact that a few genomes from a nosocomial source showed an environmental MLST suggests that a minority of nosocomial strains have recently derived from the environment. PGPR strains were assigned to different environmental types and clades but only one clade comprised strains accumulating a low number of known virulence and antibiotic resistance determinants and was exclusively from environmental sources. This clade is envisaged as a group of promissory MLSTs for selecting prospective PGPR strains.

RevDate: 2019-01-11

Hisham Y, Y Ashhab (2018)

Identification of Cross-Protective Potential Antigens against Pathogenic Brucella spp. through Combining Pan-Genome Analysis with Reverse Vaccinology.

Journal of immunology research, 2018:1474517.

Brucellosis is a zoonotic infectious disease caused by bacteria of the genus Brucella. Brucella melitensis, Brucella abortus, and Brucella suis are the most pathogenic species of this genus causing the majority of human and domestic animal brucellosis. There is a need to develop a safe and potent subunit vaccine to overcome the serious drawbacks of the live attenuated Brucella vaccines. The aim of this work was to discover antigen candidates conserved among the three pathogenic species. In this study, we employed a reverse vaccinology strategy to compute the core proteome of 90 completed genomes: 55 B. melitensis, 17 B. abortus, and 18 B. suis. The core proteome was analyzed by a metasubcellular localization prediction pipeline to identify surface-associated proteins. The identified proteins were thoroughly analyzed using various in silico tools to obtain the most potential protective antigens. The number of core proteins obtained from analyzing the 90 proteomes was 1939 proteins. The surface-associated proteins were 177. The number of potential antigens was 87; those with adhesion score ≥ 0.5 were considered antigen with "high potential," while those with a score of 0.4-0.5 were considered antigens with "intermediate potential." According to a cumulative score derived from protein antigenicity, density of MHC-I and MHC-II epitopes, MHC allele coverage, and B-cell epitope density scores, a final list of 34 potential antigens was obtained. Remarkably, most of the 34 proteins are associated with bacterial adhesion, invasion, evasion, and adaptation to the hostile intracellular environment of macrophages which is adjusted to deprive Brucella of required nutrients. Our results provide a manageable list of potential protective antigens for developing a potent vaccine against brucellosis. Moreover, our elaborated analysis can provide further insights into novel Brucella virulence factors. Our next step is to test some of these antigens using an appropriate antigen delivery system.

RevDate: 2019-01-10

Livingstone PG, Morphew RM, DE Whitworth (2018)

Genome Sequencing and Pan-Genome Analysis of 23 Corallococcus spp. Strains Reveal Unexpected Diversity, With Particular Plasticity of Predatory Gene Sets.

Frontiers in microbiology, 9:3187.

Corallococcus is an abundant genus of predatory soil myxobacteria, containing two species, C. coralloides (for which a genome sequence is available) and C. exiguus. To investigate the genomic basis of predation, we genome-sequenced 23 Corallococcus strains. Genomic similarity metrics grouped the sequenced strains into at least nine distinct genomospecies, divided between two major sub-divisions of the genus, encompassing previously described diversity. The Corallococcus pan-genome was found to be open, with strains exhibiting highly individual gene sets. On average, only 30.5% of each strain's gene set belonged to the core pan-genome, while more than 75% of the accessory pan-genome genes were present in less than four of the 24 genomes. The Corallococcus accessory pan-proteome was enriched for the COG functional category "Secondary metabolism," with each genome containing on average 55 biosynthetic gene clusters (BGCs), of which only 20 belonged to the core pan-genome. Predatory activity was assayed against ten prey microbes and found to be mostly incongruent with phylogeny or BGC complement. Thus, predation seems multifactorial, depending partially on BGC complement, but also on the accessory pan-genome - genes most likely acquired horizontally. These observations encourage further exploration of Corallococcus as a source for novel bioactive secondary metabolites and predatory proteins.

RevDate: 2019-01-10

Wu Y, Zaiden N, B Cao (2018)

The Core- and Pan-Genomic Analyses of the Genus Comamonas: From Environmental Adaptation to Potential Virulence.

Frontiers in microbiology, 9:3096.

Comamonas is often reported to be one of the major members of microbial communities in various natural and engineered environments. Versatile catabolic capabilities of Comamonas have been studied extensively in the last decade. In contrast, little is known about the ecological roles and adaptation of Comamonas to different environments as well as the virulence of potentially pathogenic Comamonas strains. In this study, we provide genomic insights into the potential ecological roles and virulence of Comamonas by analysing the entire gene set (pangenome) and the genes present in all genomes (core genome) using 34 genomes of 11 different Comamonas species. The analyses revealed that the metabolic pathways enabling Comamonas to acquire energy from various nutrient sources are well conserved. Genes for denitrification and ammonification are abundant in Comamonas, suggesting that Comamonas plays an important role in the nitrogen biogeochemical cycle. They also encode sophisticated redox sensory systems and diverse c-di-GMP controlling systems, allowing them to be able to effectively adjust their biofilm lifestyle to changing environments. The virulence factors in Comamonas were found to be highly species-specific. The conserved strategies used by potentially pathogenic Comamonas for surface adherence, motility control, nutrient acquisition and stress tolerance were also revealed.

RevDate: 2019-01-10

Kayani MR, Zheng YC, Xie FC, et al (2018)

Genome Sequences and Comparative Analysis of Two Extended-Spectrum Extensively-Drug Resistant Mycobacterium tuberculosis Strains.

Frontiers in pharmacology, 9:1492.

RevDate: 2019-01-04

diCenzo GC, Mengoni A, E Perrin (2019)

Chromids aid genome expansion and functional diversification in the family Burkholderiaceae.

Molecular biology and evolution pii:5273485 [Epub ahead of print].

Multipartite genomes, containing at least two large replicons, are found in diverse bacteria; however, the advantage of this genome structure remains incompletely understood. Here, we perform comparative genomics of hundreds of finished β-proteobacterial genomes to gain insights into the role and emergence of multipartite genomes. Nearly all essential secondary replicons (chromids) of the β-proteobacteria are found in the family Burkholderiaceae. These replicons arose from just two plasmid acquisition events, and they were likely stabilized early in their evolution by the presence of core genes. On average, Burkholderiaceae genera with multipartite genomes had a larger total genome size, but smaller chromosome, than genera without secondary replicons. Pangenome-level functional enrichment analyses suggested that inter-replicon functional biases are partially driven by the enrichment of secondary replicons in the accessory pangenome fraction. Nevertheless, the small overlap in orthologous groups present in each replicon's pangenome indicates a clear functional separation of the replicons. Chromids appeared biased to environmental adaptation, as the functional categories enriched on chromids were also over-represented on the chromosomes of the environmental genera (Paraburkholderia, Cupriavidus) compared to the pathogenic genera (Burkholderia, Ralstonia). Using ancestral state reconstruction, it was predicted that the rate of accumulation of modern-day genes by chromids was more rapid than the rate of gene accumulation by the chromosomes. Overall, the data are consistent with a model where the primary advantage of secondary replicons is in facilitating increased rates of gene acquisition through horizontal gene transfer, consequently resulting in a replicon enriched in genes associated with adaptation to novel environments.

RevDate: 2019-01-15

Dillon MM, Thakur S, Almeida RND, et al (2019)

Recombination of ecologically and evolutionarily significant loci maintains genetic cohesion in the Pseudomonas syringae species complex.

Genome biology, 20(1):3 pii:10.1186/s13059-018-1606-y.

BACKGROUND: Pseudomonas syringae is a highly diverse bacterial species complex capable of causing a wide range of serious diseases on numerous agronomically important crops. We examine the evolutionary relationships of 391 agricultural and environmental strains using whole-genome sequencing and evolutionary genomic analyses.

RESULTS: We describe the phylogenetic distribution of all 77,728 orthologous gene families in the pan-genome, reconstruct the core genome phylogeny using the 2410 core genes, hierarchically cluster the accessory genome, identify the diversity and distribution of type III secretion systems and their effectors, predict ecologically and evolutionary relevant loci, and establish the molecular evolutionary processes operating on gene families. Phylogenetic and recombination analyses reveals that the species complex is subdivided into primary and secondary phylogroups, with the former primarily comprised of agricultural isolates, including all of the well-studied P. syringae strains. In contrast, the secondary phylogroups include numerous environmental isolates. These phylogroups also have levels of genetic diversity typically found among distinct species. An analysis of rates of recombination within and between phylogroups revealed a higher rate of recombination within primary phylogroups than between primary and secondary phylogroups. We also find that "ecologically significant" virulence-associated loci and "evolutionarily significant" loci under positive selection are over-represented among loci that undergo inter-phylogroup genetic exchange.

CONCLUSIONS: While inter-phylogroup recombination occurs relatively rarely, it is an important force maintaining the genetic cohesion of the species complex, particularly among primary phylogroup strains. This level of genetic cohesion, and the shared plant-associated niche, argues for considering the primary phylogroups as a single biological species.

RevDate: 2019-01-10

Hübner S, Bercovich N, Todesco M, et al (2019)

Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance.

Nature plants, 5(1):54-62.

Domesticated plants and animals often display dramatic responses to selection, but the origins of the genetic diversity underlying these responses remain poorly understood. Despite domestication and improvement bottlenecks, the cultivated sunflower remains highly variable genetically, possibly due to hybridization with wild relatives. To characterize genetic diversity in the sunflower and to quantify contributions from wild relatives, we sequenced 287 cultivated lines, 17 Native American landraces and 189 wild accessions representing 11 compatible wild species. Cultivar sequences failing to map to the sunflower reference were assembled de novo for each genotype to determine the gene repertoire, or 'pan-genome', of the cultivated sunflower. Assembled genes were then compared to the wild species to estimate origins. Results indicate that the cultivated sunflower pan-genome comprises 61,205 genes, of which 27% vary across genotypes. Approximately 10% of the cultivated sunflower pan-genome is derived through introgression from wild sunflower species, and 1.5% of genes originated solely through introgression. Gene ontology functional analyses further indicate that genes associated with biotic resistance are over-represented among introgressed regions, an observation consistent with breeding records. Analyses of allelic variation associated with downy mildew resistance provide an example in which such introgressions have contributed to resistance to a globally challenging disease.

RevDate: 2019-02-05

Tao Y, Zhao X, Mace E, et al (2019)

Exploring and Exploiting Pan-genomics for Crop Improvement.

Molecular plant, 12(2):156-169.

Genetic variation ranging from single-nucleotide polymorphisms to large structural variants (SVs) can cause variation of gene content among individuals within the same species. There is an increasing appreciation that a single reference genome is insufficient to capture the full landscape of genetic diversity of a species. Pan-genome analysis offers a platform to evaluate the genetic diversity of a species via investigation of its entire genome repertoire. Although a recent wave of pan-genomic studies has shed new light on crop diversity and improvement using advanced sequencing technology, the potential applications of crop pan-genomics in crop improvement are yet to be fully exploited. In this review, we highlight the progress achieved in understanding crop pan-genomics, discuss biological activities that cause SVs, review important agronomical traits affected by SVs, and present our perspective on the application of pan-genomics in crop improvement.

RevDate: 2019-01-08

Brankovics B, Kulik T, Sawicki J, et al (2018)

First steps towards mitochondrial pan-genomics: detailed analysis of Fusarium graminearum mitogenomes.

PeerJ, 6:e5963 pii:5963.

There is a gradual shift from representing a species' genome by a single reference genome sequence to a pan-genome representation. Pan-genomes are the abstract representations of the genomes of all the strains that are present in the population or species. In this study, we employed a pan-genomic approach to analyze the intraspecific mitochondrial genome diversity of Fusarium graminearum. We present an improved reference mitochondrial genome for F. graminearum with an intron-exon annotation that was verified using RNA-seq data. Each of the 24 studied isolates had a distinct mitochondrial sequence. Length variation in the F. graminearum mitogenome was found to be largely due to variation of intron regions (99.98%). The "intronless" mitogenome length was found to be quite stable and could be informative when comparing species. The coding regions showed high conservation, while the variability of intergenic regions was highest. However, the most important variable parts are the intron regions, because they contain approximately half of the variable sites, make up more than half of the mitogenome, and show presence/absence variation. Furthermore, our analyses show that the mitogenome of F. graminearum is recombining, as was previously shown in F. oxysporum, indicating that mitogenome recombination is a common phenomenon in Fusarium. The majority of mitochondrial introns in F. graminearum belongs to group I introns, which are associated with homing endonuclease genes (HEGs). Mitochondrial introns containing HE genes may spread within populations through homing, where the endonuclease recognizes and cleaves the recognition site in the target gene. After cleavage of the "host" gene, it is replaced by the gene copy containing the intron with HEG. We propose to use introns unique to a population for tracking the spread of the given population, because introns can spread through vertical inheritance, recombination as well as via horizontal transfer. We demonstrate how pooled sequencing of strains can be used for mining mitogenome data. The usage of pooled sequencing offers a scalable solution for population analysis and for species level comparisons studies. This study may serve as a basis for future mitochondrial genome variability studies and representations.

RevDate: 2019-01-08

Bochkareva OO, Moroz EV, Davydov II, et al (2018)

Genome rearrangements and selection in multi-chromosome bacteria Burkholderia spp.

BMC genomics, 19(1):965 pii:10.1186/s12864-018-5245-1.

BACKGROUND: The genus Burkholderia consists of species that occupy remarkably diverse ecological niches. Its best known members are important pathogens, B. mallei and B. pseudomallei, which cause glanders and melioidosis, respectively. Burkholderia genomes are unusual due to their multichromosomal organization, generally comprised of 2-3 chromosomes.

RESULTS: We performed integrated genomic analysis of 127 Burkholderia strains. The pan-genome is open with the saturation to be reached between 86,000 and 88,000 genes. The reconstructed rearrangements indicate a strong avoidance of intra-replichore inversions that is likely caused by selection against the transfer of large groups of genes between the leading and the lagging strands. Translocated genes also tend to retain their position in the leading or the lagging strand, and this selection is stronger for large syntenies. Integrated reconstruction of chromosome rearrangements in the context of strains phylogeny reveals parallel rearrangements that may indicate inversion-based phase variation and integration of new genomic islands. In particular, we detected parallel inversions in the second chromosomes of B. pseudomallei with breakpoints formed by genes encoding membrane components of multidrug resistance complex, that may be linked to a phase variation mechanism. Two genomic islands, spreading horizontally between chromosomes, were detected in the B. cepacia group.

CONCLUSIONS: This study demonstrates the power of integrated analysis of pan-genomes, chromosome rearrangements, and selection regimes. Non-random inversion patterns indicate selective pressure, inversions are particularly frequent in a recent pathogen B. mallei, and, together with periods of positive selection at other branches, may indicate adaptation to new niches. One such adaptation could be a possible phase variation mechanism in B. pseudomallei.

RevDate: 2019-01-08

Tyakht AV, Manolov AI, Kanygina AV, et al (2018)

Genetic diversity of Escherichia coli in gut microbiota of patients with Crohn's disease discovered using metagenomic and genomic analyses.

BMC genomics, 19(1):968 pii:10.1186/s12864-018-5306-5.

BACKGROUND: Crohn's disease is associated with gut dysbiosis. Independent studies have shown an increase in the abundance of certain bacterial species, particularly Escherichia coli with the adherent-invasive pathotype, in the gut. The role of these species in this disease needs to be elucidated.

METHODS: We performed a metagenomic study investigating the gut microbiota of patients with Crohn's disease. A metagenomic reconstruction of the consensus genome content of the species was used to assess the genetic variability.

RESULTS: The abnormal shifts in the microbial community structures in Crohn's disease were heterogeneous among the patients. The metagenomic data suggested the existence of multiple E. coli strains within individual patients. We discovered that the genetic diversity of the species was high and that only a few samples manifested similarity to the adherent-invasive varieties. The other species demonstrated genetic diversity comparable to that observed in the healthy subjects. Our results were supported by a comparison of the sequenced genomes of isolates from the same microbiota samples and a meta-analysis of published gut metagenomes.

CONCLUSIONS: The genomic diversity of Crohn's disease-associated E. coli within and among the patients paves the way towards an understanding of the microbial mechanisms underlying the onset and progression of the Crohn's disease and the development of new strategies for the prevention and treatment of this disease.

RevDate: 2019-01-09

Cheleuitte-Nieves C, Gulvik CA, McQuiston JR, et al (2018)

Genotypic differences between strains of the opportunistic pathogen Corynebacterium bovis isolated from humans, cows, and rodents.

PloS one, 13(12):e0209231 pii:PONE-D-18-24681.

Corynebacterium bovis is an opportunistic bacterial pathogen shown to cause eye and prosthetic joint infections as well as abscesses in humans, mastitis in dairy cattle, and skin disease in laboratory mice and rats. Little is known about the genetic characteristics and genomic diversity of C. bovis because only a single draft genome is available for the species. The overall aim of this study was to sequence and compare the genome of C. bovis isolates obtained from different species, locations, and time points. Whole-genome sequencing was conducted on 20 C. bovis isolates (six human, four bovine, nine mouse and one rat) using the Illumina MiSeq platform and submitted to various comparative analysis tools. Sequencing generated high-quality contigs (over 2.53 Mbp) that were comparable to the only reported assembly using C. bovis DSM 20582T (97.8 ± 0.36% completeness). The number of protein-coding DNA sequences (2,174 ± 12.4) was similar among all isolates. A Corynebacterium genus neighbor-joining tree was created, which revealed Corynebacterium falsenii as the nearest neighbor to C. bovis (95.87% similarity), although the reciprocal comparison shows Corynebacterium jeikeium as closest neighbor to C. falsenii. Interestingly, the average nucleotide identity demonstrated that the C. bovis isolates clustered by host, with human and bovine isolates clustering together, and the mouse and rat isolates forming a separate group. The average number of genomic islands and putative virulence factors were significantly higher (p<0.001) in the mouse and rat isolates as compared to human/bovine isolates. Corynebacterium bovis' pan-genome contained a total of 3,067 genes of which 1,354 represented core genes. The known core genes of all isolates were primarily related to ''metabolism" and ''information storage/processing." However, most genes were classified as ''function unknown" or "unclassified". Surprisingly, no intact prophages were found in any isolate; however, almost all isolates had at least one complete CRISPR-Cas system.

RevDate: 2018-12-25

Velsko IM, Chakraborty B, Nascimento MM, et al (2018)

Species Designations Belie Phenotypic and Genotypic Heterogeneity in Oral Streptococci.

mSystems, 3(6): pii:mSystems00158-18.

Health-associated oral Streptococcus species are promising probiotic candidates to protect against dental caries. Ammonia production through the arginine deiminase system (ADS), which can increase the pH of oral biofilms, and direct antagonism of caries-associated bacterial species are desirable properties for oral probiotic strains. ADS and antagonistic activities can vary dramatically among individuals, but the genetic basis for these differences is unknown. We sequenced whole genomes of a diverse set of clinical oral Streptococcus isolates and examined the genetic basis of variability in ADS and antagonistic activities. A total of 113 isolates were included and represented 10 species: Streptococcus australis, A12-like, S. cristatus, S. gordonii, S. intermedius, S. mitis, S. oralis including S. oralis subsp. dentisani, S. parasanguinis, S. salivarius, and S. sanguinis. Mean ADS activity and antagonism on Streptococcus mutans UA159 were measured for each isolate, and each isolate was whole genome shotgun sequenced on an Illumina MiSeq. Phylogenies were built of genes known to be involved in ADS activity and antagonism. Several approaches to correlate the pan-genome with phenotypes were performed. Phylogenies of genes previously identified in ADS activity and antagonism grouped isolates by species, but not by phenotype. A genome-wide association study (GWAS) identified additional genes potentially involved in ADS activity or antagonism across all the isolates we sequenced as well as within several species. Phenotypic heterogeneity in oral streptococci is not necessarily reflected by genotype and is not species specific. Probiotic strains must be carefully selected based on characterization of each strain and not based on inclusion within a certain species. IMPORTANCE Representative type strains are commonly used to characterize bacterial species, yet species are phenotypically and genotypically heterogeneous. Conclusions about strain physiology and activity based on a single strain therefore may be inappropriate and misleading. When selecting strains for probiotic use, the assumption that all strains within a species share the same desired probiotic characteristics may result in selection of a strain that lacks the desired traits, and therefore makes a minimally effective or ineffective probiotic. Health-associated oral streptococci are promising candidates for anticaries probiotics, but strains need to be carefully selected based on observed phenotypes. We characterized the genotypes and anticaries phenotypes of strains from 10 species of oral streptococci and demonstrate poor correlation between genotype and phenotype across all species.

RevDate: 2019-01-08

Potter RF, Lainhart W, Twentyman J, et al (2018)

Population Structure, Antibiotic Resistance, and Uropathogenicity of Klebsiella variicola.

mBio, 9(6): pii:mBio.02481-18.

Klebsiella variicola is a member of the Klebsiella genus and often misidentified as Klebsiella pneumoniae or Klebsiella quasipneumoniae The importance of K. pneumoniae human infections has been known; however, a dearth of relative knowledge exists for K. variicola Despite its growing clinical importance, comprehensive analyses of K. variicola population structure and mechanistic investigations of virulence factors and antibiotic resistance genes have not yet been performed. To address this, we utilized in silico, in vitro, and in vivo methods to study a cohort of K. variicola isolates and genomes. We found that the K. variicola population structure has two distant lineages composed of two and 143 genomes, respectively. Ten of 145 K. variicola genomes harbored carbapenem resistance genes, and 6/145 contained complete virulence operons. While the β-lactam blaLEN and quinolone oqxAB antibiotic resistance genes were generally conserved within our institutional cohort, unexpectedly 11 isolates were nonresistant to the β-lactam ampicillin and only one isolate was nonsusceptible to the quinolone ciprofloxacin. K. variicola isolates have variation in ability to cause urinary tract infections in a newly developed murine model, but importantly a strain had statistically significant higher bladder CFU than the model uropathogenic K. pneumoniae strain TOP52. Type 1 pilus and genomic identification of altered fim operon structure were associated with differences in bladder CFU for the tested strains. Nine newly reported types of pilus genes were discovered in the K. variicola pan-genome, including the first identified P-pilus in Klebsiella spp.IMPORTANCE Infections caused by antibiotic-resistant bacterial pathogens are a growing public health threat. Understanding of pathogen relatedness and biology is imperative for tracking outbreaks and developing therapeutics. Here, we detail the phylogenetic structure of 145 K. variicola genomes from different continents. Our results have important clinical ramifications as high-risk antibiotic resistance genes are present in K. variicola genomes from a variety of geographic locations and as we demonstrate that K. variicola clinical isolates can establish higher bladder titers than K. pneumoniae Differential presence of these pilus genes inK. variicola isolates may indicate adaption for specific environmental niches. Therefore, due to the potential of multidrug resistance and pathogenic efficacy, identification of K. variicola and K. pneumoniae to a species level should be performed to optimally improve patient outcomes during infection. This work provides a foundation for our improved understanding of K. variicola biology and pathogenesis.

RevDate: 2019-01-10

Pang TY, MJ Lercher (2019)

Each of 3,323 metabolic innovations in the evolution of E. coli arose through the horizontal transfer of a single DNA segment.

Proceedings of the National Academy of Sciences of the United States of America, 116(1):187-192.

Even closely related prokaryotes often show an astounding diversity in their ability to grow in different nutritional environments. It has been hypothesized that complex metabolic adaptations-those requiring the independent acquisition of multiple new genes-can evolve via selectively neutral intermediates. However, it is unclear whether this neutral exploration of phenotype space occurs in nature, or what fraction of metabolic adaptations is indeed complex. Here, we reconstruct metabolic models for the ancestors of a phylogeny of 53 Escherichia coli strains, linking genotypes to phenotypes on a genome-wide, macroevolutionary scale. Based on the ancestral and extant metabolic models, we identify 3,323 phenotypic innovations in the history of the E. coli clade that arose through changes in accessory genome content. Of these innovations, 1,998 allow growth in previously inaccessible environments, while 1,325 increase biomass yield. Strikingly, every observed innovation arose through the horizontal acquisition of a single DNA segment less than 30 kb long. Although we found no evidence for the contribution of selectively neutral processes, 10.6% of metabolic innovations were facilitated by horizontal gene transfers on earlier phylogenetic branches, consistent with a stepwise adaptation to successive environments. Ninety-eight percent of metabolic phenotypes accessible to the combined E. coli pangenome can be bestowed on any individual strain by transferring a single DNA segment from one of the extant strains. These results demonstrate an amazing ability of the E. coli lineage to adapt to novel environments through single horizontal gene transfers (followed by regulatory adaptations), an ability likely mirrored in other clades of generalist bacteria.

RevDate: 2019-02-06

Jatuponwiphat T, Chumnanpuen P, Othman S, et al (2019)

Iron-associated protein interaction networks reveal the key functional modules related to survival and virulence of Pasteurella multocida.

Microbial pathogenesis, 127:257-266.

Pasteurella multocida causes respiratory infectious diseases in a multitude of birds and mammals. A number of virulence-associated genes were reported across different strains of P. multocida, including those involved in the iron transport and metabolism. Comparative iron-associated genes of P. multocida among different animal hosts towards their interaction networks have not been fully revealed. Therefore, this study aimed to identify the iron-associated genes from core- and pan-genomes of fourteen P. multocida strains and to construct iron-associated protein interaction networks using genome-scale network analysis which might be associated with the virulence. Results showed that these fourteen strains had 1587 genes in the core-genome and 3400 genes constituting their pan-genome. Out of these, 2651 genes associated with iron transport and metabolism were selected to construct the protein interaction networks and 361 genes were incorporated into the iron-associated protein interaction network (iPIN) consisting of nine different iron-associated functional modules. After comparing with the virulence factor database (VFDB), 21 virulence-associated proteins were determined and 11 of these belonged to the heme biosynthesis module. From this study, the core heme biosynthesis module and the core outer membrane hemoglobin receptor HgbA were proposed as candidate targets to design novel antibiotics and vaccines for preventing pasteurellosis across the serotypes or animal hosts for enhanced precision agriculture to ensure sustainability in food security.

RevDate: 2019-02-05

Moradigaravand D, Palm M, Farewell A, et al (2018)

Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data.

PLoS computational biology, 14(12):e1006258 pii:PCOMPBIOL-D-18-00901.

The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81-0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.

RevDate: 2018-12-17

Colson P, Levasseur A, La Scola B, et al (2018)

Ancestrality and Mosaicism of Giant Viruses Supporting the Definition of the Fourth TRUC of Microbes.

Frontiers in microbiology, 9:2668.

Giant viruses of amoebae were discovered in 2003. Since then, their diversity has greatly expanded. They were suggested to form a fourth branch of life, collectively named 'TRUC' (for "Things Resisting Uncompleted Classifications") alongside Bacteria, Archaea, and Eukarya. Their origin and ancestrality remain controversial. Here, we specify the evolution and definition of giant viruses. Phylogenetic and phenetic analyses of informational gene repertoires of giant viruses and selected bacteria, archaea and eukaryota were performed, including structural phylogenomics based on protein structural domains grouped into 289 universal fold superfamilies (FSFs). Hierarchical clustering analysis was performed based on a binary presence/absence matrix constructed using 727 informational COGs from cellular organisms. The presence/absence of 'universal' FSF domains was used to generate an unrooted maximum parsimony phylogenomic tree. Comparison of the gene content of a giant virus with those of a bacterium, an archaeon, and a eukaryote with small genomes was also performed. Overall, both cladistic analyses based on gene sequences of very central and ancient proteins and on highly conserved protein fold structures as well as phenetic analyses were congruent regarding the delineation of a fourth branch of microbes comprised by giant viruses. Giant viruses appeared as a basal group in the tree of all proteomes. A pangenome and core genome determined for Rickettsia bellii (bacteria), Methanomassiliicoccus luminyensis (archaeon), Encephalitozoon intestinalis (eukaryote), and Tupanvirus (giant virus) showed a substantial proportion of Tupanvirus genes that overlap with those of the cellular microbes. In addition, a substantial genome mosaicism was observed, with 51, 11, 8, and 0.2% of Tupanvirus genes best matching with viruses, eukaryota, bacteria, and archaea, respectively. Finally, we found that genes themselves may be subject to lateral sequence transfers. In summary, our data highlight the quantum leap between classical and giant viruses. Phylogenetic and phyletic analyses and the study of protein fold superfamilies confirm previous evidence of the existence of a fourth TRUC of life that includes giant viruses, and highlight its ancestrality and mosaicism. They also point out that best evolutionary representations for giant viruses and cellular microorganisms are rhizomes, and that sequence transfers rather than gene transfers have to be considered.

RevDate: 2018-12-17

Lees JA, Galardini M, Bentley SD, et al (2018)

pyseer: a comprehensive tool for microbial pangenome-wide association studies.

Bioinformatics (Oxford, England), 34(24):4310-4312.

Summary: Genome-wide association studies (GWAS) in microbes have different challenges to GWAS in eukaryotes. These have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.

pyseer is written in python and is freely available at, or can be installed through pip. Documentation and a tutorial are available at

Supplementary information: Supplementary data are available at Bioinformatics online.

RevDate: 2019-01-22

Fritsch L, Felten A, Palma F, et al (2019)

Insights from genome-wide approaches to identify variants associated to phenotypes at pan-genome scale: Application to L. monocytogenes' ability to grow in cold conditions.

International journal of food microbiology, 291:181-188.

Intraspecific variability of the behavior of most foodborne pathogens is well described and taken into account in Quantitative Microbial Risk Assessment (QMRA), but factors (strain origin, serotype, …) explaining these differences are scarce or contradictory between studies. Nowadays, Whole Genome Sequencing (WGS) offers new opportunities to explain intraspecific variability of food pathogens, based on various recently published bioinformatics tools. The objective of this study is to get a better insight into different existing bioinformatics approaches to associate bacterial phenotype(s) and genotype(s). Therefore, a dataset of 51 L. monocytogenes strains, isolated from multiple sources (i.e. different food matrices and environments) and belonging to 17 clonal complexes (CC), were selected to represent large population diversity. Furthermore, the phenotypic variability of growth at low temperature was determined (i.e. qualitative phenotype), and the whole genomes of selected strains were sequenced. The almost exhaustive gene content, as well as the core genome SNPs based phylogenetic reconstruction, were derived from the whole sequenced genomes. A Bayesian inference method was applied to identify the branches on which the phenotype distribution evolves within sub-lineages. Two different Genome Wide Association Studies (i.e. gene- and SNP-based GWAS) were independently performed in order to link genetic mutations to the phenotype of interest. The genomic analyses presented in this study were successfully applied on the selected dataset. The Bayesian phylogenetic approach emphasized an association with "slow" growth ability at 2 °C of the lineage I, as well as CC9 of the lineage II. Moreover, both gene- and SNP-GWAS approaches displayed significant statistical associations with the tested phenotype. A list of 114 significantly associated genes, including genes already known to be involved in the cold adaption mechanism of L. monocytogenes and genes associated to mobile genetic elements (MGE), resulted from the gene-GWAS. On the other hand, a group of 184 highly associated SNPs were highlighted by SNP-GWAS, including SNPs detected in genes which were already likely involved in cold adaption; hypothetical proteins; and intergenic regions where for example promotors and regulators can be located. The successful application of combined bioinformatics approaches associating WGS-genotypes and specific phenotypes, could contribute to improve prediction of microbial behaviors in food. The implementation of this information in hazard identification and exposure assessment processes will open new possibilities to feed QMRA-models.

RevDate: 2018-12-17

Timms VJ, Nguyen T, Crighton T, et al (2018)

Genome-wide comparison of Corynebacterium diphtheriae isolates from Australia identifies differences in the Pan-genomes between respiratory and cutaneous strains.

BMC genomics, 19(1):869.

BACKGROUND: Corynebacterium diphtheriae is the main etiological agent of diphtheria, a global disease causing life-threatening infections, particularly in infants and children. Vaccination with diphtheria toxoid protects against infection with potent toxin producing strains. However a growing number of apparently non-toxigenic but potentially invasive C. diphtheriae strains are identified in countries with low prevalence of diphtheria, raising key questions about genomic structures and population dynamics of the species. This study examined genomic diversity among 48 C. diphtheriae isolates collected in Australia over a 12-year period using whole genome sequencing. Phylogeny was determined using SNP-based mapping and genome wide analysis.

RESULTS: C. diphtheriae sequence type (ST) 32, a non-toxigenic clone with evidence of enhanced virulence that has been also circulating in Europe, appears to be endemic in Australia. Isolates from temporospatially related patients displayed the same ST and similarity in their core genomes. The genome-wide analysis highlighted a role of pilins, adhesion factors and iron utilization in infections caused by non-toxigenic strains.

CONCLUSIONS: The genomic diversity of toxigenic and non-toxigenic strains of C. diphtheriae in Australia suggests multiple sources of infection and colonisation. Genomic surveillance of co-circulating toxigenic and non-toxigenic C. diphtheriae offer new insights into the evolution and virulence of pathogenic clones and can inform targeted public health actions and policy. The genomes presented in this investigation will contribute to the global surveillance of C. diphtheriae both for the monitoring of antibiotic resistance genes and virulent strains such as those belonging to ST32.

RevDate: 2019-01-07
CmpDate: 2019-01-07

Bonnici V, Giugno R, V Manca (2018)

PanDelos: a dictionary-based method for pan-genome content discovery.

BMC bioinformatics, 19(Suppl 15):437.

BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations.

RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm.

CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at .

RevDate: 2019-01-18

Freschi L, Vincent AT, Jeukens J, et al (2019)

The Pseudomonas aeruginosa Pan-Genome Provides New Insights on Its Population Structure, Horizontal Gene Transfer, and Pathogenicity.

Genome biology and evolution, 11(1):109-120 pii:5215156.

The huge increase in the availability of bacterial genomes led us to a point in which we can investigate and query pan-genomes, for example, the full set of genes of a given bacterial species or clade. Here, we used a data set of 1,311 high-quality genomes from the human pathogen Pseudomonas aeruginosa, 619 of which were newly sequenced, to show that a pan-genomic approach can greatly refine the population structure of bacterial species, provide new insights to define species boundaries, and generate hypotheses on the evolution of pathogenicity. The 665-gene P. aeruginosa core genome presented here, which constitutes only 1% of the entire pan-genome, is the first to be in the same order of magnitude as the minimal bacterial genome and represents a conservative estimate of the actual core genome. Moreover, the phylogeny based on this core genome provides strong evidence for a five-group population structure that includes two previously undescribed groups of isolates. Comparative genomics focusing on antimicrobial resistance and virulence genes showed that variation among isolates was partly linked to this population structure. Finally, we hypothesized that horizontal gene transfer had an important role in this respect, and found a total of 3,010 putative complete and fragmented plasmids, 5% and 12% of which contained resistance or virulence genes, respectively. This work provides data and strategies to study the evolutionary trajectories of resistance and virulence in P. aeruginosa.

RevDate: 2019-01-02
CmpDate: 2019-01-02

Méric G, Mageiros L, Pensar J, et al (2018)

Disease-associated genotypes of the commensal skin bacterium Staphylococcus epidermidis.

Nature communications, 9(1):5034.

Some of the most common infectious diseases are caused by bacteria that naturally colonise humans asymptomatically. Combating these opportunistic pathogens requires an understanding of the traits that differentiate infecting strains from harmless relatives. Staphylococcus epidermidis is carried asymptomatically on the skin and mucous membranes of virtually all humans but is a major cause of nosocomial infection associated with invasive procedures. Here we address the underlying evolutionary mechanisms of opportunistic pathogenicity by combining pangenome-wide association studies and laboratory microbiology to compare S. epidermidis from bloodstream and wound infections and asymptomatic carriage. We identify 61 genes containing infection-associated genetic elements (k-mers) that correlate with in vitro variation in known pathogenicity traits (biofilm formation, cell toxicity, interleukin-8 production, methicillin resistance). Horizontal gene transfer spreads these elements, allowing divergent clones to cause infection. Finally, Random Forest model prediction of disease status (carriage vs. infection) identifies pathogenicity elements in 415 S. epidermidis isolates with 80% accuracy, demonstrating the potential for identifying risk genotypes pre-operatively.

RevDate: 2018-12-07

Wang R, Li L, Huang T, et al (2018)

Capsular Switching and ICE Transformation Occurred in Human Streptococcus agalactiae ST19 With High Pathogenicity to Fish.

Frontiers in veterinary science, 5:281.

Although Streptococcus agalactiae (GBS) cross-infection between human and fish has been confirmed in experimental and clinical studies, the mechanisms underlying GBS cross-species infection remain largely unclear. We have found different human GBS ST19 strains exhibiting strong or weak pathogenic to fish (sGBS and wGBS). In this study, our objective was to identify the genetic elements responsible for GBS cross species infection based on genome sequence data and comparative genomics. The genomes of 11 sGBS strains and 11 wGBS strains were sequenced, and the genomic analysis was performed base on pan-genome, CRISPRs, phylogenetic reconstruction and genome comparison. The results from the pan-genome, CRISPRs analysis and phylogenetic reconstruction indicated that genomes between sGBS were more conservative than that of wGBS. The genomic differences between sGBS and wGBS were primarily in the Cps region (about 111 kb) and its adjacent ICE region (about 106 kb). The Cps region included the entire cps operon, and all sGBS were capsular polysaccharide (CPS) type V, while all wGBS were CPS type III. The ICE region of sGBS contained integrative and conjugative elements (ICE) with IQ element and erm(TR), and was very conserved, whereas the ICE region of wGBS contained ICE with mega elements and the variation was large. The capsular switching (III-V) and transformation of ICE adjacent to the Cps region occurred in human GBS ST19 with different pathogenicity to fish, which may be related to the capability of GBS cross-infection.

RevDate: 2018-12-07

Gallo G, Presta L, Perrin E, et al (2018)

Genomic traits of Klebsiella oxytoca DSM 29614, an uncommon metal-nanoparticle producer strain isolated from acid mine drainages.

BMC microbiology, 18(1):198.

BACKGROUND: Klebsiella oxytoca DSM 29614 - isolated from acid mine drainages - grows anaerobically using Fe(III)-citrate as sole carbon and energy source, unlike other enterobacteria and K. oxytoca clinical isolates. The DSM 29614 strain is multi metal resistant and produces metal nanoparticles that are embedded in its very peculiar capsular exopolysaccharide. These metal nanoparticles were effective as antimicrobial and anticancer compounds, chemical catalysts and nano-fertilizers.

RESULTS: The DSM 29614 strain genome was sequenced and analysed by a combination of in silico procedures. Comparative genomics, performed between 85 K. oxytoca representatives and K. oxytoca DSM 29614, revealed that this bacterial group has an open pangenome, characterized by a very small core genome (1009 genes, about 2%), a high fraction of unique (43,808 genes, about 87%) and accessory genes (5559 genes, about 11%). Proteins belonging to COG categories "Carbohydrate transport and metabolism" (G), "Amino acid transport and metabolism" (E), "Coenzyme transport and metabolism" (H), "Inorganic ion transport and metabolism" (P), and "membrane biogenesis-related proteins" (M) are particularly abundant in the predicted proteome of DSM 29614 strain. The results of a protein functional enrichment analysis - based on a previous proteomic analysis - revealed metabolic optimization during Fe(III)-citrate anaerobic utilization. In this growth condition, the observed high levels of Fe(II) may be due to different flavin metal reductases and siderophores as inferred form genome analysis. The presence of genes responsible for the synthesis of exopolysaccharide and for the tolerance to heavy metals was highlighted too. The inferred genomic insights were confirmed by a set of phenotypic tests showing specific metabolic capability in terms of i) Fe2+ and exopolysaccharide production and ii) phosphatase activity involved in precipitation of metal ion-phosphate salts.

CONCLUSION: The K. oxytoca DSM 29614 unique capabilities of using Fe(III)-citrate as sole carbon and energy source in anaerobiosis and tolerating diverse metals coincides with the presence at the genomic level of specific genes that can support i) energy metabolism optimization, ii) cell protection by the biosynthesis of a peculiar exopolysaccharide armour entrapping metal ions and iii) general and metal-specific detoxifying activities by different proteins and metabolites.

RevDate: 2018-11-22

Abudahab K, Prada JM, Yang Z, et al (2018)

PANINI: Pangenome Neighbour Identification for Bacterial Populations.

Microbial genomics [Epub ahead of print].

The standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here, we introduce panini (Pangenome Neighbour Identification for Bacterial Populations), a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding based on the t-SNE (t-distributed stochastic neighbour embedding) algorithm. panini is browser-based and integrates with the Microreact platform for rapid online visualization and exploration of both core and accessory genome evolutionary signals, together with relevant epidemiological, geographical, temporal and other metadata. Several case studies with single- and multi-clone pneumococcal populations are presented to demonstrate the ability to identify biologically important signals from gene content data. panini is available at and code at

RevDate: 2018-12-07

Wang D, Li J, L Wang (2018)

Comprehensive study of instable regions in Pseudomonas aeruginosa and Mycobacterium tuberculosis.

Biomedical engineering online, 17(Suppl 1):133.

BACKGROUND: Pseudomonas aeruginosa is a common bacterium which is recognized for its association with hospital-acquired infections and its advanced antibiotic resistance mechanisms. Tuberculosis, one of the major causes of mortality, is initiated by the deposition of Mycobacterium tuberculosis. Accessory sequences shared by a subset of strains of a species play an important role in a species' evolution, antibiotic resistance and infectious potential.

RESULTS: Here, with a multiple sequence aligner, we segmented 25 P. aeruginosa genomes and 28 M. tuberculosis genomes into core blocks (include sequences shared by all the input genomes) and dispensable blocks (include sequences shared by a subset of the input genomes), respectively. For each input genome, we then constructed a scaffold consisting of its core and dispensable blocks sorted by blocks' locations on the chromosomes. Consecutive dispensable blocks on these scaffold formed instable regions. After a comprehensive study of these instable regions, three characteristics of instable regions are summarized: instable regions were short, site specific and varied in different strains. Three DNA elements (directed repeats (DRs), transposons and integrons) were then studied to see whether these DNA elements are associated with the variation of instable regions. A pipeline was developed to search for DR pairs on the flank of every instable sequence. 27 DR pairs in P. aeruginosa strains and 6 pairs in M. tuberculosis strains were found to exist in the instable regions. On the average, 14% and 12% of instable regions in P. aeruginosa strains covered transposase genes and integrase genes, respectively. In M. tuberculosis strains, an average of 43% and 8% of instable regions contain transposase genes and integrase genes, respectively.

CONCLUSIONS: Instable regions were short, site specific and varied in different strains for both P. aeruginosa and M. tuberculosis. Our experimental results showed that DRs, transposons and integrons may be associated with variation of instable regions.

RevDate: 2019-01-24

Sherman RM, Forman J, Antonescu V, et al (2019)

Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Nature genetics, 51(1):30-35.

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.

RevDate: 2018-11-26

Kayansamruaj P, Soontara C, Unajak S, et al (2018)

Comparative genomics inferred two distinct populations of piscine pathogenic Streptococcus agalactiae, serotype Ia ST7 and serotype III ST283, in Thailand and Vietnam.

Genomics pii:S0888-7543(18)30485-3 [Epub ahead of print].

The genomes of Streptococcus agalactiae (group B streptococcus; GBS) collected from diseased fish in Thailand and Vietnam over a nine-year period (2008-2016) were sequenced and compared (n = 21). Based on capsular serotype and multilocus sequence typing (MLST), GBS isolates are divided into 2 groups comprised of i) serotype Ia; sequence type (ST)7 and ii) serotype III; ST283. Population structure inferred by core genome (cg)MLST and Bayesian clustering analysis also strongly indicated distribution of two GBS populations in both Thailand and Vietnam. Deep phylogenetic analysis implied by CRISPR array's spacer diversity was able to cluster GBS isolates according to their temporal and geographic origins, though ST7 has varying CRISPR1-spacer profiles when compared to ST283 strains. Based on overall genotypic features, Thai ST283 strains were closely related to the Singaporean ST283 strain causing foodborne illness in humans in 2015, thus, signifying zoonotic potential of this GBS population in the country.

RevDate: 2018-12-18

Monat C, Schreiber M, Stein N, et al (2018)

Prospects of pan-genomics in barley.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik pii:10.1007/s00122-018-3234-z [Epub ahead of print].

The concept of a pan-genome refers to intraspecific diversity in genome content and structure, encompassing both genes and intergenic space. Pan-genomic studies employ a combination of de novo sequence assembly and reference-based alignment to discover and genotype structural variants. The large size and complex structure of Triticeae genomes were for a long time an obstacle for genomic research in barley and its relatives. Now that a reference genome is available, computational pipelines for high-quality sequence assembly are in place, and sequence costs continue to drop, investigations into the structural diversity of the barley genome seem within reach. Here, we review the recent progress on pan-genomics in the model grass Brachypodium distachyon, and the cereal crops rice and maize, and devise a multi-tiered strategy for a pan-genome project in barley. Our design involves: (1) the construction of high-quality de novo sequence assemblies for a small core set of representative genotypes, (2) short-read sequencing of a large diversity panel of genebank accessions to medium coverage and (3) the use of complementary methods such as chromosome-conformation capture sequencing and k-mer-based association genetics. The in silico representation of the barley pan-genome may inform about the mechanisms of structural genome evolution in the Triticeae and supplement quantitative genetics models of crop performance for better accuracy and predictive ability.

RevDate: 2018-11-23

Mohapatra B, Kazy SK, P Sar (2018)

Comparative genome analysis of arsenic reducing, hydrocarbon metabolizing groundwater bacterium Achromobacter sp. KAs 3-5T explains its competitive edge for survival in aquifer environment.

Genomics pii:S0888-7543(18)30421-X [Epub ahead of print].

Whole genome sequence of arsenic (As) reducing, hydrocarbon metabolizing groundwater bacterium Achromobacter sp. KAs 3-5T was explored to understand the genomic basis of its As-ecophysiology and niche adaptation in aquifer environment. The genome (5.6 Mbp, 65.5 G + C mol %) encodes 4840 proteins, 1138 enzymes, 53 tRNAs, 11 rRNAs, 608 signal peptides, and 1.13% horizontally transferred genes. Presence of genes encoding cytosolic As5+-reduction (arsRCBH, ACR3), aromatics utilization (bph, naph, catABC, boxABCD, genACB), Fe-transformation (tonB, achromobactin, FUR, FeR), and denitrification (nar, nap) processes were observed and validated through proteomics. Phylogenomic analysis (< 90% ANI, < 50% DDH) confirmed strain KAs 3-5T to be a novel representative of the genus Achromobacter. An asymptotic open pan-genome (20,855 genes) and high correlation between genomic and ecological diversity suggested niche preference ability of this genus. Assemblage of species specific genes affiliated to transcription-regulation, membrane transport, and redox-transformation explained the strain's competitive survival strategies in As-rich oligotrophic groundwater.

RevDate: 2018-12-07

Fontana A, Zacconi C, L Morelli (2018)

Genetic Signatures of Dairy Lactobacillus casei Group.

Frontiers in microbiology, 9:2611.

Lactobacillus casei/Lactobacillus paracasei group of species contains strains adapted to a wide range of environments, from dairy products to intestinal tract of animals and fermented vegetables. Understanding the gene acquisitions and losses that induced such different adaptations, implies a comparison between complete genomes, since evolutionary differences spread on the whole sequence. This study compared 12 complete genomes of L. casei/paracasei dairy-niche isolates and 7 genomes of L. casei/paracasei isolated from other habitats (i.e., corn silage, human intestine, sauerkraut, beef, congee). Phylogenetic tree construction and average nucleotide identity (ANI) metric showed a clustering of the two dairy L. casei strains ATCC393 and LC5, indicating a lower genetic relatedness in comparison to the other strains. Genomic analysis revealed a core of 313 genes shared by dairy and non-dairy Lactic Acid bacteria (LAB), within a pan-genome of 9,462 genes. Functional category analyses highlighted the evolutionary genes decay of dairy isolates, particularly considering carbohydrates and amino acids metabolisms. Specifically, dairy L. casei/paracasei strains lost the ability to metabolize myo-inositol and taurine (i.e., iol and tau gene clusters). However, gene acquisitions by dairy strains were also highlighted, mostly related to defense mechanisms and host-pathogen interactions (i.e., yueB, esaA, and sle1). This study aimed to be a preliminary investigation on dairy and non-dairy marker genes that could be further characterized for probiotics or food applications.

RevDate: 2018-12-07

Hiller NL, R Sá-Leão (2018)

Puzzling Over the Pneumococcal Pangenome.

Frontiers in microbiology, 9:2580.

The Gram positive bacterium Streptococcus pneumoniae (pneumococcus) is a major human pathogen. It is a common colonizer of the human host, and in the nasopharynx, sinus, and middle ear it survives as a biofilm. This mode of growth is optimal for multi-strain colonization and genetic exchange. Over the last decades, the far-reaching use of antibiotics and the widespread implementation of pneumococcal multivalent conjugate vaccines have posed considerable selective pressure on pneumococci. This scenario provides an exceptional opportunity to study the evolution of the pangenome of a clinically important bacterium, and has the potential to serve as a case study for other species. The goal of this review is to highlight key findings in the studies of pneumococcal genomic diversity and plasticity.

RevDate: 2019-01-16

Biessy A, Novinscak A, Blom J, et al (2019)

Diversity of phytobeneficial traits revealed by whole-genome analysis of worldwide-isolated phenazine-producing Pseudomonas spp.

Environmental microbiology, 21(1):437-455.

Plant-beneficial Pseudomonas spp. competitively colonize the rhizosphere and display plant-growth promotion and/or disease-suppression activities. Some strains within the P. fluorescens species complex produce phenazine derivatives, such as phenazine-1-carboxylic acid. These antimicrobial compounds are broadly inhibitory to numerous soil-dwelling plant pathogens and play a role in the ecological competence of phenazine-producing Pseudomonas spp. We assembled a collection encompassing 63 strains representative of the worldwide diversity of plant-beneficial phenazine-producing Pseudomonas spp. In this study, we report the sequencing of 58 complete genomes using PacBio RS II sequencing technology. Distributed among four subgroups within the P. fluorescens species complex, the diversity of our collection is reflected by the large pangenome which accounts for 25 413 protein-coding genes. We identified genes and clusters encoding for numerous phytobeneficial traits, including antibiotics, siderophores and cyclic lipopeptides biosynthesis, some of which were previously unknown in these microorganisms. Finally, we gained insight into the evolutionary history of the phenazine biosynthetic operon. Given its diverse genomic context, it is likely that this operon was relocated several times during Pseudomonas evolution. Our findings acknowledge the tremendous diversity of plant-beneficial phenazine-producing Pseudomonas spp., paving the way for comparative analyses to identify new genetic determinants involved in biocontrol, plant-growth promotion and rhizosphere competence.

RevDate: 2018-11-23

Nanayakkara BS, O'Brien CL, DM Gordon (2018)

Diversity and distribution of Klebsiella capsules in Escherichia coli.

Environmental microbiology reports [Epub ahead of print].

E. coli strains responsible for elevated counts (blooms) in freshwater reservoirs in Australia carry a capsule originating from Klebsiella. The occurrence of Klebsiella capsules in E. coli was about 7% overall and 23 different capsule types were detected. Capsules were observed in strains from phylogroups A, B1 and C, but were absent from phylogroup B2, D, E and F strains. In general, few A, B1 or C lineages were capsule-positive, but when a lineage was encapsulated multiple different capsule types were present. All Klebsiella capsule-positive strains were of serogroups O8, O9 and O89. Regardless of the phylogroup, O9 strains were more likely to be capsule-positive than O8 strains. Given the sequence similarity, it appears that both the capsule region and the O-antigen gene region are transferred to E. coli from Klebsiella as a single block via horizontal gene transfer events. Pan genome analysis indicated that there were only modest differences between encapsulated and non-encapsulated strains belonging to phylogroup A. The possession of a Klebsiella capsule, but not the type of capsule, is likely a key determinant of the bloom status of a strain.

RevDate: 2018-11-14

Al-Bassam MM, Haist J, Neumann SA, et al (2018)

Expression Patterns, Genomic Conservation and Input Into Developmental Regulation of the GGDEF/EAL/HD-GYP Domain Proteins in Streptomyces.

Frontiers in microbiology, 9:2524.

To proliferate, antibiotic-producing Streptomyces undergo a complex developmental transition from vegetative growth to the production of aerial hyphae and spores. This morphological switch is controlled by the signaling molecule cyclic bis-(3',5') di-guanosine-mono-phosphate (c-di-GMP) that binds to the master developmental regulator, BldD, leading to repression of key sporulation genes during vegetative growth. However, a systematical analysis of all the GGDEF/EAL/HD-GYP proteins that control c-di-GMP levels in Streptomyces is still lacking. Here, we have FLAG-tagged all 10 c-di-GMP turnover proteins in Streptomyces venezuelae and characterized their expression patterns throughout the life cycle, revealing that the diguanylate cyclase (DGC) CdgB and the phosphodiesterase (PDE) RmdB are the most abundant GGDEF/EAL proteins. Moreover, we have deleted all the genes coding for c-di-GMP turnover enzymes individually and analyzed morphogenesis of the mutants in macrocolonies. We show that the composite GGDEF-EAL protein CdgC is an active DGC and that deletion of the DGCs cdgB and cdgC enhance sporulation whereas deletion of the PDEs rmdA and rmdB delay development in S. venezuelae. By comparing the pan genome of 93 fully sequenced Streptomyces species we show that the DGCs CdgA, CdgB, and CdgC, and the PDE RmdB represent the most conserved c-di-GMP-signaling proteins in the genus Streptomyces.

RevDate: 2018-12-28

Pinto M, González-Díaz A, Machado MP, et al (2019)

Insights into the population structure and pan-genome of Haemophilus influenzae.

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases, 67:126-135.

The human-restricted bacterium Haemophilus influenzae is responsible for respiratory infections in both children and adults. While colonization begins in the upper airways, it can spread throughout the respiratory tract potentially leading to invasive infections. Although the spread of H. influenzae serotype b (Hib) has been prevented by vaccination, the emergence of infections by other serotypes as well as by non-typeable isolates (NTHi) have been observed, prompting the need for novel prevention strategies. Here, we aimed to study the population structure of H. influenzae and to get some insights into its pan-genome. We studied 305H. influenzae strains, enrolling 217 publicly available genomes, as well as 88 newly sequenced H. influenzae invasive strains isolated in Portugal, spanning a 24-year period. NTHi isolates presented a core-SNP-based genetic diversity about 10-fold higher than the one observed for Hib. The analysis of key factors involved in pathogenesis, such as lipooligosaccharides, hemagglutinating pili and High Molecular Weight-adhesins, suggests that NTHi shape its virulence repertoire, either by acquisition and loss of genes or by SNP-based diversification, likely towards host immune evasion and persistence. Discreet NTHi subpopulations structures are proposed based on core-genome supported with 17 candidate genetic markers identified in the accessory genome. Additionally, this study provides two bioinformatics tools for in silico rapid identification of H. influenzae serotypes and NTHi clades previously proposed, obviating laboratory-based demanding procedures. The present study constitutes an important genomic framework that could lay way for future studies on the genetic determinants underlying invasiveness and disease and population structure of H. influenzae.

RevDate: 2018-11-14

Wüthrich D, Irmler S, Berthoud H, et al (2018)

Conversion of Methionine to Cysteine in Lactobacillus paracasei Depends on the Highly Mobile cysK-ctl-cysE Gene Cluster.

Frontiers in microbiology, 9:2415.

Milk and dairy products are rich in nutrients and are therefore habitats for various microbiomes. However, the composition of nutrients can be quite diverse, in particular among the sulfur containing amino acids. In milk, methionine is present in a 25-fold higher abundance than cysteine. Interestingly, a fraction of strains of the species L. paracasei - a flavor-enhancing adjunct culture species - can grow in medium with methionine as the sole sulfur source. In this study, we focus on genomic and evolutionary aspects of sulfur dependence in L. paracasei strains. From 24 selected L. paracasei strains, 16 strains can grow in medium with methionine as sole sulfur source. We sequenced these strains to perform gene-trait matching. We found that one gene cluster - consisting of a cysteine synthase, a cystathionine lyase, and a serine acetyltransferase - is present in all strains that grow in medium with methionine as sole sulfur source. In contrast, strains that depend on other sulfur sources do not have this gene cluster. We expanded the study and searched for this gene cluster in other species and detected it in the genomes of many bacteria species used in the food production. The comparison to these species showed that two different versions of the gene cluster exist in L. paracasei which were likely gained in two distinct events of horizontal gene transfer. Additionally, the comparison of 62 L. paracasei genomes and the two versions of the gene cluster revealed that this gene cluster is mobile within the species.

RevDate: 2018-11-27

Fleshman A, Mullins K, Sahl J, et al (2018)

Corrigendum: Comparative pan-genomic analyses of Orientia tsutsugamushi reveal an exceptional model of bacterial evolution driving genomic diversity.

Microbial genomics, 4(10):.

RevDate: 2018-10-29

Franz E, Rotariu O, Lopes BS, et al (2018)

Phylogeographic analysis reveals multiple international transmission events have driven the global emergence of Escherichia coli O157:H7.

Clinical infectious diseases : an official publication of the Infectious Diseases Society of America pii:5146342 [Epub ahead of print].

Background: Shiga toxin-producing Escherchia coli O157:H7 is a zoonotic pathogen which causes numerous food and waterborne disease outbreaks. It is globally distributed but its origin and temporal sequence of geographical spread is unknown.

Methods: We analysed Whole Genome Sequencing data of 757 isolates from 4 continents and performed a pan genome analysis to identify the core genome and from this extracted single nucleotide polymorphisms. Timed phylogeographic analysis was performed on a subset of the isolates to investigate it's worldwide spread.

Results: The common ancestor of this set of isolates occurred around 1890 (1845-1925) and originated from the Netherlands. Phylogeographic analysis identified 34 major transmission events. The earliest were predominantly intercontinental from Europe to Australia around 1937 (1909-1958), to USA in 1941 (1921-1962), to Canada in 1960 (1943-1979), and from Australia to New Zealand in 1966 (1943-1982). This pre-dates the first reported human case of E. coli O157:H7 in 1975 from the USA.

Conclusions: Inter- and intra- continental transmission events have resulted in the current international distribution of E. coli O157:H7 and it is likely that these events were facilitated by animal movements (e.g. Holstein Friesian cattle). These findings will inform policy on action that is crucial to reduce further spread of E. coli O157:H7 and other (emerging) STEC strains globally.

RevDate: 2019-01-08

De Filippis F, La Storia A, Villani F, et al (2019)

Strain-Level Diversity Analysis of Pseudomonas fragi after In Situ Pangenome Reconstruction Shows Distinctive Spoilage-Associated Metabolic Traits Clearly Selected by Different Storage Conditions.

Applied and environmental microbiology, 85(1): pii:AEM.02212-18.

Microbial spoilage of raw meat causes huge economic losses every year. An understanding of the microbial ecology associated with the spoilage and its dynamics during the refrigerated storage of meat can help in preventing and delaying the spoilage-related activities. The raw meat microbiota is usually complex, but only a few members will develop during storage and cause spoilage upon the pressure from several external factors, such as temperature and oxygen availability. We characterized the metagenome of beef packed aerobically or under vacuum during refrigerated storage to explore how different packaging conditions may influence the microbial composition and potential spoilage-associated activities. Different population dynamics and spoilage-associated genomic repertoires occurred in beef stored aerobically or in vacuum packaging. Moreover, the pangenomes of Pseudomonas fragi strains were extracted from metagenomes. We demonstrated the presence of specific, storage-driven strain-level profiles of Pseudomonas fragi, characterized by different gene repertoires and thus potentially able to act differently during meat spoilage. The results provide new knowledge on strain-level microbial ecology associated with meat spoilage and may be of value for future strategies of spoilage prevention and food waste reduction.IMPORTANCE This work provides insights on the mechanisms involved in raw beef spoilage during refrigerated storage and on the selective pressure exerted by the packaging conditions. We highlighted the presence of different microbial metagenomes during the spoilage of beef packaged aerobically or under vacuum. The packaging condition was able to select specific Pseudomonas fragi strains with distinctive genomic repertoires. This study may help in deciphering the behavior of different biomes directly in situ in food and in understanding the specific contribution of different strains to food spoilage.

RevDate: 2019-02-01
CmpDate: 2019-02-01

Hii SYF, Ahmad N, Hashim R, et al (2018)

A SNP-based phylogenetic analysis of Corynebacterium diphtheriae in Malaysia.

BMC research notes, 11(1):760.

OBJECTIVE: There is a lack of study in Corynebacterium diphtheriae isolates in Malaysia. The alarming surge of cases in year 2016 lead us to evaluate the local clinical C. diphtheriae strains in Malaysia. We conducted single nucleotide polymorphism phylogenetic analysis on the core and pan-genome as well as toxin and diphtheria toxin repressor (DtxR) genes of Malaysian C. diphtheriae isolates from the year 1986-2016.

RESULTS: The comparison between core and pan-genomic comparison showed variation in the distribution of C. diphtheriae. The local isolates portrayed a heterogenous trait and a close relationship between Malaysia's and Belarus's, Africa's and India's strains were observed. A toxigenic C. diphtheriae clone was noted to be circulating in the Malaysian population for nearly 30 years and from our study, the non-toxigenic and toxigenic C. diphtheriae strains can be differentiated significantly into two large clusters, A and B respectively. Analysis against vaccine strain, PW8 portrayed that the amino acid composition of toxin and DtxR in Malaysia's local strains are well-conserved and there was no functional defect noted. Hence, the change in efficacy of the currently used toxoid vaccine is unlikely to occur.

RevDate: 2018-11-14

Subedi D, Vijay AK, Kohli GS, et al (2018)

Comparative genomics of clinical strains of Pseudomonas aeruginosa strains isolated from different geographic sites.

Scientific reports, 8(1):15668.

The large and complex genome of Pseudomonas aeruginosa, which consists of significant portions (up to 20%) of transferable genetic elements contributes to the rapid development of antibiotic resistance. The whole genome sequences of 22 strains isolated from eye and cystic fibrosis patients in Australia and India between 1992 and 2007 were used to compare genomic divergence and phylogenetic relationships as well as genes for antibiotic resistance and virulence factors. Analysis of the pangenome indicated a large variation in the size of accessory genome amongst 22 stains and the size of the accessory genome correlated with number of genomic islands, insertion sequences and prophages. The strains were diverse in terms of sequence type and dissimilar to that of global epidemic P. aeruginosa clones. Of the eye isolates, 62% clustered together within a single lineage. Indian eye isolates possessed genes associated with resistance to aminoglycoside, beta-lactams, sulphonamide, quaternary ammonium compounds, tetracycline, trimethoprims and chloramphenicols. These genes were, however, absent in Australian isolates regardless of source. Overall, our results provide valuable information for understanding the genomic diversity of P. aeruginosa isolated from two different infection types and countries.

RevDate: 2018-11-14

Chaudhari NM, Gautam A, Gupta VK, et al (2018)

PanGFR-HM: A Dynamic Web Resource for Pan-Genomic and Functional Profiling of Human Microbiome With Comparative Features.

Frontiers in microbiology, 9:2322.

The conglomerate of microorganisms inhabiting various body-sites of human, known as the human microbiome, is one of the key determinants of human health and disease. Comprehensive pan-genomic and functional analysis approach for human microbiome components can enrich our understanding about impact of microbiome on human health. By utilizing this approach we developed PanGFR-HM ( - a novel dynamic web-resource that integrates genomic and functional characteristics of 1293 complete microbial genomes available from Human Microbiome Project. The resource allows users to explore genomic/functional diversity and genome-based phylogenetic relationships between human associated microbial genomes, not provided by any other resource. The key features implemented here include pan-genome and functional analysis of organisms based on taxonomy or body-site, and comparative analysis between groups of organisms. The first feature can also identify probable gene-loss events and significantly over/under represented KEGG/COG categories within pan-genome. The unique second feature can perform comparative genomic, functional and pathways analysis between 4 groups of microbes. The dynamic nature of this resource enables users to define parameters for orthologous clustering and to select any set of organisms for analysis. As an application for comparative feature of PanGFR-HM, we performed a comparative analysis with 67 Lactobacillus genomes isolated from human gut, oral cavity and urogenital tract, and therefore characterized the body-site specific genes, enzymes and pathways. Altogether, PanGFR-HM, being unique in its content and functionality, is expected to provide a platform for microbiome-based comparative functional and evolutionary genomics.

RevDate: 2019-02-03

Johnson TJ, Elnekave E, Miller EA, et al (2019)

Phylogenomic Analysis of Extraintestinal Pathogenic Escherichia coli Sequence Type 1193, an Emerging Multidrug-Resistant Clonal Group.

Antimicrobial agents and chemotherapy, 63(1): pii:AAC.01913-18.

The fluoroquinolone-resistant sequence type 1193 (ST1193) of Escherichia coli, from the ST14 clonal complex (STc14) within phylogenetic group B2, has appeared recently as an important cause of extraintestinal disease in humans. Although this emerging lineage has been characterized to some extent using conventional methods, it has not been studied extensively at the genomic level. Here, we used whole-genome sequence analysis to compare 355 ST1193 isolates with 72 isolates from other STs within STc14. Using core genome phylogeny, the ST1193 isolates formed a tightly clustered clade with many genotypic similarities, unlike ST14 isolates. All ST1193 isolates possessed the same set of three chromosomal mutations conferring fluoroquinolone resistance, carried the fimH64 allele, and were lactose non-fermenting. Analysis revealed an evolutionary progression from K1 to K5 capsular types and acquisition of an F-type virulence plasmid, followed by changes in plasmid structure congruent with genome phylogeny. In contrast, the numerous identified antimicrobial resistance genes were distributed incongruently with the underlying phylogeny, suggesting frequent gain or loss of the corresponding resistance gene cassettes despite retention of the presumed carrier plasmids. Pangenome analysis revealed gains and losses of genetic loci occurring during the transition from ST14 to ST1193 and from the K1 to K5 capsular types. Using time-scaled phylogenetic analysis, we estimated that current ST1193 clades first emerged approximately 25 years ago. Overall, ST1193 appears to be a recently emerged clone in which both stepwise and mosaic evolution have contributed to epidemiologic success.

RevDate: 2018-10-20

Bettgenhaeuser J, SG Krattinger (2018)

Rapid gene cloning in cereals.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik pii:10.1007/s00122-018-3210-7 [Epub ahead of print].

KEY MESSAGE: The large and complex genomes of many cereals hindered cloning efforts in the past. Advances in genomics now allow the rapid cloning of genes from humanity's most valuable crops. The past two decades were characterized by a genomics revolution that entailed profound changes to crop research, plant breeding, and agriculture. Today, high-quality reference sequences are available for all major cereal crop species. Large resequencing and pan-genome projects start to reveal a more comprehensive picture of the genetic makeup and the diversity among domesticated cereals and their wild relatives. These technological advancements will have a dramatic effect on dissecting genotype-phenotype associations and on gene cloning. In this review, we will highlight the status of the genomic resources available for various cereal crops and we will discuss their implications for gene cloning. A particular focus will be given to the cereal species barley and wheat, which are characterized by very large and complex genomes that have been inaccessible to rapid gene cloning until recently. With the advancements in genomics and the development of several rapid gene-cloning methods, it has now become feasible to tackle the cloning of most agriculturally important genes, even in wheat and barley.

RevDate: 2019-02-05

Fraunhofer ME, Geißler AJ, Behr J, et al (2019)

Comparative Genomics of Lactobacillus brevis Reveals a Significant Plasmidome Overlap of Brewery and Insect Isolates.

Current microbiology, 76(1):37-47.

Lactobacillus (L.) brevis represents a versatile, ubiquitistic species of lactic acid bacteria, occurring in various foods, as well as plants and intestinal tracts. The ability to deal with considerably differing environmental conditions in the respective ecological niches implies a genomic adaptation to the particular requirements to use it as a habitat beyond a transient state. Given the isolation source, 24 L. brevis genomes were analyzed via comparative genomics to get a broad view of the genomic complexity and ecological versatility of this species. This analysis showed L. brevis being a genetically diverse species possessing a remarkably large pan genome. As anticipated, it proved difficult to draw a correlation between chromosomal settings and isolation source. However, on plasmidome level, brewery- and insect-derived strains grouped into distinct clusters, referable to a noteworthy gene sharing between both groups. The brewery-specific plasmidome is characterized by several genes, which support a life in the harsh environment beer, but 40% of the brewery plasmidome were found in insect-derived strains as well. This suggests a close interaction between these habitats. Further analysis revealed the presence of a truncated horC cluster version in brewery- and insect-associated strains. This disproves horC, the major contributor to survival in beer, as brewery isolate specific. We conclude that L. brevis does not perform rigorous chromosomal changes to live in different habitats. Rather it appears that the species retains a certain genetic diversity in the plasmidome and meets the requirements of a particular ecological niche with the acquisition of appropriate plasmids.

RevDate: 2018-11-14

Mercante JW, Caravas JA, Ishaq MK, et al (2018)

Genomic heterogeneity differentiates clinical and environmental subgroups of Legionella pneumophila sequence type 1.

PloS one, 13(10):e0206110.

Legionella spp. are the cause of a severe bacterial pneumonia known as Legionnaires' disease (LD). In some cases, current genetic subtyping methods cannot resolve LD outbreaks caused by common, potentially endemic L. pneumophila (Lp) sequence types (ST), which complicates laboratory investigations and environmental source attribution. In the United States (US), ST1 is the most prevalent clinical and environmental Lp sequence type. In order to characterize the ST1 population, we sequenced 289 outbreak and non-outbreak associated clinical and environmental ST1 and ST1-variant Lp strains from the US and, together with international isolate sequences, explored their genetic and geographic diversity. The ST1 population was highly conserved at the nucleotide level; 98% of core nucleotide positions were invariant and environmental isolates unassociated with human disease (n = 99) contained ~65% more nucleotide diversity compared to clinical-sporadic (n = 139) or outbreak-associated (n = 28) ST1 subgroups. The accessory pangenome of environmental isolates was also ~30-60% larger than other subgroups and was enriched for transposition and conjugative transfer-associated elements. Up to ~10% of US ST1 genetic variation could be explained by geographic origin, but considerable genetic conservation existed among strains isolated from geographically distant states and from different decades. These findings provide new insight into the ST1 population structure and establish a foundation for interpreting genetic relationships among ST1 strains; these data may also inform future analyses for improved outbreak investigations.

RevDate: 2019-01-09
CmpDate: 2019-01-09

Kavvas ES, Catoiu E, Mih N, et al (2018)

Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance.

Nature communications, 9(1):4306.

Mycobacterium tuberculosis is a serious human pathogen threat exhibiting complex evolution of antimicrobial resistance (AMR). Accordingly, the many publicly available datasets describing its AMR characteristics demand disparate data-type analyses. Here, we develop a reference strain-agnostic computational platform that uses machine learning approaches, complemented by both genetic interaction analysis and 3D structural mutation-mapping, to identify signatures of AMR evolution to 13 antibiotics. This platform is applied to 1595 sequenced strains to yield four key results. First, a pan-genome analysis shows that M. tuberculosis is highly conserved with sequenced variation concentrated in PE/PPE/PGRS genes. Second, the platform corroborates 33 genes known to confer resistance and identifies 24 new genetic signatures of AMR. Third, 97 epistatic interactions across 10 resistance classes are revealed. Fourth, detailed structural analysis of these genes yields mechanistic bases for their selection. The platform can be used to study other human pathogens.

RevDate: 2018-12-14
CmpDate: 2018-12-14

Zoledowska S, Motyka-Pomagruk A, Sledz W, et al (2018)

High genomic variability in the plant pathogenic bacterium Pectobacterium parmentieri deciphered from de novo assembled complete genomes.

BMC genomics, 19(1):751.

BACKGROUND: Pectobacterium parmentieri is a newly established species within the plant pathogenic family Pectobacteriaceae. Bacteria belonging to this species are causative agents of diseases in economically important crops (e.g. potato) in a wide range of different environmental conditions, encountered in Europe, North America, Africa, and New Zealand. Severe disease symptoms result from the activity of P. parmentieri virulence factors, such as plant cell wall degrading enzymes. Interestingly, we observe significant phenotypic differences among P. parmentieri isolates regarding virulence factors production and the abilities to macerate plants. To establish the possible genomic basis of these differences, we sequenced 12 genomes of P. parmentieri strains (10 isolated in Poland, 2 in Belgium) with the combined use of Illumina and PacBio approaches. De novo genome assembly was performed with the use of SPAdes software, while annotation was conducted by NCBI Prokaryotic Genome Annotation Pipeline.

RESULTS: The pan-genome study was performed on 15 genomes (12 de novo assembled and three reference strains: P. parmentieri CFBP 8475T, P. parmentieri SCC3193, P. parmentieri WPP163). The pan-genome includes 3706 core genes, a high number of accessory (1468) genes, and numerous unique (1847) genes. We identified the presence of well-known genes encoding virulence factors in the core genome fraction, but some of them were located in the dispensable genome. A significant fraction of horizontally transferred genes, virulence-related gene duplications, as well as different CRISPR arrays were found, which can explain the observed phenotypic differences. Finally, we found also, for the first time, the presence of a plasmid in one of the tested P. parmentieri strains isolated in Poland.

CONCLUSIONS: We can hypothesize that a large number of the genes in the dispensable genome and significant genomic variation among P. parmentieri strains could be the basis of the potential wide host range and widespread diffusion of P. parmentieri. The obtained data on the structure and gene content of P. parmentieri strains enabled us to speculate on the importance of high genomic plasticity for P. parmentieri adaptation to different environments.

RevDate: 2018-12-11

Yu J, Golicz AA, Lu K, et al (2018)

Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars.

Plant biotechnology journal [Epub ahead of print].

Sesame (Sesamum indicum L.) is an important oil crop renowned for its high oil content and quality. Recently, genome assemblies for five sesame varieties including two landraces (S. indicum cv. Baizhima and Mishuozhima) and three modern cultivars (S. indicum var. Zhongzhi13, Yuzhi11 and Swetha), have become available providing a rich resource for comparative genomic analyses and gene discovery. Here, we employed a reference-assisted assembly approach to improve the draft assemblies of four of the sesame varieties. We then constructed a sesame pan-genome of 554.05 Mb. The pan-genome contained 26 472 orthologous gene clusters; 15 409 (58.21%) of them were core (present across all five sesame genomes), whereas the remaining 41.79% (11 063) clusters and the 15 890 variety-specific genes were dispensable. Comparisons between varieties suggest that modern cultivars from China and India display significant genomic variation. The gene families unique to the sesame modern cultivars contain genes mainly related to yield and quality, while those unique to the landraces contain genes involved in environmental adaptation. Comparative evolutionary analysis indicates that several genes involved in plant-pathogen interaction and lipid metabolism are under positive selection, which may be associated with sesame environmental adaption and selection for high seed oil content. This study of the sesame pan-genome provides insights into the evolution and genomic characteristics of this important oilseed and constitutes a resource for further sesame crop improvement.

RevDate: 2018-12-11
CmpDate: 2018-12-11

Bobay LM, H Ochman (2018)

Factors driving effective population size and pan-genome evolution in bacteria.

BMC evolutionary biology, 18(1):153.

BACKGROUND: Knowledge of population-level processes is essential to understanding the efficacy of selection operating within a species. However, attempts at estimating effective population sizes (Ne) are particularly challenging in bacteria due to their extremely large census populations sizes, varying rates of recombination and arbitrary species boundaries.

RESULTS: In this study, we estimated Ne for 153 species (152 bacteria and one archaeon) defined under a common framework and found that ecological lifestyle and growth rate were major predictors of Ne; and that contrary to theoretical expectations, Ne was unaffected by recombination rate. Additionally, we found that Ne shapes the evolution and diversity of total gene repertoires of prokaryotic species.

CONCLUSION: Together, these results point to a new model of genome architecture evolution in prokaryotes, in which pan-genome sizes, not individual genome sizes, are governed by drift-barrier evolution.

RevDate: 2019-01-16
CmpDate: 2019-01-16

Chun BH, Kim KH, Jeong SE, et al (2019)

Genomic and metabolic features of the Bacillus amyloliquefaciens group- B. amyloliquefaciens, B. velezensis, and B. siamensis- revealed by pan-genome analysis.

Food microbiology, 77:146-157.

The genomic and metabolic features of the Bacillus amyloliquefaciens group comprising B. amyloliquefaciens, B. velezensis, and B. siamensis were investigated through a pan-genome analysis combined with an experimental verification of some of the functions identified. All B. amyloliquefaciens group genomes were retrieved from GenBank and their phylogenetic relatedness was subsequently investigated. Genome comparisons of B. amyloliquefaciens, B. siamensis, and B. velezensis showed that their genomic and metabolic features were similar; however species-specific features were also identified. Energy metabolism-related genes are more enriched in B. amyloliquefaciens, whereas secondary metabolite biosynthesis-related genes are enriched in B. velezensis. Compared to B. amyloliquefaciens and B. siamensis, B. velezensis harbors more genes in its core-genome which are involved in the biosynthesis of antimicrobial compounds, as well as genes involved in d-galacturonate and d-fructuronate metabolism. B. amyloliquefaciens, B. siamensis, and B. velezensis all harbor a xanthine oxidase gene cluster (xoABCDE) in their core-genomes that is involved in metabolizing xanthine and uric acid to glycine and oxalureate. A reconstruction of B. amyloliquefaciens group metabolic pathways using their individual pan-genomes revealed that the B. amyloliquefaciens group strains have the ability to metabolize diverse carbon sources aerobically, or anaerobically, and can produce various metabolites such as lactate, ethanol, acetate, CO2, xylitol, diacetyl, acetoin, and 2,3-butanediol. This study therefore provides insights into the genomic and metabolic features of the B. amyloliquefaciens group.

RevDate: 2018-12-14
CmpDate: 2018-12-14

Wright ES, DA Baum (2018)

Exclusivity offers a sound yet practical species criterion for bacteria despite abundant gene flow.

BMC genomics, 19(1):724.

BACKGROUND: The question of whether bacterial species objectively exist has long divided microbiologists. A major source of contention stems from the fact that bacteria regularly engage in horizontal gene transfer (HGT), making it difficult to ascertain relatedness and draw boundaries between taxa. A natural way to define taxa is based on exclusivity of relatedness, which applies when members of a taxon are more closely related to each other than they are to any outsider. It is largely unknown whether exclusive bacterial taxa exist when averaging over the genome or are rare due to rampant hybridization.

RESULTS: Here, we analyze a collection of 701 genomes representing a wide variety of environmental isolates from the family Streptomycetaceae, whose members are competent at HGT. We find that the presence/absence of auxiliary genes in the pan-genome displays a hierarchical (tree-like) structure that correlates significantly with the genealogy of the core-genome. Moreover, we identified the existence of many exclusive taxa, although individual genes often contradict these taxa. These conclusions were supported by repeating the analysis on 1,586 genomes belonging to the genus Bacillus. However, despite confirming the existence of exclusive groups (taxa), we were unable to identify an objective threshold at which to assign the rank of species.

CONCLUSIONS: The existence of bacterial taxa is justified by considering average relatedness across the entire genome, as captured by exclusivity, but is rejected if one requires unanimous agreement of all parts of the genome. We propose using exclusivity to delimit taxa and conventional genome similarity thresholds to assign bacterial taxa to the species rank. This approach recognizes species that are phylogenetically meaningful, while also establishing some degree of comparability across species-ranked taxa in different bacterial clades.

RevDate: 2018-12-11

Peng Y, Tang S, Wang D, et al (2018)

MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks.

GigaScience, 7(11):.

Pangenome analyses facilitate the interpretation of genetic diversity and evolutionary history of a taxon. However, there is an urgent and unmet need to develop new tools for advanced pangenome construction and visualization, especially for metagenomic data. Here, we present an integrated pipeline, named MetaPGN, for construction and graphical visualization of pangenome networks from either microbial genomes or metagenomes. Given either isolated genomes or metagenomic assemblies coupled with a reference genome of the targeted taxon, MetaPGN generates a pangenome in a topological network, consisting of genes (nodes) and gene-gene genomic adjacencies (edges) of which biological information can be easily updated and retrieved. MetaPGN also includes a self-developed Cytoscape plugin for layout of and interaction with the resulting pangenome network, providing an intuitive and interactive interface for full exploration of genetic diversity. We demonstrate the utility of MetaPGN by constructing Escherichia coli pangenome networks from five E. coli pathogenic strains and 760 human gut microbiomes,revealing extensive genetic diversity of E. coli within both isolates and gut microbial populations. With the ability to extract and visualize gene contents and gene-gene physical adjacencies of a specific taxon from large-scale metagenomic data, MetaPGN provides advantages in expanding pangenome analysis to uncultured microbial taxa.

RevDate: 2018-11-14

Sharma V, Mobeen F, T Prakash (2018)

Exploration of Survival Traits, Probiotic Determinants, Host Interactions, and Functional Evolution of Bifidobacterial Genomes Using Comparative Genomics.

Genes, 9(10):.

Members of the genus Bifidobacterium are found in a wide-range of habitats and are used as important probiotics. Thus, exploration of their functional traits at the genus level is of utmost significance. Besides, this genus has been demonstrated to exhibit an open pan-genome based on the limited number of genomes used in earlier studies. However, the number of genomes is a crucial factor for pan-genome calculations. We have analyzed the pan-genome of a comparatively larger dataset of 215 members of the genus Bifidobacterium belonging to different habitats, which revealed an open nature. The pan-genome for the 56 probiotic and human-gut strains of this genus, was also found to be open. The accessory- and unique-components of this pan-genome were found to be under the operation of Darwinian selection pressure. Further, their genome-size variation was predicted to be attributed to the abundance of certain functions carried by genomic islands, which are facilitated by insertion elements and prophages. In silico functional and host-microbe interaction analyses of their core-genome revealed significant genomic factors for niche-specific adaptations and probiotic traits. The core survival traits include stress tolerance, biofilm formation, nutrient transport, and Sec-secretion system, whereas the core probiotic traits are imparted by the factors involved in carbohydrate- and protein-metabolism and host-immunomodulations.

RevDate: 2018-12-11
CmpDate: 2018-12-11

Awan F, Dong Y, Liu J, et al (2018)

Comparative genome analysis provides deep insights into Aeromonas hydrophila taxonomy and virulence-related factors.

BMC genomics, 19(1):712.

BACKGROUND: Aeromonas hydrophila is a potential zoonotic pathogen and primary fish pathogen. With overlapping characteristics, multiple isolates are often mislabelled and misclassified. Moreover, the potential pathogenic factors among the publicly available genomes in A. hydrophila strains of different origins have not yet been investigated.

RESULTS: To identify the valid strains of A. hydrophila and their pathogenic factors, we performed a pan-genomic study. It revealed that there were 13 mislabelled strains and 49 valid strains that were further verified by Average nucleotide identity (ANI), digital DNA-DNA hybridization (dDDH) and in silico multiple locus strain typing (MLST). Multiple numbers of phages were detected among the strains and among them Aeromonas phi 018 was frequently present. The diversity in type III secretion system (T3SS) and conservation of type II and type VI secretion systems (T2SS and T6SS, respectively) among all the strains are important to study for designing future strategies. The most prevalent antibiotic resistances were found to be beta-lactamase, polymyxin and colistin resistances. The comparative analyses of sequence type (ST) 251 and other ST groups revealed that there were higher numbers of virulence factors in ST-251 than in other STs group.

CONCLUSION: Publicly available genomes have 13 mislabelled organisms, and there are only 49 valid A. hydrophila strains. This valid pan-genome identifies multiple prophages that can be further utilized. Different A. hydrophila strains harbour multiple virulence factors and antibiotic resistance genes. Identification of such factors is important for designing future treatment regimes.

RevDate: 2018-11-14
CmpDate: 2018-10-29

Sheikhizadeh Anari S, de Ridder D, Schranz ME, et al (2018)

Efficient inference of homologs in large eukaryotic pan-proteomes.

BMC bioinformatics, 19(1):340.

BACKGROUND: Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, there is a need for efficient standalone tools to detect homologs in novel data.

RESULTS: To address this, we present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. We demonstrate accuracy, scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa.

CONCLUSIONS: We clearly observed the trade-off between recall and precision in our homology inference. Favoring recall or precision strongly depends on the application. The clustering behavior of our program can be optimized for particular applications by altering a few key parameters. The program is available for public use at as an extension to our pan-genomic analysis tool, PanTools.

RevDate: 2018-11-14

Wang LYR, Jokinen CC, Laing CR, et al (2018)

Multi-Year Persistence of Verotoxigenic Escherichia coli (VTEC) in a Closed Canadian Beef Herd: A Cohort Study.

Frontiers in microbiology, 9:2040.

In this study, fecal samples were collected from a closed beef herd in Alberta, Canada from 2012 to 2015. To limit serotype bias, which was observed in enrichment broth cultures, Verotoxigenic Escherichia coli (VTEC) were isolated directly from samples using a hydrophobic grid-membrane filter verotoxin immunoblot assay. Overall VTEC isolation rates were similar for three different cohorts of yearling heifers on both an annual (68.5 to 71.8%) and seasonal basis (67.3 to 76.0%). Across all three cohorts, O139:H19 (37.1% of VTEC-positive samples), O22:H8 (15.8%) and O?(O108):H8 (15.4%) were among the most prevalent serotypes. However, isolation rates for serotypes O139:H19, O130:H38, O6:H34, O91:H21, and O113:H21 differed significantly between cohort-years, as did isolation rates for some serotypes within a single heifer cohort. There was a high level of VTEC serotype diversity with an average of 4.3 serotypes isolated per heifer and 65.8% of the heifers classified as "persistent shedders" of VTEC based on the criteria of >50% of samples positive and ≥4 consecutive samples positive. Only 26.8% (90/336) of the VTEC isolates from yearling heifers belonged to the human disease-associated seropathotypes A (O157:H7), B (O26:H11, O111:NM), and C (O22:H8, O91:H21, O113:H21, O137:H41, O2:H6). Conversely, seropathotypes B (O26:NM, O111:NM) and C (O91:H21, O2:H29) strains were dominant (76.0%, 19/25) among VTEC isolates from month-old calves from this herd. Among VTEC from heifers, carriage rates of vt1, vt2, vt1+vt2, eae, and hlyA were 10.7, 20.8, 68.5, 3.9, and 88.7%, respectively. The adhesin gene saa was present in 82.7% of heifer strains but absent from all of 13 eae+ve strains (from serotypes/intimin types O157:H7/γ1, O26:H11/β1, O111:NM/θ, O84:H2/ζ, and O182:H25/ζ). Phylogenetic relationships inferred from wgMLST and pan genome-derived core SNP analysis showed that strains clustered by phylotype and serotype. Further, VTEC strains of the same serotype usually shared the same suite of antibiotic resistance and virulence genes, suggesting the circulation of dominant clones within this distinct herd. This study provides insight into the diverse and dynamic nature of VTEC populations within groups of cattle and points to a broad spectrum of human health risks associated with these E. coli strains.

RevDate: 2018-11-14

Golanowska M, Potrykus M, Motyka-Pomagruk A, et al (2018)

Comparison of Highly and Weakly Virulent Dickeya solani Strains, With a View on the Pangenome and Panregulon of This Species.

Frontiers in microbiology, 9:1940.

Bacteria belonging to the genera Dickeya and Pectobacterium are responsible for significant economic losses in a wide variety of crops and ornamentals. During last years, increasing losses in potato production have been attributed to the appearance of Dickeya solani. The D. solani strains investigated so far share genetic homogeneity, although different virulence levels were observed among strains of various origins. The purpose of this study was to investigate the genetic traits possibly related to the diverse virulence levels by means of comparative genomics. First, we developed a new genome assembly pipeline which allowed us to complete the D. solani genomes. Four de novo sequenced and ten publicly available genomes were used to identify the structure of the D. solani pangenome, in which 74.8 and 25.2% of genes were grouped into the core and dispensable genome, respectively. For D. solani panregulon analysis, we performed a binding site prediction for four transcription factors, namely CRP, KdgR, PecS and Fur, to detect the regulons of these virulence regulators. Most of the D. solani potential virulence factors were predicted to belong to the accessory regulons of CRP, KdgR, and PecS. Thus, some differences in gene expression could exist between D. solani strains. The comparison between a highly and a low virulent strain, IFB0099 and IFB0223, respectively, disclosed only small differences between their genomes but significant differences in the production of virulence factors like pectinases, cellulases and proteases, and in their mobility. The D. solani strains also diverge in the number and size of prophages present in their genomes. Another relevant difference is the disruption of the adhesin gene fhaB2 in the highly virulent strain. Strain IFB0223, which has a complete adhesin gene, is less mobile and less aggressive than IFB0099. This suggests that in this case, mobility rather than adherence is needed in order to trigger disease symptoms. This study highlights the utility of comparative genomics in predicting D. solani traits involved in the aggressiveness of this emerging plant pathogen.

RevDate: 2018-10-16

Bayer PE, Golicz AA, Tirnaz S, et al (2018)

Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome.

Plant biotechnology journal [Epub ahead of print].

Brassica oleracea is an important agricultural species encompassing many vegetable crops including cabbage, cauliflower, broccoli and kale; however, it can be susceptible to a variety of fungal diseases such as clubroot, blackleg, leaf spot and downy mildew. Resistance to these diseases is meditated by specific disease resistance genes analogs (RGAs) which are differently distributed across B. oleracea lines. The sequenced reference cultivar does not contain all B. oleracea genes due to gene presence/absence variation between individuals, which makes it necessary to search for RGA candidates in the B. oleracea pangenome. Here we present a comparative analysis of RGA candidates in the pangenome of B. oleracea. We show that the presence of RGA candidates differs between lines and suggests that in B. oleracea, SNPs and presence/absence variation drive RGA diversity using separate mechanisms. We identified 59 RGA candidates linked to Sclerotinia, clubroot, and Fusarium wilt resistance QTL, and these findings have implications for crop breeding in B. oleracea, which may also be applicable in other crops species.

RevDate: 2018-10-19

Checcucci A, diCenzo GC, Ghini V, et al (2018)

Creation and Characterization of a Genomically Hybrid Strain in the Nitrogen-Fixing Symbiotic Bacterium Sinorhizobium meliloti.

ACS synthetic biology, 7(10):2365-2378.

Many bacteria, often associated with eukaryotic hosts and of relevance for biotechnological applications, harbor a multipartite genome composed of more than one replicon. Biotechnologically relevant phenotypes are often encoded by genes residing on the secondary replicons. A synthetic biology approach to developing enhanced strains for biotechnological purposes could therefore involve merging pieces or entire replicons from multiple strains into a single genome. Here we report the creation of a genomic hybrid strain in a model multipartite genome species, the plant-symbiotic bacterium Sinorhizobium meliloti. We term this strain as cis-hybrid, since it is produced by genomic material coming from the same species' pangenome. In particular, we moved the secondary replicon pSymA (accounting for nearly 20% of total genome content) from a donor S. meliloti strain to an acceptor strain. The cis-hybrid strain was screened for a panel of complex phenotypes (carbon/nitrogen utilization phenotypes, intra- and extracellular metabolomes, symbiosis, and various microbiological tests). Additionally, metabolic network reconstruction and constraint-based modeling were employed for in silico prediction of metabolic flux reorganization. Phenotypes of the cis-hybrid strain were in good agreement with those of both parental strains. Interestingly, the symbiotic phenotype showed a marked cultivar-specific improvement with the cis-hybrid strains compared to both parental strains. These results provide a proof-of-principle for the feasibility of genome-wide replicon-based remodelling of bacterial strains for improved biotechnological applications in precision agriculture.

RevDate: 2018-11-26
CmpDate: 2018-11-26

Le KK, Whiteside MD, Hopkins JE, et al (2018)

Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses.

Database : the journal of biological databases and curation, 2018:1-10.

Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies. The Spfy platform facilitates rapid phenotype identification, as well as the efficient storage and downstream comparative analysis of tens of thousands of genome sequences. Though generally applicable to bacterial genome sequences, Spfy currently contains 10 243 Escherichia coli genomes, for which in-silico serotype and Shiga-toxin subtype, as well as the presence of known virulence factors and antimicrobial resistance determinants have been computed. Additionally, the presence/absence of the entire E. coli pan-genome was computed and linked to each genome. Owing to its database of diverse pre-computed results, and the ability to easily incorporate user data, Spfy facilitates hypothesis testing in fields ranging from population genomics to epidemiology, while mitigating the re-computation of analyses. The graph approach of Spfy is flexible, and can accommodate new analysis software modules as they are developed, easily linking new results to those already stored. Spfy provides a database and analyses approach for E. coli that is able to match the rapid accumulation of WGS data in public databases.

RevDate: 2019-01-07

Kavya VNS, Tayal K, Srinivasan R, et al (2019)

Sequence Alignment on Directed Graphs.

Journal of computational biology : a journal of computational molecular cell biology, 26(1):53-67.

Genomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAGs). To achieve this, acyclic extensions of the input graphs are first constructed through expensive loop unrolling steps (DAGification). Furthermore, such graph extensions could have considerable blowup in their size and in the worst case the blow-up factor is proportional to the input sequence length. We provide a novel alignment algorithm V-ALIGN that aligns the input sequence directly on the input graph while avoiding such expensive DAGification steps. V-ALIGN is based on a novel dynamic programming (DP) formulation that allows gapped alignment directly on the input graph. It supports affine and linear gaps. We also propose refinements to V-ALIGN for better performance in practice. With the proposed refinements, the time to fill the DP table has linear dependence on the sizes of the sequence, the graph, and its feedback vertex set. We conducted experiments to compare the proposed algorithm against the existing POA-based techniques. We also performed alignment experiments on the genome variation graphs constructed from the 1000 Genomes data. For aligning short sequences, standard approaches restrict the expensive gapped alignment to small filtered subgraphs having high similarity to the input sequence. In such cases, the performance of V-ALIGN for gapped alignment on the filtered subgraph depends on the subgraph sizes.


ESP Quick Facts

ESP Origins

In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.

ESP Support

In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.

ESP Rationale

Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.

ESP Goal

In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.

ESP Usage

Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.

ESP Content

When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.

ESP Help

Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.

ESP Plans

With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.

Electronic Scholarly Publishing
21454 NE 143rd Street
Woodinville, WA 98077

E-mail: RJR8222 @

Papers in Classical Genetics

The ESP began as an effort to share a handful of key papers from the early days of classical genetics. Now the collection has grown to include hundreds of papers, in full-text format.

Digital Books

Along with papers on classical genetics, ESP offers a collection of full-text digital books, including many works by Darwin (and even a collection of poetry — Chicago Poems by Carl Sandburg).


ESP now offers a much improved and expanded collection of timelines, designed to give the user choice over subject matter and dates.


Biographical information about many key scientists.

Selected Bibliographies

Bibliographies on several topics of potential interest to the ESP community are now being automatically maintained and generated on the ESP site.

ESP Picks from Around the Web (updated 07 JUL 2018 )