Global genomic population structure of Clostridioides difficile : [Preprint]
Clostridioides difficile is the primary infectious cause of antibiotic-associated diarrhea. Local transmissions and international outbreaks of this pathogen have been previously elucidated by bacterial whole-genome sequencing, but comparative genomic analyses at the global scale were hampered by the lack of specific bioinformatic tools. Here we introduce EnteroBase, a publicly accessible database (http://enterobase.warwick.ac.uk) that automatically retrieves and assembles C. difficile short-reads from the public domain, and calls alleles for core-genome multilocus sequence typing (cgMLST). We demonstrate that the identification of highly related genomes is 89% consistent between cgMLST and single-nucleotide polymorphisms. EnteroBase currently contains 13,515 quality-controlled genomes which have been assigned to hierarchical sets of single-linkage clusters by cgMLST distances. Hierarchical clustering can be used to identify populations of C. difficile at all epidemiological levels, from recent transmission chains through to pandemic and endemic strains, and is largely compatible with prior ribotyping. Hierarchical clustering thus enables comparisons to earlier surveillance data and will facilitate communication among researchers, clinicians and public-health officials who are combatting disease caused by C. difficile.