Using affinity propagation clustering for identifying bacterial clades and subclades with whole-genome sequences of Francisella tularensis
By combining a reference-independent SNP analysis and average nucleotide identity (ANI) with affinity propagation clustering (APC), we developed a significantly improved methodology allowing resolving phylogenetic relationships, based on objective criteria. These bioinformatics tools can be used as a general ruler to determine phylogenetic relationships and clustering of bacteria, exemplary done with Francisella (F.) tularensis. Molecular epidemiology of F. tularensis is currently assessed mostly based on laboratory methods and molecular analysis. The high evolutionary stability and the clonal nature makes Francisella ideal for subtyping with single nucleotide polymorphisms (SNPs). Sequencing and real-time PCR can be used to validate the SNP analysis. We investigate whole-genome sequences of 155 F. tularensis subsp. holarctica isolates. Phylogenetic testing was based on SNPs and average nucleotide identity (ANI) as reference independent, alignment-free methods taking small-scale and large-scale differences within the genomes into account. Especially the whole genome SNP analysis with kSNP3.0 allowed deciphering quite subtle signals of systematic differences in molecular variation. Affinity propagation clustering (APC) resulted in three clusters showing the known clades B.4, B.6, and B.12. These data correlated with the results of real‐time PCR assays targeting canSNPs loci. Additionally, we detected two subtle sub-clusters. SplitsTree was used with standard-setting using the aligned SNPs from Parsnps. Together APC, HierBAPS, and SplitsTree enabled us to generate hypotheses about epidemiologic relationships between bacterial clusters and describing the distribution of isolates. Our data indicate that the choice of the typing technique can increase our understanding of the pathogenesis and transmission of diseases with the eventual for prevention. This is opening perspectives to be applied to other bacterial species. The data provide evidence that Germany might be the collision zone where the clade B.12, also known as the East European clade, overlaps with the clade B.6, also known as the Iberian clade. Described methods allow generating a new, more detailed perspective for F. tularensis subsp. holarctica phylogeny. These results may encourage to determine phylogenetic relationships and clustering of other bacteria the same way.