A theoretical and generalized approach for the assessment of the sample-specific limit of detection for clinical metagenomics
Metagenomics is a powerful tool to identify novel or unexpected pathogens, since it is generic and relatively unbiased. The limit of detection (LOD) is a critical parameter for the routine application of methods in the clinical diagnostic context. Although attempts for the determination of LODs for metagenomics next-generation sequencing (mNGS) have been made previously, these were only applicable for specific target species in defined samples matrices. Therefore, we developed and validated a generalized probability-based model to assess the sample-specific LOD of mNGS experiments (LODmNGS). Initial rarefaction analyses with datasets of Borna disease virus 1 human encephalitis cases revealed a stochastic behavior of virus read detection. Based on this, we transformed the Bernoulli formula to predict the minimal necessary dataset size to detect one virus read with a probability of 99%. We validated the formula with 30 datasets from diseased individuals, resulting in an accuracy of 99.1% and an average of 4.5±0.4 viral reads found in the calculated minimal dataset size. We demonstrated by modeling the virus genome size, virus-, and total RNA-concentration that the main determinant of mNGS sensitivity is the virus- sample background ratio. The predicted LODmNGS for the respective pathogenic virus in the datasets were congruent with the virus-concentration determined by RT-qPCR. Theoretical assumptions were further confirmed by correlation analysis of mNGS and RT-qPCR data from the samples of the analyzed datasets. This approach should guide standardization of mNGS application, due to the generalized concept of LODmNGS.