Strategies for the identification of disease-related patterns of volatile organic compounds: prediction of paratuberculosis in an animal model using random forests
Recently, modern statistical methods which were developed for automated pattern recognition are being used increasingly for data analysis in studies on emissions of volatile organic compounds (VOCs). With the detection of disease-related VOC profiles, novel non-invasive diagnostic tools could be developed for clinical applications. However, it is important to bear in mind that not all statistical methods are suited for the investigation of VOC profiles. In particular univariate methods are not able to discover VOC patterns as they consider each compound separately. The present study demonstrates this fact in practice. Using VOC samples from a controlled animal study on paratuberculosis, random forests were applied for pattern recognition and disease prediction. This strategy was compared to a prediction approach based on single compounds. Both methods were framed within a cross-validation procedure. A comparison of both strategies based on this VOC data reveals that random forests achieve higher sensitivities and specificities than predictions based on single compounds. Therefore, it will most likely be more fruitful to further investigate VOC patterns instead of single biomarkers for paratuberculosis. All methods used are thoroughly explained to aid the transfer to other data analyses.