Evaluation of Time-of-Flight Secondary Ion Mass Spectrometry Spectra of Peptides by Random Forest with Amino Acid Labels: Results from a Versailles Project on Advanced Materials and Standards Interlaboratory Study
We report the results of a VAMAS (Versailles Project on Advanced Materials and Standards) interlaboratory study on the identification of peptide sample TOF-SIMS spectra by machine learning. More than 1000 time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of six peptide model samples (one of them was a test sample) were collected using 27 TOF-SIMS instruments from 25 institutes of six countries, the U. S., the U. K., Germany, China, South Korea, and Japan. Because peptides have systematic and simple chemical structures, they were selected as model samples. The intensity of peaks in every TOF-SIMS spectrum was extracted using the same peak list and normalized to the total ion count. The spectra of the test peptide sample were predicted by Random Forest with 20 amino acid labels. The accuracy of the prediction for the test spectra was 0.88. Although the prediction of an unknown peptide was not perfect, it was shown that all of the amino acids in an unknown peptide can be determined by Random Forest prediction and the TOF-SIMS spectra. Moreover, the prediction of peptides, which are included in the training spectra, was almost perfect. Random Forest also suggests specific fragment ions from an amino acid residue Q, whose fragment ions detected by TOF-SIMS have not been reported, in the important features. This study indicated that the analysis using Random Forest, which enables translation of the mathematical relationships to chemical relationships, and the multi labels representing monomer chemical structures, is useful to predict the TOF-SIMS spectra of an unknown peptide.
Use and reproduction:
All rights reserved