Quantification of soil variables in a heterogeneous soil region with VIS-NIR-SWIR data using different statistical sampling and modeling strategies
Estimation accuracies obtained for soil properties from spectroradiometer data markedly depend on the individual sample set. The choice of the statistical method to sample a calibration set and the extension of the multivariate modeling approach with bagging and/or spectral variable selection may optimize predictions. We studied this with a set of 172 arable topsoils from a region near Trier (Germany) that covered—as often typical for medium to large-scale applications of soil spectroscopy—a wide range of different soil situations. Yet, differences concerning target variables—organic carbon (OC), nitrogen (N), microbial biomass (Cmic ) and thermostable carbon (Cinert)—were small. Based on a split of calibration and validation data with the Kennard–Stone algorithm, we found only moderate improvements towards partial least squares regression (PLSR) when combining PLSR with bagging and, for spectral variable selection, with “competitive adaptive reweighted sampling” (CARS). R2 improved for OC (from 0.75 to 0.79), N(from 0.72 to 0.77) andCinert (from 0.66 to 0.68) in the validation. Additionally, we used individual calibration sets for each validation sample. In this “local” approach, we clustered calibration samples in the spectral feature space and selected individually the most similar sample from each cluster. Combining bagging- CARS-PLSR with this local approach improved R2 markedly to 0.76 for Cinert, and slightly to 0.82 for OC and to 0.76 (previously 0.73) for Cmic . Effects of the local approach were twofold, as it removed improper samples from the calibration and balanced skewness in the data distribution.
Nutzung und Vervielfältigung:
Alle Rechte vorbehalten