Effect of data spatial scale on the performance of fish habitat models
Habitat models are widely used to explore past and predict future shifts in fish distribution. Our literature review reveals a widespread practice of using in situ data or data with the highest possible resolution to train fish habitat models. Using examples of six fish species at two life stages in the North Sea, we demonstrate that the choice of the data resolution is crucial for a model's performance. We matched fish abundance data from a 51-year long scientific survey at three spatial scales with environmental parameters at seven spatial scales, obtaining a total of 240 data sets. We varied the resolution used for model training and for model predictions and evaluated model performance with various metrics on training and cross-validating data. Contrary to the common notion, training the model with low-resolution data generally improved the performance metrics when compared to models built upon in situ or high-resolution data. The optimal resolution for fish and environmental data was roughly twice the average distance between observations. Training the model with data of higher resolutions often yielded unrealistic fish multidecadal distributional shifts. In turn, best model predictions were achieved with data of higher resolution than the training data. We explain these results with scale-dependent ecological responses, subscale noise in the raw data, failure of interpolation to create information and failure to comply with the Nyquist–Shannon sampling theorem. This study shows that the choice of an appropriate spatial scale is crucial to correctly predict shifts in fish distribution under climate change.