Abstract:Nitrogen is a key factor that affects crop growth. The basis for the implementation of various agricultural water and fertilizer management technologies is the accurate determination of soil nitrogen content. Soil nitrogen content could be detected quickly by the visible-near-infrared spectroscopy technology. The bottleneck that limits the application of spectral technology in soil nitrogen test is the accuracy and generalizability of predictive models. In order to improve the prediction accuracy and generalization ability, a soil nitrogen content prediction model was proposed based on sparse self-attention and visible-near-infrared spectroscopy, which was called VNIRSformer. The model consisted of input layer, embedding layer, encoder, decoder, prediction layer and output layer. The land use/cover area frame statistical survey dataset (LUCAS) was used to train model to improve its generalization ability. The performance of VNIRSformer was tested at 15 different spectral wavelength intervals, and the result showed that as the wavelength interval was increased, the model prediction accuracy was firstly increased and then decreased, and the model size was reduced. The model prediction accuracy was the lowest at the wavelength interval of 1 nm, where the RMSE was 0.47 g/kg and the R2 was 0.78. The highest predictive accuracy of the model was for the 5 nm wavelength interval, of which the RMSE was 0.35 g/kg and the R2 was 0.89. The greatest reduction in model size was observed when the wavelength interval was increased from 0.5 nm to 1 nm, which was decreased by 72%. The model size was decreased uniformly at a rate of 5% as the wavelength interval was increased from 1 nm to 5 nm. Considering the model size and performance, the optimal wavelength interval was set to be 5 nm. When compared with six different prediction models (two convolutional neural networks, traditional self-attention model,partial least squares regression, support vector machine regression, and K-nearest neighbor regression), the VNIRSformer model had the best performance, with RMSE of 0.35 g/kg, R2 of 0.89 and RPD was 2.95. To test the adaptability of VNIRSformer to predict the soil nitrogen content at different grades, it was found that VNIRSformer had high prediction accuracy for soil nitrogen content below 5 g/kg. VNIRSformer was directly applied to self-collected datasets to verify the model’s generalization ability. R2 was decreased by 0.17, indicating that VNIRSformer had a certain generalization ability. The research results indicated that spectral data with a wavelength interval of 5 nm was selected as input of VNIRSformer, which had the best prediction performance and moderate scale. Sparse attention mechanism was able to improve model prediction accuracy and reduce model training time. The VNIRSformer model had a certain generalization ability. The results could provide support for the practical application of field soil nitrogen content prediction based on visible-near-infrared spectroscopy technology.