Abstract:Accurate acquisition of maize yield information was recognized as critical for formulating agricultural policies and supporting national economic development. Long short-term memory (LSTM) networks and Transformer models were employed for crop yield estimation due to their respective strengths in processing remote sensing time series data. To balance LSTM's capacity for capturing local temporal dependencies with the Transformer's efficiency in modeling global relationships, multi-source remote sensing parameters and maize yield were used as inputs to construct three yield estimation models: Transformer Encoder-LSTM (TFEL), Transformer-LSTM (TFL), and pure Transformer. A Bayesian optimization algorithm was applied to determine the optimal combinations of hidden layer size, learning rate, and other hyperparameters. These models were then used to estimate county-scale maize yields in the Taiyuan and Shangdang Basins of Shanxi Province. The Shapley additive explanations (SHAP) method was employed to quantify the contribution of each remote sensing feature within the hybrid models. The TFEL model demonstrated superior estimation accuracy (R2 was 0.72, P<0.01, RMSE was 756.43 kg/hm2, MAPE was 10.58%, NRMSE was 11.86%) compared with both the TFL model (R2 was 0.62, P<0.01, RMSE was 974.14 kg/hm2, MAPE was 13.50%, NRMSE was 15.14%) and the Transformer model (R2 was 0.53, P<0.01, RMSE was 1 028.76 kg/hm2, MAPE was 19.13%, NRMSE was 16.16%). The spatial distribution of estimated yields showed higher values in the northern and southern regions and lower values in the eastern and western regions. A strong linear relationship was observed between estimated and statistical yields, confirming the TFEL model's generalization capability. Results revealed that the two-band enhanced vegetation index (EVI2) and green chlorophyll vegetation index (GCVI) provided the greatest contributions to maize yield estimation in both the TFEL and TFL frameworks. Compared with the TFL model, the TFEL model focused more effectively on key remote sensing parameters, and consistently identified and quantified their contributions to yield estimation, thereby achieving higher estimation accuracy. In summary, the hybrid model based on Transformer and LSTM showed promising application potential in maize yield estimation, which can provide theoretical and method reference for regional crop yield assessment.