Abstract:Aiming to utilize information from spectral data, canopy structure, and texture features for cotton yield estimation through unmanned aerial vehicle (UAV) remote sensing, while systematically analyzing the contribution of these factors to yield estimation, based on the construction of a machine learning model for cotton yield estimation by using multi-source UAV data, the optimal growth stage for yield estimation was further identified and the effectiveness of multi-source sensor data in estimating cotton yield was compared. Finally, the contribution of various input features was quantified. Data were collected from three types of sensors: RGB (red, green, blue), multi-spectral (MS), and light detection and ranging (LiDAR). By conducting a correlation analysis between cotton spectral vegetation indices and yield, the optimal growth stage for cotton yield estimation was determined. Subsequently, yield estimation methods were developed by using three machine learning models: partial least squares regression (PLSR), random forest regression (RFR), and extreme gradient boosting (XGBoost). The performance of models based on the two most commonly used sensors (RGB and MS cameras) was evaluated. The results confirmed that the flowering stage was the optimal growth period for cotton yield estimation. Using UAV data from the flowering stage, the XGBoost model achieved the highest yield estimation accuracy (R2 was 0.70, RMSE was 611.31 kg/hm2, rRMSE was 10.60%). When comparing features extracted from RGB and MS image data, the modeling results based on MS camera data were superior. Additionally, when features extracted from both RGB and MS camera data were used as inputs, the model performance exceeded that of single-sensor data. The Shapley additive explanations (SHAP) algorithm was employed to analyze the contribution of each input feature in the machine learning models for yield estimation. It was found that the three types of feature information derived from the three sensors were all significant for yield estimation, with texture features and canopy structure demonstrating considerable potential in this regard. The research result can provide theoretical and technical support for high-throughput cotton yield estimation in smart cotton management.