Abstract:China is the largest rice-growing country in the world, and accurate recognition of different growth stages of rice is essential for achieving intelligent rice field management. Targeting the classification of rice growth stages, including initial heading stage, full heading stage, milk ripening stage, and yellow ripening stage, a recognition model for rice growth stages was proposed based on multi- feature fusion. The specific methodology included multi-feature extraction module, utilizing high- resolution remote sensing images obtained from drones as the data source, multi-scale features and deep semantic features of the images were extracted by using the Swin Transformer model and ResNet18 model. An improved feature fusion module was designed, which included an adaptive feature fusion ( AFF) module and a global context modeling ( GC) module. The AFF module achieved adaptive fusion of different features by learning the weight relationships between features. The GC module incorporated global context information to enhance the model’s ability to perceive global features, thereby improving adaptability in complex scenarios. Model optimization, utilizing the Lion optimizer and a dynamic learning rate strategy to accelerate the model??s convergence speed. Additionally, a Dropout layer was introduced to prevent overfitting and improve the model??s generalization ability. Comparative experiments demonstrated that the proposed model achieved a 97. 14% accuracy rate in recognizing rice at the heading and maturity stages, which was 3. 29, 2. 53, and 1. 09 percentage points higher than that of other mainstream models such as MobileNetV2, EfficientNet and Swin Transformer, respectively. Notably, while maintaining high accuracy, the model had relatively fewer parameters and lower computational costs, showcasing lightweight characteristics suitable for practical applications.