Abstract:Given the limitations of manual apple disease detection, including inefficiency, high cost, and low accuracy, aiming to propose a more effective solution that can improve detection accuracy while reducing time and costs. The Swin Transformer was utilized as the base model and the DenseNet framework was integrated into its core modules to enhance feature propagation and improve gradient flow. Additionally, the Outlook Attention module captured fine-grained image details, enhancing the model’s ability to extract intricate features. To further optimize the model’s performance, the depthwise separable and dilated convolutions were introduced, enabling the capture of multi-scale features while reducing the parameter size. Finally, the Non-Local module was integrated into the model to incorporate global context information, thereby further enhancing overall performance. These improvements collectively enabled the model to exhibit superior performance and robustness across multiple tasks. Experimental results indicated that the accuracy for classifying apple leaf diseases reached 95.8%, with precision, recall, and F1 score values of 95.80%, 95.74%, and 95.76%, respectively, all surpassing those of the baseline model. The proposed Swin Transformer-based model, optimized for apple leaf disease classification, efficiently identified both the type and severity of apple leaf diseases. This served as a theoretical foundation and provided critical support for large-scale crop disease monitoring, facilitating precise disease prevention and control in sustainable agriculture. Moreover, compared with existing deep learning models like ResNet and standard Swin Transformer, the proposed model exhibited superior accuracy and computational efficiency. Future research would focus on further optimizing the model architecture to address more complex agricultural scenarios, such as classifying co-occurring diseases, and integrating drone-based image acquisition technologies for real-time disease detection and prediction.