Abstract:Wheat, as one of the world’s essential food crops, requires timely diagnosis and control of diseases and pests to significantly reduce yield losses, which is crucial for global food security. However, deep learning methods typically rely on large amounts of training data and high-performance computing resources, which pose limitations in scenarios with small sample sizes and limited resources. To address these issues, a knowledge-enhanced lightweight multi-modal wheat diseases and pests identification model named multi-modal blend convolutional neural network (Blend-CNN) was proposed. The model was structured around a dual-branch convolutional neural network framework. It utilized an EfficientNet backbone network to extract image features of diseases and pests and incorporated a multi-branch TextCNN backbone network to extract textual features of diseases and pests descriptions, thereby obtaining more feature information and improving identification accuracy. Additionally, an innovative convolutional network-based multi-modal fusion method was introduced, allowing the model to optimally integrate information from both modalities globally. Furthermore, to mitigate the accuracy loss common in traditional multi-modal methods, the gradient blend loss function was developed. Finally, to verify the model’s effectiveness, comparative experiments were conducted on a constructed dataset containing 880 samples. The results demonstrated that the proposed model achieved the highest identification accuracy of 96.95% on this dataset, compared with other models, it had fewer parameters, lower complexity, and it was more lightweight which was applicable for edge devices, offering theoretical support for wheat diseases and pests identification in scenarios with limited sample sizes and resources.