Abstract:In order to improve the accuracy of pear target detection in complex environments, a TC-ICSA-YOLO v8 pear target detection model was proposed based on Transformer-CNN feature deep fusion. The proposed model effectively leveraged the strengths of convolutional neural networks (CNNs) in extracting local (high-frequency) features from images and the capabilities of Transformers in capturing global (low-frequency) features. To further enhance the model’s performance, the experimental design incorporated the Inception dilated convolution module and the adaptive detail fusion module (ADI), as well as introduced novel mechanisms such as squeeze-enhanced axial attention (SeaAttention) and coordinate attention (CA), thereby enhancing the model’s feature extraction capabilities. Data augmentation was achieved through the use of the Fourier transform (FT) method, with a frequency ramp structure employed to better balance the contributions of local (high-frequency) and global (low-frequency) feature components, optimizing the network’s performance in feature extraction. The experimental results demonstrated that the TC-ICSA-YOLO v8 model achieved a mean average precision (mAP) of 97.01%, precision of 97.33%, and recall of 95.69% on the validation set, with a detection speed of 81.21 frames per second. Compared with the YOLO v8s model under the same conditions, the TC-ICSA-YOLO v8 model demonstrated superior target detection precision for night-time images, with an improvement in mAP of 14.65, 3.34, 0.52, 0.20, 6.68, and 5.54 percentage points over Faster R-CNN, YOLO v3, YOLO v7s, YOLO v8s, SwinTransformer, and RT-DETR models, respectively. The model’s parameter count was reduced by 74.60, 34.16, 9.48, 16.81, 20.84, and 13.64MB compared with that of these models. The improved model had high detection accuracy and less number of parameters, which was favorable for deployment on mobile. The detection model proposed had good target detection effect for pear, which can provide reference for object detection in complex environment, and can provide technical support for pear automated picking.