基于Transformer-CNN特征深度融合的复杂环境香梨目标检测

doi:10.6041/j.issn.1000-1298.2026.02.016

首页 > 过刊浏览>2026年第57卷第2期 >161-170. DOI:10.6041/j.issn.1000-1298.2026.02.016

基于Transformer-CNN特征深度融合的复杂环境香梨目标检测
DOI:
                        10.6041/j.issn.1000-1298.2026.02.016
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金项目（51466014）

Pear Object Detection in Complex Environment Based on Transformer-CNN Feature Deep Fusion

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为了能在复杂环境下提高香梨目标检测准确性，本研究提出了一种基于Transformer-CNN特征深度融合的TC-ICSA-YOLO v8香梨目标检测模型。模型有效融合了卷积神经网络CNN在提取图像局部（高频）特征信息和Transformer在提取图像全局（低频）特征信息的优势，设计了Inception dilated卷积模块和自适应细节融合模块ADI，引入挤压增强轴向注意力机制SeaAttention和坐标注意力机制CA等提升模型的特征提取能力，采用傅里叶变换FT进行数据增强，使用频率斜坡结构以更好地平衡局部（高频）特征信息和全局（低频）特征信息成分，从而优化网络在特征提取过程的性能表现。试验结果表明，TC-ICSA-YOLO v8模型在验证集上的平均精度均值（mAP）、精确率、召回率和检测速度分别达到97.01%、97.33%、95.69%和81.21f/s。本文模型对夜间拍摄的图像目标检测精度也优于同等条件的YOLO v8s模型；其平均精度均值（mAP）与Faster R-CNN、YOLO v3、YOLO v7s、YOLO v8s、SwinTransformer、RT-DETR模型对比分别提高14.65、3.34、0.52、0.20、6.68、5.45个百分点；模型内存占用量分别减少74.60、34.16、9.48、16.81、20.84、13.64MB，本文模型检测精度更高，参数量更少,更有利于部署在移动端。本文提出的检测模型对香梨具有很好的目标检测效果，为复杂环境下目标检测提供参考，可为香梨自动化采摘提供技术支撑。

Abstract:

In order to improve the accuracy of pear target detection in complex environments, a TC-ICSA-YOLO v8 pear target detection model was proposed based on Transformer-CNN feature deep fusion. The proposed model effectively leveraged the strengths of convolutional neural networks (CNNs) in extracting local (high-frequency) features from images and the capabilities of Transformers in capturing global (low-frequency) features. To further enhance the model’s performance, the experimental design incorporated the Inception dilated convolution module and the adaptive detail fusion module (ADI), as well as introduced novel mechanisms such as squeeze-enhanced axial attention (SeaAttention) and coordinate attention (CA), thereby enhancing the model’s feature extraction capabilities. Data augmentation was achieved through the use of the Fourier transform (FT) method, with a frequency ramp structure employed to better balance the contributions of local (high-frequency) and global (low-frequency) feature components, optimizing the network’s performance in feature extraction. The experimental results demonstrated that the TC-ICSA-YOLO v8 model achieved a mean average precision (mAP) of 97.01%, precision of 97.33%, and recall of 95.69% on the validation set, with a detection speed of 81.21 frames per second. Compared with the YOLO v8s model under the same conditions, the TC-ICSA-YOLO v8 model demonstrated superior target detection precision for night-time images, with an improvement in mAP of 14.65, 3.34, 0.52, 0.20, 6.68, and 5.54 percentage points over Faster R-CNN, YOLO v3, YOLO v7s, YOLO v8s, SwinTransformer, and RT-DETR models, respectively. The model’s parameter count was reduced by 74.60, 34.16, 9.48, 16.81, 20.84, and 13.64MB compared with that of these models. The improved model had high detection accuracy and less number of parameters, which was favorable for deployment on mobile. The detection model proposed had good target detection effect for pear, which can provide reference for object detection in complex environment, and can provide technical support for pear automated picking.

参考文献

相似文献

引证文献

引用本文

杨瑛,谭忠,郑文轩.基于Transformer-CNN特征深度融合的复杂环境香梨目标检测[J].农业机械学报,2026,57(2):161-170. YANG Ying, TAN Zhong, ZHENG Wenxuan. Pear Object Detection in Complex Environment Based on Transformer-CNN Feature Deep Fusion[J]. Transactions of the Chinese Society for Agricultural Machinery,2026,57(2):161-170.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-10-02
最后修改日期:
录用日期:
在线发布日期: 2026-01-15
出版日期:

期刊浏览

EI收录结果

引用本文

分享

相关视频

文章指标

历史

文章二维码