Abstract:To solve the challenges of detecting and counting Camellia oleifera fruits with multiple occlusions in complex environments, a detection model was proposed based on a dual-backbone network and a consecutive attention feature fusion module (CAFF). The dual-backbone network combined the advantages of two different backbone networks to achieve efficient extraction of different features. In addition, a dual-input single-output CAFF module was designed. This CAFF module replaced the traditional concat operation and optimizes the fusion strategy for multi-scale feature information. In order to strike a balance between model precision and size, the ghost convolution (Ghostconv) module was used, and the spatial pyramid pooling fast (SPPF) layer was removed. It accelerated training time and reduced the number of parameters. The improved YOLO dual-backbone & consecutive attention feature fusion & lightweight (YOLO-DCL) model performed well on all kinds of occlusion detection tasks, with a mean average precision (mAP) of 92.7%, precision of 90.7%, and recall of 84.9%, while the model size was only 5.7 MB. Compared with the YOLO v8n model, it increased 4.0 percentage points of mAP, 8.6 percentage points of precision, and 2.3 percentage points of recall. At the same time, the model size was decreased by 9.5%. Besides, the model incorporated the ability to automatically count Camellia oleifera fruits with occlusion categories, which can reduce labor costs and improve the accuracy of yield estimation. It was very suitable for deployment in complex environments.