Abstract:In response to the challenges of low harvesting accuracy due to difficulties in the maturity recognition and pose estimation of dragon fruits in complex orchard environments, a lightweight and efficient OWD-YOLO real-time detection model was proposed based on YOLO 11n-pose to achieve precise and efficient harvesting of dragon fruits. Firstly, reparameterized convolution was introduced into the base model, combined with the C3K2 module, to enhance the model's ability to extract multi-scale features and fine-grained pose details of the dragon fruit. Secondly, by incorporating wavelet pooling and large kernel convolution attention mechanisms into the SPPF module, the model reduced interference from environmental factors such as lighting variations and background occlusion, thereby improving detection accuracy. Additionally, a DGECA attention mechanism was introduced into the backbone network to enhance the model's ability to recognize key features such as the fruit skin color and texture, improving the accuracy of maturity classification. Finally, a six-degree-of-freedom robotic arm harvesting platform based on the OWD-YOLO model was deployed in a complex orchard environment, with three-dimensional pose estimation of the dragon fruit achieved via a depth camera. Field experiments demonstrated that the OWD-YOLO achieved an object detection precision of 88.0%, a mean average precision of 92.7%, and a keypoint mean average precision of 93.3%, with absolute improvements of 5.0, 2.4, and 2.0 percentage points over the baseline, respectively. The average frame rate was 58.7 f/s, with a single fruit harvesting success rate of 86.0%, and an average harvesting time of 29.4 s. These results met the requirements for precise mechanized harvesting in complex orchard environments.