Abstract:During lotus seedpod harvesting, efficient and precise detection and localization are essential for improving harvesting efficiency and minimizing the risk of mispicking. However, existing lotus seedpod recognition methods predominantly rely on computationally intensive and structurally complex deep learning models, rendering them impractical for real-time field applications. To address this limitation, a lightweight object detection and localization method optimized for lotus seedpod harvesting scenarios was proposed. The method was based on the lightweight lotus segmentation network (LLSegNet), a lightweight semantic segmentation model that employed MobileNetV2 as the backbone within the DeepLabv3+ framework. To cope with challenges such as multi-scale variation, difficulty in capturing fine details, and background interference during harvesting, key enhancement strategies, including dense atrous spatial pyramid pooling, strip pooling, convolutional block attention, and an efficient channel attention network. These improvements enhanced multi-scale feature extraction and representation while maintaining the lightweight nature of the overall model. Experiments were carried out on an Ubuntu 20.04 platform utilizing the PyTorch 2.3.1 framework for model training and evaluation. The results demonstrated that the LLSegNet model attained an average intersection over union (mIoU) of 86.1% and an average pixel accuracy (mPA) of 92.5%, with a memory footprint of 15.9 MB and a frame rate (FPS) of 73.4 f/s, all of which were superior to mainstream semantic segmentation models. Furthermore, leveraging the high-quality semantic segmentation results produced by the LLSegNet model, a harvesting point localization method that integrated image processing and skeleton analysis was proposed. This method accomplished precise localization of harvesting points through a combination of image preprocessing, skeleton extraction, geometric analysis, and normal vector expansion mapping, achieving a success rate of 88.5%. The findings demonstrated that the proposed method not only improved detection and localization accuracy but also remained computationally efficient, showing strong potential for deployment and further application in resource-constrained agricultural environments.