Abstract:Aiming to address the challenges of tomato recognition in natural environments, such as interference from complex backgrounds and difficulty in detecting adjacent fruits with similar ripeness levels, a lightweight YOLO v5s-MCA model for tomato ripeness detection was proposed. The model categorized tomato ripeness into four distinct stages: mature, turning mature, color transition, and immature. Firstly, it incorporated the MobileNetV3 network as the backbone, significantly reducing the model’s parameter count and computational requirements. Moreover, the coordinate attention (CA) mechanism was integrated into the backbone and neck networks, enhancing the model’s ability to enhance the model’s ability to represent tomato features. Additionally, the neck network was replaced with a weighted bidirectional feature pyramid network (BiFPN) to strengthen feature fusion and improve recognition accuracy. The standard convolution modules in the neck network were also replaced with GSConv convolution to reduce model complexity and enhance the ability to capture target information. Experimental evaluations revealed the superior performance of the YOLO v5s-MCA model. The model achieved a parameter count of only 2.33×106, with a computational cost of 4.1×109 and a memory footprint of just 4.83 MB. The model achieved a precision of 92.8% and a mean average precision (mAP) of 95.1%, representing improvements of 3.4 percentage points and 4.4 percentage points, respectively, compared with the baseline YOLO v5s model. To further validate the effectiveness of the YOLO v5s-MCA model, it was compared with six other models, including YOLO v3s, YOLO v5s, YOLO v5n, YOLO v7, YOLO v8n, and YOLO v10n. Among these, the YOLO v5s-MCA model outperformed its counterparts in terms of lightweight design and detection performance.