Abstract:Currently, the determination of the optimal fertilization time for rice relies heavily on a combination of traditional experience and manual field inspection, which struggles to meet the demands of modern agricultural intelligence. In response, a method for rice fertilization period discrimination was introduced based on a multi-modal knowledge graph, integrating textual experiential information and visual cues for determining the fertilization period. Initially, a single-modal knowledge graph for rice fertilization was constructed. On this basis, cross-modal feature phrases corresponding to the four fertilization periods (re-greening, tillering, heading, and grain-filling) were extracted by using dependency syntax analysis. These phrases were then combined with the Chinese CLIP model to determine their match with images and their respective weights for the fertilization periods, forming new triplets with cross-modal nodes. This led to the creation of a multi-modal rice fertilization knowledge graph. Subsequently, the multi-modal knowledge graph was used to calculate the comprehensive matching degree of input information, and field-collected images were utilized for cross-validation. This process comprehensively evaluated the accuracy and stability of the discrimination method, thereby determining the decision thresholds for each fertilization period. The discrimination methods accuracy was tested by using 600 images captured on the day of each fertilization period and five days before and after. Results showed that the overall accuracy rate of the multi-modal knowledge graph-based rice fertilization period discrimination method was 86.2%, with the highest accuracy rate of 90.1% during the grain-filling period. By utilizing both textual and visual modalities, this method enhanced information utilization and demonstrated discriminative capability in real-world scenarios, offering a reference for the automated determination of rice fertilization periods.