Abstract:The rapid development of China’s protected horticulture industry has led to a surge in demand for intelligent knowledge services. However, the current fragmented and loosely connected knowledge systems, along with imprecise and inefficient knowledge service methods, pose significant challenges in guiding production. Moreover, practitioners’ descriptions of issues are often incomplete, further complicating the resolution of protected horticulture problems. To address these issues in protected horticulture production, integrating knowledge graph (KG) and large language model (LLM) were proposed to create a multi-source knowledge-enhanced question-answering model. Initially, a knowledge dataset for protected horticulture was constructed, encompassing over 60 commonly cultivated categories in protected horticulture and containing nearly 1.5 million words. Through semantic segmentation, totally 26349 textual blocks were obtained and stored in a vector database. Additionally, textual knowledge related to production techniques was extracted from the dataset to construct a knowledge graph. Concurrently, a semantic information enhancement model was proposed based on KG entity matching. Subsequently, a retrieval-augmented generation method was designed, in which the KG and related textual information were input into the prompt template to improve the LLM’s problem-analysis capabilities. Furthermore, to enhance its adaptability in the field of protected horticulture, the LLM was fine-tuned on relevant question-answering corpora by using low-rank adaptation (LoRA) method. Based on this, a multi-source knowledge-enhanced LLM (named PengKGPT) was developed to reason and respond to issues in protected horticulture production. Finally, the case studies revealed that PengKGPT attained score and accuracy rates of 91.2% and 82.10%, respectively, marking improvements of 36.6 and 32.53 percentage points compared with the base model. This enhancement significantly augmented the large language models analytical capabilities for questions in vertical domains. When benchmarked against classic commercial models such as ERNIE 4.0 Turbo and GPT-4o, PengKGPT demonstrated increases of 10.2 and 14 percentage points in score rate, along with improvements of 10.4 and 12.69 percentage points in accuracy rate, respectively. These results indicated that PengKGPT exhibited superior professionalism and reliability in addressing challenges within protected horticulture production. The results indicated that this approach can provide auxiliary support for protected horticulture production.