基于mBART的农作物命名实体规范化研究
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(62272188)、中央高校基本科研业务费专项资金项目(2662021JC008)和内蒙古自治区科技重大专项(2021ZD0046)


Crop Named Entity Normalization Based on mBART
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    由于地域、文化差异,农业文本中实体名称混乱,使得自动识别和提取信息变得复杂,限制了农业信息化发展。为提高农业信息提取效率,本文提出了基于mBART的农业命名实体规范化方法mJoint。首先,基于农业领域专家的知识经验,构建了一个以农作物为主的农业文本数据集,涵盖豆类、谷物和油料三大农作物,共包含22440条高质量的农业标注数据。其次,农业实体规范化问题涉及农业非规范化实体的检测与识别2个问题,本文提出基于mBART的统一生成式框架来联合检测、识别出农业非规范实体,直接完成农业命名实体规范化任务。为了提高农业实体规范化效果,在模型中额外引入农业非规范实体检测和农业非规范实体识别2个辅助任务。最后,在提出的农作物数据集上进行大量实验,结果表明,本文提出的mJoint在农业命名实体规范化任务上的P、R与F1值都达到0.99以上,相较于其他对比方法,各项指标均为最优。与大语言模型相比,本文提出的方法同样具有显著优势。

    Abstract:

    Due to geographical or cultural differences, the entity names in agricultural texts are confused, which makes automatic identification and extraction of information complicated and limits the development of agricultural informatization. In view of this, an agricultural entity normalization method based on mBART was proposed. Firstly, based on the knowledge and experience of experts in the agricultural field, a crop-oriented agricultural text dataset was constructed, covering the three major crops of “legumes”, “cereals” and “oil crops”, with a total of 22440 pieces of high-quality agricultural labeling data. Secondly, the problem of agricultural entity normalization involved the detection and identification of non-normalized agricultural entities. A unified generative framework was proposed based on mBART to jointly detect and identify agricultural non-normalized entities and directly complete the task of normalizing agricultural named entities. Furthermore, in order to improve the normalization effect of agricultural entities, auxiliary tasks of agricultural non-normalized entity detection and agricultural non-normalized entity recognition were additionally introduced into the model. Finally, extensive experiments were conducted on the proposed crop dataset. The results showed that the proposed method achieved P, R, and F1 above 0.99 in the task of agricultural entity normalization, and all indexes were optimal compared with other methods. Compared with the large language models, the proposed method also had significant advantages.

    参考文献
    相似文献
    引证文献
引用本文

胡玉雪,黄仲强,王同官,苏东宇,申余丰,沙灜.基于mBART的农作物命名实体规范化研究[J].农业机械学报,2025,56(7):558-566. HU Yuxue, HUANG Zhongqiang, WANG Tongguan, SU Dongyu, SHEN Yufeng, SHA Ying. Crop Named Entity Normalization Based on mBART[J]. Transactions of the Chinese Society for Agricultural Machinery,2025,56(7):558-566.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-14
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-07-10
  • 出版日期:
文章二维码