基于连续提示注入与指针网络的农业病害命名实体识别
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家科技创新2030-“新一代人工智能”重大项目(2021ZD0113604)、财政部和农业农村部:国家现代农业产业技术体系项目(CARS-23-D07)和河北省自然科学基金项目(F2022204004)


Named Entity Recognition of Agricultural Disease Based on Continuous Prompts Injection and Pointer Network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对农业病害领域命名实体识别过程中存在的预训练语言模型利用不充分、外部知识注入利用率低、嵌套命名实体识别率低的问题,本文提出基于连续提示注入和指针网络的命名实体识别模型CP-MRC(Continuous prompts for machine reading comprehension)。该模型引入BERT(Bidirectional encoder representation from transformers)预训练模型,通过冻结BERT模型原有参数,保留其在预训练阶段获取到的文本表征能力;为了增强模型对领域数据的适用性,在每层Transformer中插入连续可训练提示向量;为提高嵌套命名实体识别的准确性,采用指针网络抽取实体序列。在自建农业病害数据集上开展了对比实验,该数据集包含2933条文本语料,8个实体类型,共10414个实体。实验结果显示,CP-MRC模型的精确率、召回率、F1值达到83.55%、81.4%、82.4%,优于其他模型;在病原、作物两类嵌套实体的识别率较其他模型F1值提升3个百分点和13个百分点,嵌套实体识别率明显提升。本文提出的模型仅采用少量可训练参数仍然具备良好识别性能,为较大规模预训练模型在信息抽取任务上的应用提供了思路。

    Abstract:

    In response to the problems of insufficient utilization of pretrained language models, low utilization of external knowledge injection, and low recognition rate of nested named entities in the process of named entity recognition in the field of agricultural diseases, a named entity recognition model continuous prompts for machine reading comprehension (CP-MRC) was proposed based on continuous prompt injection and pointer network. This model introduced the bidirectional encoder representation from transformers (BERT) pretraining model, which freezed the original parameters of the BERT model and retained its text representation ability obtained during the pretraining stage. To enhance the applicability of the model to domain data, continuous trainable hint vectors were inserted into each layer of Transformer. To improve the accuracy of nested named entity recognition, a pointer network was used to extract entity sequences. A comparative experiment was conducted on a self built agricultural disease dataset, which included 2933 text corpora, 8 entity types, and a total of 10414 entities. The experimental results showed that the accuracy, recall, and F1 values of the CP-MRC model reached 83.55%, 81.4%, and 82.4%, which was superior to other models. The recognition rate of nested entities in pathogens and crops was increased by 3 percentage points and 13 percentage points in F1 value compared with that of others, and the recognition rate of nested entities was significantly improved. The model still had good recognition performance with only a small number of trainable parameters, providing ideas for the application of large-scale pretrained models in information extraction tasks.

    参考文献
    相似文献
    引证文献
引用本文

王春山,张宸硕,吴华瑞,朱华吉,缪祎晟,张立杰.基于连续提示注入与指针网络的农业病害命名实体识别[J].农业机械学报,2024,55(6):254-261. WANG Chunshan, ZHANG Chenshuo, WU Huarui, ZHU Huaji, MIAO Yisheng, ZHANG Lijie. Named Entity Recognition of Agricultural Disease Based on Continuous Prompts Injection and Pointer Network[J]. Transactions of the Chinese Society for Agricultural Machinery,2024,55(6):254-261.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-10-31
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-06-10
  • 出版日期:
文章二维码