基于曼哈顿注意力机制的水稻病虫害命名实体识别
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(U21A2019、62476081)、黑龙江省自然科学基金联合引导项目(LH2024F048)和黑龙江省省属高等学校基本科研业务费科研项目(ZRCQC202403)


Named Entity Recognition for Rice Pest and Disease Based on Manhattan Attention Mechanism
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    水稻病虫害信息多源自非结构化文本,具有嵌套实体密集、句式冗长、语法结构复杂的特点,使得现有命名实体识别方法难以充分识别相关实体。针对该问题,提出AgriRoBERTa-BiLSTM-Man-CRF模型。构建农业领域预训练语料库和水稻病虫害命名实体识别标注数据集,为模型训练提供高质量数据支撑;在农业领域语料上使用全词掩码策略二次预训练RoBERTa,使其能够专注中文词语整体语义并学习水稻病虫害文本的语言模式,增强模型对中文水稻病虫害专业术语和语义捕捉能力;引入曼哈顿注意力机制,利用L1距离捕捉高维空间中稀疏特征并量化特征差异,精准聚焦关键上下文信息,进一步提升实体边界识别精度。试验结果表明,所提算法实体识别F1值达90.69%,精确率为87.87%,召回率为93.71%。模型F1值较BiLSTM-CRF、BiLSTM-Attention-CRF、BERT-BiLSTM-CRF和IDCNN-CRF 4个常规模型分别提高7.8 、9.99 、1.8 、15.9个百分点,可更有效地识别水稻病虫害文本中各类实体。

    Abstract:

    Rice pest and disease information mostly originates from unstructured text. These texts contain densely nested entities, lengthy sentences, and complex grammatical structures. Because of this, current named entity recognition (NER) methods struggle to fully identify the relevant entities. To solve this problem, the AgriRoBERTa-BiLSTM-Man-CRF model was proposed. Firstly, a pre-trained corpus in the agricultural domain and a labeled dataset for rice pest and disease named entity recognition were constructed. This provided high-quality data for model training. Secondly, pre-training RoBERTa on agricultural texts was continued by using whole-word masking. This approach enabled the model to focus on the complete meaning of Chinese words and learn the specific language patterns found in texts about rice diseases and pests. Finally, Manhattan attention mechanism was introduced to capture sparse features in high-dimensional space by using L1-distance. This approach quantified feature differences while precisely focusing on critical contextual information, so as to improve the accuracy of entity boundary recognition. Experimental results showed that the proposed algorithm achieved an F1 score of 90.69%, with a precision of 87.87% and a recall of 93.71% for entity recognition. The F1 score was 7.8, 9.99, 1.8, 15.9 percentage points higher than that of four conventional models: BiLSTM-CRF, BiLSTM-Attention-CRF, BERT-BiLSTM-CRF and IDCNN-CRF. This enhanced performance enabled more effective recognition of diverse entities in rice pest and disease texts. This significant improvement indicated that the model can recognize various entities in rice pest and disease texts more effectively.

    参考文献
    相似文献
    引证文献
引用本文

路阳,高鹏飞,李昊岩,石业欣,王鹏.基于曼哈顿注意力机制的水稻病虫害命名实体识别[J].农业机械学报,2026,57(8):289-298,307. LU Yang, GAO Pengfei, LI Haoyan, SHI Yexin, WANG Peng. Named Entity Recognition for Rice Pest and Disease Based on Manhattan Attention Mechanism[J]. Transactions of the Chinese Society for Agricultural Machinery,2026,57(8):289-298,307.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-07-15
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-04-15
  • 出版日期:
文章二维码