多模态作物表型数据分布式存取方法研究
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2022YFD2002303-03)和北京市乡村振兴项目(NY2401040425)


Distributed Access Method for Multimodal Crop Phenotypic Data
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    高通量作物表型采集设备的飞速发展为育种与栽培研究提供了现代化数据采集手段,同时催生了海量多模态、非结构化表型数据。传统结构化数据存储模式已难以满足此类数据的高效存取需求。为此,提出基于分布式技术的混合存取框架,利用HBase和HDFS构建结构化和非结构化融合存储引擎,集成客户端缓存和Redis缓存设计高效检索机制,并对核心问题进行了优化:针对原生HDFS存储表型数据的固有缺陷,设计基于模态聚合的MCH存储框架,通过将表型数据按模态分类合并存储,并使用双层哈希技术构建局部索引,有效降低NameNode内存压力,同时提高单模态数据的访问效率与存储空间利用率。面向高并发数据读取场景,构建基于数据热度的双层缓存机制,通过元数据分层缓存优化热点数据读取效率,创新提出结合访问频率和时间特性的数据热度评估模型,有效提高缓存命中率。试验结果表明:本文提出的分布式存取方法在数据为1.0×105份时,NameNode内存占用量较最佳原生方案(SequenceFile)降低31.2%,检索时间较最佳原生方案(MapFile)降低25.4%,为海量多模态表型数据的存储和检索提供了技术支撑。

    Abstract:

    The rapid development of high-throughput crop phenotyping acquisition equipment has provided modern data collection means for breeding and cultivation research, while spawning massive multi-modal and unstructured phenotypic data. Traditional structured data storage models can no longer meet the efficient access requirements of such data.A hybrid access framework was proposed based on distributed technology, which used HBase and HDFS to build a structured and unstructured fusion storage engine, integrated client-side cache and Redis cache to design an efficient retrieval mechanism, and optimized core issues: aiming at the inherent defects of native HDFS in storing phenotypic data, a modal aggregation-based MCH storage framework was designed. By classifying and merging phenotypic data according to modalities and constructing local indexes by using double-layer hashing technology, it effectively reduced NameNode memory pressure while improving access efficiency and storage space utilization of single-modal data. For high-concurrency data reading scenarios, a double-layer cache mechanism based on data popularity was constructed. It optimized hot data reading efficiency through metadata hierarchical caching and innovatively proposed a data popularity evaluation model combining access frequency and time characteristics, which effectively improved cache hit rate. Experimental results showed that when the data scale was 1.0×105, the proposed distributed access method reduced the NameNode memory occupancy rate by 31.2% compared with the optimal native solution (SequenceFile), and the retrieval time by 25.4% compared with the optimal native solution (MapFile), providing technical support for the storage and retrieval of massive multi-modal phenotypic data.

    参考文献
    相似文献
    引证文献
引用本文

郝子超,赵向宇,潘守慧,刘东明,王开义.多模态作物表型数据分布式存取方法研究[J].农业机械学报,2026,57(1):51-61. HAO Zichao, ZHAO Xiangyu, PAN Shouhui, LIU Dongming, WANG Kaiyi. Distributed Access Method for Multimodal Crop Phenotypic Data[J]. Transactions of the Chinese Society for Agricultural Machinery,2026,57(1):51-61.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-10-16
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-01-01
  • 出版日期:
文章二维码