基于一致性损失的多模态食谱检索

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

中图分类号：TP391.4 文献标识码：A 文章编号：2096-4706（2025）09-0074-06

Abstract： Multimodal recipe retrieval effectively matches food images with their corresponding recipe texts. However， semantic inconsistencies between image and text modalities pose challenges to retrieval accuracy and efficiency. This paper proposes a Consistent Multimodal Hierarchical Transformers （CMHT） model， which enhances semantic consistency between food images and recipe texts in the embedding space through cross-modal and intra-modal contrastive learning. The experimental results show that the application of CMHT on the recipe dataset improves the accuracy of retrieval， proving the application potential of this method in multimodal data processing in the food field.

Keywords： intelligent agriculture; food computing; multimodal recipe retrieval; cross-modal contrastive learning; semantic consistency

0 引言

近年来，随着计算能力、数据规模的提升以及深度神经网络（Deep Neural Networks， DNNs）算法的突破，深度学习（Deep Learning）迅速发展，成为人工智能的核心技术之一。（剩余10107字）

试读结束

购买全文6.00元下一篇考虑局部权重和全局权重的加权网络方法研究

现代信息科技

2025年09期

¥18.00/本