时空对比学习驱动的弱监督图像语义分割网络

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:计算机视觉;语义分割;弱监督学习;类激活图;视觉变换器;对比学习中图分类号:TP391 文献标识码:Adoi:10.37188/OPE.20263401.0150 CSTR:32169.14.OPE.20263401.0150

Weakly supervised image semantic segmentation network driven by spatio-temporal contrastive learning

LIANG Zhen 1,2 ,HU Yanzhu 1,2* , YANG Yang1,2

(1.School of Intelligent Engineering and Automation,Beijing Uniuersity of Posts and Telecommunications, Beijing lOo876, China; 2. Key Laboratory ofIoTMonitoring and Early Warning,Ministry ofEmergency Management, Beijing University of Posts and Telecommunications,Beijing lOo876,China) * Corresponding author,E-mail: bupt_automation_safety_yzhu@bupt. edu. cn

Abstract: Existing image-level weakly supervised semantic segmentation methods based on Vision Transformer(ViT)primarily rely on self-attention to extract limited semantic information and often fail to fully exploit multi-dimensional feature relationships,resulting in coarse target region identification. To address this limitation,a Spatio-temporal Contrastive Learning network (STCL) is proposed to improve segmentation accuracy by mining supervisory signals from both spatial and temporal perspectives. Specifically,a spatial feature contrastive learning module is introduced based on ViT token representations, integrating patch-level and class-level token contrastive strategies to capture implicit semantic relationships in image space.In addition,a temporal context contrastive learning module is developed,in which a memory bank is leveraged to incorporate prior knowledge from historical images to guide current segmentation,together with a memory bank update strategy and an adaptive memory contrastive loss to enhance discrimination of fine-grained regions. STCL achieves a mean Intersection over Union(mIoU) of 72.7% on PASCAL VOC and 43.6% on MS COCO,demonstrating superior performance.

Key words: computer vision; semantic segmentation; weakly supervised learning; class activation map; vision transformer; contrastive learning

1引言

语义分割主要研究计算机视觉中的像素级目标识别。(剩余26473字)

monitor
客服机器人