基于三分支网络的实时图像语义分割

打开文本图片集
关键词:语义分割;深度学习;实时性;注意力机制;多尺度特征中图分类号:TP391.41 文献标识码:Adoi:10.37188/OPE.20263401.0167 CSTR:32169.14.OPE.20263401.0167
Real-time image semantic segmentation based on three-branchnetwork
REN Fenglei 1,2 , GAO Ziyang 1,2 , ZHANG Yan 1,2 , ZHOU Haibo 1,2* ,YANGLu 1,2 ,QIN Zhichang 1,2
(1. Tianjin Key Laboratory for Aduanced Mechatronic System Design and Intelligent Control, Tianjin University ofTechnology, Tianjin 3OO384,China; 2.National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology,Tianjin 3OO384,China) * Corresponding author, E -mail : haibo_zhou@163. com
Abstract: To meet the stringent requirements for both accuracy and real-time performance in applications such as autonomous driving,a real-time image semantic segmentation algorithm based on a triple-branch network is proposed to achieve a favorable balance between segmentation accuracy and inference speed. Inspired by PIDNet,a triple-branch architecture is designed to extract fine-grained detail information,semantic contextual information,and edge cues from the input image,respectively. An efficient pyramid pooling module is integrated into the semantic context branch to capture multi-scale contextual information and enlarge the network receptive field. In adition,a lightweight multi-scale channel interaction attention mechanism is introduced into both the detail and edge branches to enhance feature representations.Features from the three branches are subsequently fused,and a semantic segmentation head is employed to produce the final result. The proposed network achieves 79.2% mIoU at 88.5 frame/s on the Cityscapes dataset and 80.5% mIoU at 14O.1 frame/s on the CamVid dataset. Experimental results demonstrate that the proposed method performs semantic segmentation efficiently,providing an efective trade-off between real-time performance and accuracy while significantly outperforming existing baseline methods.
Key words: semantic segmentation;deep learning;real time; atention mechanism; multi-scale features
1引言
语义分割作为计算机视觉领域的一项关键任务,其核心目标是对图像的所有像素进行精确分类,并赋予相应的标签[1]。(剩余16134字)