基于单目相机的复杂场景深度估计网络

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:单目深度估计;U型编码器-解码器;逐层扩张卷积;特征交互模块;对称式Transformer解码器中图分类号:TP183 文献标志码:A 文章编号:1008-0562(2025)04-0505-08

A complex scene depth estimation network based on monocular camera CHEN Zhanguo',CHEN Zhenjun',XUEChenxia',WANGGuoliang', LIJinyi',LIYuting',YUBaocai²* (1.National Energy Group Baorixile Mineral Resources,Hulunbuir O21ooo, China; 2.OrdosResearch Institute,Liaoning Technical University,Ordos 017oo0,China)

Abstract:To improve the depth estimation accuracy in complex and changeable scenes,anew monocular depth estimation network based on U-shaped encoder-decoder isproposed.The Swin Transformer architecture is adoptedas thecoreofthe encoder torealize fine-grained featureextractionof input data at multiple levels and scales.Multi-scale local features are extracted by using layer-by-layer dilated convolution.The local and global features are interacted through the feature interaction module to achieve a more comprehensive understanding of complex scenes.A symmetric transformer decoder is adopted and combined with an image patch expansion layer to reshape the feature map ofadjacent dimensions intoa feature mapwith higher resolution.Eventually,pixellevel depth prediction is output. Quantitative experiments are conducted on the NYU Depth v2 dataset and the KITTI dataset.The research results show that this network has high efficiencyand practicability in complex and changeable scenes.The research conclusion breaks through the limitations of traditional methods in complex and changeable scenes and provides new perspectives and methodologies for the theoretical research of depth estimation.

Keywords:monocular depth estimation; U-shapedencoder-decoder;layer-by-layer dilatedconvolution; feature interactionmodule;symmetrictransformerdecoder

0引言单目深度估计是一种通过单个摄像头拍摄的图像来推断场景中各物体深度信息的技术。(剩余11110字)

试读结束

monitor