[1]张喻铭,周武杰,叶绿.基于自蒸馏和双模态的室内场景解析算法[J].浙江科技学院学报,2024,(03):218-227270.[doi:10.3969/j.issn.1671-8798.2024.03.004 ]
 ZHANG Yuming,ZHOU Wujie,YE L.Indoor scene parsing method based on self-distillation and dual-mode[J].,2024,(03):218-227270.[doi:10.3969/j.issn.1671-8798.2024.03.004 ]
点击复制

基于自蒸馏和双模态的室内场景解析算法(/HTML)
分享到:

《浙江科技学院学报》[ISSN:1001-3733/CN:61-1062/R]

卷:
期数:
2024年03期
页码:
218-227270
栏目:
出版日期:
2024-06-28

文章信息/Info

Title:
Indoor scene parsing method based on self-distillation and dual-mode
文章编号:
1671-8798(2024)03-0218-10
作者:
张喻铭周武杰叶绿
(浙江科技大学 信息与电子工程学院,杭州 310023)
Author(s):
ZHANG Yuming ZHOU Wujie YE Lü
(School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, Zhejiang, China)
关键词:
室内场景解析 自蒸馏 多级级联 双模态
分类号:
TP389.1
DOI:
10.3969/j.issn.1671-8798.2024.03.004
文献标志码:
A
摘要:
【目的】为了使室内机器人能准确地识别室内不同类别的物体,从而选择更安全可行的路线,提出一种用于室内场景解析的基于自蒸馏和双模态的自蒸馏多级级联网络(self-distillation multi-stage cascaded network, SMCNet)。【方法】首先,使用分割变换器(segmentation transformer, SegFormer)作为骨干网络以双流的方式分别提取三色图(red green blue, RGB)和深度图中的特征信息,得到4组特征输出; 其次,设计了特征增强模块(feature enhancement module, FEM),将这四组特征进行特征增强后分组融合,以充分提取双模态特征中的有用信息并充分交融; 最后,设计了自蒸馏监督模块(self-distillation supervision module, SSM),通过自蒸馏方法将高层特征中的有价值信息传递到低层特征中,并设计了多级级联监督模块(multi-stage cascaded supervision module, MCSM)进行跨层监督,得到最终的预测图。【结果】在室内场景双模态数据集纽约大学深度版本2(New York University Depth version 2, NYUDv2)和场景理解彩色-深度(scene understanding red green blue-depth, SUN RGB-D)上,相比已有的方法,本研究提出的模型在相同条件下得到的结果超过其他方法,均值交并比(mean intersection over union, MIoU)在NYUDv2和SUN RGB-D两个数据集上分别达到了57.3%和53.1%。【结论】SMCNet能比较准确地解析出室内场景中不同类别的物体,可为室内机器人获取室内视觉信息提供一定的技术支撑。

参考文献/References:

[1] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(4):640.
[2] HOPFIELD J J. Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the National Academy of Sciences,1982,79:2554.
[3] 姬壮伟.基于深度全卷积神经网络的图像识别研究[J].山西大同大学学报(自然科学版),2022,38(2):27.
[4] HAZIRBAS C, MA L N, DOMOKOS C, et al. FuseNet:incorporating depth into semantic segmentation via fusion-based CNN architecture[C]//Asian Conference on Computer Vision. Taipei:Springer,2017:213.
[5] YANG E, ZHOU W J, QIAN X H, et al. MGCNet:multilevel gated collaborative network for RGB-D semantic segmentation of indoor scene[J]. IEEE Signal Processing Letters,2022,29:2567.
[6] WU P, GUO R Z, TONG X Z, et al. Link-RGBD:cross-guided feature fusion network for RGBD semantic segmentation[J]. IEEE Sensors Journal,2022,22(24):24161.
[7] JIANG J D, ZHENG L N, LUO F, et al. RedNet:residual encoder-decoder network for indoor RGB-D semantic segmentation[EB/OL].(2018-08-06)[2023-10-25]. https://arxiv.org/abs/1806.01054.
[8] CHEN L Z, LIN Z, WANG Z Q, et al. Spatial information guided convolution for real-time RGBD semantic segmentation[J]. IEEE Transactions on Image Processing, 2021,30:2313.
[9] CAO J M, LENG H C, LISCHINSKI D, et al. ShapeConv:shape-aware convolutional layer for indoor RGB-D semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision. Montreal:IEEE, 2021:7088.
[10] ZHOU W J, YANG E Q, LEI J S, et al. PGDENet:progressive guided fusion and depth enhancement network for RGB-D indoor scene parsing[J]. IEEE Transactions on Multimedia,2022,25:3483.
[11] CHEN X K, LIN K Y, WANG J B, et al. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation[C]//European Conference on Computer Vision. Glasgow:Springer,2020:561.
[12] ZHOU W, YUE Y, FANG M, et al. BCINet:bilateral cross-modal interaction network for indoor scene understanding in RGB-D images[J]. Information Fusion,2023,94:32.
[13] 徐高,周武杰,叶绿.基于边界-图卷积的机器人行驶路障场景解析[J].浙江科技学院学报,2023,35(5):402.
[14] 李成豪,张静,胡莉,等.基于多尺度感受野融合的小目标检测算法[J].计算机工程与应用,2022,58(12):177.
[15] ZHANG L F, BAO C L, MA K. Self-distillation:owards efficient and compact neural networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(8):4388.
[16] 郑云飞,王晓兵,张雄伟,等.基于金字塔知识的自蒸馏HRNet目标分割方法[J].电子学报,2023,51(3):746.
[17] AN S, LIAO Q M, LU Z Q, et al. Efficient semantic segmentation via self-attention and self-distillation[J]. IEEE Transactions on Intelligent Transportation Systems,2022,23(9):15256.
[18] XIE E Z, WANG W H, YU Z D, et al. SegFormer:simple and efficient design for semantic segmentation with transformers[J]. Advances in Neural Information Processing Systems,2021,34:12077.
[19] WANG Y K, HUANG W B, SUN F C, et al. Deep multimodal fusion by channel exchanging[C]//Conferences on Neural Information Processing Systems. Vancouver:NeurIPS, 2020:4835.
[20] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL].(2015-03-09)[2023-10-25]. https://arxiv.org/abs/1503.02531.
[21] MILLETARI F, NAVAB N, AHMADI S A. V-Net:fully convolutional neural networks for volumetric medical image segmentation[C]//2016 Fourth International Conference on 3D Vision. Stanford:IEEE,2016:565.
[22] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from rgbd images[C]//Conference on Computer Vision. Florence:Springer,2012:746.
[23] XIAO J X, OWENS A, TORRALBA A. Sun3D:a database of big spaces reconstructed using SfM and object labels[C]//International Conference on Computer Vision. Sydney:IEEE,2013:1625.

备注/Memo

备注/Memo:
收稿日期:2023-10-28
基金项目:国家重点研发计划项目(2022YFEO196000); 国家自然科学基金项目(62371422)
通信作者:周武杰(1983— ),男,浙江省临海人,副教授,博士,主要从事人工智能和视觉大数据研究。E-mail:wujiezhou@163.com。
更新日期/Last Update: 2024-06-28