您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (2): 71-77.doi: 10.6040/j.issn.1672-3961.0.2024.164

• 机器学习与数据挖掘 • 上一篇    

基于实例判别与特征增强的单图三维模型检索

刁振宇1,2,韩小凡1,2,张承宇1,2,聂慧佳1,2,赵秀阳1,2,牛冬梅1,2*   

  1. 1. 山东省泛在智能计算重点实验室(筹), 山东 济南 250022;2.济南大学信息科学与工程学院, 山东 济南 250022
  • 发布日期:2025-04-15
  • 作者简介:刁振宇(1998— ),男,山东枣庄人,硕士研究生,主要研究方向为三维模型表示、三维模型检索. E-mail:dzy10242023@163.com . *通信作者简介:牛冬梅(1988— ),女,山东泰安人,副教授,硕士生导师,博士,主要研究方向为三维模型处理. E-mal:ise_niudm@ujn.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62102163);山东省高等学校青年创新团队发展计划资助项目;山东省科技型中小企业创新能力提升工程资助项目(2023TSGCO244)

Single image 3D model retrieval based on instance discrimination and feature enhancement

DIAO Zhenyu1,2, HAN Xiaofan1,2, ZHANG Chengyu1,2, NIE Huijia1,2, ZHAO Xiuyang1,2, NIU Dongmei1,2*   

  1. 1. Shandong Provincial Key Laboratory of Ubiquitous Intelligent Computing, Jinan 250022, Shandong, China;
    2. School of Information Science and Engineering, University of Jinan, Jinan 250022, Shandong, China
  • Published:2025-04-15

摘要: 为减小图像检索三维模型算法中图像域和模型域间的模态差距,提出一种由4个模块组成的神经网络算法模型。数据交换模块通过一定概率交换图像和三维模型数据,使图像域网络具有模型域特征学习能力,模型域网络具有图像域特征学习能力,初步减小模态差距。特征对齐模块有实例样本判别损失函数和图像模型配对损失函数,进一步对齐图像域和模型域。实例判别损失函数将每个实例视为独立个体类,对其进行分类,使相同实例的图像和三维模型的特征相似。图像模型配对模块旨在拉近相同实例的图像和三维模型,推远不同实例的图像和三维模型。基于对比学习在图像域中增加特征增强模块,提高图像域内特征区分性。试验结果表明,提出的算法在3个常见数据集Pix3D、 CompCars和StanfordCars上取得良好效果,检索精度较现有经典方法提高4.5%。实现图像域和三维模型域对齐,减小模态差距,提高图像检索三维模型精度。

关键词: 三维模型检索, 度量学习, 对比学习, 多模态, 跨模态检索

中图分类号: 

  • TP183
[1] WU Peng, LU Xiankai, SHEN Jianbing, et al. Clip fusion with bi-level optimization for human mesh reconstruction from monocular videos[C] //Proceeding of the 31st ACM International Conference on Multimedia. New York, USA: ACM, 2023:105-115.
[2] QIN Zheyun, HAN Cheng, WANG Qifan, et al. Unified 3D segmenter as prototypical classifiers[C] //Proceeding of the 37th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2023:46419-46432.
[3] LIU Anan, ZHANG Chenyu, LI Wenhui, et al. Self-supervised auxiliary domain alignment for unsupervised 2D image-based 3D shape retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12): 8809-8821.
[4] LI Tianbao, SU Yuting, SONG Dan, et al. Progressive fourier adversarial domain adaptation for object classification and retrieval[J]. IEEE Transactions on Multimedia, 2024, 26: 4540-4553.
[5] SONG Dan, YANG Yuanxiang, LI Wenhui, et al. Adaptive semantic transfer network for unsupervised 2D image-based 3D model retrieval[J]. Computer Vision and Image Understanding, 2024, 240(3): 1077-3142.
[6] DAI Yongxing, LIU Jun, SUN Yifan, et al. IDM: an intermediate domain module for domain adaptive person re-id[C] //Proceeding of the 20th International Conference on Computer Vision. Piscatawa, USA: IEEE, 2021: 11844-11854.
[7] FU Huan, LI Shunming, JIA Rongfei, et al. Hard example generation by texture synthesis for cross-domain shape similarity learning[C] //Proceeding of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2020: 14675-14687.
[8] GRABNER A, ROTH P M, LEPETIT V. Location field descriptors: single image 3D model retrieval in the wild[C] //Proceeding of the 9th International Conference on 3D Vision. Quebec, Canada: IEEE, 2019: 583-593.
[9] LIN Mingxian, YANG Jie, WANG He, et al. Single image 3D shape retrieval via cross-modal instance and category contrastive learning[C] //Proceeding of the 20th International Conference on Computer Vision. Piscataway, USA: IEEE, 2021: 11385-11395.
[10] HE Kaiming, FAN Haoqi, WU Yuxing, et al. Momentum contrast for unsupervised visual representation learning[C] //Proceeding of the 33th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2020: 9726-9735.
[11] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C] //Proceeding of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2020: 18661-18673.
[12] WU Z R, SONG S R, KHOSLA A, et al. 3D shapenets: a deep representation for volumetric shapes[C] //Proceeding of the 28th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2015: 1912-1920.
[13] FURUYA T, OHBUCHI R. Deep aggregation of local 3D geometric features for 3D model retrieval[C] //Proceeding of the 2016 British Machine Vision Conference. York, UK: BMVC Press, 2016: 920 - 928.
[14] QI C R, SU H, NIEßNER M, et al. Volumetric and multi-view CNNs for object classification on 3D data[C] //Proceeding of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 5648-5656.
[15] MATURANA D, SCHERER S. Voxnet: A 3d convolutional neural network for real-time object recognition[C] //Proceeding of the 2015 international conference on intelligent robots and systems. Piscataway, USA: IEEE, 2015: 922-928.
[16] QI C R, SU H, KAICHUN M, et al. Pointnet: deep learning on point sets for 3D classification and segmentation[C] //Proceeding of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2017: 77-85.
[17] QI C R, LI Y, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space[C] //Proceeding of the 31st International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2017: 5105-5114.
[18] MA Xu, QIN Can, YOU Haoxuan, et al. Rethinking network design and local geometry in point cloud: a simple residual MLP framework[C] //Proceeding of the 10th International Conference on Learning Representations. New York, USA: Curran Associates Inc., 2022: 661-673.
[19] WANG Yue, SUN Yongbin, LIU Ziwei, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 1-14.
[20] SU J C, GADELHA M, WANG R, et al. A deeper look at 3D shape classifiers[C] //Proceeding of the 15th European conference on computer vision. Heidelberg, Germany: Springer-Verlag, 2018: 645-661.
[21] FENG Yifan, ZHANG Zizhao, ZHAO Xibin, et al. GVCNN: group-view convolutional neural networks for 3D shape recognition[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 264-272.
[22] SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C] //Proceeding of the 15th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2015: 945-953.
[23] ISAAC-MEDINA B, WILLCOCKS C, BRECKON T. Multi-view vision transformers for object detection[C] //Proceeding of the 26th International Conference on Pattern Recognition. Piscataway, USA: IEEE, 2022: 4678-4684.
[24] NIE Weizhi, ZHAO Yue, NIE Jie, et al. CLN: cross-domain learning network for 2D image-based 3D shape retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 992-1005.
[25] LIU Anan, GUO Fubin, ZHOU Heyu, et al. Domain-adversarial-guided siamese network for unsupervised cross-domain 3-D object retrieval[J]. IEEE Transactions on Cybernetics, 2022, 52(12): 13862-13873.
[26] ZHOU Heyu, LIU Anan, NIE Weizhi. Dual-level embedding alignment network for 2D image-based 3D object retrieval[C] //Proceeding of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1667-1675.
[27] GRABNER A, ROTH P M, LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 3022-3031.
[28] SUN Xingyuan, WU Jiajun, ZHANG Xiuming, et al. Pix3D: dataset and methods for single-image 3D shape modeling[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 2974-2983.
[29] WANG Yaming, TAN Xiao, YANG Yi, et al. 3D pose estimation for fine-grained object categories[C] //Proceeding of of 16th European Conference on Computer Vision. Heidelberg, Germany: Springer-Verlag, 2019: 619-632.
[30] KRAUSE J, STARK M, DENG J, et al. 3D Object representations for fine-grained categorization[C] //Proceeding of the 14th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2013: 554-561.
[31] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C] //Proceeding of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 770-778.
[32] AUBRY M, RUSSELL B C. Understanding deep features with computer-generated imagery[C] //Proceeding of the 15th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2015: 2875-2883.
[33] XUE Le, GAO M, MARTIN M R, et al. Ulip: learning a unified representation of language, images, and point clouds for 3d understanding[C] //Proceeding of the 36th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2023:1179-1189.
[1] 谭智方,董飞,卢鹏宇,潘嘉男,聂秀山,尹义龙. 基于跨模态注意力哈希学习的视频片段定位方法[J]. 山东大学学报 (工学版), 2025, 55(1): 58-65.
[2] 李伟豪,王苹苹,许万博,魏本征. 结构先验引导的多模态腰椎MRI图像分割算法[J]. 山东大学学报 (工学版), 2025, 55(1): 66-76.
[3] 聂秀山,巩蕊,董飞,郭杰,马玉玲. 短视频场景分类方法综述[J]. 山东大学学报 (工学版), 2024, 54(3): 1-11.
[4] 郑顺,王绍卿,刘玉芳,李可可,孙福振. 基于动态掩码和多对对比学习的序列推荐模型[J]. 山东大学学报 (工学版), 2023, 53(6): 47-55.
[5] 于艺旋,杨耕,耿华. 连续复合运动的多模态层次化关键帧提取方法[J]. 山东大学学报 (工学版), 2023, 53(2): 42-50.
[6] 杨霄,袭肖明,李维翠,杨璐. 基于层次化双重注意力网络的乳腺多模态图像分类[J]. 山东大学学报 (工学版), 2022, 52(3): 34-41.
[7] 刘笑,陈家炜,胡峻林. 用于亲属关系鉴别的成对约束组合度量学习[J]. 山东大学学报 (工学版), 2022, 52(2): 50-56.
[8] 霍兵强,周涛,陆惠玲,董雅丽,刘珊. 基于NRC和多模态残差神经网络的肺部肿瘤良恶性分类[J]. 山东大学学报 (工学版), 2020, 50(6): 59-67.
[9] 田枫, 李欣, 刘芳, 李闯, 孙小强, 杜睿山. 基于多模态子空间学习的语义标签生成方法[J]. 山东大学学报 (工学版), 2020, 50(3): 31-37.
[10] 常致富,周风余,王玉刚,沈冬冬,赵阳. 基于深度学习的图像自动标注方法综述[J]. 山东大学学报 (工学版), 2019, 49(6): 25-35.
[11] 牟廉明. 自适应特征选择加权k子凸包分类[J]. 山东大学学报 (工学版), 2018, 48(5): 32-37.
[12] 牟春倩,唐雁,胡金戈. 基于流形排序的三维模型检索方法[J]. 山东大学学报(工学版), 2017, 47(4): 19-24.
[13] 牟春倩,唐雁. 融合整体和局部信息的三维模型检索方法[J]. 山东大学学报(工学版), 2016, 46(6): 48-53.
[14] 翟海亭,吴晓娟,彭彰 . 一种改进的基于互信息的三维医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(4): 33-39 .
Viewed
Full text
6
HTML PDF
Just accepted Online first Issue Just accepted Online first Issue
0 0 0 0 0 6

  From local
  Times 6
  Rate 100%

Abstract
19
Just accepted Online first Issue
0 0 19
  From Others local
  Times 18 1
  Rate 95% 5%

Cited

Web of Science  Crossref   ScienceDirect  Search for Citations in Google Scholar >>
 
This page requires you have already subscribed to WoS.
  Shared   
  Discussed   
No Suggested Reading articles found!