您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (2): 71-77.doi: 10.6040/j.issn.1672-3961.0.2024.164

• 机器学习与数据挖掘 • 上一篇    下一篇

基于实例判别与特征增强的单图三维模型检索

刁振宇1,2,韩小凡1,2,张承宇1,2,聂慧佳1,2,赵秀阳1,2,牛冬梅1,2*   

  1. 1. 山东省泛在智能计算重点实验室(筹), 山东 济南 250022;2.济南大学信息科学与工程学院, 山东 济南 250022
  • 发布日期:2025-04-15
  • 作者简介:刁振宇(1998— ),男,山东枣庄人,硕士研究生,主要研究方向为三维模型表示、三维模型检索. E-mail:dzy10242023@163.com . *通信作者简介:牛冬梅(1988— ),女,山东泰安人,副教授,硕士生导师,博士,主要研究方向为三维模型处理. E-mal:ise_niudm@ujn.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62102163);山东省高等学校青年创新团队发展计划资助项目;山东省科技型中小企业创新能力提升工程资助项目(2023TSGCO244)

Single image 3D model retrieval based on instance discrimination and feature enhancement

DIAO Zhenyu1,2, HAN Xiaofan1,2, ZHANG Chengyu1,2, NIE Huijia1,2, ZHAO Xiuyang1,2, NIU Dongmei1,2*   

  1. 1. Shandong Provincial Key Laboratory of Ubiquitous Intelligent Computing, Jinan 250022, Shandong, China;
    2. School of Information Science and Engineering, University of Jinan, Jinan 250022, Shandong, China
  • Published:2025-04-15

摘要: 为减小图像检索三维模型算法中图像域和模型域间的模态差距,提出一种由4个模块组成的神经网络算法模型。数据交换模块通过一定概率交换图像和三维模型数据,使图像域网络具有模型域特征学习能力,模型域网络具有图像域特征学习能力,初步减小模态差距。特征对齐模块有实例样本判别损失函数和图像模型配对损失函数,进一步对齐图像域和模型域。实例判别损失函数将每个实例视为独立个体类,对其进行分类,使相同实例的图像和三维模型的特征相似。图像模型配对模块旨在拉近相同实例的图像和三维模型,推远不同实例的图像和三维模型。基于对比学习在图像域中增加特征增强模块,提高图像域内特征区分性。试验结果表明,提出的算法在3个常见数据集Pix3D、 CompCars和StanfordCars上取得良好效果,检索精度较现有经典方法提高4.5%。实现图像域和三维模型域对齐,减小模态差距,提高图像检索三维模型精度。

关键词: 三维模型检索, 度量学习, 对比学习, 多模态, 跨模态检索

Abstract: To reduce the modal gap between the image domain and the model domain in 3D model retrieval algorithms, a neural network algorithm model consisting of four modules was proposed. The data exchange module exchanged image and 3D model data with a certain probability, allowing the image domain network to learn model domain features and the model domain network to learn image domain features, thus initially reducing the modal gap. The feature alignment module included an instance sample discrimination loss function and an image-model pairing loss function, which further aligned the image domain and model domain. The instance discrimination loss function treated each instance as an independent class and classified it, making the features of the same instance's images and 3D models similar. The image-model pairing module aimed to bring closer the images and 3D models of the same instance and push apart the images and 3D models of different instances. Based on contrastive learning, a feature enhancement module was added to the image domain to improve feature discrimination within the image domain. The experimental results showed that the proposed algorithm achieved good results on three common datasets: Pix3D, CompCars, and StanfordCars, improving retrieval accuracy by up to 4.5% compared to existing classical methods. This aligned the image domain and the 3D model domain, reduced the modal gap, and improved the accuracy of image retrieval of 3D models.

Key words: 3D model retrieval, metric learning, contrastive learning, multimodal, cross modal retrieval

中图分类号: 

  • TP183
[1] WU Peng, LU Xiankai, SHEN Jianbing, et al. Clip fusion with bi-level optimization for human mesh reconstruction from monocular videos[C] //Proceeding of the 31st ACM International Conference on Multimedia. New York, USA: ACM, 2023:105-115.
[2] QIN Zheyun, HAN Cheng, WANG Qifan, et al. Unified 3D segmenter as prototypical classifiers[C] //Proceeding of the 37th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2023:46419-46432.
[3] LIU Anan, ZHANG Chenyu, LI Wenhui, et al. Self-supervised auxiliary domain alignment for unsupervised 2D image-based 3D shape retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12): 8809-8821.
[4] LI Tianbao, SU Yuting, SONG Dan, et al. Progressive fourier adversarial domain adaptation for object classification and retrieval[J]. IEEE Transactions on Multimedia, 2024, 26: 4540-4553.
[5] SONG Dan, YANG Yuanxiang, LI Wenhui, et al. Adaptive semantic transfer network for unsupervised 2D image-based 3D model retrieval[J]. Computer Vision and Image Understanding, 2024, 240(3): 1077-3142.
[6] DAI Yongxing, LIU Jun, SUN Yifan, et al. IDM: an intermediate domain module for domain adaptive person re-id[C] //Proceeding of the 20th International Conference on Computer Vision. Piscatawa, USA: IEEE, 2021: 11844-11854.
[7] FU Huan, LI Shunming, JIA Rongfei, et al. Hard example generation by texture synthesis for cross-domain shape similarity learning[C] //Proceeding of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2020: 14675-14687.
[8] GRABNER A, ROTH P M, LEPETIT V. Location field descriptors: single image 3D model retrieval in the wild[C] //Proceeding of the 9th International Conference on 3D Vision. Quebec, Canada: IEEE, 2019: 583-593.
[9] LIN Mingxian, YANG Jie, WANG He, et al. Single image 3D shape retrieval via cross-modal instance and category contrastive learning[C] //Proceeding of the 20th International Conference on Computer Vision. Piscataway, USA: IEEE, 2021: 11385-11395.
[10] HE Kaiming, FAN Haoqi, WU Yuxing, et al. Momentum contrast for unsupervised visual representation learning[C] //Proceeding of the 33th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2020: 9726-9735.
[11] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C] //Proceeding of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2020: 18661-18673.
[12] WU Z R, SONG S R, KHOSLA A, et al. 3D shapenets: a deep representation for volumetric shapes[C] //Proceeding of the 28th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2015: 1912-1920.
[13] FURUYA T, OHBUCHI R. Deep aggregation of local 3D geometric features for 3D model retrieval[C] //Proceeding of the 2016 British Machine Vision Conference. York, UK: BMVC Press, 2016: 920 - 928.
[14] QI C R, SU H, NIEßNER M, et al. Volumetric and multi-view CNNs for object classification on 3D data[C] //Proceeding of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 5648-5656.
[15] MATURANA D, SCHERER S. Voxnet: A 3d convolutional neural network for real-time object recognition[C] //Proceeding of the 2015 international conference on intelligent robots and systems. Piscataway, USA: IEEE, 2015: 922-928.
[16] QI C R, SU H, KAICHUN M, et al. Pointnet: deep learning on point sets for 3D classification and segmentation[C] //Proceeding of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2017: 77-85.
[17] QI C R, LI Y, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space[C] //Proceeding of the 31st International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2017: 5105-5114.
[18] MA Xu, QIN Can, YOU Haoxuan, et al. Rethinking network design and local geometry in point cloud: a simple residual MLP framework[C] //Proceeding of the 10th International Conference on Learning Representations. New York, USA: Curran Associates Inc., 2022: 661-673.
[19] WANG Yue, SUN Yongbin, LIU Ziwei, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 1-14.
[20] SU J C, GADELHA M, WANG R, et al. A deeper look at 3D shape classifiers[C] //Proceeding of the 15th European conference on computer vision. Heidelberg, Germany: Springer-Verlag, 2018: 645-661.
[21] FENG Yifan, ZHANG Zizhao, ZHAO Xibin, et al. GVCNN: group-view convolutional neural networks for 3D shape recognition[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 264-272.
[22] SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C] //Proceeding of the 15th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2015: 945-953.
[23] ISAAC-MEDINA B, WILLCOCKS C, BRECKON T. Multi-view vision transformers for object detection[C] //Proceeding of the 26th International Conference on Pattern Recognition. Piscataway, USA: IEEE, 2022: 4678-4684.
[24] NIE Weizhi, ZHAO Yue, NIE Jie, et al. CLN: cross-domain learning network for 2D image-based 3D shape retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 992-1005.
[25] LIU Anan, GUO Fubin, ZHOU Heyu, et al. Domain-adversarial-guided siamese network for unsupervised cross-domain 3-D object retrieval[J]. IEEE Transactions on Cybernetics, 2022, 52(12): 13862-13873.
[26] ZHOU Heyu, LIU Anan, NIE Weizhi. Dual-level embedding alignment network for 2D image-based 3D object retrieval[C] //Proceeding of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1667-1675.
[27] GRABNER A, ROTH P M, LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 3022-3031.
[28] SUN Xingyuan, WU Jiajun, ZHANG Xiuming, et al. Pix3D: dataset and methods for single-image 3D shape modeling[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 2974-2983.
[29] WANG Yaming, TAN Xiao, YANG Yi, et al. 3D pose estimation for fine-grained object categories[C] //Proceeding of of 16th European Conference on Computer Vision. Heidelberg, Germany: Springer-Verlag, 2019: 619-632.
[30] KRAUSE J, STARK M, DENG J, et al. 3D Object representations for fine-grained categorization[C] //Proceeding of the 14th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2013: 554-561.
[31] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C] //Proceeding of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 770-778.
[32] AUBRY M, RUSSELL B C. Understanding deep features with computer-generated imagery[C] //Proceeding of the 15th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2015: 2875-2883.
[33] XUE Le, GAO M, MARTIN M R, et al. Ulip: learning a unified representation of language, images, and point clouds for 3d understanding[C] //Proceeding of the 36th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2023:1179-1189.
[1] 谭智方,董飞,卢鹏宇,潘嘉男,聂秀山,尹义龙. 基于跨模态注意力哈希学习的视频片段定位方法[J]. 山东大学学报 (工学版), 2025, 55(1): 58-65.
[2] 李伟豪,王苹苹,许万博,魏本征. 结构先验引导的多模态腰椎MRI图像分割算法[J]. 山东大学学报 (工学版), 2025, 55(1): 66-76.
[3] 聂秀山,巩蕊,董飞,郭杰,马玉玲. 短视频场景分类方法综述[J]. 山东大学学报 (工学版), 2024, 54(3): 1-11.
[4] 郑顺,王绍卿,刘玉芳,李可可,孙福振. 基于动态掩码和多对对比学习的序列推荐模型[J]. 山东大学学报 (工学版), 2023, 53(6): 47-55.
[5] 于艺旋,杨耕,耿华. 连续复合运动的多模态层次化关键帧提取方法[J]. 山东大学学报 (工学版), 2023, 53(2): 42-50.
[6] 杨霄,袭肖明,李维翠,杨璐. 基于层次化双重注意力网络的乳腺多模态图像分类[J]. 山东大学学报 (工学版), 2022, 52(3): 34-41.
[7] 刘笑,陈家炜,胡峻林. 用于亲属关系鉴别的成对约束组合度量学习[J]. 山东大学学报 (工学版), 2022, 52(2): 50-56.
[8] 霍兵强,周涛,陆惠玲,董雅丽,刘珊. 基于NRC和多模态残差神经网络的肺部肿瘤良恶性分类[J]. 山东大学学报 (工学版), 2020, 50(6): 59-67.
[9] 田枫, 李欣, 刘芳, 李闯, 孙小强, 杜睿山. 基于多模态子空间学习的语义标签生成方法[J]. 山东大学学报 (工学版), 2020, 50(3): 31-37.
[10] 常致富,周风余,王玉刚,沈冬冬,赵阳. 基于深度学习的图像自动标注方法综述[J]. 山东大学学报 (工学版), 2019, 49(6): 25-35.
[11] 牟廉明. 自适应特征选择加权k子凸包分类[J]. 山东大学学报 (工学版), 2018, 48(5): 32-37.
[12] 牟春倩,唐雁,胡金戈. 基于流形排序的三维模型检索方法[J]. 山东大学学报(工学版), 2017, 47(4): 19-24.
[13] 牟春倩,唐雁. 融合整体和局部信息的三维模型检索方法[J]. 山东大学学报(工学版), 2016, 46(6): 48-53.
[14] 翟海亭,吴晓娟,彭彰 . 一种改进的基于互信息的三维医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(4): 33-39 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 邓斌,王江 . 基于混沌同步与自适应控制的神经元模型参数估计[J]. 山东大学学报(工学版), 2007, 37(5): 19 -23 .
[2] 王进野,姚瑞英,张纪良,王其军 . 一类模糊双曲正切模型稳定性控制[J]. 山东大学学报(工学版), 2007, 37(2): 63 -66 .
[3] 夏 斌,张连俊 . DS-CDMA UWB系统中基于能量比较的TOA估计算法[J]. 山东大学学报(工学版), 2007, 37(1): 70 -73 .
[4] 员冬玲,邓建新,丁泽良,段振兴 . 梯度陶瓷水煤浆喷嘴的残余热应力有限元分析[J]. 山东大学学报(工学版), 2008, 38(2): 18 -22 .
[5] 毕侠飞,孙同景,杨福刚,张巍 . 非接触式并行连铸方坯在线定尺切割系统研究[J]. 山东大学学报(工学版), 2008, 38(1): 52 -55 .
[6] 杜烨,汤红卫,王卫东 . 边界奇异权法在复合型裂纹计算中的应用[J]. 山东大学学报(工学版), 2008, 38(3): 123 -126 .
[7] 王汝贵,蔡敢为 . 两自由度可控平面连杆机构机电耦合系统的超谐波共振分析[J]. 山东大学学报(工学版), 2008, 38(3): 58 -63 .
[8] 刘飞宏,王建明*,余丰,张刚. 基于SPH耦合有限元法的喷丸残余应力场数值模拟[J]. 山东大学学报(工学版), 2010, 40(6): 67 -71 .
[9] 刘晓平 王洪运 张鹏 秦绪平 张孟力. 三元共聚阳离子聚丙烯酰胺的合成及性能评价[J]. 山东大学学报(工学版), 2009, 39(3): 71 -76 .
[10] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .