Journal of Shandong University(Engineering Science) ›› 2025, Vol. 55 ›› Issue (2): 71-77.doi: 10.6040/j.issn.1672-3961.0.2024.164

• Machine Learning & Data Mining • Previous Articles     Next Articles

Single image 3D model retrieval based on instance discrimination and feature enhancement

DIAO Zhenyu1,2, HAN Xiaofan1,2, ZHANG Chengyu1,2, NIE Huijia1,2, ZHAO Xiuyang1,2, NIU Dongmei1,2*   

  1. 1. Shandong Provincial Key Laboratory of Ubiquitous Intelligent Computing, Jinan 250022, Shandong, China;
    2. School of Information Science and Engineering, University of Jinan, Jinan 250022, Shandong, China
  • Published:2025-04-15

Abstract: To reduce the modal gap between the image domain and the model domain in 3D model retrieval algorithms, a neural network algorithm model consisting of four modules was proposed. The data exchange module exchanged image and 3D model data with a certain probability, allowing the image domain network to learn model domain features and the model domain network to learn image domain features, thus initially reducing the modal gap. The feature alignment module included an instance sample discrimination loss function and an image-model pairing loss function, which further aligned the image domain and model domain. The instance discrimination loss function treated each instance as an independent class and classified it, making the features of the same instance's images and 3D models similar. The image-model pairing module aimed to bring closer the images and 3D models of the same instance and push apart the images and 3D models of different instances. Based on contrastive learning, a feature enhancement module was added to the image domain to improve feature discrimination within the image domain. The experimental results showed that the proposed algorithm achieved good results on three common datasets: Pix3D, CompCars, and StanfordCars, improving retrieval accuracy by up to 4.5% compared to existing classical methods. This aligned the image domain and the 3D model domain, reduced the modal gap, and improved the accuracy of image retrieval of 3D models.

Key words: 3D model retrieval, metric learning, contrastive learning, multimodal, cross modal retrieval

CLC Number: 

  • TP183
[1] WU Peng, LU Xiankai, SHEN Jianbing, et al. Clip fusion with bi-level optimization for human mesh reconstruction from monocular videos[C] //Proceeding of the 31st ACM International Conference on Multimedia. New York, USA: ACM, 2023:105-115.
[2] QIN Zheyun, HAN Cheng, WANG Qifan, et al. Unified 3D segmenter as prototypical classifiers[C] //Proceeding of the 37th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2023:46419-46432.
[3] LIU Anan, ZHANG Chenyu, LI Wenhui, et al. Self-supervised auxiliary domain alignment for unsupervised 2D image-based 3D shape retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12): 8809-8821.
[4] LI Tianbao, SU Yuting, SONG Dan, et al. Progressive fourier adversarial domain adaptation for object classification and retrieval[J]. IEEE Transactions on Multimedia, 2024, 26: 4540-4553.
[5] SONG Dan, YANG Yuanxiang, LI Wenhui, et al. Adaptive semantic transfer network for unsupervised 2D image-based 3D model retrieval[J]. Computer Vision and Image Understanding, 2024, 240(3): 1077-3142.
[6] DAI Yongxing, LIU Jun, SUN Yifan, et al. IDM: an intermediate domain module for domain adaptive person re-id[C] //Proceeding of the 20th International Conference on Computer Vision. Piscatawa, USA: IEEE, 2021: 11844-11854.
[7] FU Huan, LI Shunming, JIA Rongfei, et al. Hard example generation by texture synthesis for cross-domain shape similarity learning[C] //Proceeding of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2020: 14675-14687.
[8] GRABNER A, ROTH P M, LEPETIT V. Location field descriptors: single image 3D model retrieval in the wild[C] //Proceeding of the 9th International Conference on 3D Vision. Quebec, Canada: IEEE, 2019: 583-593.
[9] LIN Mingxian, YANG Jie, WANG He, et al. Single image 3D shape retrieval via cross-modal instance and category contrastive learning[C] //Proceeding of the 20th International Conference on Computer Vision. Piscataway, USA: IEEE, 2021: 11385-11395.
[10] HE Kaiming, FAN Haoqi, WU Yuxing, et al. Momentum contrast for unsupervised visual representation learning[C] //Proceeding of the 33th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2020: 9726-9735.
[11] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C] //Proceeding of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2020: 18661-18673.
[12] WU Z R, SONG S R, KHOSLA A, et al. 3D shapenets: a deep representation for volumetric shapes[C] //Proceeding of the 28th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2015: 1912-1920.
[13] FURUYA T, OHBUCHI R. Deep aggregation of local 3D geometric features for 3D model retrieval[C] //Proceeding of the 2016 British Machine Vision Conference. York, UK: BMVC Press, 2016: 920 - 928.
[14] QI C R, SU H, NIEßNER M, et al. Volumetric and multi-view CNNs for object classification on 3D data[C] //Proceeding of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 5648-5656.
[15] MATURANA D, SCHERER S. Voxnet: A 3d convolutional neural network for real-time object recognition[C] //Proceeding of the 2015 international conference on intelligent robots and systems. Piscataway, USA: IEEE, 2015: 922-928.
[16] QI C R, SU H, KAICHUN M, et al. Pointnet: deep learning on point sets for 3D classification and segmentation[C] //Proceeding of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2017: 77-85.
[17] QI C R, LI Y, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space[C] //Proceeding of the 31st International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2017: 5105-5114.
[18] MA Xu, QIN Can, YOU Haoxuan, et al. Rethinking network design and local geometry in point cloud: a simple residual MLP framework[C] //Proceeding of the 10th International Conference on Learning Representations. New York, USA: Curran Associates Inc., 2022: 661-673.
[19] WANG Yue, SUN Yongbin, LIU Ziwei, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 1-14.
[20] SU J C, GADELHA M, WANG R, et al. A deeper look at 3D shape classifiers[C] //Proceeding of the 15th European conference on computer vision. Heidelberg, Germany: Springer-Verlag, 2018: 645-661.
[21] FENG Yifan, ZHANG Zizhao, ZHAO Xibin, et al. GVCNN: group-view convolutional neural networks for 3D shape recognition[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 264-272.
[22] SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C] //Proceeding of the 15th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2015: 945-953.
[23] ISAAC-MEDINA B, WILLCOCKS C, BRECKON T. Multi-view vision transformers for object detection[C] //Proceeding of the 26th International Conference on Pattern Recognition. Piscataway, USA: IEEE, 2022: 4678-4684.
[24] NIE Weizhi, ZHAO Yue, NIE Jie, et al. CLN: cross-domain learning network for 2D image-based 3D shape retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 992-1005.
[25] LIU Anan, GUO Fubin, ZHOU Heyu, et al. Domain-adversarial-guided siamese network for unsupervised cross-domain 3-D object retrieval[J]. IEEE Transactions on Cybernetics, 2022, 52(12): 13862-13873.
[26] ZHOU Heyu, LIU Anan, NIE Weizhi. Dual-level embedding alignment network for 2D image-based 3D object retrieval[C] //Proceeding of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1667-1675.
[27] GRABNER A, ROTH P M, LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 3022-3031.
[28] SUN Xingyuan, WU Jiajun, ZHANG Xiuming, et al. Pix3D: dataset and methods for single-image 3D shape modeling[C] //Proceeding of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 2974-2983.
[29] WANG Yaming, TAN Xiao, YANG Yi, et al. 3D pose estimation for fine-grained object categories[C] //Proceeding of of 16th European Conference on Computer Vision. Heidelberg, Germany: Springer-Verlag, 2019: 619-632.
[30] KRAUSE J, STARK M, DENG J, et al. 3D Object representations for fine-grained categorization[C] //Proceeding of the 14th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2013: 554-561.
[31] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C] //Proceeding of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 770-778.
[32] AUBRY M, RUSSELL B C. Understanding deep features with computer-generated imagery[C] //Proceeding of the 15th IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2015: 2875-2883.
[33] XUE Le, GAO M, MARTIN M R, et al. Ulip: learning a unified representation of language, images, and point clouds for 3d understanding[C] //Proceeding of the 36th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2023:1179-1189.
[1] HAN Xiaofan, DIAO Zhenyu, ZHANG Chengyu, NIE Huijia, ZHAO Xiuyang, NIU Dongmei. Single image 3D model retrieval based on attention and view information [J]. Journal of Shandong University(Engineering Science), 2025, 55(4): 48-55.
[2] HUO Bingqiang, ZHOU Tao, LU Huiling, DONG Yali, LIU Shan. Lung tumor benign-malignant classification based on multi-modal residual neural network and NRC algorithm [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 59-67.
[3] Feng TIAN, Xin LI, Fang LIU, Chuang LI, Xiaoqiang SUN, Ruishan DU. A semantictag generation method based on multi-model subspace learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(3): 31-37.
[4] Zhifu CHANG,Fengyu ZHOU,Yugang WANG,Dongdong SHEN,Yang ZHAO. A survey of image captioning methods based on deep learning [J]. Journal of Shandong University(Engineering Science), 2019, 49(6): 25-35.
[5] Lianming MOU. Weighted k sub-convex-hull classifier based on adaptive feature selection [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 32-37.
[6] JI Anzhao, WANG Yufeng, LIU Xuefen. Numerical calculation method and distribution law of zero points of the compound Bessel function [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(1): 71-77.
[7] MOU Chunqian, TANG Yan, HU Jinge. A new 3D model retrieval method based on manifold ranking [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 19-24.
[8] MOU Chunqian, TANG Yan. A novel 3D model retrieval method fusing global and local information [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(6): 48-53.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHANG Yong-hua,WANG An-ling,LIU Fu-ping . The reflected phase angle of low frequent inhomogeneous[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 22 -25 .
[2] SHI Lai-shun,WAN Zhong-yi . Synthesis and performance evaluation of a novel betaine-type asphalt emulsifier[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 112 -115 .
[3] KONG Xiang-zhen,LIU Yan-jun,WANG Yong,ZHAO Xiu-hua . Compensation and simulation for the deadband of the pneumatic proportional valve[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 99 -102 .
[4] LAI Xiang . The global domain of attraction for a kind of MKdV equations[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 87 -92 .
[5] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[6] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[7] WANG Bo,WANG Ning-sheng . Automatic generation and combinatory optimization of disassembly sequence for mechanical-electric assembly[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 52 -57 .
[8] JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .
[9] . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 27 -32 .
[10] QIN Tong, SUN Fengrong*, WANG Limei, WANG Qinghao, LI Xincai. 3D surface reconstruction using the shape based interpolation guided by maximal discs[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 1 -5 .