Journal of Shandong University(Engineering Science) ›› 2025, Vol. 55 ›› Issue (4): 48-55.doi: 10.6040/j.issn.1672-3961.0.2024.165

• Special Issue for Deep Learning with Vision • Previous Articles    

Single image 3D model retrieval based on attention and view information

HAN Xiaofan1,2, DIAO Zhenyu1,2, ZHANG Chengyu1,2, NIE Huijia1,2, ZHAO Xiuyang1,2, NIU Dongmei1,2*   

  1. 1. Shandong Provincial Key Laboratory of Ubiquitous Intelligent Computing, Jinan 250022, Shandong, China;
    2. School of Information Science and Engineering, University of Jinan, Jinan 250022, Shandong, China
  • Published:2025-08-31

Abstract: To extract effective feature descriptors and reduce the significant differences between 2D images and 3D models, a method based on attention and view information was proposed. The method introduced a spatial attention mechanism into the model's feature extraction module to enhance the effectiveness of the model's feature descriptors. The 2D views of 3D models were incorporated into the process of learning query image features to reduce the domain gap between the image domain and the model domain. Experiments were conducted on three representative benchmark datasets: Pix3D, Comp Cars, and Stanford Cars. The results showed that the best retrieval accuracy improved by 5%. The proposed method effectively retrieved similar 3D models from a single image and improved the retrieval accuracy.

Key words: 3D model, single image, spatial atention, 3D model retrieval based on single image, domain gap

CLC Number: 

  • TP183
[1] MU P P, ZHANG S Y, ZHANG Y, et al. Image-based 3D model retrieval using manifold learning[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(11): 1397-1408.
[2] HU N, ZHOU H Y, LIU A A, et al. Collaborative distribution alignment for 2D image-based 3D shape retrieval[J]. Journal of Visual Communication and Image Representation, 2022, 83: 103426.
[3] ZHOU H Y, NIE W Z, SONG D, et al. Semantic consistency guided instance feature alignment for 2D image-based 3D shape retrieval[C] //Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM, 2020: 925-933.
[4] HE X W, HUANG T T, BAI S, et al. View n-gram network for 3D object retrieval[C] //Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 7514-7523.
[5] LIN D Y, LI Y Q, CHENG Y, et al. Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features[J]. Knowledge-Based Systems, 2022, 247: 108754.
[6] ALZU'BI A, ABUARQOUB A, AL-HOMOUZ A. Aggregated deep convolutional neural networks for multi-view 3D object retrieval[C] //Proceedings of 2019 11th International Congress on Ultra Modern Telecomm-unications and Control Systems and Workshops. Dublin, Ireland: IEEE, 2019: 1-5.
[7] LIN M X, YANG J, WANG H, et al. Single image 3D shape retrieval via cross-modal instance and category contrastive learning[C] //Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 11385-11395.
[8] LIN S F, WU C, HSU C, et al. An efficient 3D model retrieval based on principal axes analysis and feature integration[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2011, 25(4): 583-604.
[9] PAN X Q, CHEN Y R, KUO C C. 3D shape retrieval via irrelevance filtering and similarity ranking(IF/SR)[C] //Proceedings of Computer Vision-ACCV 2016 Workshops. Taipei, China: Springer, 2017: 630-646.
[10] ZHOU H Y, LIU A A, NIE W Z. Dual-level embedding alignment network for 2D image-based 3D object retrieval[C] //Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1667-1675.
[11] NIE W Z, LIU A A, ZHAO S C, et al. Deep correlated joint network for 2D image-based 3D model retrieval[J]. IEEE Transactions on Cybernetics, 2020, 52(3): 1862-1871.
[12] CHU J H, ZHAO X Q, SONG D, et al. Improved semantic representation learning by multiple clustering for image-based 3D model retrieval[J]. International Journal on Semantic Web and Information Systems, 2022, 18(1): 1-20.
[13] ZOU Q F, LIU L G, LIU Y. Instance-level 3D shape retrieval from a single image by hybrid-representation-assisted joint embedding[J]. The Visual Computer, 2021, 37(7): 1743-1756.
[14] GRABNER A, ROTH P M, LEPETIT V. Location field descriptors: single image 3D model retrieval in the wild[C] //Proceedings of 2019 International Conference on 3D Vision. Quebec, Canada: IEEE, 2019: 583-593.
[15] FU H, LI S M, JIA R F, et al. Hard example generation by texture synthesis for cross-domain shape similarity learning[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2020: 14675-14687.
[16] XUE L, GAO M F, XING C,et al. Ulip: learning unified representation of language, image and point cloud for 3D understanding[C] //Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 1179-1189.
[17] HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 9729-9738.
[18] HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 722-737.
[19] LÖWE S, O'CONNOR P, VEELING B. Putting an end to end-to-end: gradient-isolated learning of representa- tions[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2019: 3039-3051.
[20] MISRA I, MAATEN L V. Self-supervised learning of pretext-invariant representations[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 6707-6717.
[21] TIAN Y, KRISHNAN D, ISOLA P. Contrastive multiview coding[C] //Proceedings of Computer Vision-ECCV 2020: 16th European Conference. Glasgow, UK: Springer-Verlag, 2020: 776-794.
[22] WU Z F, WANG S N, GU J T, et al. Clear: contrastive learning for sentence representation[J]. ACM Transactions on Intelligent Systems and Technology, 2020, 14(4): 1-34.
[23] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2020: 18661-18673.
[24] PENG B, LIN G, LEI, J, et al. Contrastivemulti-view learning for 3D shape clustering[J]. IEEE Transactions on Multimedia, 2024, 26: 6262-6272.
[25] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[DB/OL].(2020-02-13)[2020-03-30]. https://doi.org/10.48550/arXiv.2002.05709
[26] OORD A V, LI Y, VINYALS O. Representation learning with contrastive predictive coding[DB/OL].(2018-07-10)[2019-01-22]. https://doi.org/10.48550/arXiv.1807.03748
[27] SUN J P, LEI S. A study of few-shot image classification model based on contrastive learning and self-attention[C] //Proceedings of 2023 IEEE International Conference on Electrical, Automation and Computer Engineering. Changchun, China: IEEE, 2023: 1142-1148.
[28] CHEN Q, CHEN Y N. Multi-view 3D model retrieval based on enhanced detail features with contrastive center loss[J]. Multimedia Tools and Applications, 2022, 81(8): 10407-10426.
[29] REINHARD E, ADHIKHMIN M, GOOCH B, et al. Color transfer between images[J]. IEEE Computer Graphics and Applications, 2001, 21(5): 34-41.
[30] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
[31] WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C] //Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer-Verlag, 2018: 3-19.
[32] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C] //Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2961-2969.
[33] YUAN Y H, CHEN X L, WANG J D. Object-contextual representations for semantic segmentation[C] //Proceedings of the Computer Vision-ECCV 2020: 16th European Conference. Glasgow, UK: Springer-Verlag, 2020: 173-190.
[34] SUN X Y, WU J J, ZHANG X M, et al. Pix3d: dataset and methods for single-image 3D shape modeling[C] //Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018: 2974-2983.
[35] WANG Y M, TAN X, YANG Y, et al. 3D pose estimation for fine-grained object categories[C] //Proceedings of Computer Vision-ECCV 2018 Workshops. Cham, Switzerland: Springer-Verlag, 2019: 619-632.
[36] AUBRY M, RUSSELL B C. Understanding deep features with computer-generated imagery[C] //Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2875-2883.
[37] GRABNER A, ROTH P M, LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C] //Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018: 3022-3031.
[1] DIAO Zhenyu, HAN Xiaofan, ZHANG Chengyu, NIE Huijia, ZHAO Xiuyang, NIU Dongmei. Single image 3D model retrieval based on instance discrimination and feature enhancement [J]. Journal of Shandong University(Engineering Science), 2025, 55(2): 71-77.
[2] MOU Chunqian, TANG Yan, HU Jinge. A new 3D model retrieval method based on manifold ranking [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 19-24.
[3] MOU Chunqian, TANG Yan. A novel 3D model retrieval method fusing global and local information [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(6): 48-53.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!