您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (4): 48-55.doi: 10.6040/j.issn.1672-3961.0.2024.165

• 深度学习与视觉专题 • 上一篇    

基于注意力和视图信息的单图三维模型检索

韩小凡1,2,刁振宇1,2,张承宇1,2,聂慧佳1,2,赵秀阳1,2,牛冬梅1,2*   

  1. 1. 山东省泛在智能计算重点实验室(筹), 山东 济南 250022;2.济南大学信息科学与工程学院, 山东 济南 250022
  • 发布日期:2025-08-31
  • 作者简介:韩小凡(2000— ),女,山东滨州人,硕士研究生,主要研究方向为三维模型表示、三维模型检索. E-mail:202221200983@stu.ujn.edu.cn . *通信作者简介:牛冬梅(1988—),女,山东泰安人,副教授,硕士生导师,博士,主要研究方向为三维模型处理. E-mal:ise_niudm@ujn.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62102163);山东省高等学校青年创新团队发展计划资助项目;山东省科技型中小企业创新能力提升工程资助项目(2023TSGC0244)

Single image 3D model retrieval based on attention and view information

HAN Xiaofan1,2, DIAO Zhenyu1,2, ZHANG Chengyu1,2, NIE Huijia1,2, ZHAO Xiuyang1,2, NIU Dongmei1,2*   

  1. 1. Shandong Provincial Key Laboratory of Ubiquitous Intelligent Computing, Jinan 250022, Shandong, China;
    2. School of Information Science and Engineering, University of Jinan, Jinan 250022, Shandong, China
  • Published:2025-08-31

摘要: 为提取有效特征描述符,减小图像和三维模型巨大差异,提出一种基于注意力和视图信息的方法。该方法在模型特征提取模块引入空间注意力机制,提高模型特征描述符的有效性;将三维模型二维视图引入到查询图像特征学习过程中,缩小图像域与模型域域间差异。在 Pix3D、 Comp Cars、 Stanford Cars 3个代表性基准数据集进行试验,结果表明检索精度较现有经典方法提高约 5%。所提出方法能够使单幅图像有效检索相似三维模型,提高检索准确率。

关键词: 三维模型, 单幅图像, 空间注意力, 基于单图的三维模型检索, 域间差异

Abstract: To extract effective feature descriptors and reduce the significant differences between 2D images and 3D models, a method based on attention and view information was proposed. The method introduced a spatial attention mechanism into the model's feature extraction module to enhance the effectiveness of the model's feature descriptors. The 2D views of 3D models were incorporated into the process of learning query image features to reduce the domain gap between the image domain and the model domain. Experiments were conducted on three representative benchmark datasets: Pix3D, Comp Cars, and Stanford Cars. The results showed that the best retrieval accuracy improved by 5%. The proposed method effectively retrieved similar 3D models from a single image and improved the retrieval accuracy.

Key words: 3D model, single image, spatial atention, 3D model retrieval based on single image, domain gap

中图分类号: 

  • TP183
[1] MU P P, ZHANG S Y, ZHANG Y, et al. Image-based 3D model retrieval using manifold learning[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(11): 1397-1408.
[2] HU N, ZHOU H Y, LIU A A, et al. Collaborative distribution alignment for 2D image-based 3D shape retrieval[J]. Journal of Visual Communication and Image Representation, 2022, 83: 103426.
[3] ZHOU H Y, NIE W Z, SONG D, et al. Semantic consistency guided instance feature alignment for 2D image-based 3D shape retrieval[C] //Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM, 2020: 925-933.
[4] HE X W, HUANG T T, BAI S, et al. View n-gram network for 3D object retrieval[C] //Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 7514-7523.
[5] LIN D Y, LI Y Q, CHENG Y, et al. Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features[J]. Knowledge-Based Systems, 2022, 247: 108754.
[6] ALZU'BI A, ABUARQOUB A, AL-HOMOUZ A. Aggregated deep convolutional neural networks for multi-view 3D object retrieval[C] //Proceedings of 2019 11th International Congress on Ultra Modern Telecomm-unications and Control Systems and Workshops. Dublin, Ireland: IEEE, 2019: 1-5.
[7] LIN M X, YANG J, WANG H, et al. Single image 3D shape retrieval via cross-modal instance and category contrastive learning[C] //Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 11385-11395.
[8] LIN S F, WU C, HSU C, et al. An efficient 3D model retrieval based on principal axes analysis and feature integration[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2011, 25(4): 583-604.
[9] PAN X Q, CHEN Y R, KUO C C. 3D shape retrieval via irrelevance filtering and similarity ranking(IF/SR)[C] //Proceedings of Computer Vision-ACCV 2016 Workshops. Taipei, China: Springer, 2017: 630-646.
[10] ZHOU H Y, LIU A A, NIE W Z. Dual-level embedding alignment network for 2D image-based 3D object retrieval[C] //Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1667-1675.
[11] NIE W Z, LIU A A, ZHAO S C, et al. Deep correlated joint network for 2D image-based 3D model retrieval[J]. IEEE Transactions on Cybernetics, 2020, 52(3): 1862-1871.
[12] CHU J H, ZHAO X Q, SONG D, et al. Improved semantic representation learning by multiple clustering for image-based 3D model retrieval[J]. International Journal on Semantic Web and Information Systems, 2022, 18(1): 1-20.
[13] ZOU Q F, LIU L G, LIU Y. Instance-level 3D shape retrieval from a single image by hybrid-representation-assisted joint embedding[J]. The Visual Computer, 2021, 37(7): 1743-1756.
[14] GRABNER A, ROTH P M, LEPETIT V. Location field descriptors: single image 3D model retrieval in the wild[C] //Proceedings of 2019 International Conference on 3D Vision. Quebec, Canada: IEEE, 2019: 583-593.
[15] FU H, LI S M, JIA R F, et al. Hard example generation by texture synthesis for cross-domain shape similarity learning[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2020: 14675-14687.
[16] XUE L, GAO M F, XING C,et al. Ulip: learning unified representation of language, image and point cloud for 3D understanding[C] //Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 1179-1189.
[17] HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 9729-9738.
[18] HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 722-737.
[19] LÖWE S, O'CONNOR P, VEELING B. Putting an end to end-to-end: gradient-isolated learning of representa- tions[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2019: 3039-3051.
[20] MISRA I, MAATEN L V. Self-supervised learning of pretext-invariant representations[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 6707-6717.
[21] TIAN Y, KRISHNAN D, ISOLA P. Contrastive multiview coding[C] //Proceedings of Computer Vision-ECCV 2020: 16th European Conference. Glasgow, UK: Springer-Verlag, 2020: 776-794.
[22] WU Z F, WANG S N, GU J T, et al. Clear: contrastive learning for sentence representation[J]. ACM Transactions on Intelligent Systems and Technology, 2020, 14(4): 1-34.
[23] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2020: 18661-18673.
[24] PENG B, LIN G, LEI, J, et al. Contrastivemulti-view learning for 3D shape clustering[J]. IEEE Transactions on Multimedia, 2024, 26: 6262-6272.
[25] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[DB/OL].(2020-02-13)[2020-03-30]. https://doi.org/10.48550/arXiv.2002.05709
[26] OORD A V, LI Y, VINYALS O. Representation learning with contrastive predictive coding[DB/OL].(2018-07-10)[2019-01-22]. https://doi.org/10.48550/arXiv.1807.03748
[27] SUN J P, LEI S. A study of few-shot image classification model based on contrastive learning and self-attention[C] //Proceedings of 2023 IEEE International Conference on Electrical, Automation and Computer Engineering. Changchun, China: IEEE, 2023: 1142-1148.
[28] CHEN Q, CHEN Y N. Multi-view 3D model retrieval based on enhanced detail features with contrastive center loss[J]. Multimedia Tools and Applications, 2022, 81(8): 10407-10426.
[29] REINHARD E, ADHIKHMIN M, GOOCH B, et al. Color transfer between images[J]. IEEE Computer Graphics and Applications, 2001, 21(5): 34-41.
[30] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
[31] WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C] //Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer-Verlag, 2018: 3-19.
[32] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C] //Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2961-2969.
[33] YUAN Y H, CHEN X L, WANG J D. Object-contextual representations for semantic segmentation[C] //Proceedings of the Computer Vision-ECCV 2020: 16th European Conference. Glasgow, UK: Springer-Verlag, 2020: 173-190.
[34] SUN X Y, WU J J, ZHANG X M, et al. Pix3d: dataset and methods for single-image 3D shape modeling[C] //Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018: 2974-2983.
[35] WANG Y M, TAN X, YANG Y, et al. 3D pose estimation for fine-grained object categories[C] //Proceedings of Computer Vision-ECCV 2018 Workshops. Cham, Switzerland: Springer-Verlag, 2019: 619-632.
[36] AUBRY M, RUSSELL B C. Understanding deep features with computer-generated imagery[C] //Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2875-2883.
[37] GRABNER A, ROTH P M, LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C] //Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018: 3022-3031.
[1] 刁振宇,韩小凡,张承宇,聂慧佳,赵秀阳,牛冬梅. 基于实例判别与特征增强的单图三维模型检索[J]. 山东大学学报 (工学版), 2025, 55(2): 71-77.
[2] 马军,车进,贺愉婷,马鹏森. 基于空间注意力及条件增强的文本生成图像方法[J]. 山东大学学报 (工学版), 2024, 54(6): 49-56.
[3] 刘方旭,王建,魏本征. 基于多空间注意力的小儿肺炎辅助诊断算法[J]. 山东大学学报 (工学版), 2023, 53(2): 135-142.
[4] 牟春倩,唐雁,胡金戈. 基于流形排序的三维模型检索方法[J]. 山东大学学报(工学版), 2017, 47(4): 19-24.
[5] 牟春倩,唐雁. 融合整体和局部信息的三维模型检索方法[J]. 山东大学学报(工学版), 2016, 46(6): 48-53.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!