基于注意力和视图信息的单图三维模型检索

doi:10.6040/j.issn.1672-3961.0.2024.165

摘要/Abstract

摘要： 为提取有效特征描述符,减小图像和三维模型巨大差异,提出一种基于注意力和视图信息的方法。该方法在模型特征提取模块引入空间注意力机制,提高模型特征描述符的有效性;将三维模型二维视图引入到查询图像特征学习过程中,缩小图像域与模型域域间差异。在 Pix3D、 Comp Cars、 Stanford Cars 3个代表性基准数据集进行试验,结果表明检索精度较现有经典方法提高约 5%。所提出方法能够使单幅图像有效检索相似三维模型,提高检索准确率。

关键词: 三维模型, 单幅图像, 空间注意力, 基于单图的三维模型检索, 域间差异

Abstract: To extract effective feature descriptors and reduce the significant differences between 2D images and 3D models, a method based on attention and view information was proposed. The method introduced a spatial attention mechanism into the model's feature extraction module to enhance the effectiveness of the model's feature descriptors. The 2D views of 3D models were incorporated into the process of learning query image features to reduce the domain gap between the image domain and the model domain. Experiments were conducted on three representative benchmark datasets: Pix3D, Comp Cars, and Stanford Cars. The results showed that the best retrieval accuracy improved by 5%. The proposed method effectively retrieved similar 3D models from a single image and improved the retrieval accuracy.

Key words: 3D model, single image, spatial atention, 3D model retrieval based on single image, domain gap

中图分类号:

TP183

韩小凡,刁振宇,张承宇,聂慧佳,赵秀阳,牛冬梅. 基于注意力和视图信息的单图三维模型检索[J]. 山东大学学报 (工学版), 2025, 55(4): 48-55.

HAN Xiaofan, DIAO Zhenyu, ZHANG Chengyu, NIE Huijia, ZHAO Xiuyang, NIU Dongmei. Single image 3D model retrieval based on attention and view information[J]. Journal of Shandong University(Engineering Science), 2025, 55(4): 48-55.

参考文献

[1] MU P P, ZHANG S Y, ZHANG Y, et al. Image-based 3D model retrieval using manifold learning[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(11): 1397-1408.
[2] HU N, ZHOU H Y, LIU A A, et al. Collaborative distribution alignment for 2D image-based 3D shape retrieval[J]. Journal of Visual Communication and Image Representation, 2022, 83: 103426.
[3] ZHOU H Y, NIE W Z, SONG D, et al. Semantic consistency guided instance feature alignment for 2D image-based 3D shape retrieval[C] //Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM, 2020: 925-933.
[4] HE X W, HUANG T T, BAI S, et al. View n-gram network for 3D object retrieval[C] //Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 7514-7523.
[5] LIN D Y, LI Y Q, CHENG Y, et al. Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features[J]. Knowledge-Based Systems, 2022, 247: 108754.
[6] ALZU'BI A, ABUARQOUB A, AL-HOMOUZ A. Aggregated deep convolutional neural networks for multi-view 3D object retrieval[C] //Proceedings of 2019 11th International Congress on Ultra Modern Telecomm-unications and Control Systems and Workshops. Dublin, Ireland: IEEE, 2019: 1-5.
[7] LIN M X, YANG J, WANG H, et al. Single image 3D shape retrieval via cross-modal instance and category contrastive learning[C] //Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 11385-11395.
[8] LIN S F, WU C, HSU C, et al. An efficient 3D model retrieval based on principal axes analysis and feature integration[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2011, 25(4): 583-604.
[9] PAN X Q, CHEN Y R, KUO C C. 3D shape retrieval via irrelevance filtering and similarity ranking(IF/SR)[C] //Proceedings of Computer Vision-ACCV 2016 Workshops. Taipei, China: Springer, 2017: 630-646.
[10] ZHOU H Y, LIU A A, NIE W Z. Dual-level embedding alignment network for 2D image-based 3D object retrieval[C] //Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1667-1675.
[11] NIE W Z, LIU A A, ZHAO S C, et al. Deep correlated joint network for 2D image-based 3D model retrieval[J]. IEEE Transactions on Cybernetics, 2020, 52(3): 1862-1871.
[12] CHU J H, ZHAO X Q, SONG D, et al. Improved semantic representation learning by multiple clustering for image-based 3D model retrieval[J]. International Journal on Semantic Web and Information Systems, 2022, 18(1): 1-20.
[13] ZOU Q F, LIU L G, LIU Y. Instance-level 3D shape retrieval from a single image by hybrid-representation-assisted joint embedding[J]. The Visual Computer, 2021, 37(7): 1743-1756.
[14] GRABNER A, ROTH P M, LEPETIT V. Location field descriptors: single image 3D model retrieval in the wild[C] //Proceedings of 2019 International Conference on 3D Vision. Quebec, Canada: IEEE, 2019: 583-593.
[15] FU H, LI S M, JIA R F, et al. Hard example generation by texture synthesis for cross-domain shape similarity learning[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2020: 14675-14687.
[16] XUE L, GAO M F, XING C,et al. Ulip: learning unified representation of language, image and point cloud for 3D understanding[C] //Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 1179-1189.
[17] HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 9729-9738.
[18] HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 722-737.
[19] LÖWE S, O'CONNOR P, VEELING B. Putting an end to end-to-end: gradient-isolated learning of representa- tions[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2019: 3039-3051.
[20] MISRA I, MAATEN L V. Self-supervised learning of pretext-invariant representations[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 6707-6717.
[21] TIAN Y, KRISHNAN D, ISOLA P. Contrastive multiview coding[C] //Proceedings of Computer Vision-ECCV 2020: 16th European Conference. Glasgow, UK: Springer-Verlag, 2020: 776-794.
[22] WU Z F, WANG S N, GU J T, et al. Clear: contrastive learning for sentence representation[J]. ACM Transactions on Intelligent Systems and Technology, 2020, 14(4): 1-34.
[23] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2020: 18661-18673.
[24] PENG B, LIN G, LEI, J, et al. Contrastivemulti-view learning for 3D shape clustering[J]. IEEE Transactions on Multimedia, 2024, 26: 6262-6272.
[25] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[DB/OL].(2020-02-13)[2020-03-30]. https://doi.org/10.48550/arXiv.2002.05709
[26] OORD A V, LI Y, VINYALS O. Representation learning with contrastive predictive coding[DB/OL].(2018-07-10)[2019-01-22]. https://doi.org/10.48550/arXiv.1807.03748
[27] SUN J P, LEI S. A study of few-shot image classification model based on contrastive learning and self-attention[C] //Proceedings of 2023 IEEE International Conference on Electrical, Automation and Computer Engineering. Changchun, China: IEEE, 2023: 1142-1148.
[28] CHEN Q, CHEN Y N. Multi-view 3D model retrieval based on enhanced detail features with contrastive center loss[J]. Multimedia Tools and Applications, 2022, 81(8): 10407-10426.
[29] REINHARD E, ADHIKHMIN M, GOOCH B, et al. Color transfer between images[J]. IEEE Computer Graphics and Applications, 2001, 21(5): 34-41.
[30] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
[31] WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C] //Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer-Verlag, 2018: 3-19.
[32] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C] //Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2961-2969.
[33] YUAN Y H, CHEN X L, WANG J D. Object-contextual representations for semantic segmentation[C] //Proceedings of the Computer Vision-ECCV 2020: 16th European Conference. Glasgow, UK: Springer-Verlag, 2020: 173-190.
[34] SUN X Y, WU J J, ZHANG X M, et al. Pix3d: dataset and methods for single-image 3D shape modeling[C] //Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018: 2974-2983.
[35] WANG Y M, TAN X, YANG Y, et al. 3D pose estimation for fine-grained object categories[C] //Proceedings of Computer Vision-ECCV 2018 Workshops. Cham, Switzerland: Springer-Verlag, 2019: 619-632.
[36] AUBRY M, RUSSELL B C. Understanding deep features with computer-generated imagery[C] //Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2875-2883.
[37] GRABNER A, ROTH P M, LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C] //Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018: 3022-3031.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed