山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (4): 48-55.doi: 10.6040/j.issn.1672-3961.0.2024.165
• 深度学习与视觉专题 • 上一篇
韩小凡1,2,刁振宇1,2,张承宇1,2,聂慧佳1,2,赵秀阳1,2,牛冬梅1,2*
HAN Xiaofan1,2, DIAO Zhenyu1,2, ZHANG Chengyu1,2, NIE Huijia1,2, ZHAO Xiuyang1,2, NIU Dongmei1,2*
摘要: 为提取有效特征描述符,减小图像和三维模型巨大差异,提出一种基于注意力和视图信息的方法。该方法在模型特征提取模块引入空间注意力机制,提高模型特征描述符的有效性;将三维模型二维视图引入到查询图像特征学习过程中,缩小图像域与模型域域间差异。在 Pix3D、 Comp Cars、 Stanford Cars 3个代表性基准数据集进行试验,结果表明检索精度较现有经典方法提高约 5%。所提出方法能够使单幅图像有效检索相似三维模型,提高检索准确率。
中图分类号:
| [1] MU P P, ZHANG S Y, ZHANG Y, et al. Image-based 3D model retrieval using manifold learning[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(11): 1397-1408. [2] HU N, ZHOU H Y, LIU A A, et al. Collaborative distribution alignment for 2D image-based 3D shape retrieval[J]. Journal of Visual Communication and Image Representation, 2022, 83: 103426. [3] ZHOU H Y, NIE W Z, SONG D, et al. Semantic consistency guided instance feature alignment for 2D image-based 3D shape retrieval[C] //Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM, 2020: 925-933. [4] HE X W, HUANG T T, BAI S, et al. View n-gram network for 3D object retrieval[C] //Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 7514-7523. [5] LIN D Y, LI Y Q, CHENG Y, et al. Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features[J]. Knowledge-Based Systems, 2022, 247: 108754. [6] ALZU'BI A, ABUARQOUB A, AL-HOMOUZ A. Aggregated deep convolutional neural networks for multi-view 3D object retrieval[C] //Proceedings of 2019 11th International Congress on Ultra Modern Telecomm-unications and Control Systems and Workshops. Dublin, Ireland: IEEE, 2019: 1-5. [7] LIN M X, YANG J, WANG H, et al. Single image 3D shape retrieval via cross-modal instance and category contrastive learning[C] //Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 11385-11395. [8] LIN S F, WU C, HSU C, et al. An efficient 3D model retrieval based on principal axes analysis and feature integration[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2011, 25(4): 583-604. [9] PAN X Q, CHEN Y R, KUO C C. 3D shape retrieval via irrelevance filtering and similarity ranking(IF/SR)[C] //Proceedings of Computer Vision-ACCV 2016 Workshops. Taipei, China: Springer, 2017: 630-646. [10] ZHOU H Y, LIU A A, NIE W Z. Dual-level embedding alignment network for 2D image-based 3D object retrieval[C] //Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1667-1675. [11] NIE W Z, LIU A A, ZHAO S C, et al. Deep correlated joint network for 2D image-based 3D model retrieval[J]. IEEE Transactions on Cybernetics, 2020, 52(3): 1862-1871. [12] CHU J H, ZHAO X Q, SONG D, et al. Improved semantic representation learning by multiple clustering for image-based 3D model retrieval[J]. International Journal on Semantic Web and Information Systems, 2022, 18(1): 1-20. [13] ZOU Q F, LIU L G, LIU Y. Instance-level 3D shape retrieval from a single image by hybrid-representation-assisted joint embedding[J]. The Visual Computer, 2021, 37(7): 1743-1756. [14] GRABNER A, ROTH P M, LEPETIT V. Location field descriptors: single image 3D model retrieval in the wild[C] //Proceedings of 2019 International Conference on 3D Vision. Quebec, Canada: IEEE, 2019: 583-593. [15] FU H, LI S M, JIA R F, et al. Hard example generation by texture synthesis for cross-domain shape similarity learning[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2020: 14675-14687. [16] XUE L, GAO M F, XING C,et al. Ulip: learning unified representation of language, image and point cloud for 3D understanding[C] //Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 1179-1189. [17] HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 9729-9738. [18] HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 722-737. [19] LÖWE S, O'CONNOR P, VEELING B. Putting an end to end-to-end: gradient-isolated learning of representa- tions[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2019: 3039-3051. [20] MISRA I, MAATEN L V. Self-supervised learning of pretext-invariant representations[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 6707-6717. [21] TIAN Y, KRISHNAN D, ISOLA P. Contrastive multiview coding[C] //Proceedings of Computer Vision-ECCV 2020: 16th European Conference. Glasgow, UK: Springer-Verlag, 2020: 776-794. [22] WU Z F, WANG S N, GU J T, et al. Clear: contrastive learning for sentence representation[J]. ACM Transactions on Intelligent Systems and Technology, 2020, 14(4): 1-34. [23] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc, 2020: 18661-18673. [24] PENG B, LIN G, LEI, J, et al. Contrastivemulti-view learning for 3D shape clustering[J]. IEEE Transactions on Multimedia, 2024, 26: 6262-6272. [25] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[DB/OL].(2020-02-13)[2020-03-30]. https://doi.org/10.48550/arXiv.2002.05709 [26] OORD A V, LI Y, VINYALS O. Representation learning with contrastive predictive coding[DB/OL].(2018-07-10)[2019-01-22]. https://doi.org/10.48550/arXiv.1807.03748 [27] SUN J P, LEI S. A study of few-shot image classification model based on contrastive learning and self-attention[C] //Proceedings of 2023 IEEE International Conference on Electrical, Automation and Computer Engineering. Changchun, China: IEEE, 2023: 1142-1148. [28] CHEN Q, CHEN Y N. Multi-view 3D model retrieval based on enhanced detail features with contrastive center loss[J]. Multimedia Tools and Applications, 2022, 81(8): 10407-10426. [29] REINHARD E, ADHIKHMIN M, GOOCH B, et al. Color transfer between images[J]. IEEE Computer Graphics and Applications, 2001, 21(5): 34-41. [30] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778. [31] WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C] //Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer-Verlag, 2018: 3-19. [32] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C] //Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2961-2969. [33] YUAN Y H, CHEN X L, WANG J D. Object-contextual representations for semantic segmentation[C] //Proceedings of the Computer Vision-ECCV 2020: 16th European Conference. Glasgow, UK: Springer-Verlag, 2020: 173-190. [34] SUN X Y, WU J J, ZHANG X M, et al. Pix3d: dataset and methods for single-image 3D shape modeling[C] //Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018: 2974-2983. [35] WANG Y M, TAN X, YANG Y, et al. 3D pose estimation for fine-grained object categories[C] //Proceedings of Computer Vision-ECCV 2018 Workshops. Cham, Switzerland: Springer-Verlag, 2019: 619-632. [36] AUBRY M, RUSSELL B C. Understanding deep features with computer-generated imagery[C] //Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2875-2883. [37] GRABNER A, ROTH P M, LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C] //Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018: 3022-3031. |
| [1] | 刁振宇,韩小凡,张承宇,聂慧佳,赵秀阳,牛冬梅. 基于实例判别与特征增强的单图三维模型检索[J]. 山东大学学报 (工学版), 2025, 55(2): 71-77. |
| [2] | 马军,车进,贺愉婷,马鹏森. 基于空间注意力及条件增强的文本生成图像方法[J]. 山东大学学报 (工学版), 2024, 54(6): 49-56. |
| [3] | 刘方旭,王建,魏本征. 基于多空间注意力的小儿肺炎辅助诊断算法[J]. 山东大学学报 (工学版), 2023, 53(2): 135-142. |
| [4] | 牟春倩,唐雁,胡金戈. 基于流形排序的三维模型检索方法[J]. 山东大学学报(工学版), 2017, 47(4): 19-24. |
| [5] | 牟春倩,唐雁. 融合整体和局部信息的三维模型检索方法[J]. 山东大学学报(工学版), 2016, 46(6): 48-53. |
|
||