Journal of Shandong University(Engineering Science) ›› 2025, Vol. 55 ›› Issue (3): 80-87.doi: 10.6040/j.issn.1672-3961.0.2024.018
• Machine Learning & Data Mining • Previous Articles
LI Feng, WEN Yimin*
CLC Number:
| [1] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C] //Proceedings of the 2009 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009: 248-255. [2] JI J, LUO Y, SUN X, et al. Improving image captioning by leveraging intra-and inter-layer global representation in Transformer network[C] //Proceedings of the AAAI Conference on Artificial Intelligence. [S.l.] : AAAI, 2021: 1655-1663. [3] JIANG H, MISRA I, ROHRBACH M, et al. In defense of grid features for visual question answering[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 10267-10276. [4] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149. [5] ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C] //Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 6077-6086. [6] YANG X, TANG K, ZHANG H, et al. Auto-encoding scene graphs for image captioning[C] //Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019: 10685-10694. [7] HERDADE S, KAPPELER A, BOAKEY K, et al. Image captioning: transforming objects into words[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM, 2019: 11137-11147. [8] DONG X, LONG C, XU W, et al. Dual graph convolutional networks with Transformer and curriculum learning for image captioning[C] //Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 2615-2624. [9] KUO C W, KIRA Z. Beyond a pre-trained object detector: cross-modal textual and visual context for image captioning[C] //Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 17969-17979. [10] BERNARDI R, CAKICI R, ELLIOTT D, et al. Automatic description generation from images: a survey of models, datasets, and evaluation measures[J]. Journal of Artificial Intelligence Research, 2016, 55: 409-442. [11] SOCHER R, KARPATHY A, LE Q V, et al. Grounded compositional semantics for finding and describing images with sentences[J]. Transactions of the Association for Computational Linguistics, 2014, 2: 207-218. [12] KULKARNI G, PREMRAJ V, ORDONEZ V, et al. Babytalk: understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2891-2903. [13] YAO T, PAN Y, LI Y, et al. Exploring visual relationship for image captioning[C] //Proceedings of the European Conference on Computer Vision(ECCV). Munich, Germany: Springer, 2018: 684-699. [14] HUANG L, WANG W, CHEN J, et al. Attention on attention for image captioning[C] //Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Long Beach, USA: IEEE, 2019: 4634-4643. [15] PAN Y, YAO T, LI Y, et al. X-linear attention networks for image captioning[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 10971-10980. [16] CORNIA M, STEFANINI M, BARALDI L, et al. Meshed-memory Transformer for image captioning[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 10578-10587. [17] LUO Y, JI J, SUN X, et al. Dual-level collaborative Transformer for image captioning[C] //Proceedings of the AAAI Conference on Artificial Intelligence. [S.l.] : AAAI, 2021: 2286-2293. [18] WEI J, LI Z, ZHU J, et al. Enhance understanding and reasoning ability for image captioning[J]. Applied Intelligence, 2023, 53(3): 2706-2722. [19] KARPATHY A, LI F F. Deep visual-semantic align-ments for generating image descriptions[C] //Proceedings of the 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 3128-3137. [20] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C] //Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2002: 311-318. [21] BANERJEE S, LAVIE A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments[C] //Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor, USA: ACL, 2005: 65-72. [22] LIN C Y. ROUGE: a package for automatic evaluation of summaries[C] //Proceedings of the Workshop on Text Summarization Branches Out. Barcelona, Spain: ACL, 2004: 74-81. [23] VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: consensus-based image description evaluation[C] //Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 4566-4575. [24] ANDERSON P, FERNANDO B, JOHNSON M, et al. SPICE: semantic propositional image caption evalu-ation[C] //Proceedings of the European Conference on Computer Vision(ECCV). Amsterdam, Netherlands: Springer, 2016: 382-398. [25] LI X, YIN X, LI C, et al. Oscar: object-semantics aligned pre-training for vision-language tasks[C] //Proceedings of the European Conference on Computer Vision(ECCV). Glasgow, UK: Springer, 2020: 121-137. [26] SCHUSTER S, KRISHNA R, CHANG A, et al. Generating semantically precise scene graphs from textual descriptions for improved image retrieval[C] //Pro-ceedings of the fourth Workshop on Vision and Language. Lisbon, Portugal: ACL, 2015: 70-80. |
| [1] | WANG Yuou, YUAN Yingchun, HE Zhenxue, WANG Kejian. A relation extraction method based on improved RoBERTa, multiple-instance learning and dual attention mechanism [J]. Journal of Shandong University(Engineering Science), 2025, 55(2): 78-87. |
| [2] | Jiachun LI,Bowen LI,Jianbo CHANG. An efficient and lightweight RGB frame-level face anti-spoofing model [J]. Journal of Shandong University(Engineering Science), 2023, 53(6): 1-7. |
| [3] | Xinzhang WU,Xiangyu LIANG,Hongyu ZHU,Dongdong ZHANG. Short-term wind power prediction based on CEEMDAN-GRA-PCC-ATCN [J]. Journal of Shandong University(Engineering Science), 2022, 52(6): 146-156. |
| [4] | Ye LIANG,Nan MA,Hongzhe LIU. Image-dependent fusion method for saliency maps [J]. Journal of Shandong University(Engineering Science), 2021, 51(4): 1-7. |
| [5] | Junsan ZHANG,Qiaoqiao CHENG,Yao WAN,Jie ZHU,Shidong ZHANG. MIRGAN: a medical image report generation model based on GAN [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 9-18. |
| [6] | ZHANG Qinyang, LI Xu, YAO Chunlong, LI Changwu. Aspect-level sentiment classification combined with syntactic dependency information [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 83-89. |
| [7] | ZHANG Yuefang, DENG Hongxia, HU Chunxiang, QIAN Guanyu, LI Haifang. Hippocampal segmentation combining residual attention mechanism and generative adversarial networks [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 76-81. |
| [8] | LIAO Nanxing, ZHOU Shibin, ZHANG Guopeng, CHENG Deqiang. Image caption generation method based on class activation mapping and attention mechanism [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 28-34. |
| [9] | Guoyong CAI,Xinhao HE,Yangyang CHU. Visual sentiment analysis based on spatial attention mechanism and convolutional neural network [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 8-13. |
| [10] | Zhifu CHANG,Fengyu ZHOU,Yugang WANG,Dongdong SHEN,Yang ZHAO. A survey of image captioning methods based on deep learning [J]. Journal of Shandong University(Engineering Science), 2019, 49(6): 25-35. |
|
||