山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (3): 31-37.doi: 10.6040/j.issn.1672-3961.0.2019.364
Feng TIAN(),Xin LI,Fang LIU*(),Chuang LI,Xiaoqiang SUN,Ruishan DU
摘要:
基于已有的视觉空间和文本空间上标签相关性建模方法,提出一种多模态子空间学习的语义标签生成方法。通过建立视觉特征相似图,以非线性方式重构“图像-标签”相关性,进而将图像的视觉模态表示和标签的文本模态表示统一到多模态子空间中,并保证空间变换前后具备结构保持。在该空间中,标签的文本模态与图像的视觉内容模态信息彼此互补,语义相关的图像和标签映射到空间中相近的样本点,进而将语义标签生成问题转换为子空间内图像的近邻标签搜索问题。结果表明,该方法在FLICKR-25K数据集上,性能达到36.88%,在NUS-WIDE数据集上,性能达到44.17%,多模态子空间学习的语义标签生成方法可以大幅度提升标签生成的准确性。
中图分类号:
1 | TIAN Feng , WANG Quge , LI Xin , et al. Heterogeneous multimedia cooperative annotation based on multimodal correlation learning[J]. Journal of Visual Communication and Image Representation, 2019, 58 (2): 544- 553. |
2 | LI Xirong , URICCHIO Tiberio , BALLAN Lamberto , et al. Socializing the semantic gap: a comparative survey on image tag assignment, refinement, and retrieval[J]. ACM Computing Surveys, 2016, 49 (1): 14. |
3 | 田枫, 沈旭昆. 基于标签集相关性学习的大规模网络图像在线标注[J]. 自动化学报, 2014, 40 (8): 1635- 1643. |
TIAN Feng , SHEN Xukun . Large scale web image online annotation by learning label set relevance[J]. Acta Automatica Sinica, 2014, 40 (8): 1635- 1643. | |
4 | 李瞳, 李彤, 赵宏伟. 基于残差神经网络的视频内容快速检索系统研究[J]. 吉林大学学报(信息科学版), 2018, 36 (4): 112- 116. |
LI Tong , LI Tong , ZHAO Hongwei . Video content quick search system based on residual neural network[J]. Journal of Jilin University(Information Science Edition), 2018, 36 (4): 112- 116. | |
5 | LI Xirong . Tag relevance fusion for social image retrieval[J]. Multimedia Systems, 2017, 23 (1): 29- 40. |
6 | LI Jundong , CHENG Kewei , WANG Suhang , et al. Feature selection: a data perspective[J]. ACM Computing Surveys (CSUR), 2018, 50 (6): 94. |
7 | MAFARJA Majdi , ALJARAH Ibrahim , HEIDARI Ali Asghar , et al. Binary dragonfly optimization for feature selection using time-varying transfer functions[J]. Knowledge-Based Systems, 2018, 161 (2): 185- 204. |
8 |
XU Xing , HE Li , LU Huimin , et al. Deep adversarial metric learning for cross-modal retrieval[J]. World Wide Web, 2019, 22 (2): 657- 672.
doi: 10.1007/s11280-018-0541-x |
9 |
CHENG Gong , YANG Ceyuan , YAO Xiwen , et al. When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56 (5): 2811- 2821.
doi: 10.1109/TGRS.2017.2783902 |
10 | HOTELLING Harold . Relations between two sets of variates[J]. Biometrika, 1936, 16 (2): 321- 377. |
11 | HARDOON David R , SHAWE-TAYLOR John . Sparse canonical correlation analysis[J]. Machine Learning, 2011, 83 (3): 331- 353. |
12 | 庄凌, 庄越挺, 吴江琴. 一种基于稀疏典型性相关分析的图像检索方法[J]. 软件学报, 2012, 34 (5): 1295- 1304. |
ZHUANG Ling , ZHUANG Yueting , WU Jiangqin , et al. Image retrieval approach based on sparse canonical correlation analysis[J]. Journal of Software, 2012, 34 (5): 1295- 1304. | |
13 | LI Zechao, LIU Jing, ZHU Xiaobin. Image annotation using multi-correlation probabilistic matrix factorization[C]//Proceedings of the International Conference on Multimedia. New York, USA: ACM, 2010: 1187-1190. |
14 | ZHUANG Jinfeng, STEVEN C H H. A two-view learning approach for image tag ranking[C]// Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2011: 625-634. |
15 | ZHU Guangyu, YAN Shuicheng, MA Yi. Image tag refinement towards low-rank, content-tag prior and error sparsity[C]// Proceedings of the ACM Multimedia Conference. New York, USA: ACM Press, 2010: 461-470. |
16 | LIU Yang , WEN Kaiwen , GAO Quanxue , et al. SVM based multi-label learning with missing labels for image annotation[J]. Pattern Recognition, 2018, 78 (2): 307- 317. |
17 | LI Xirong, LIAO Shuai, LAN Weiyu, et al. Zero-shot image tagging by hierarchical semantic embedding[C]// Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2016: 879-882. |
18 | XIA Hao, WU Pengcheng, STEVEN C H H. Online multi-modal distance learning for scalable multimedia retrieval[C]//Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2016: 455-464. |
19 | XIE Liang, SHEN Jialie, ZHU Lei. Online cross-modal hashing for web image retrieval[C]//Proceedings of thirtieth AAAI Conference on Artificial Intelligence. New York, USA: AAAI Press, 2016: 294-300. |
20 | DONG Jianfeng, LI Xirong, LIAO Shuai, et al. Image retrieval by cross-media relevance fusion[C]// Proceedings of the 23rd ACM international conference on Multimedia. New York, USA: ACM, 2016: 173-176. |
21 |
TIAN Feng , SHEN Xukun , LIU Xianmei , et al. Image tagging by semantic neighbor learning using user-contributed social image datasets[J]. Tsinghua Science and Technology, 2017, 22 (6): 551- 563.
doi: 10.23919/TST.2017.8195340 |
22 |
TIAN Feng , SHEN Xukun , LIU Xianmei . Multimedia automatic annotation by mining label set correlation[J]. Multimedia Tools and Applications, 2018, 77 (3): 3473- 3491.
doi: 10.1007/s11042-017-5170-3 |
23 | MARK J H, MICHAEL S L. The MIR flickr retrieval evaluation[C]// Proceedings of the ACM Intenational Conference on Multimedia Information Retrieval. New York, USA: ACM, 2008: 39-43. |
24 | CHUA T S, TANG Jinhui, HONG Richang. NUS-WIDE: a real-world web image database from national university of singapore[C]// Proceedings of the ACM Conference on Image and Video Retrieval. New York, USA: ACM, 2009: 1-9 |
25 | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer Cham, 2014: 740-755. |
[1] | 张继,金翠,王洪元,陈首兵. 基于奇异值分解行人对齐网络的行人重识别[J]. 山东大学学报 (工学版), 2019, 49(5): 91-97. |
[2] | 王熙照,白丽杰*,花强,刘玉超. null[J]. 山东大学学报(工学版), 2011, 41(4): 1-6. |
|