您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (3): 31-37.doi: 10.6040/j.issn.1672-3961.0.2019.364

• 机器学习与数据挖掘 • 上一篇    下一篇

基于多模态子空间学习的语义标签生成方法

田枫(),李欣,刘芳*(),李闯,孙小强,杜睿山   

  1. 东北石油大学计算机与信息技术学院,黑龙江 大庆 163318
  • 收稿日期:2019-05-14 出版日期:2020-06-20 发布日期:2020-06-16
  • 通讯作者: 刘芳 E-mail:tianfeng1980@163.com;lfliufang1983@126.com
  • 作者简介:田枫(1980—),男,黑龙江大庆人,教授,博士,主要研究方向为计算机视觉.E-mail: tianfeng1980@163.com
  • 基金资助:
    国家自然科学基金资助项目(61502094);东北石油大学优秀中青年科研创新团队资助项目(KYCXTD201903);黑龙江省高等教育教学改革研究项目(SJGY20180079);黑龙江省高等教育教学改革研究项目(SJGY20190098);黑龙江省哲学社会科学研究规划项目资助项目(19SHE280);大庆市哲学社会科学规划研究项目(DSGB2019042)

A semantictag generation method based on multi-model subspace learning

Feng TIAN(),Xin LI,Fang LIU*(),Chuang LI,Xiaoqiang SUN,Ruishan DU   

  1. School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, Heilongjiang, China
  • Received:2019-05-14 Online:2020-06-20 Published:2020-06-16
  • Contact: Fang LIU E-mail:tianfeng1980@163.com;lfliufang1983@126.com
  • Supported by:
    国家自然科学基金资助项目(61502094);东北石油大学优秀中青年科研创新团队资助项目(KYCXTD201903);黑龙江省高等教育教学改革研究项目(SJGY20180079);黑龙江省高等教育教学改革研究项目(SJGY20190098);黑龙江省哲学社会科学研究规划项目资助项目(19SHE280);大庆市哲学社会科学规划研究项目(DSGB2019042)

摘要:

基于已有的视觉空间和文本空间上标签相关性建模方法,提出一种多模态子空间学习的语义标签生成方法。通过建立视觉特征相似图,以非线性方式重构“图像-标签”相关性,进而将图像的视觉模态表示和标签的文本模态表示统一到多模态子空间中,并保证空间变换前后具备结构保持。在该空间中,标签的文本模态与图像的视觉内容模态信息彼此互补,语义相关的图像和标签映射到空间中相近的样本点,进而将语义标签生成问题转换为子空间内图像的近邻标签搜索问题。结果表明,该方法在FLICKR-25K数据集上,性能达到36.88%,在NUS-WIDE数据集上,性能达到44.17%,多模态子空间学习的语义标签生成方法可以大幅度提升标签生成的准确性。

关键词: 图像标签生成, 多模态学习, 子空间学习, 空间变换, 结构保持

Abstract:

A multi-model subspace learning semantic tag generation method was proposed, whic was based on the visual space and label space tag correlation modeling method separately. This method reconstructed the "image-tag" correlation in a non-linear manner by establishing a visual feature similarity map, thereby unifying the visual modal representation of the image and the text modal representation of the tag into a multi-model subspace, and ensuring space structure preservation before and after conversion. In this space, the text modal information of the label and the modal information of the visual content of the image were complementary to each other. The semantically related images and labels were mapped to similar sample points in the space, and the semantic label generation problem was then transformed into the nearest label-neighbors retrieval problem. The results showed that the performance of the proposed method was 36.88% on FLICKR-25K data set, and 44.17% on NUS-WIDE data set, which indicated that the proposed method could greatly improve the accuracy of label generation.

Key words: image tag generation, multimodal learning, subspace learning, space transformation, structure preservation

中图分类号: 

  • TP391

图1

多模态子空间学习过程"

表1

数据集统计"

数据集 图像/
标签/
图像/
图像平均标签/个
Flickr-25K(2008) 17 512 457 8 756 6.71
NUS-WIDE(2009) 55 615 2 892 27 808 7.83
MS-COCO(2014) 123 287 80 82 783 2.95

图2

F1值变化"

图3

NUS-WIDE数据集平均图像准确率"

图4

NUS-WIDE数据集平均图像召回率"

图5

FLICKR数据集平均图像准确率"

图6

FLICKR数据集平均图像召回率"

表2

MIR-FLICKR数据集标注性能"

方法 API ARI APL ARL F1
lres 19.12 16.45 38.00 26.75 31.40
mpmf 16.58 15.83 38.53 26.01 31.06
twtv 21.05 17.03 38.37 26.61 31.43
cmusl 23.07 18.27 55.56 27.60 36.88

表3

NUS-WIDE数据集标注性能"

方法 API ARI APL ARL F1
lres 18.25 14.33 18.56 6.76 9.91
mpmf 17.02 13.41 13.30 8.24 10.18
twtv 19.47 15.48 17.24 7.64 10.59
cmusl 21.14 16.44 22.27 14.36 17.46

表4

MSCOCO数据集标注性能"

方法 API ARI APL ARL F1
lres 33.17 32.45 37.52 32.35 34.74
mpmf 33.21 32.39 38.11 32.24 34.93
twtv 34.34 33.26 40.50 32.01 35.76
cmusl 35.28 34.79 46.87 41.76 44.17
lres 35.83 33.96 47.16 42.53 44.73

图7

NUS-WIDE数据集部分标签的标注准确率"

图8

标注结果实例"

1 TIAN Feng , WANG Quge , LI Xin , et al. Heterogeneous multimedia cooperative annotation based on multimodal correlation learning[J]. Journal of Visual Communication and Image Representation, 2019, 58 (2): 544- 553.
2 LI Xirong , URICCHIO Tiberio , BALLAN Lamberto , et al. Socializing the semantic gap: a comparative survey on image tag assignment, refinement, and retrieval[J]. ACM Computing Surveys, 2016, 49 (1): 14.
3 田枫, 沈旭昆. 基于标签集相关性学习的大规模网络图像在线标注[J]. 自动化学报, 2014, 40 (8): 1635- 1643.
TIAN Feng , SHEN Xukun . Large scale web image online annotation by learning label set relevance[J]. Acta Automatica Sinica, 2014, 40 (8): 1635- 1643.
4 李瞳, 李彤, 赵宏伟. 基于残差神经网络的视频内容快速检索系统研究[J]. 吉林大学学报(信息科学版), 2018, 36 (4): 112- 116.
LI Tong , LI Tong , ZHAO Hongwei . Video content quick search system based on residual neural network[J]. Journal of Jilin University(Information Science Edition), 2018, 36 (4): 112- 116.
5 LI Xirong . Tag relevance fusion for social image retrieval[J]. Multimedia Systems, 2017, 23 (1): 29- 40.
6 LI Jundong , CHENG Kewei , WANG Suhang , et al. Feature selection: a data perspective[J]. ACM Computing Surveys (CSUR), 2018, 50 (6): 94.
7 MAFARJA Majdi , ALJARAH Ibrahim , HEIDARI Ali Asghar , et al. Binary dragonfly optimization for feature selection using time-varying transfer functions[J]. Knowledge-Based Systems, 2018, 161 (2): 185- 204.
8 XU Xing , HE Li , LU Huimin , et al. Deep adversarial metric learning for cross-modal retrieval[J]. World Wide Web, 2019, 22 (2): 657- 672.
doi: 10.1007/s11280-018-0541-x
9 CHENG Gong , YANG Ceyuan , YAO Xiwen , et al. When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56 (5): 2811- 2821.
doi: 10.1109/TGRS.2017.2783902
10 HOTELLING Harold . Relations between two sets of variates[J]. Biometrika, 1936, 16 (2): 321- 377.
11 HARDOON David R , SHAWE-TAYLOR John . Sparse canonical correlation analysis[J]. Machine Learning, 2011, 83 (3): 331- 353.
12 庄凌, 庄越挺, 吴江琴. 一种基于稀疏典型性相关分析的图像检索方法[J]. 软件学报, 2012, 34 (5): 1295- 1304.
ZHUANG Ling , ZHUANG Yueting , WU Jiangqin , et al. Image retrieval approach based on sparse canonical correlation analysis[J]. Journal of Software, 2012, 34 (5): 1295- 1304.
13 LI Zechao, LIU Jing, ZHU Xiaobin. Image annotation using multi-correlation probabilistic matrix factorization[C]//Proceedings of the International Conference on Multimedia. New York, USA: ACM, 2010: 1187-1190.
14 ZHUANG Jinfeng, STEVEN C H H. A two-view learning approach for image tag ranking[C]// Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2011: 625-634.
15 ZHU Guangyu, YAN Shuicheng, MA Yi. Image tag refinement towards low-rank, content-tag prior and error sparsity[C]// Proceedings of the ACM Multimedia Conference. New York, USA: ACM Press, 2010: 461-470.
16 LIU Yang , WEN Kaiwen , GAO Quanxue , et al. SVM based multi-label learning with missing labels for image annotation[J]. Pattern Recognition, 2018, 78 (2): 307- 317.
17 LI Xirong, LIAO Shuai, LAN Weiyu, et al. Zero-shot image tagging by hierarchical semantic embedding[C]// Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2016: 879-882.
18 XIA Hao, WU Pengcheng, STEVEN C H H. Online multi-modal distance learning for scalable multimedia retrieval[C]//Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2016: 455-464.
19 XIE Liang, SHEN Jialie, ZHU Lei. Online cross-modal hashing for web image retrieval[C]//Proceedings of thirtieth AAAI Conference on Artificial Intelligence. New York, USA: AAAI Press, 2016: 294-300.
20 DONG Jianfeng, LI Xirong, LIAO Shuai, et al. Image retrieval by cross-media relevance fusion[C]// Proceedings of the 23rd ACM international conference on Multimedia. New York, USA: ACM, 2016: 173-176.
21 TIAN Feng , SHEN Xukun , LIU Xianmei , et al. Image tagging by semantic neighbor learning using user-contributed social image datasets[J]. Tsinghua Science and Technology, 2017, 22 (6): 551- 563.
doi: 10.23919/TST.2017.8195340
22 TIAN Feng , SHEN Xukun , LIU Xianmei . Multimedia automatic annotation by mining label set correlation[J]. Multimedia Tools and Applications, 2018, 77 (3): 3473- 3491.
doi: 10.1007/s11042-017-5170-3
23 MARK J H, MICHAEL S L. The MIR flickr retrieval evaluation[C]// Proceedings of the ACM Intenational Conference on Multimedia Information Retrieval. New York, USA: ACM, 2008: 39-43.
24 CHUA T S, TANG Jinhui, HONG Richang. NUS-WIDE: a real-world web image database from national university of singapore[C]// Proceedings of the ACM Conference on Image and Video Retrieval. New York, USA: ACM, 2009: 1-9
25 LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer Cham, 2014: 740-755.
[1] 张继,金翠,王洪元,陈首兵. 基于奇异值分解行人对齐网络的行人重识别[J]. 山东大学学报 (工学版), 2019, 49(5): 91-97.
[2] 王熙照,白丽杰*,花强,刘玉超. null[J]. 山东大学学报(工学版), 2011, 41(4): 1-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 施来顺,万忠义 . 新型甜菜碱型沥青乳化剂的合成与性能测试[J]. 山东大学学报(工学版), 2008, 38(4): 112 -115 .
[2] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[3] 胡天亮,李鹏,张承瑞,左毅 . 基于VHDL的正交编码脉冲电路解码计数器设计[J]. 山东大学学报(工学版), 2008, 38(3): 10 -13 .
[4] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[5] 孙从征,管从胜,秦敬玉,程川 . 铝合金化学镀镍磷合金结构和性能[J]. 山东大学学报(工学版), 2007, 37(5): 108 -112 .
[6] 徐丽丽,季忠,夏继梅 . 同规格货物装箱问题的优化计算[J]. 山东大学学报(工学版), 2008, 38(3): 14 -17 .
[7] 潘多涛,刘桂萍,刘长风 . 生物絮凝剂产生菌的筛选及培养条件优化[J]. 山东大学学报(工学版), 2008, 38(3): 99 -103 .
[8] 赵勇 田四明 曹哲明. 宜万铁路复杂岩溶隧道施工地质工作方法[J]. 山东大学学报(工学版), 2009, 39(5): 91 -95 .
[9] 邓斌,王江 . 基于混沌同步与自适应控制的神经元模型参数估计[J]. 山东大学学报(工学版), 2007, 37(5): 19 -23 .
[10] 高厚磊 田佳 杜强 武志刚 刘淑敏. 能源开发新技术——分布式发电[J]. 山东大学学报(工学版), 2009, 39(5): 106 -110 .