您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2018, Vol. 48 ›› Issue (5): 38-46.doi: 10.6040/j.issn.1672-3961.0.2017.552

• 机器学习与数据挖掘 • 上一篇    下一篇

基于优选典型相关分量的跨媒体检索模型

李广丽1(),刘斌1,朱涛1,殷依2,张红斌2,3   

  1. 1. 华东交通大学信息工程学院, 江西 南昌 330013
    2. 华东交通大学软件学院, 江西 南昌 330013
    3. 武汉大学计算机学院, 湖北 武汉 430072
  • 收稿日期:2017-10-25 出版日期:2018-10-01 发布日期:2017-10-25
  • 作者简介:李广丽(1978—),女,广西博白人,副教授,工学硕士,主要研究方向为跨媒体检索,推荐系统,机器学习. E-mail:642908415@qq.com
  • 基金资助:
    国家自然科学基金资助项目(61762038);国家自然科学基金资助项目(61741108);国家自然科学基金资助项目(61463017);江西省自然科学基金资助项目(20171BAB202023);江西省科技厅重点研发计划资助项目(20171BBG70093);教育部人文社会科学研究资助项目(16YJAZH029);教育部人文社会科学研究资助项目(17YJAZH117);江西省社科规划资助项目(16TQ02);江西省普通本科高校中青年教师发展计划访问学者专项资金基金资助项目(赣教办函[2016]109号);江西省教育厅科技资助项目(GJJ160509);江西省教育厅科技资助项目(GJJ160531);江西省教育厅科技资助项目(GJJ160497);江西省高校人文社科基金资助项目(TQ1503);江西省高校人文社科基金资助项目(XW1502)

Cross-media retrieval model based on choosing key canonical correlated vectors

Guangli LI1(),Bin LIU1,Tao ZHU1,Yi YIN2,Hongbin ZHANG2,3   

  1. 1. School of Information Engineering, East China Jiaotong University, Nanchang 330013, Jiangxi, China
    2. Software School, East China Jiaotong University, Nanchang 330013, Jiangxi, China
    3. Computer School, Wuhan University, Wuhan 430072, Hubei, China
  • Received:2017-10-25 Online:2018-10-01 Published:2017-10-25
  • Supported by:
    国家自然科学基金资助项目(61762038);国家自然科学基金资助项目(61741108);国家自然科学基金资助项目(61463017);江西省自然科学基金资助项目(20171BAB202023);江西省科技厅重点研发计划资助项目(20171BBG70093);教育部人文社会科学研究资助项目(16YJAZH029);教育部人文社会科学研究资助项目(17YJAZH117);江西省社科规划资助项目(16TQ02);江西省普通本科高校中青年教师发展计划访问学者专项资金基金资助项目(赣教办函[2016]109号);江西省教育厅科技资助项目(GJJ160509);江西省教育厅科技资助项目(GJJ160531);江西省教育厅科技资助项目(GJJ160497);江西省高校人文社科基金资助项目(TQ1503);江西省高校人文社科基金资助项目(XW1502)

摘要:

在跨媒体检索中,准确利用异构媒体间的语义相关性是制约检索性能优劣的关键因素之一。提出改进的核典型相关分析(modified kernel canonical correlation analysis, MKCCA)模型,以改善跨媒体检索性能:抽取图像的尺度不变特征变换(scale invariant feature transform, SIFT)与描述灰度纹理的空间包络特征(GIST),抽取文本的词频(term frequency, TF)特征;精选映射核,把图像、文本特征映射到高维可分空间中,生成核矩阵;基于典型相关分析(canonical correlation analysis, CCA)方法挖掘图像、文本核矩阵间的非线性语义相关性;设置语义相关度阈值,降低语义噪声干扰并优选核心典型相关分量,更准确、鲁棒地刻画图像与文本间的语义关联。试验表明:SIFT-TF特征组合整体表现最好,而MKCCA模型与高斯核(gauss kernel)配合可获取最优跨媒体检索性能,其图像检索文本与文本检索图像的平均精度均值(mean average precision, MAP)较次优指标分别提升3.06%和1.18%。

关键词: 典型相关分量, 跨媒体检索, 核典型相关分析, 语义相关度阈值, 高斯核

Abstract:

It is one of the most important factors which affect final retrieval performance effectively by acquiring the core semantic correlations between heterogeneous media in cross-media retrieval. To improve retrieval performance, a modified kernel canonical correlation analysis (MKCCA) model was presented: image features like SIFT (scale invariant feature transform) and GIST were extracted respectively to better characterize the key visual content of images. Meanwhile TF (term frequency) feature was extracted to depict the key characteristics of texts. Then the extracted features were mapped into a high-dimensional space by mapping kernels. As the results, two kernel matrixes were acquired to describe the mapped features. Based on the kernel matrixes, the non-linear semantic correlations between images and texts were fully mined by canonical correlation analysis (CCA) model. More importantly, with the help of a semantic correlation threshold, those core canonical correlation vectors were chosen to suppress semantic noises and depict the key semantic correlations between images and texts more robustly. Experimental results showed that the best overall retrieval performance was obtained by using the feature combination SIFT-TF. Moreover the highest retrieval performance was obtained by MKCCA model combined with gauss kernel. Compared to the best competitor, the MAP value of the "images retrieve texts (I_R_T)" task was improved about 3.06% while the MAP value of the "texts retrieve image (T_R_I)" task was improved about 1.18%.

Key words: canonical correlated vectors, cross-media retrieval, kernel canonical correlation analysis, semantic correlation threshold, gauss kernel

中图分类号: 

  • TP391

图1

MKCCA模型的基本原理"

图2

MKCCA模型的技术流程"

图3

调制语义相关度阈值Yu及核函数后,跨媒体检索模型的MAP值变化"

表1

模型的MAP、特征平均准确率和模型平均准确率"

%
模型GIST-TF SIFT-TF 模型平均准确率
MAP 特征平均准确率 MAP 特征平均准确率
图像检索文本 文本检索图像 图像检索文本 文本检索图像
CCA 33.93 17.34 25.64 20.61 37.06 28.84 27.24
OKCCA+linear kernel 28.76 32.19 30.48 28.64 25.22 26.93 28.71
OKCCA+gauss kernel 18.27 29.34 23.81 35.46 31.59 33.53 28.67
OKCCA+poly kernel 18.99 18.01 18.50 31.45 22.98 27.22 22.86
MKCCA+linear kernel 30.50 33.37 31.94 27.78 26.39 27.09 29.52
MKCCA+gauss kernel 20.29 29.09 24.69 38.95 30.08 34.52 29.61
MKCCA+poly kernel 18.98 17.89 18.44 30.23 21.47 25.85 22.15

表2

SCM模型的MAP、特征平均准确率、语义距离平均准确率"

%
语义距离GIST-TFSCM SIFT-TFSCM 语义距离平均准确率
MAP 特征平均准确率 MAP 特征平均准确率
图像检索文本 文本检索图像 图像检索文本 文本检索图像
KL 24.36 21.46 22.91 31.01 24.33 27.67 25.29
JS 25.96 23.45 24.71 34.77 27.38 31.08 27.89
L1 26.24 23.62 24.93 34.69 27.55 31.12 28.03
L2 27.22 23.67 25.45 35.89 28.19 32.04 28.74

表3

各类模型的MAP和特征平均准确率"

%
特征组合MAP 特征平均准确率
CCA OKCCA MKCCA SCM
T_R_I with SIFT-TF 37.06 31.59 30.08 28.19 31.73
T_R_I with GIST-TF 17.34 32.19 33.37 23.67 26.64
I_R_T with SIFT-TF 20.61 35.46 38.95 35.89 32.73
I_R_T with GIST-TF 33.93 28.76 30.50 27.22 30.10
平均 27.24 32.00 33.23 28.74
1 HODOSH M , YOUNG P , HOCKENMAIER J . Framing image description as a ranking task: Data, models and evaluation metrics[J]. Journal of Artificial Intelligence Resource, 2013, 47, 853- 899.
doi: 10.1613/jair.3994
2 KIROS R, SALAKHUTDINOV R, ZEMEL R. Multimodal Neural Language Models[C]//Proceedings of International Conference on Machine Learning, 2014. New York: ACM, 2014: 595-603.
3 LI P, MA J, GAO S. Learning to summarize web image and text mutually[C]//Proceedings of ACM International Conference on Multimedia Retrieval. New York: ACM, 2012: 1-8.
4 李广丽, 陈婧琳, 刘斌, 等. 基于Tag-rank和典型相关性分析的在线商品跨媒体检索研究[J]. 科学技术与工程, 2016, 16 (4): 222- 227.
LI Guangli , CHEN Jinglin , LIU Bin , et al. Cross-media retrieval of online product based on tag-rank and CCA[J]. Science Technology and Engineering, 2016, 16 (4): 222- 227.
5 WU F, ZHANG H, ZHUANG Y T. Learning semantic correlations for cross-media retrieval[C]//Proceedings of International Conference on Image Processing. Piscataway, NJ: IEEE, 2006: 1465-1468.
6 WU F, YANG Y, ZHUANG Y T, et al. Understanding multimedia document semantics for cross-media retrieval[C]//Proceedings of Pacific-rim Conference on Advances in Multimedia Information Processing. Berlin Heidelberg: Springer, 2006, 4261: 979-988.
7 RASIWASIA N, COSTA P J, COVIELLO E, et al. A new approach to cross-modal multimedia[C]//Proceedings of Acm International Conference on Multimedia. New York: ACM, 2010: 251-260.
8 WANG Xikui, LIU Yang, WANG Donghui, et al. Cross-media Topic Mining on Wikipedia[C]//Proceedings of Acm International Conference on Multimedia. New York: ACM, 2013: 689-692.
9 SVANTE Wold . Principal component analysis[J]. Chemometrics and Intelligent Laboratory Systems, 1987, (2): 37- 52.
10 STONE James . Encyclopedia of statistics in behavioral science[M]. Chichester: John Wiley & Sons, 2005.
11 VINZI V E , CHIN W , HENSELER J , et al. Handbook of partial least squares: concepts, methods and applications[M]. Berlin: Springer, 2010.
12 HARDOON D R , SZEDMAK S , SHAWE Taylor J . Canonical correlation analysis: an overview with application to learning methods[J]. Neural Computation, 2004, 16 (12): 2639- 2664.
doi: 10.1162/0899766042321814
13 AKAHO S. A kernel method for canonical correlation analysis[C]//Proceedings of the International Meeting of the Psychometric Society. New York: ACM, 2001, 40(2): 263-269
14 BLEI D, JORDAN M. Modeling annotated data[C]//Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003: 127-134.
15 戴晓娟. 基于SVM线性核函数情感分类模型的建立和研究[J]. 哈尔滨师范大学自然科学学报, 2014, 30 (3): 55- 57.
doi: 10.3969/j.issn.1000-5617.2014.03.018
DAI Xiaojuan . The establishment and emotion research of the linear kernel based on SVM classification model[J]. Natural Sciences Journal of Harbin Normal University, 2014, 30 (3): 55- 57.
doi: 10.3969/j.issn.1000-5617.2014.03.018
16 赵莹.支持向量机中高斯核函数的研究[D].上海:华东师范大学, 2007.
ZHAO Ying. Research on gauss kernel in support vector machine[D]. Shanghai: East China Normal University, 2007.
17 赵金伟, 冯博琴, 闫桂荣. 基于正交多项式核函数方法[J]. 计算机技术与发展, 2012, 22 (5): 177- 179, 184.
ZHAO Jinwei , FENG Boqin , YAN Guirong . Review of chebyshev kernel functions[J]. Computer Technology and Development, 2012, 22 (5): 177- 179, 184.
18 姚志均, 刘俊涛, 周瑜, 等. 基于对称KL距离的相似性度量方法[J]. 华中科技大学学报(自然科学版), 2011, 39 (11): 1- 4, 38.
YAO Zhijun , LIU Juntao , ZHOU Yu , et al. Similarity measure method using symmetric KL divergence[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2011, 39 (11): 1- 4, 38.
19 FENG Yansong , LAPAPTA M . Automatic caption generation for news images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35 (4): 797- 812.
doi: 10.1109/TPAMI.2012.118
20 张红斌, 姬东鸿, 尹兰, 等. 基于关键词精化和句法树的商品图像句子标注[J]. 计算机研究与发展, 2016, 53 (11): 2542- 2555.
doi: 10.7544/issn1000-1239.2016.20150906
ZHANG Hongbin , JI Donghong , YIN Lan , et al. Caption generation from product image based on tag refinement and syntactic tree[J]. Journal of Computer Research and Development, 2016, 53 (11): 2542- 2555.
doi: 10.7544/issn1000-1239.2016.20150906
21 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770-778.
22 KIM Yoon. Convolutional neural networks for sentence classification[C]//Proceedings of Conference on Empirical Methods on Natural Language Processing. Stroudsburg, PA: ACL, 2014: 1746-1751.
[1] 张思懿1,2,王士同1*. 核化空间深度间距的特征提取方法[J]. 山东大学学报(工学版), 2012, 42(3): 45-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 何东之, 张吉沣, 赵鹏飞. 不确定性传播算法的MapReduce并行化实现[J]. 山东大学学报(工学版), 0, (): 22 -28 .
[2] 黄劲潮. 基于快速区域建议网络的图像多目标分割算法[J]. 山东大学学报(工学版), 2018, 48(4): 20 -26 .
[3] 唐庆顺,金璐,李国栋,吴春富. 基于自适应终端滑模控制器的机械手跟踪控制[J]. 山东大学学报(工学版), 2016, 46(5): 45 -53 .
[4] 张建明, 刘泉声, 唐志成, 占婷, 蒋亚龙. 考虑剪切变形历史影响的节理峰值剪切强度准则[J]. 山东大学学报(工学版), 0, (): 77 -81 .
[5] 王换,周忠眉. 一种基于聚类的过抽样算法[J]. 山东大学学报(工学版), 2018, 48(3): 134 -139 .
[6] 阳爱民1,周咏梅1,邓河2,周剑峰3. 一种网络流量分类特征的产生及选择方法[J]. 山东大学学报(工学版), 2010, 40(5): 1 -7 .
[7] 尤鸣宇,陈燕,李国正. 不均衡问题中的特征选择新算法:Im-IG[J]. 山东大学学报(工学版), 2010, 40(5): 123 -128 .
[8] 吴国瑶,马立勇. 基于B样条FFD模型配准的虹膜图像融合方法[J]. 山东大学学报(工学版), 2010, 40(5): 24 -27 .
[9] 肖乔, 裴继红, 王荔霞, 龚志成. 基于多通道Gabor滤波模糊融合的遥感图像舰船检测[J]. 山东大学学报(工学版), 0, (): 29 -35 .
[10] 马相明, 孙霞, 张强. 轮式装载机典型作业工况构建与分析[J]. 山东大学学报(工学版), 0, (): 82 -87 .