山东大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (2): 37-42.doi: 10.6040/j.issn.1672-3961.2.2014.125
钱肃驰, 彭甫镕, 陆建峰
QIAN Suchi, PENG Furong, LU Jianfeng
摘要: 为解决社交媒体中标签的缺失、错误等问题,提出一种基于内容相似度和语义相似度的标签优化方法。首先利用TF-IDF(term frequency—inverse document frequency)计算文本间相似度,然后利用文本间相似度与标签相似度的一致性定义了目标函数,最后加入了修正项来减少优化前后用户提供标签的偏差。将目标函数应用到豆瓣电影标签进行优化,并将结果与原标签进行比较分析。与原标签相比,优化后的标签准确性得到了提高。试验结果表明,该方法能够有效地优化标签,有效解决标签缺失和错误等问题。
中图分类号:
[1] BOYD D M, ELLISON N B. Social network sites:definition, history, and scholarship[J]. Engineering Management Review, IEEE, 2010, 38(3):16-31. [2] SIGURBJRNSSON B, VAN Zwol R. Flickr tag recommendation based on collective knowledge[C]//Proceedings of the 17th International Conference on World Wide Web. Beijing, China:ACM, 2008:327-336. [3] 张斌,张引,高克宁,等. 融合关系与内容分析的社会标签推荐[J]. 软件学报, 2012, 23(3):476-488. ZHANG Bin, ZHANG Yin, GAO Kening,et al.Combining relation and content analysis for social tagging recommendation[J]. Journal of Software, 2012, 23(3):476-488. [4] KRESTEL R, FANKHAUSER P, NEJDL W. Latent dirichlet allocation for tag recommendation[C]//Proceedings of the Third ACM Conference on Recommender Systems. New York, USA:ACM, 2009:61-68. [5] SYMEONIDIS P, NANOPOULOS A, MANOLOPOULOS Y. Tag recommendations based on tensor dimensionality reduction[C]//Proceedings of the 2008 ACM Conference on Recommender Systems. Lausanne:ACM, 2008:43-50. [6] JASCHKE R, MARINHO L, HOTHO A, et al. Tag recommendations in folksonomies[M]// Berlin Heidelberg:Springer, 2007:506-514. [7] LEE SEUNG. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401(6755):788-791. [8] CARBONELL J, GOLDSTEIN J. The use of MMR, diversity-based reranking for reordering documents and producing summaries[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA:ACM, 1998:335-336. [9] 张雷鸣, 李秋丹, 廖胜才. 非负矩阵分解在标签语义分析中的应用[J]. 计算机科学, 2010, 37(4):171-174. ZHANG Leiming, LI Qiudan, LIAO Shengcai. Application of non-negative matrix factorization in tag semantics analysis[J]. Computer Science, 2010, 37(4):171-174. [10] HOYER P O. Non-negative matrix factorization with sparseness constraints[J]. Journal of Machine Learning Research, 2004(5):1457-1469. [11] JIANG B. Improving collaborative tag recommendation by using local lexicon in social comment context[C]//Proceedings of 15th International Conference on Computer Supported Cooperative Work in Design (CSCWD). Lausanne, Switzerland:IEEE, 2011:577-580. [12] LIU D, HUA X S, WANG M, et al. Retagging social images based on visual and semantic consistency[C]//Proceedings of the 19th International Conference on World Wide Web. Raleigh, USA:ACM, 2010:1149-1150. [13] HOFMANN T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine Learning, 2001, 42(1-2):177-196. [14] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003(3):993-1022. [15] MERIALDO B. Tagging English text with a probabilistic model[J]. Computational Linguistics, 1994, 20(2):155-171. [16] 计智伟,胡珉,尹建新. 特征选择算法综述[J]. 电子设计工程,2011,19(9):46-51. JI Zhiwei, HU Min, YIN Jianxin. A survey of feature selection algorithm[J]. Electronic Design Engineering, 2011, 19(9):46-51. [17] 夏天. 汉语词语语义相似度计算研究[J]. 计算机工程, 2007, 33(6):191-194. XIA Tian. Study on Chinese words semantic similarity computation[J]. Computer Engineering, 2007, 33(6):191-194. [18] LIN D. Using syntactic dependency as local context to resolve word sense ambiguity[C]//Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. New York, USA: Association for Computational Linguistics, 1997:64-71. [19] LIU D, WANG M, YANG Y, et al. Tag quality improvement for social images[C]//Proceedings of IEEE International Conference on Multimedia and Expo. New York, USA:IEEE, 2009:350-353. [20] LEE D D, SEUNG H S. Algorithms for non-negative matrix factorization[J]. Advances in Neural Information Processing Systems, 2000(3):556-562. |
[1] | 林江豪,周咏梅,阳爱民,陈锦. 基于词向量的领域情感词典构建[J]. 山东大学学报(工学版), 2018, 48(3): 40-47. |
[2] | 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15. |
[3] | 尹坤,尹红风*,杨燕,贾真. 基于SimRank的百度百科词条语义相似度计算[J]. 山东大学学报(工学版), 2014, 44(3): 29-35. |
|