您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 17-24.doi: 10.6040/j.issn.1672-3961.0.2017.411

• • 上一篇    下一篇

电商商品嵌入表示分类方法

龙柏1,曾宪宇1,李徵1,2,刘淇1*   

  1. 1. 中国科学技术大学计算机科学与技术学院大数据分析与应用安徽省重点实验室, 安徽 合肥 230000;2. 中国科学技术大学软件学院, 安徽 合肥 230000
  • 收稿日期:2017-05-17 出版日期:2018-06-20 发布日期:2017-05-17
  • 通讯作者: 刘淇(1986— ),男,山东临沂人,副教授,博士,主要研究方向为数据挖掘与知识发现、机器学习方法及其应用.E-mail: qiliuql@ustc.edu.cn E-mail:blong@ustc.edu.cn
  • 作者简介:龙柏(1980— ),男,安徽桐城人,高级工程师,博士,主要研究方向为高性能计算,数据挖掘及其应用. E-mail: blong@ustc.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(No61403358,61672483,U1605251);中科院青年创新促进会会员专项基金资助项目(会员编号2014299)

Item embedding classification method for E-commerce

LONG Bai1, ZENG Xianyu1, LI Zhi1,2, LIU Qi1*   

  1. 1. Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230000, Anhui, China;
    2. School of Software Engineering, University of Science and Technology of China, Hefei 230000, Anhui, China
  • Received:2017-05-17 Online:2018-06-20 Published:2017-05-17

摘要: 借鉴近些年来在自然语言处理领域卓有成效的一种词嵌入模型word2vec,提出两种商品嵌入表示模型item2vec和w-item2vec。提出的两种模型通过对用户在每次购买时对商品的比较和选择行为进行建模,将商品表示为一个低维空间的向量,该向量可以有效地对不同商品之间的关系和性质进行度量。应用这一性质,使用item2vec和w-item2vec得到的向量对商品进行分类,试验结果表明:在仅使用10%数据训练的基础上,w-item2vec对商品分类的准确率可以接近50%。两种模型分类准确性均显著优于其他模型。

关键词: 电子商务, 商品分类, 词嵌入, 行为建模, 商品嵌入

Abstract: Inspired by the Word Embedding Model word2vec, which proved higly successful in the field of Natural Language Processing in recent years, two Item Embedding models item2vec and w-item2vec were proposed. By modeling users behaviour sequences, both item2vec and w-item2vec projected the items to distributed representations in vector space. The vectors of items represented the properties of items and could be used to measure the relations between items. By means of this property, we could categorize products effectively and efficiently. Experimental results showed that methods were conducted on a real-world dataset and w-item2vec achieved an accuracy of nearly 50% for item categorization by using only 10% of the items for training. Two proposed models outperformed other methods obviously.

Key words: item categorization, item emebedding, behavior modeling, word embedding, E-commerce

中图分类号: 

  • TP391
[1] 中华人民共和国商务部:中国电子商务报告(2016)[EB/OL].(2017-06-14)[2017-06-28]. http://images.mofcom.gov.cn/dzsws/ 201706/20170621110205702.pdf
[2] SHEN D, RUVINI J D, SARWAR B. Large-scale item categorization for e-commerce[C] //Proceedings of the 21st ACM International Conference on Information and Knowledge Management. Hawaii, USA: ACM, 2012: 595-604.
[3] CHEN J, WARREN D. Cost-sensitive learning for large-scale hierarchical classification[C] //Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. San Francisco, USA: ACM, 2013: 1351-1360.
[4] DEKEL O, KESHET J, SINGER Y. Large margin hierarchical classification[C] //Proceedings of the 21st International Conference on Machine Learning. Banff, Canada: ACM, 2004: 27.
[5] DAS P, XIA Y, LEVINE A, et al. Large-scale taxonomy categorization for noisy product listings[C] //Proceedings of IEEE International Conference on Big Data. Honolulu, USA: IEEE, 2017:3885-3894.
[6] DIMITROVSKI I, KOCEV D, KITANOVSKI I, et al. Improved medical image modality classification using a combination of visual and textual features[J]. Computerized Medical Imaging and Graphics, 2015, 39(1): 14-26.
[7] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[8] LIU Q, ZENG X, ZHU H, et al. Mining indecisiveness in customer behaviors[C] // Proceedings of IEEE International Conference on Data Mining. Barcelona, Spain: IEEE, 2016:281-290.
[9] HINTON G E, MCCLELLAND J L, RUMELHART D E. Distributed representations[M]. New York, USA: Encyclopedia of Cognitive Science. John Wiley & Sons, Ltd, 2006:77-109.
[10] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26(1):3111-3119.
[11] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [C] // Proceedings of International Conference on Learning Representations. Scottsdale, USA: ICLR, 2013:1-12.
[12] PEROZZI B, Al-RFOU R, SKIENA S. Deepwalk: Online learning of social representations[C] //Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2014: 701-710.
[13] GROVER A, LESKOVEC J. node2vec: Scalable feature learning for networks[C] //Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016: 855-864.
[14] GRBOVIC M, DJURIC N, RADOSAVLJEVIC V, et al. Context-and content-aware embeddings for query rewriting in sponsored search[C] //Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. Santiago, Chile: ACM, 2015: 383-392.
[15] PRESS S J, WELSON S. Choosing between logistic regression and discriminant analysis[J]. Journal of the American Statistical Association, 1978, 73(364): 699-705.
[16] SUYKENS J A K, VANDEWALLE J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9(3): 293-300.
[17] JANSEN Bernard J, SPINK A, BLAKELY C, et al. Defining a session on web search engines: research articles[J]. Journal of the American Society for Information Science and Technology, 2007, 58(6): 862-871.
[18] GUTHRIE D, ALLISON B, LIU W, et al. A closer look at skip-gram modelling[C] //Proceedings of the 5th international Conference on Language Resources and Evaluation(LREC-2006). Genoa, Italy: ELRA, 2006: 1-4.
[19] GUTMANN M U, HYVÄRINEN A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics[J]. Journal of Machine Learning Research, 2012, 13(2): 307-361.
[20] MNIH A, TEH Y W. A fast and simple algorithm for training neural probabilistic language models [C] // Proceedings of International Coference on International Conference on Machine Learning. Omnipress, Scotland: PMLR, 2012:419-426.
[21] MNIH A, KAVUKCUOGLU K. Learning word embeddings efficiently with noise-contrastive estimation [C] // Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, USA: NIPS, 2013: 2265-2273.
[22] SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C] //Proceedings of the 10th International Conference on World Wide Web. Hong Kong, China: ACM, 2001: 285-295.
[23] MNIH A, SALAKHUTDINOV R R. Probabilistic matrix factorization[C] // Proceedings of Advances in Neural Information Processing Systems. Whistler, Canada: NIPS, 2008: 1257-1264.
[1] 陈大伟,闫昭*,刘昊岩. SVD系列算法在评分预测中的过拟合现象[J]. 山东大学学报(工学版), 2014, 44(3): 15-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!