电商商品嵌入表示分类方法

doi:10.6040/j.issn.1672-3961.0.2017.411

山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 17-24.doi: 10.6040/j.issn.1672-3961.0.2017.411

电商商品嵌入表示分类方法

龙柏¹,曾宪宇¹,李徵^1,2,刘淇^1*

1. 中国科学技术大学计算机科学与技术学院大数据分析与应用安徽省重点实验室, 安徽合肥 230000;2. 中国科学技术大学软件学院, 安徽合肥 230000

收稿日期:2017-05-17 出版日期:2018-06-20 发布日期:2017-05-17
通讯作者: 刘淇(1986— ),男,山东临沂人,副教授,博士,主要研究方向为数据挖掘与知识发现、机器学习方法及其应用.E-mail: qiliuql@ustc.edu.cn E-mail:blong@ustc.edu.cn
作者简介:龙柏(1980— ),男,安徽桐城人,高级工程师,博士,主要研究方向为高性能计算,数据挖掘及其应用. E-mail: blong@ustc.edu.cn
基金资助:
国家自然科学基金资助项目(No61403358,61672483,U1605251);中科院青年创新促进会会员专项基金资助项目(会员编号2014299)

Item embedding classification method for E-commerce

LONG Bai¹, ZENG Xianyu¹, LI Zhi^1,2, LIU Qi^1*

1. Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230000, Anhui, China;
2. School of Software Engineering, University of Science and Technology of China, Hefei 230000, Anhui, China

Received:2017-05-17 Online:2018-06-20 Published:2017-05-17

摘要/Abstract

摘要： 借鉴近些年来在自然语言处理领域卓有成效的一种词嵌入模型word2vec,提出两种商品嵌入表示模型item2vec和w-item2vec。提出的两种模型通过对用户在每次购买时对商品的比较和选择行为进行建模,将商品表示为一个低维空间的向量,该向量可以有效地对不同商品之间的关系和性质进行度量。应用这一性质,使用item2vec和w-item2vec得到的向量对商品进行分类,试验结果表明:在仅使用10%数据训练的基础上,w-item2vec对商品分类的准确率可以接近50%。两种模型分类准确性均显著优于其他模型。

关键词: 电子商务, 商品分类, 词嵌入, 行为建模, 商品嵌入

Abstract: Inspired by the Word Embedding Model word2vec, which proved higly successful in the field of Natural Language Processing in recent years, two Item Embedding models item2vec and w-item2vec were proposed. By modeling users behaviour sequences, both item2vec and w-item2vec projected the items to distributed representations in vector space. The vectors of items represented the properties of items and could be used to measure the relations between items. By means of this property, we could categorize products effectively and efficiently. Experimental results showed that methods were conducted on a real-world dataset and w-item2vec achieved an accuracy of nearly 50% for item categorization by using only 10% of the items for training. Two proposed models outperformed other methods obviously.

Key words: item categorization, item emebedding, behavior modeling, word embedding, E-commerce

中图分类号:

TP391

龙柏,曾宪宇,李徵,刘淇. 电商商品嵌入表示分类方法[J]. 山东大学学报(工学版), 2018, 48(3): 17-24.

LONG Bai, ZENG Xianyu, LI Zhi, LIU Qi. Item embedding classification method for E-commerce[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 17-24.

参考文献

[1] 中华人民共和国商务部:中国电子商务报告(2016)[EB/OL].(2017-06-14)[2017-06-28]. http://images.mofcom.gov.cn/dzsws/ 201706/20170621110205702.pdf
[2] SHEN D, RUVINI J D, SARWAR B. Large-scale item categorization for e-commerce[C] //Proceedings of the 21st ACM International Conference on Information and Knowledge Management. Hawaii, USA: ACM, 2012: 595-604.
[3] CHEN J, WARREN D. Cost-sensitive learning for large-scale hierarchical classification[C] //Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. San Francisco, USA: ACM, 2013: 1351-1360.
[4] DEKEL O, KESHET J, SINGER Y. Large margin hierarchical classification[C] //Proceedings of the 21st International Conference on Machine Learning. Banff, Canada: ACM, 2004: 27.
[5] DAS P, XIA Y, LEVINE A, et al. Large-scale taxonomy categorization for noisy product listings[C] //Proceedings of IEEE International Conference on Big Data. Honolulu, USA: IEEE, 2017:3885-3894.
[6] DIMITROVSKI I, KOCEV D, KITANOVSKI I, et al. Improved medical image modality classification using a combination of visual and textual features[J]. Computerized Medical Imaging and Graphics, 2015, 39(1): 14-26.
[7] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[8] LIU Q, ZENG X, ZHU H, et al. Mining indecisiveness in customer behaviors[C] // Proceedings of IEEE International Conference on Data Mining. Barcelona, Spain: IEEE, 2016:281-290.
[9] HINTON G E, MCCLELLAND J L, RUMELHART D E. Distributed representations[M]. New York, USA: Encyclopedia of Cognitive Science. John Wiley & Sons, Ltd, 2006:77-109.
[10] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26(1):3111-3119.
[11] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [C] // Proceedings of International Conference on Learning Representations. Scottsdale, USA: ICLR, 2013:1-12.
[12] PEROZZI B, Al-RFOU R, SKIENA S. Deepwalk: Online learning of social representations[C] //Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2014: 701-710.
[13] GROVER A, LESKOVEC J. node2vec: Scalable feature learning for networks[C] //Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016: 855-864.
[14] GRBOVIC M, DJURIC N, RADOSAVLJEVIC V, et al. Context-and content-aware embeddings for query rewriting in sponsored search[C] //Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. Santiago, Chile: ACM, 2015: 383-392.
[15] PRESS S J, WELSON S. Choosing between logistic regression and discriminant analysis[J]. Journal of the American Statistical Association, 1978, 73(364): 699-705.
[16] SUYKENS J A K, VANDEWALLE J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9(3): 293-300.
[17] JANSEN Bernard J, SPINK A, BLAKELY C, et al. Defining a session on web search engines: research articles[J]. Journal of the American Society for Information Science and Technology, 2007, 58(6): 862-871.
[18] GUTHRIE D, ALLISON B, LIU W, et al. A closer look at skip-gram modelling[C] //Proceedings of the 5th international Conference on Language Resources and Evaluation(LREC-2006). Genoa, Italy: ELRA, 2006: 1-4.
[19] GUTMANN M U, HYVÄRINEN A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics[J]. Journal of Machine Learning Research, 2012, 13(2): 307-361.
[20] MNIH A, TEH Y W. A fast and simple algorithm for training neural probabilistic language models [C] // Proceedings of International Coference on International Conference on Machine Learning. Omnipress, Scotland: PMLR, 2012:419-426.
[21] MNIH A, KAVUKCUOGLU K. Learning word embeddings efficiently with noise-contrastive estimation [C] // Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, USA: NIPS, 2013: 2265-2273.
[22] SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C] //Proceedings of the 10th International Conference on World Wide Web. Hong Kong, China: ACM, 2001: 285-295.
[23] MNIH A, SALAKHUTDINOV R R. Probabilistic matrix factorization[C] // Proceedings of Advances in Neural Information Processing Systems. Whistler, Canada: NIPS, 2008: 1257-1264.

多维度评价

Viewed

Full text

676

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	676

From	Others	local

Times	42	634
Rate	6%	94%

Abstract

1310

Just accepted	Online first	Issue

0	0	1310

From	Others	local

Times	1308	2
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

电商商品嵌入表示分类方法

Item embedding classification method for E-commerce

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

多维度评价

本文评价

推荐阅读 0