您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 17-24.doi: 10.6040/j.issn.1672-3961.0.2017.411

• • 上一篇    下一篇

电商商品嵌入表示分类方法

龙柏1,曾宪宇1,李徵1,2,刘淇1*   

  1. 1. 中国科学技术大学计算机科学与技术学院大数据分析与应用安徽省重点实验室, 安徽 合肥 230000;2. 中国科学技术大学软件学院, 安徽 合肥 230000
  • 收稿日期:2017-05-17 出版日期:2018-06-20 发布日期:2017-05-17
  • 通讯作者: 刘淇(1986— ),男,山东临沂人,副教授,博士,主要研究方向为数据挖掘与知识发现、机器学习方法及其应用.E-mail: qiliuql@ustc.edu.cn E-mail:blong@ustc.edu.cn
  • 作者简介:龙柏(1980— ),男,安徽桐城人,高级工程师,博士,主要研究方向为高性能计算,数据挖掘及其应用. E-mail: blong@ustc.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(No61403358,61672483,U1605251);中科院青年创新促进会会员专项基金资助项目(会员编号2014299)

Item embedding classification method for E-commerce

LONG Bai1, ZENG Xianyu1, LI Zhi1,2, LIU Qi1*   

  1. 1. Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230000, Anhui, China;
    2. School of Software Engineering, University of Science and Technology of China, Hefei 230000, Anhui, China
  • Received:2017-05-17 Online:2018-06-20 Published:2017-05-17

摘要: 借鉴近些年来在自然语言处理领域卓有成效的一种词嵌入模型word2vec,提出两种商品嵌入表示模型item2vec和w-item2vec。提出的两种模型通过对用户在每次购买时对商品的比较和选择行为进行建模,将商品表示为一个低维空间的向量,该向量可以有效地对不同商品之间的关系和性质进行度量。应用这一性质,使用item2vec和w-item2vec得到的向量对商品进行分类,试验结果表明:在仅使用10%数据训练的基础上,w-item2vec对商品分类的准确率可以接近50%。两种模型分类准确性均显著优于其他模型。

关键词: 电子商务, 商品分类, 词嵌入, 行为建模, 商品嵌入

Abstract: Inspired by the Word Embedding Model word2vec, which proved higly successful in the field of Natural Language Processing in recent years, two Item Embedding models item2vec and w-item2vec were proposed. By modeling users behaviour sequences, both item2vec and w-item2vec projected the items to distributed representations in vector space. The vectors of items represented the properties of items and could be used to measure the relations between items. By means of this property, we could categorize products effectively and efficiently. Experimental results showed that methods were conducted on a real-world dataset and w-item2vec achieved an accuracy of nearly 50% for item categorization by using only 10% of the items for training. Two proposed models outperformed other methods obviously.

Key words: item categorization, item emebedding, behavior modeling, word embedding, E-commerce

中图分类号: 

  • TP391
[1] 中华人民共和国商务部:中国电子商务报告(2016)[EB/OL].(2017-06-14)[2017-06-28]. http://images.mofcom.gov.cn/dzsws/ 201706/20170621110205702.pdf
[2] SHEN D, RUVINI J D, SARWAR B. Large-scale item categorization for e-commerce[C] //Proceedings of the 21st ACM International Conference on Information and Knowledge Management. Hawaii, USA: ACM, 2012: 595-604.
[3] CHEN J, WARREN D. Cost-sensitive learning for large-scale hierarchical classification[C] //Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. San Francisco, USA: ACM, 2013: 1351-1360.
[4] DEKEL O, KESHET J, SINGER Y. Large margin hierarchical classification[C] //Proceedings of the 21st International Conference on Machine Learning. Banff, Canada: ACM, 2004: 27.
[5] DAS P, XIA Y, LEVINE A, et al. Large-scale taxonomy categorization for noisy product listings[C] //Proceedings of IEEE International Conference on Big Data. Honolulu, USA: IEEE, 2017:3885-3894.
[6] DIMITROVSKI I, KOCEV D, KITANOVSKI I, et al. Improved medical image modality classification using a combination of visual and textual features[J]. Computerized Medical Imaging and Graphics, 2015, 39(1): 14-26.
[7] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[8] LIU Q, ZENG X, ZHU H, et al. Mining indecisiveness in customer behaviors[C] // Proceedings of IEEE International Conference on Data Mining. Barcelona, Spain: IEEE, 2016:281-290.
[9] HINTON G E, MCCLELLAND J L, RUMELHART D E. Distributed representations[M]. New York, USA: Encyclopedia of Cognitive Science. John Wiley & Sons, Ltd, 2006:77-109.
[10] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26(1):3111-3119.
[11] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [C] // Proceedings of International Conference on Learning Representations. Scottsdale, USA: ICLR, 2013:1-12.
[12] PEROZZI B, Al-RFOU R, SKIENA S. Deepwalk: Online learning of social representations[C] //Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2014: 701-710.
[13] GROVER A, LESKOVEC J. node2vec: Scalable feature learning for networks[C] //Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016: 855-864.
[14] GRBOVIC M, DJURIC N, RADOSAVLJEVIC V, et al. Context-and content-aware embeddings for query rewriting in sponsored search[C] //Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. Santiago, Chile: ACM, 2015: 383-392.
[15] PRESS S J, WELSON S. Choosing between logistic regression and discriminant analysis[J]. Journal of the American Statistical Association, 1978, 73(364): 699-705.
[16] SUYKENS J A K, VANDEWALLE J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9(3): 293-300.
[17] JANSEN Bernard J, SPINK A, BLAKELY C, et al. Defining a session on web search engines: research articles[J]. Journal of the American Society for Information Science and Technology, 2007, 58(6): 862-871.
[18] GUTHRIE D, ALLISON B, LIU W, et al. A closer look at skip-gram modelling[C] //Proceedings of the 5th international Conference on Language Resources and Evaluation(LREC-2006). Genoa, Italy: ELRA, 2006: 1-4.
[19] GUTMANN M U, HYVÄRINEN A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics[J]. Journal of Machine Learning Research, 2012, 13(2): 307-361.
[20] MNIH A, TEH Y W. A fast and simple algorithm for training neural probabilistic language models [C] // Proceedings of International Coference on International Conference on Machine Learning. Omnipress, Scotland: PMLR, 2012:419-426.
[21] MNIH A, KAVUKCUOGLU K. Learning word embeddings efficiently with noise-contrastive estimation [C] // Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, USA: NIPS, 2013: 2265-2273.
[22] SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C] //Proceedings of the 10th International Conference on World Wide Web. Hong Kong, China: ACM, 2001: 285-295.
[23] MNIH A, SALAKHUTDINOV R R. Probabilistic matrix factorization[C] // Proceedings of Advances in Neural Information Processing Systems. Whistler, Canada: NIPS, 2008: 1257-1264.
[1] 李港龙,林培光,李金玉,王倩. 基于智能合约的电商社区式问答服务平台设计[J]. 山东大学学报 (工学版), 2024, 54(6): 57-71.
[2] 孙志巍,宋明阳,潘泽华,景丽萍. 上下文感知的判别式主题模型[J]. 山东大学学报 (工学版), 2022, 52(4): 131-138.
[3] 陈大伟,闫昭*,刘昊岩. SVD系列算法在评分预测中的过拟合现象[J]. 山东大学学报(工学版), 2014, 44(3): 15-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[2] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[3] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[4] 李梁,罗奇鸣,陈恩红. 对象级搜索中基于图的对象排序模型(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 15 -21 .
[5] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[6] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[7] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .
[8] Yue Khing Toh1 , XIAO Wendong2 , XIE Lihua1 . 基于无线传感器网络的分散目标跟踪:实际测试平台的开发应用(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 50 -56 .
[9] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .
[10] 孙玉利,李法德,左敦稳,戚美 . 直立分室式流体连续通电加热系统的升温特性[J]. 山东大学学报(工学版), 2006, 36(6): 19 -23 .