JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2017, Vol. 47 ›› Issue (4): 1-6.doi: 10.6040/j.issn.1672-3961.0.2016.339

    Next Articles

Optimization algorithm for big data mining based on parameter server framework

LIU Yang1, LIU Bo2, WANG Feng1   

  1. 1. Institute of Cloud Computing and Big Data, Henan University of Economics and Law, Zhengzhou 450046, Henan, China;
    2. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
  • Received:2016-09-03 Online:2017-08-20 Published:2016-09-03

Abstract: Traditional machine learning algorithms for small data were not applicable for mining of big data. An optimization algorithm for machine learning and big data mining was proposed. The iterative computation of machine learning algorithms was divided into two phases according to the change of model vector. According to the observation that most samples contributed little to the model update during the iteration, the computation load of machine learning algorithms could be reduced by reusing the iterative computing results of this kind of samples. The experimental results showed that the proposed method could reduce the computation load by 35%, with little effect on prediction accuracy of the training model.

Key words: big data, sample diversity, machine learning, distributed system, optimization

CLC Number: 

  • TU457
[1] 张引,陈敏,廖小飞. 大数据应用的现状与展望[J]. 计算机研究和发展,2013, 50(S2):216-233 ZHANG Yin, CHEN Min, LIAO Xiaofei. Big data applications: a survey[J]. Journal of Computer Research and Development, 2013, 50(S2):216-233.
[2] 王元卓,靳小龙,程学旗. 网络大数据:现状与展望[J]. 计算机学报,2013,36(6):1125-1138. WANG Yuanzhuo, JIN Xiaolong, CHENG Xueqi. Network big data: present and future[J]. Chinese Journal of Computers, 2013, 36(6):1125-1138.
[3] 张蕾,章毅. 大数据分析的无限深度神经网络方法[J]. 计算机研究与发展,2016,53(1):68-79. ZHANG Lei, ZHANG Yi. Big data analysis by infinite deep neural networks[J].Journal of Computer Research and Development, 2016, 53(1):68-79.
[4] 耿丽娟,李星毅. 用于大数据分类的KNN算法研究[J]. 计算机应用研究,2014, 31(5):1342-1344. GENG Lijuan, LI Xingyi. Improvements of KNN algorithm for big data classification[J]. Application Research of Computers, 2014, 31(5):1342-1344.
[5] 刘红岩,陈剑,陈国青. 数据挖掘中的数据分类算法综述[J].清华大学学报(自然科学版),2002,42(6):727-730. LIU Hongyan, CHEN Jian, CHEN Guoqing. Review of classification algorithms for data mining[J]. Journal of Tsinghua University(Science & Technology), 2002, 42(6):727-730.
[6] 何清,李宁,罗文娟,等. 大数据下的机器学习算法综述[J]. 模式识别与人工智能,2014,27(4):327-336. HE Qing, LI Ning, LUO Wenjuan, et al. A survey of machine learning algorithms for big data[J]. Pattern Recognition and Artificial Intelligence, 2014, 27(4):327-336.
[7] 吴启晖,邱俊飞,丁国如. 面向频谱大数据处理的机器学习方法[J].数据采集与处理,2015,30(4):703-713. WU Qihui, QIU Junfei, DING Guoru. Machine learning methods for big spectrum data processing[J]. Journal of Data Acquisition and Processing, 2015, 30(4):703-713.
[8] 程学旗,靳小龙,王元卓. 大数据系统和分析技术综述[J]. 软件学报,2014,25(9):1889-1908. CHENG Xueqi, JIN Xiaolong, WANG Yuanzhuo. Survey on big data system and analytic technology[J]. Journal of Software, 2014, 25(9):1889-1908.
[9] 郭迟,刘经南,方媛,等. 位置大数据的价值提取与协同挖掘方法[J]. 软件学报,2014, 25(4):713-730. GUO Chi, LIU Jingnan, FANG Yuan, et al. Value extraction and collaborative mining methods for location big data[J]. Journal of Software, 2014, 25(4):713-730.
[10] 陈国良,毛睿,陆克中. 大数据并行计算框架[J]. 科学通报,2015,60:566-569. CHEN Guoliang, MAO Rui, LU Kezhong. Parallel computing framework for big data[J]. Chinese Science Bulletin, 2015, 60:566-569.
[11] YUAN Jinhui, GAO Fei, HO Qirong, et al. Light LDA: big topic models on modest computer clusters[C] //Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: Springer, 2015:1351-1361
[12] KUMAR Abhimanu, BEUTEL Alex, HO Qirong, et al. Fugue: slow-worker-agnostic distributed learning for big models on big data[C] //Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Reykjavik, Iceland: JMLR, 2014:531-539.
[13] LIU Ji, WRIGHT S J, RE Christopher, et al. An asynchronous parallel stochastic coordinate descent algorithm[J]. Journal of Machine Learning Research, 2015, 16(1):285-322.
[14] HSIEH C J, YU H F, DHILLON I S. PASSCoDe: parallel asynchronous stochastic dual coordinate descent[C] //Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 2015: 2370-2379.
[15] CHU Chengtao, KIM Sangkyun, LIN Yian, et al. Map-reduce for machine learning on multicore[C] //20th Annual Conference on Neural Information Processing Systems Vancouver. British Columbia, Canada: MIT Press, 2006:281-288.
[16] POWER Russell, LI Jinyang. Piccolo: building fast, distributed programs with partitioned tables[C] //9th USENIX Symposium on Operating Systems Design and Implementation. Vancouver, Canada: USENIX, 2010: 293-306.
[17] CHILIMBI Trishul, SUZUE Yutaka, APACIBLE Johnson, et al. Project adam: building an efficient and scalable deep learning training system[C] //11th USENIX Symposium on Operating Systems Design and Implementation. Broomfield, USA: USENIX, 2014: 571-582.
[18] XING Eric P, HO Qirong, DAI Wei, et al. Petuum: a new platform for distributed machine learning on big data[C] //Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney, NSW, Australia: ACM, 2015: 1335-1344.
[19] LI Mu, ANDERSEN David G, PARK Jun Woo, et al. Scaling distributed machine learning with the parameter server[C] //11th USENIX Symposium on Operating Systems Design and Implementation. Broomfield, USA: USENIX, 2014:583-598.
[20] LI Mu, ANDERSEN David G, SMOLA Alexander J, et al. Communication efficient distributed machine learning with the parameter server[C] //28th Annual Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014: 19-27.
[21] HO Qirong, CIPAR James, CUI Henggang, et al. More effective distributed ML via a stale synchronous parallel parameter server[C] //27th Annual Conference on Neural Information Processing Systems. Lake Tahoe, United States: MIT Press, 2013: 1223-1231.
[22] LANGFORD John, SMOLA Alexander J, ZINKEVICH Martin. Slow learners are fast[C] //23rd Annual Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2009: 2331-2339.
[23] ZINKEVICH Martin A, WEIMER Markus, SMOLA Alex, et al. Parallelized stochastic gradient descent[C] //24th Annual Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2009: 2331-233.
[24] LEWIS David D, YANG Yiming, ROSE Tony G, et al. RCV1: a new benchmark collection for text categorization research[J]. Journal of Machine Learning Research, 2004, 5:361-397.
[1] SHAO Mengwei, YUAN Shifei, ZHOU Hongzhi, WANG Naihua. Optimisation of finned tube structure based on BP neural network and genetic algorithm [J]. Journal of Shandong University(Engineering Science), 2025, 55(6): 76-82.
[2] LI Xiaohui, LIU Xiaofei, SUN Weitong, ZHAO Yi, DONG Yuan, JIN Yinli. An inspection task assignment and path planning algorithm based on vehicles-UAVs collaboration [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 101-109.
[3] WEN Yujie, ZHANG Damin. Enhanced beluga whale optimization algorithm and its application [J]. Journal of Shandong University(Engineering Science), 2025, 55(3): 88-99.
[4] ZHU Ming, SHI Chenglong, LÜ Pan, LIU Xianrong, SUN Chi, CHEN Jiancheng, FAN Hongyun. Deformation prediction method and engineering application of deep foundation pit based on optimized LSTM method [J]. Journal of Shandong University(Engineering Science), 2025, 55(3): 141-148.
[5] YAN Renwu, LIN Jianxiong, LI Peiqiang, WU Guoyao, KUANG Yu. Bi-level optimization strategy for active distribution networks considering carbon emission factors and dynamic reconfiguration [J]. Journal of Shandong University(Engineering Science), 2025, 55(2): 16-27.
[6] ZHENG Fangyuan, CHEN Lizheng, WANG Wenkui, ZHANG Hanyuan, FAN Yingle. Intelligent building energy optimization considering user satisfaction [J]. Journal of Shandong University(Engineering Science), 2025, 55(2): 45-57.
[7] PENG Zhenhua, WANG Zhechao, LI Jiajia, QIAO Liping, ZHAO Qinni, LI Hanshuo. Evaluation of containment properties and optimization design of water curtain system for an extended underground oil storage cavern [J]. Journal of Shandong University(Engineering Science), 2025, 55(2): 125-133.
[8] Xiuguang SONG,Xinming GUO,Fang YAN,Guoqiang LI,Yuan TIAN. Intelligent scheduling technology of highway emergency rescue vehicle [J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 1-17.
[9] Haoyuan LI,Jingming YU,Guilin ZHANG,Bin ZHANG. Optimization of manufacturing parameters for optical fiber preform core based on intelligent algorithm [J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 149-156.
[10] Caihui LIU,Qi ZHOU,Xiaowen YE. An intrusion detection model based on improved ReliefF algorithm [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 1-10.
[11] Shaowei YU,Ruiling QIN,Jingjing GUAN,Can JI,Shuo FENG,Rui JIANG,Yingning LIU. Eco-driving model for connected and automated vehicle platoons using the traffic capacity remainder [J]. Journal of Shandong University(Engineering Science), 2022, 52(6): 23-29.
[12] Xuhao WANG,Qianqian LIU,Hucheng LI,cheng LI,Peng LI,Yifeng LING. Research and optimization of hollow form of precast concrete pavement slab [J]. Journal of Shandong University(Engineering Science), 2022, 52(4): 139-150.
[13] Xiushan NIE,Yuling MA,Huiyan QIAO,Jie GUO,Chaoran CUI,Zhiyun YU,Xingbo LIU,Yilong YIN. Survey on student academic performance prediction from the perspective of task granularity [J]. Journal of Shandong University(Engineering Science), 2022, 52(2): 1-14.
[14] Ruiyi YAN,Zhen DONG,Sen LU,Yanhua LAI,Mingxin LÜ. Analysis of the influence of the underside baffle and deflector of the fume hood on the flow field [J]. Journal of Shandong University(Engineering Science), 2021, 51(5): 122-130.
[15] HUANG Cheng, YUAN Dongfeng, ZHANG Haixia. Optimization of digital twin job scheduling problem based on lion swarm algorithm [J]. Journal of Shandong University(Engineering Science), 2021, 51(4): 17-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] SUN Cong-zheng,GUAN Cong-sheng,QIN Jing-yu,CHENG Chuan . The structure and performances of the electroless Ni-P alloy coating on aluminum alloy[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(5): 108 -112 .
[2] LIU Xin 1, SONG Sili 1, WANG Xinhong 2. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 98 -100 .
[3] PAN Duo-tao,LIU Gui-ping,LIU Chang-feng . Screening of microbe producing flocculant and optimizationon its cultural conditions[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 99 -103 .
[4] XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu. The sentiment mining method based on extended sentiment dictionary and integrated features[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 15 -18 .
[5] ZHANG Ying-Chun, WANG Zuo-Xun, WANG Gui-Juan. High voltage cable temperature measurement system based on neural network controller[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 62 -67 .
[6] MENG Jian, LI Yibin, LI Bin. Bound gait controlling method of quadruped robot[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(3): 28 -34 .
[7] FANG Ting,YANG Zhong,SHEN Chun-Lin . Multiple targets accurate tracking on UAV formation video sequences[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 22 -26 .
[8] LI Meng-li, WANG Wei-qiang ,XU Shu-gen , SONG Ming-da. Possibility analysis on chemical explosion of material causing urea  reactor cylinder fracture[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(6): 1 -6 .
[9] PAN Guo-dong,WANG Jia-ye,XIANG Hui . A note on proof of the 3-Color problem of the polygon triangulation graph[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(1): 74 -75 .
[10] WANG Xiu-hong,GUO Qing-qiang,LI Qi-qiang . Highorder cumulant adaptive filter based on particle swarm optimization[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(6): 15 -19 .