Journal of Shandong University(Engineering Science) ›› 2015, Vol. 45 ›› Issue (5): 22-28.doi: 10.6040/j.issn.1672-3961.0.2014.367

Previous Articles    

Parallel implementing probabilistic spreading algorithm using MapReduce programming mode

HE Dongzhi, ZHANG Jifeng, ZHAO Pengfei   

  1. School of Software Engineering, Beijing University of Technology, Beijing 100124, China
  • Published:2020-05-26

Abstract: In order to overcome the limitations of the serial probabilistic spreading algorithm in dealing with large-scale dataset, a parallelization of the algorithm was put forth by using MapReduce. The complex computing tasks were decomposed into a series of MapReduce job flow for distributed parallel processing on Hadoop. The input and output data of every step were stored in the Hadoop distributed file system. Hit ratio was used to compare the parallelizable probabilistic spreading algorithm versus the global ranking method performance. Speedups of the parallelizable algorithm were compared while the amount of data and the number of nodes was different. Experiment results showed that the probabilistic spreading algorithm based on MapReduce had good parallelism and had higher hit ratio than the global ranking method. Data scale that can be handled by the serial algorithm was expanded, and the operation speed of the algorithm was raised.

Key words: MapReduce, cloud computing paltform, bipartite network, probabilistic spreading algorithm, distributed

CLC Number: 

  • TP311
[1] 王国霞,刘贺平.个性化推荐系统综述[J].计算机工程与应用, 2012, 48(7):66-76. WANG Guoxia, LIU Heping. Survey of personalized recommendation system[J]. Computer Engineering and Applications, 2012, 48(7):66-76.
[2] L(¨overU)Linyuan, MEDO Matús, CHI Hoyeung, et al. Recommender systems[J]. Physics Reports, 2012, 519(1):1-49.
[3] 朱郁筱, 吕琳媛. 推荐系统评价指标综述[J]. 电子科技大学学报, 2012, 41(2):163-175. ZHU Yuxiao, L(¨overU)Linyuan. Evaluation metrics for recommender systems[J]. Journal of University of Electronic Science and Technology of China, 2012, 41(2):163-175.
[4] ZHOU Tao, REN Jie, MEDO Matús, et al. Bipartite network projection and personal recommendation[J]. Physical Review E, 2007, 76(4):046115.
[5] LATAPY Matthieu, MAGNIEN Clémence, DELVECCHIO Nathalie. Basic notions for the analysis of large two-mode networks[J]. Social Networks, 2008, 30(1):31-48.
[6] 吴亚晶, 张鹏, 狄增如, 等. 二分网络研究[J]. 复杂系统与复杂性科学, 2010, 7(1):1-12. WU Yajing, ZHANG Peng, DI Zengru, et al. Study on bipartite networks[J]. Complex Systems and Complexity Science, 2010, 7(1):1-12.
[7] 刘亮亮, 曹菡, 韩亚楠. 基于群体动力学的协同过滤算法及应用[J]. 计算机应用研究, 2014, 31(12):3603-3605, 3612. LIU Liangliang, CAO Han, HAN Yanan. Algorithm and application of collaborative filtering based on group dynamics[J]. Application Research of Computers, 2014, 31(12):3603-3605, 3612.
[8] KOREN Yehuda, BELL Robert. Recommender systems handbook[M]. New York: Springer, 2011:145-186.
[9] ADOMAVICIUS G, TUZHILIN A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions[J]. Knowledge and Data Engineering, 2005, 17(6):734-749.
[10] 陈吉荣, 乐嘉锦. 基于 Hadoop 生态系统的大数据解决方案综述[J]. 计算机工程与科学, 2013, 35(10):25-35. CHEN Jirong, LE Jiajin. Reviewing the big data solution based on hadoop ecosystem[J]. Computer Engineering & Science, 2013, 35(10):25-35.
[11] 冯汉超, 周凯东. 分布式系统下大数据存储结构优化研究[J]. 河北工程大学学报:自然科学版, 2014, 31(4):69-73. FENG Hanchao, ZHOU Kaidong. Research on optimizing big data storage structure in distributed system[J]. Journal of Hebei University of Engineering:Natural Science Edition, 2014, 31(4):69-73.
[12] 刘飞. 基于云计算的分布式存储系统的研究和应用[D]. 西安:西安工业大学, 2012. LIU Fei. Research and application of distributed storage system based on cloud computing[D]. Xi'an:Xi'an Technological University, 2012.
[13] BORTHAKUR D. HDFS architecture guide[J]. Hadoop Apache Project, 2008:53.
[14] 黄文依, 王劲松, 林胜. HDFS可视化操作研究与实现[J]. 天津理工大学学报, 2012, 28(1):31-34. HANG Wenyi, WANG Jinsong, LIN Sheng. Research and implementation of HDFS visualized operation[J]. Journal of Tianjin University of Technology, 2012, 28(1):31-34.
[15] KAVULYA S, TAN J, GANDHI R, et al. An analysis of traces from a production mapreduce cluster[C] //Processdings of Cluster, Cloud and Grid Computing(CCGrid). Melbourne:IEEE, 2010:94-103.
[16] LIN Wenhui, LEI Zhenming, LIU Jun, et al. MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment[J]. The Journal of China Universities of Posts and Telecommunications, 2013, 20(6):77-121.
[17] 应毅,刘亚军. MapReduce并行计算技术发展综述[J]. 计算机系统应用, 2014,23(4):1-6,11. YING Yi, LIU Yajun. Survey of developments of MapReduce parallel computing technology[J]. Computer Systems & Applications, 2014, 23(4):1-6,11.
[18] DITTRICH J, QUIANÉRUIZ J A. Efficient big data processing in hadoop MapReduce[J]. Proceedings of the VLDB Endowment, 2012, 5(12):2014-2015.
[19] 赵卫中,马慧芳,傅燕翔,等.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学, 2011(10):166-168. ZHAO Weizhong, MA Huifang, FU Yanxiang, et al. Research on parallel k-means algorithm design based on hadoop platform[J]. Computer Science, 2011(10):166-168.
[20] HASSANZADEH O, KEMENTSIETSIDIS A, KIMEL-FELD B, et al. Next generation data analytics at IBM research[J]. Proceedings of the VLDB Endowment, 2013, 6(11):1174-1175.
[21] BORTHAKUR D, GRAY J, SARMA J S, et al. Apache hadoop goes realtime at facebook[C] //Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2011:1071-1080.
[22] LEE H, KWON J. Similar user clustering based on movielens data set[J]. Advanced Science and Technology Letters, 2014, 51:32-35.
[1] Fangyun HAN,Liang QIAO,Bincheng ZHAO,Li ZHANG. Weighted value of solar tariff based on time-of-use electricity price [J]. Journal of Shandong University(Engineering Science), 2019, 49(6): 93-97,106.
[2] Fei WANG,Chunyi WANG,Chuanyong WANG,Guangfeng ZHAO,Mu LI,Xiaohan SHI. Restrictions and countermeasures of distributed PV integration based on secondary battery energy storage system [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 109-115, 121.
[3] Qingtao QU,Qicheng LIU,Chunxiao MU. A parallel adaptive news topic tracking algorithm based on N-Gram language model [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 37-43.
[4] WANG Qi, SUN Zhumei, LIU Shaohong, BAI Jianyun. Integration transform of dust removal system based on fieldbus compatible technology [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(4): 37-41.
[5] QIN Liguo, HE Xiao, ZHOU Donghua. A new distributed formation for multi-agent systems with constant time delays [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(5): 79-88.
[6] LIU Yang, LIU Bo, WANG Feng. Optimization algorithm for big data mining based on parameter server framework [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 1-6.
[7] LI Qingdong, LUO Weichao, YE Yingxin, ZHANG Chengrui, HU Tianliang. Sharing mechanism of machine tool resource based on ontology [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 130-138.
[8] SUN Yibing, FU Minyue, WANG Bingchang, ZHANG Huanshui. Distributed state estimation algorithm for large-scale dynamic systems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(6): 62-68.
[9] FAN Debin, DENG Changshou, YUAN Sihao, TAN Xujie, DONG Xiaogang. Distributed particle swarm optimization algorithm based on mapreduce [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(6): 23-30.
[10] CHEN Jiming, SUN Mingyu, YOU Jujuan, KANG Zhongjian. Research of reactive power optimization based on subspace bacterial colony chemotaxis algorithm to in distribution networks with distributed generation#br# [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(2): 49-54.
[11] CHEN Wen-qiang1, LIN Chen1,2, CHEN Ke3, CHEN Jin-xiu1, ZOU Quan1,2*. Distributed affinity propagation clustering algorithm based on GraphLab [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(5): 13-18.
[12] XU Ming-xia1,2, JIANG Xin-liang1, SUN Ming-ting3. Simplified calculation method under uniform distributed load for composite wall panels [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(6): 80-85.
[13] SHI Hai-dong, HU Dong-mei, WANG Xiao-dong. Statistical analysis of delay spread in distributed antenna systems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(1): 133-142.
[14] SUN Jia-bing1,2, ZHANG Cheng-jin1*. Optimal fusion filtering for systems with stochastic parametric
uncertainties and packet dropouts
[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(6): 59-65.
[15] GAO Hou-Lei, TIAN Jia, DU Jiang, WU Zhi-Gang, LIU Chu-Min. Distributed generation—new technology in energy development [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 106-110.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!