您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2015, Vol. 45 ›› Issue (5): 22-28.doi: 10.6040/j.issn.1672-3961.0.2014.367

• • 上一篇    下一篇

不确定性传播算法的MapReduce并行化实现

何东之,张吉沣,赵鹏飞   

  1. 北京工业大学软件学院, 北京 100124
  • 发布日期:2020-05-26
  • 作者简介:何东之(1970- ),男,山东临沂人,副教授,博士,主要研究方向为嵌入式系统和新型人机交互.E-mail:victor@bjut.edu.cn
  • 基金资助:
    北京市教委基金资助项目(PXM2011_014204_09_000232)

Parallel implementing probabilistic spreading algorithm using MapReduce programming mode

HE Dongzhi, ZHANG Jifeng, ZHAO Pengfei   

  1. School of Software Engineering, Beijing University of Technology, Beijing 100124, China
  • Published:2020-05-26

摘要: 为了克服单机串行不确定性传播算法处理大规模数据集的局限,采用MapReduce编程模型对算法进行并行化实现。将单机算法按照算法流程进行拆分,每一步对应一个MapReduce程序。每一步的输入及输出数据都存储在Hadoop分布式文件系统上。用命中率对比并行化的不确定性传播算法与全局排名算法的性能。对比不同数据量、不同节点数时并行化的不确定性传播算法的加速比。试验结果表明,不确定性传播算法MapReduce并行化后部署在Hadoop集群上运行,命中率显著高于全局排名算法,且有着较好的并行性,扩大了单机算法所能处理的数据规模且提高了算法的运算速度。

关键词: MapReduce, 云计算平台, 二分网络, 不确定性传播算法, 分布式

Abstract: In order to overcome the limitations of the serial probabilistic spreading algorithm in dealing with large-scale dataset, a parallelization of the algorithm was put forth by using MapReduce. The complex computing tasks were decomposed into a series of MapReduce job flow for distributed parallel processing on Hadoop. The input and output data of every step were stored in the Hadoop distributed file system. Hit ratio was used to compare the parallelizable probabilistic spreading algorithm versus the global ranking method performance. Speedups of the parallelizable algorithm were compared while the amount of data and the number of nodes was different. Experiment results showed that the probabilistic spreading algorithm based on MapReduce had good parallelism and had higher hit ratio than the global ranking method. Data scale that can be handled by the serial algorithm was expanded, and the operation speed of the algorithm was raised.

Key words: MapReduce, cloud computing paltform, bipartite network, probabilistic spreading algorithm, distributed

中图分类号: 

  • TP311
[1] 王国霞,刘贺平.个性化推荐系统综述[J].计算机工程与应用, 2012, 48(7):66-76. WANG Guoxia, LIU Heping. Survey of personalized recommendation system[J]. Computer Engineering and Applications, 2012, 48(7):66-76.
[2] L(¨overU)Linyuan, MEDO Matús, CHI Hoyeung, et al. Recommender systems[J]. Physics Reports, 2012, 519(1):1-49.
[3] 朱郁筱, 吕琳媛. 推荐系统评价指标综述[J]. 电子科技大学学报, 2012, 41(2):163-175. ZHU Yuxiao, L(¨overU)Linyuan. Evaluation metrics for recommender systems[J]. Journal of University of Electronic Science and Technology of China, 2012, 41(2):163-175.
[4] ZHOU Tao, REN Jie, MEDO Matús, et al. Bipartite network projection and personal recommendation[J]. Physical Review E, 2007, 76(4):046115.
[5] LATAPY Matthieu, MAGNIEN Clémence, DELVECCHIO Nathalie. Basic notions for the analysis of large two-mode networks[J]. Social Networks, 2008, 30(1):31-48.
[6] 吴亚晶, 张鹏, 狄增如, 等. 二分网络研究[J]. 复杂系统与复杂性科学, 2010, 7(1):1-12. WU Yajing, ZHANG Peng, DI Zengru, et al. Study on bipartite networks[J]. Complex Systems and Complexity Science, 2010, 7(1):1-12.
[7] 刘亮亮, 曹菡, 韩亚楠. 基于群体动力学的协同过滤算法及应用[J]. 计算机应用研究, 2014, 31(12):3603-3605, 3612. LIU Liangliang, CAO Han, HAN Yanan. Algorithm and application of collaborative filtering based on group dynamics[J]. Application Research of Computers, 2014, 31(12):3603-3605, 3612.
[8] KOREN Yehuda, BELL Robert. Recommender systems handbook[M]. New York: Springer, 2011:145-186.
[9] ADOMAVICIUS G, TUZHILIN A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions[J]. Knowledge and Data Engineering, 2005, 17(6):734-749.
[10] 陈吉荣, 乐嘉锦. 基于 Hadoop 生态系统的大数据解决方案综述[J]. 计算机工程与科学, 2013, 35(10):25-35. CHEN Jirong, LE Jiajin. Reviewing the big data solution based on hadoop ecosystem[J]. Computer Engineering & Science, 2013, 35(10):25-35.
[11] 冯汉超, 周凯东. 分布式系统下大数据存储结构优化研究[J]. 河北工程大学学报:自然科学版, 2014, 31(4):69-73. FENG Hanchao, ZHOU Kaidong. Research on optimizing big data storage structure in distributed system[J]. Journal of Hebei University of Engineering:Natural Science Edition, 2014, 31(4):69-73.
[12] 刘飞. 基于云计算的分布式存储系统的研究和应用[D]. 西安:西安工业大学, 2012. LIU Fei. Research and application of distributed storage system based on cloud computing[D]. Xi'an:Xi'an Technological University, 2012.
[13] BORTHAKUR D. HDFS architecture guide[J]. Hadoop Apache Project, 2008:53.
[14] 黄文依, 王劲松, 林胜. HDFS可视化操作研究与实现[J]. 天津理工大学学报, 2012, 28(1):31-34. HANG Wenyi, WANG Jinsong, LIN Sheng. Research and implementation of HDFS visualized operation[J]. Journal of Tianjin University of Technology, 2012, 28(1):31-34.
[15] KAVULYA S, TAN J, GANDHI R, et al. An analysis of traces from a production mapreduce cluster[C] //Processdings of Cluster, Cloud and Grid Computing(CCGrid). Melbourne:IEEE, 2010:94-103.
[16] LIN Wenhui, LEI Zhenming, LIU Jun, et al. MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment[J]. The Journal of China Universities of Posts and Telecommunications, 2013, 20(6):77-121.
[17] 应毅,刘亚军. MapReduce并行计算技术发展综述[J]. 计算机系统应用, 2014,23(4):1-6,11. YING Yi, LIU Yajun. Survey of developments of MapReduce parallel computing technology[J]. Computer Systems & Applications, 2014, 23(4):1-6,11.
[18] DITTRICH J, QUIANÉRUIZ J A. Efficient big data processing in hadoop MapReduce[J]. Proceedings of the VLDB Endowment, 2012, 5(12):2014-2015.
[19] 赵卫中,马慧芳,傅燕翔,等.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学, 2011(10):166-168. ZHAO Weizhong, MA Huifang, FU Yanxiang, et al. Research on parallel k-means algorithm design based on hadoop platform[J]. Computer Science, 2011(10):166-168.
[20] HASSANZADEH O, KEMENTSIETSIDIS A, KIMEL-FELD B, et al. Next generation data analytics at IBM research[J]. Proceedings of the VLDB Endowment, 2013, 6(11):1174-1175.
[21] BORTHAKUR D, GRAY J, SARMA J S, et al. Apache hadoop goes realtime at facebook[C] //Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2011:1071-1080.
[22] LEE H, KWON J. Similar user clustering based on movielens data set[J]. Advanced Science and Technology Letters, 2014, 51:32-35.
[1] 孙东磊,孙毅,刘蕊,孙鹏凯,张玉敏. 基于岭回归的配电网分布式光伏消纳能力预测方法[J]. 山东大学学报 (工学版), 2025, 55(3): 149-157.
[2] 刘新,刘冬兰,付婷,王勇,常英贤,姚洪磊,罗昕,王睿,张昊. 基于联邦学习的时间序列预测算法[J]. 山东大学学报 (工学版), 2024, 54(3): 55-63.
[3] 范海雯,郝旭东,赵康,邢法财,蒋哲,李常刚. 基于卷积神经网络的含分布式光伏配电网静态等值[J]. 山东大学学报 (工学版), 2023, 53(4): 140-148.
[4] 刘振,孙媛媛,李亚辉,许庆燊,于涛,庞延庆. 基于用户行为预测的分布式光伏智能社区需求响应策略[J]. 山东大学学报 (工学版), 2022, 52(5): 24-34.
[5] 杨思,王艳,赵斌成,韩学山,刘冬,孙东磊. 含分布式电源的配电网三阶段协同优化调度[J]. 山东大学学报 (工学版), 2022, 52(5): 55-69.
[6] 刘新锋, 张旖旎,徐惠三,宋玲,陈梦雅. 基于随机森林和专家系统的分布式光伏电站阴影遮挡诊断[J]. 山东大学学报 (工学版), 2021, 51(2): 98-104.
[7] 宋士瞻,陈浩宇,张健,王坤,郝庆水. 考虑路灯充电桩接入的城市配电网电压控制方法[J]. 山东大学学报 (工学版), 2020, 50(3): 104-110.
[8] 韩方运, 乔梁, 赵斌成, 张利. 基于分时电价的加权太阳能价值电价[J]. 山东大学学报 (工学版), 2019, 49(6): 93-97.
[9] 王飞, 王春义, 王传勇, 赵光锋, 李沐, 施啸寒. 基于梯次利用电池储能系统的分布式光伏接入受限对策[J]. 山东大学学报 (工学版), 2018, 48(6): 109-115.
[10] 屈庆涛,刘其成,牟春晓. 基于N-Gram语言模型的并行自适应新闻话题追踪算法[J]. 山东大学学报 (工学版), 2018, 48(6): 37-43.
[11] 秦利国,何潇,周东华. 一种时延多智能体系统的分布式编队[J]. 山东大学学报(工学版), 2017, 47(5): 79-88.
[12] 刘洋,刘博,王峰. 基于Parameter Server框架的大数据挖掘优化算法[J]. 山东大学学报(工学版), 2017, 47(4): 1-6.
[13] 李庆冬,骆伟超,叶瑛歆,张承瑞,胡天亮. 基于本体的机床设备资源共享机制[J]. 山东大学学报(工学版), 2017, 47(3): 130-138.
[14] 孙一冰,付敏跃,王炳昌,张焕水. 大规模动态系统的分布式状态估计算法[J]. 山东大学学报(工学版), 2016, 46(6): 62-68.
[15] 范德斌,邓长寿,袁斯昊,谭旭杰,董小刚. 基于MapReduce模型的分布式粒子群算法[J]. 山东大学学报(工学版), 2016, 46(6): 23-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[2] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[3] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[4] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[5] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[6] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[7] 秦通,孙丰荣*,王丽梅,王庆浩,李新彩. 基于极大圆盘引导的形状插值实现三维表面重建[J]. 山东大学学报(工学版), 2010, 40(3): 1 -5 .
[8] 孙殿柱,朱昌志,李延瑞 . 散乱点云边界特征快速提取算法[J]. 山东大学学报(工学版), 2009, 39(1): 84 -86 .
[9] 岳远征. 远离平衡态玻璃的弛豫[J]. 山东大学学报(工学版), 2009, 39(5): 1 -20 .
[10] 张爱娟. 模拟体液中类骨羟基磷灰石的合成[J]. 山东大学学报(工学版), 2010, 40(3): 86 -90 .