山东大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (1): 15-21.doi: 10.6040/j.issn.1672-3961.1.2015.172
王会青,孙宏伟,张建辉
WANG Huiqing, SUN Hongwei, ZHANG Jianhui
摘要: 将并行计算的策略引入到时间序列处理中,提出基于Map/Reduce的时间序列相似性搜索算法,充分利用云计算可进行大规模计算和数据处理的特点,有效降低了时间序列相似性搜索中运算量,简化了计算过程。该算法在心电图数据集上进行相似性搜索,分别进行PAA下界过滤和DTW距离的计算,验证运算时间和并行加速比随节点变化的情况,与传统的单机运算相比,有效地提高了时间序列挖掘效率。
中图分类号:
[1] BHADURI K, ZHU Q, OZA N C, et al. Fast and flexible multivariate time series subsequence search[C] //Proc of the 2010 IEEE Int l Conf on Data Mining. Sydney, Australia: IEEE, 2010:48-57. [2] BERNDT D J, CLIFFORD J. Using dynamic time warping to find patterns in time series[C] //KDD Workshop. Seattle, USA:Springer US, 1994:359-370. [3] KEOGH E, RATANAMAHATANA C A. Exact indexing of dynamic time warping[J]. Knowledge and Information Systems, 2005(7):358-386. [4] ZHANG Y, GLASS J. A piecewise aggregate approximation lower-bound estimate for posteriorgram-based dynamic time warping[C] //Interspeech. Florence,Italy:IEEE, 2011:1909-1912. [5] 丁静,杨善林,罗贺,等. 云计算环境下的数据挖掘服务模式[J]. 计算机科学, 2012, 39(6):217-219. DING Jing, YANG Shanlin, LUO He, et al. Data mining service model in cloud computing environment[J]. Computer Science, 2012, 39(6):217-219. [6] 杨来,史忠植,梁帆,等. 基于Hadoop云平台的并行数据挖掘方法[J].系统仿真学报,2013,25(5):936-944. YANG Lai, SHI Zhongzhi, LIANG Fan, et al. Parallel approach in data mining based on hadoop cloud platform[J]. Journal of System Simulation, 2013, 25(5):936-944. [7] 程苗. 基于云计算的Web数据挖掘[J]. 计算机科学,2011,38(10):146-149. CHENG Miao. Web data mining based on cloud computing[J]. Computer Science, 2011, 38(10):146-149. [8] 陈光鹏,杨育彬,高阳,等. 一种基于Map/Reduce的频繁闭项集挖掘算法[J]. 模式识别与人工智能,2012,25(2):220-224. CHEN Guangpeng, YANG Yubin, GAO Yang, et al. Closed frequent itemset mining based on mapreduce[J]. PR&AI, 2015, 25(2):220-224. [9] 田国会,许亚雄. 云机器人:概念、架构与关键技术研究综述[J]. 山东大学学报(工学版),2014,44(6):47-54. TIAN Guohui, XU Yaxiong. Cloud robotics: concept, architectures and key technologies[J]. Journal of Shandong University(Engineering Science), 2014, 44(6):47-54. [10] ZHU Q, WANG X Y, KEOGH E, et al. An efficient and effective similarity measure to enable data mining of petroglyphs[J]. Data Mining and Knowledge Discovery, 2011, 23(1):91-127. [11] PAPAPETROU P, ATHITSOS V, POTAMIAS M, et al. Embedding-based subsequence matching in time-series databases[J]. ACM Transactions on Database Systems, 2011, 36(3):17. [12] 周大镯,吴晓丽,闫红灿. 一种高效的多变量时间序列相似查询算[J]. 计算机应用,2008,28(10):2541-2543. ZHOU Dazhuo, WU Xiaoli, YAN Hongcan. An efficient similarity search for multivariate time series[J]. Computer Applications, 2008, 28(10):2541-2543. [13] FU AWC, KEOGH E, LAU LYH, et al. Scaling and time warping in time series querying[J]. The VLDB Journal, 2008, 17:899-921. [14] MU B, YAN J L. Efficient time series lower bounding technique[J]. Computer Engineering and Applications, 2009, 45(11):168-171. [15] BEGUM N,KEOGH E. Rare time series motif discovery from unbounded streams[J]. Proceedings of the VLDB Endowment, 2014, 8(2):149-160. [16] ZHOU D Z, JIANG W B, LI M Q. Efficient clustering algorithm for multivariate time series[J]. Computer Engineering and Applications, 2010, 46(1):137-139. [17] THANAWIN Rakthanmanon, BILSON Campana, ABDULLAH Mueen, et al. Searching and mining trillions of time series subsequences under dynamic time warping[C] //KDD'12-18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA:IEEE, 2012:262-270. [18] WANG X, MUEEN A, DING H, et al. Experimental comparison of representation methods and distance measures foe time series data[J]. Data Mining and Knowledge Discovery, 2013, 26(2):275-309. [19] SHOKOOHI-YEKTA M, WANG J, KEOGH E. On the non-trivial generalization of dynamic time warping to the multi-dimensional case[C] //SDM Vancouver,Canada:Society for Industrial and Applied Mathematics, 2015:289-297. [20] SAKURAI Y, YOSHIKAWA M, FALOUTSOS C. FTW: fast similarity search under the time warping distance[C] //PODS. Baltimore, Maryland:Sprinter US, 2005:326-337. [21] BANKÓ Z, ABONYI J. Correlation based dynamic time warping of multivariate time series[J]. Expert Systems with Applications, 2012, 39(17):12814-12823. [22] 李正欣,张凤鸣,李克武,等.一种支持DTW距离的多元时间序列索引结构[J]. 软件学报,2014,25(3):560-575. LI Zhengxin, ZHANG Fengming, LI Kewu, et al. Index structure for multivariate time series under DTW distance metric[J]. Journal of Software, 2014, 25(3):560-575. [23] HYUNJIN Yoon, KIYOUNG Yang, CYRUS Shahabi. Feature subset selection and feature ranking for multivariate time series[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(9):1186-1198. [24] 李伟卫,赵航,张阳,等. 基于MapReduce的海量数据挖掘技术研究[J].计算机工程与应用,2013,49(20):112-117. LI Weiwei, ZHAO Hang, ZHANG Yang, et al. Research on massive data mining based on mapreduce[J]. Computer Engineering and Applications, 2013, 49(20):112-117. [25] 何东之,张吉沣,赵鹏飞. 不确定传播算法的MapReduce并行化实现[J]. 山东大学学报(工学版),2015,44(5):22-28. HE Dongzhi, ZHANG Jifeng, ZHAO Pengfei. Parallel implementing probabilistic spreading algorithm using MapReduce programming mode[J]. Journal of Shandong University(Engineering Science), 2015, 44(5):22-28. [26] AGRAWAL R, FALOUTSOS C, SWAMI A. Efficient similarity search in sequence databases[C] //Proc of the 4th Int'l Conf on Foundations of Data Organization and Algorithms. Chicoga, USA:Springer Berlin Heidelberg, 1993:69-84. |
[1] | 任立波, 尚立宝, 闫日雄, 何海澜, 赵红霞, 韩吉田. 脉冲鼓泡床内鼓泡和颗粒混合特性的CFD-DEM数值模拟[J]. 山东大学学报(工学版), 2015, 45(2): 62-66. |
|