基于Map/Reduce的时间序列相似性搜索算法

doi:10.6040/j.issn.1672-3961.1.2015.172

山东大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (1): 15-21.doi: 10.6040/j.issn.1672-3961.1.2015.172

基于Map/Reduce的时间序列相似性搜索算法

王会青,孙宏伟,张建辉

太原理工大学计算机科学与技术学院, 山西太原 030024

收稿日期:2015-05-12 出版日期:2016-02-20 发布日期:2015-05-12
作者简介:王会青(1978- ),女,山西太原人,副教授,硕士研究生导师,主要研究方向为数据库和智能信息处理,数据挖掘与机器学习.E-mail:1013208257@qq.com
基金资助:
国家自然科学基金青年科学基金资助项目(61402318);高等学校博士学科点专项科研基金资助项目(20131402120009);山西省科技攻关资助项目(20130313012-2);太原理工大学校青年团队资助项目(2013T049)

Time series similarity searching algorithm based on Map/Reduce

WANG Huiqing, SUN Hongwei, ZHANG Jianhui

College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, Shanxi, China

Received:2015-05-12 Online:2016-02-20 Published:2015-05-12

摘要/Abstract

摘要： 将并行计算的策略引入到时间序列处理中,提出基于Map/Reduce的时间序列相似性搜索算法,充分利用云计算可进行大规模计算和数据处理的特点,有效降低了时间序列相似性搜索中运算量,简化了计算过程。该算法在心电图数据集上进行相似性搜索,分别进行PAA下界过滤和DTW距离的计算,验证运算时间和并行加速比随节点变化的情况,与传统的单机运算相比,有效地提高了时间序列挖掘效率。

关键词: 时间序列挖掘, 并行计算, 动态时间弯曲距离, 相似性搜索, 下界算法

Abstract: The strategy of parallel computing was introduced into time series processing, and time series similarity searching algorithm based on Map/Reduce was proposed. The proposed algorithm could make use of the features of cloud computing to take large-scale computing and data processing, and could efficiently reduce the large calculation and simplify the computing process of time series similarity searching. The proposed algorithm was adopted on electrocardiograph dataset to complete similarity searching with piecewise aggregate approximation lower bound and dynamic time warping distance, which verified the effect of nodes changing on operation time and parallel speed up. Compared with the traditional one running on single PC, the proposed algorithm improved the efficiency of time series mining effectively.

Key words: parallel computing, similarity searching, time series mining, lower bound algorithm, dynamic time warping distance

中图分类号:

TP311

王会青,孙宏伟,张建辉. 基于Map/Reduce的时间序列相似性搜索算法[J]. 山东大学学报(工学版), 2016, 46(1): 15-21.

WANG Huiqing, SUN Hongwei, ZHANG Jianhui. Time series similarity searching algorithm based on Map/Reduce[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(1): 15-21.

参考文献

[1] BHADURI K, ZHU Q, OZA N C, et al. Fast and flexible multivariate time series subsequence search[C] //Proc of the 2010 IEEE Int l Conf on Data Mining. Sydney, Australia: IEEE, 2010:48-57.
[2] BERNDT D J, CLIFFORD J. Using dynamic time warping to find patterns in time series[C] //KDD Workshop. Seattle, USA:Springer US, 1994:359-370.
[3] KEOGH E, RATANAMAHATANA C A. Exact indexing of dynamic time warping[J]. Knowledge and Information Systems, 2005(7):358-386.
[4] ZHANG Y, GLASS J. A piecewise aggregate approximation lower-bound estimate for posteriorgram-based dynamic time warping[C] //Interspeech. Florence,Italy:IEEE, 2011:1909-1912.
[5] 丁静,杨善林,罗贺,等. 云计算环境下的数据挖掘服务模式[J]. 计算机科学, 2012, 39(6):217-219. DING Jing, YANG Shanlin, LUO He, et al. Data mining service model in cloud computing environment[J]. Computer Science, 2012, 39(6):217-219.
[6] 杨来,史忠植,梁帆,等. 基于Hadoop云平台的并行数据挖掘方法[J].系统仿真学报,2013,25(5):936-944. YANG Lai, SHI Zhongzhi, LIANG Fan, et al. Parallel approach in data mining based on hadoop cloud platform[J]. Journal of System Simulation, 2013, 25(5):936-944.
[7] 程苗. 基于云计算的Web数据挖掘[J]. 计算机科学,2011,38(10):146-149. CHENG Miao. Web data mining based on cloud computing[J]. Computer Science, 2011, 38(10):146-149.
[8] 陈光鹏,杨育彬,高阳,等. 一种基于Map/Reduce的频繁闭项集挖掘算法[J]. 模式识别与人工智能,2012,25(2):220-224. CHEN Guangpeng, YANG Yubin, GAO Yang, et al. Closed frequent itemset mining based on mapreduce[J]. PR&AI, 2015, 25(2):220-224.
[9] 田国会,许亚雄. 云机器人:概念、架构与关键技术研究综述[J]. 山东大学学报(工学版),2014,44(6):47-54. TIAN Guohui, XU Yaxiong. Cloud robotics: concept, architectures and key technologies[J]. Journal of Shandong University(Engineering Science), 2014, 44(6):47-54.
[10] ZHU Q, WANG X Y, KEOGH E, et al. An efficient and effective similarity measure to enable data mining of petroglyphs[J]. Data Mining and Knowledge Discovery, 2011, 23(1):91-127.
[11] PAPAPETROU P, ATHITSOS V, POTAMIAS M, et al. Embedding-based subsequence matching in time-series databases[J]. ACM Transactions on Database Systems, 2011, 36(3):17.
[12] 周大镯,吴晓丽,闫红灿. 一种高效的多变量时间序列相似查询算[J]. 计算机应用,2008,28(10):2541-2543. ZHOU Dazhuo, WU Xiaoli, YAN Hongcan. An efficient similarity search for multivariate time series[J]. Computer Applications, 2008, 28(10):2541-2543.
[13] FU AWC, KEOGH E, LAU LYH, et al. Scaling and time warping in time series querying[J]. The VLDB Journal, 2008, 17:899-921.
[14] MU B, YAN J L. Efficient time series lower bounding technique[J]. Computer Engineering and Applications, 2009, 45(11):168-171.
[15] BEGUM N,KEOGH E. Rare time series motif discovery from unbounded streams[J]. Proceedings of the VLDB Endowment, 2014, 8(2):149-160.
[16] ZHOU D Z, JIANG W B, LI M Q. Efficient clustering algorithm for multivariate time series[J]. Computer Engineering and Applications, 2010, 46(1):137-139.
[17] THANAWIN Rakthanmanon, BILSON Campana, ABDULLAH Mueen, et al. Searching and mining trillions of time series subsequences under dynamic time warping[C] //KDD'12-18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA:IEEE, 2012:262-270.
[18] WANG X, MUEEN A, DING H, et al. Experimental comparison of representation methods and distance measures foe time series data[J]. Data Mining and Knowledge Discovery, 2013, 26(2):275-309.
[19] SHOKOOHI-YEKTA M, WANG J, KEOGH E. On the non-trivial generalization of dynamic time warping to the multi-dimensional case[C] //SDM Vancouver,Canada:Society for Industrial and Applied Mathematics, 2015:289-297.
[20] SAKURAI Y, YOSHIKAWA M, FALOUTSOS C. FTW: fast similarity search under the time warping distance[C] //PODS. Baltimore, Maryland:Sprinter US, 2005:326-337.
[21] BANKÓ Z, ABONYI J. Correlation based dynamic time warping of multivariate time series[J]. Expert Systems with Applications, 2012, 39(17):12814-12823.
[22] 李正欣,张凤鸣,李克武,等.一种支持DTW距离的多元时间序列索引结构[J]. 软件学报,2014,25(3):560-575. LI Zhengxin, ZHANG Fengming, LI Kewu, et al. Index structure for multivariate time series under DTW distance metric[J]. Journal of Software, 2014, 25(3):560-575.
[23] HYUNJIN Yoon, KIYOUNG Yang, CYRUS Shahabi. Feature subset selection and feature ranking for multivariate time series[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(9):1186-1198.
[24] 李伟卫,赵航,张阳,等. 基于MapReduce的海量数据挖掘技术研究[J].计算机工程与应用,2013,49(20):112-117. LI Weiwei, ZHAO Hang, ZHANG Yang, et al. Research on massive data mining based on mapreduce[J]. Computer Engineering and Applications, 2013, 49(20):112-117.
[25] 何东之,张吉沣,赵鹏飞. 不确定传播算法的MapReduce并行化实现[J]. 山东大学学报(工学版),2015,44(5):22-28. HE Dongzhi, ZHANG Jifeng, ZHAO Pengfei. Parallel implementing probabilistic spreading algorithm using MapReduce programming mode[J]. Journal of Shandong University(Engineering Science), 2015, 44(5):22-28.
[26] AGRAWAL R, FALOUTSOS C, SWAMI A. Efficient similarity search in sequence databases[C] //Proc of the 4th Int'l Conf on Foundations of Data Organization and Algorithms. Chicoga, USA:Springer Berlin Heidelberg, 1993:69-84.

多维度评价

Viewed

Full text

377

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	377

From	Others	local

Times	81	296
Rate	21%	79%

Abstract

1368

Just accepted	Online first	Issue

0	0	1368

From	Others	local

Times	1366	2
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

基于Map/Reduce的时间序列相似性搜索算法

Time series similarity searching algorithm based on Map/Reduce

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

多维度评价

本文评价

推荐阅读 10