JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2016, Vol. 46 ›› Issue (1): 15-21.doi: 10.6040/j.issn.1672-3961.1.2015.172

Previous Articles     Next Articles

Time series similarity searching algorithm based on Map/Reduce

WANG Huiqing, SUN Hongwei, ZHANG Jianhui   

  1. College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, Shanxi, China
  • Received:2015-05-12 Online:2016-02-20 Published:2015-05-12

Abstract: The strategy of parallel computing was introduced into time series processing, and time series similarity searching algorithm based on Map/Reduce was proposed. The proposed algorithm could make use of the features of cloud computing to take large-scale computing and data processing, and could efficiently reduce the large calculation and simplify the computing process of time series similarity searching. The proposed algorithm was adopted on electrocardiograph dataset to complete similarity searching with piecewise aggregate approximation lower bound and dynamic time warping distance, which verified the effect of nodes changing on operation time and parallel speed up. Compared with the traditional one running on single PC, the proposed algorithm improved the efficiency of time series mining effectively.

Key words: parallel computing, similarity searching, time series mining, lower bound algorithm, dynamic time warping distance

CLC Number: 

  • TP311
[1] BHADURI K, ZHU Q, OZA N C, et al. Fast and flexible multivariate time series subsequence search[C] //Proc of the 2010 IEEE Int l Conf on Data Mining. Sydney, Australia: IEEE, 2010:48-57.
[2] BERNDT D J, CLIFFORD J. Using dynamic time warping to find patterns in time series[C] //KDD Workshop. Seattle, USA:Springer US, 1994:359-370.
[3] KEOGH E, RATANAMAHATANA C A. Exact indexing of dynamic time warping[J]. Knowledge and Information Systems, 2005(7):358-386.
[4] ZHANG Y, GLASS J. A piecewise aggregate approximation lower-bound estimate for posteriorgram-based dynamic time warping[C] //Interspeech. Florence,Italy:IEEE, 2011:1909-1912.
[5] 丁静,杨善林,罗贺,等. 云计算环境下的数据挖掘服务模式[J]. 计算机科学, 2012, 39(6):217-219. DING Jing, YANG Shanlin, LUO He, et al. Data mining service model in cloud computing environment[J]. Computer Science, 2012, 39(6):217-219.
[6] 杨来,史忠植,梁帆,等. 基于Hadoop云平台的并行数据挖掘方法[J].系统仿真学报,2013,25(5):936-944. YANG Lai, SHI Zhongzhi, LIANG Fan, et al. Parallel approach in data mining based on hadoop cloud platform[J]. Journal of System Simulation, 2013, 25(5):936-944.
[7] 程苗. 基于云计算的Web数据挖掘[J]. 计算机科学,2011,38(10):146-149. CHENG Miao. Web data mining based on cloud computing[J]. Computer Science, 2011, 38(10):146-149.
[8] 陈光鹏,杨育彬,高阳,等. 一种基于Map/Reduce的频繁闭项集挖掘算法[J]. 模式识别与人工智能,2012,25(2):220-224. CHEN Guangpeng, YANG Yubin, GAO Yang, et al. Closed frequent itemset mining based on mapreduce[J]. PR&AI, 2015, 25(2):220-224.
[9] 田国会,许亚雄. 云机器人:概念、架构与关键技术研究综述[J]. 山东大学学报(工学版),2014,44(6):47-54. TIAN Guohui, XU Yaxiong. Cloud robotics: concept, architectures and key technologies[J]. Journal of Shandong University(Engineering Science), 2014, 44(6):47-54.
[10] ZHU Q, WANG X Y, KEOGH E, et al. An efficient and effective similarity measure to enable data mining of petroglyphs[J]. Data Mining and Knowledge Discovery, 2011, 23(1):91-127.
[11] PAPAPETROU P, ATHITSOS V, POTAMIAS M, et al. Embedding-based subsequence matching in time-series databases[J]. ACM Transactions on Database Systems, 2011, 36(3):17.
[12] 周大镯,吴晓丽,闫红灿. 一种高效的多变量时间序列相似查询算[J]. 计算机应用,2008,28(10):2541-2543. ZHOU Dazhuo, WU Xiaoli, YAN Hongcan. An efficient similarity search for multivariate time series[J]. Computer Applications, 2008, 28(10):2541-2543.
[13] FU AWC, KEOGH E, LAU LYH, et al. Scaling and time warping in time series querying[J]. The VLDB Journal, 2008, 17:899-921.
[14] MU B, YAN J L. Efficient time series lower bounding technique[J]. Computer Engineering and Applications, 2009, 45(11):168-171.
[15] BEGUM N,KEOGH E. Rare time series motif discovery from unbounded streams[J]. Proceedings of the VLDB Endowment, 2014, 8(2):149-160.
[16] ZHOU D Z, JIANG W B, LI M Q. Efficient clustering algorithm for multivariate time series[J]. Computer Engineering and Applications, 2010, 46(1):137-139.
[17] THANAWIN Rakthanmanon, BILSON Campana, ABDULLAH Mueen, et al. Searching and mining trillions of time series subsequences under dynamic time warping[C] //KDD'12-18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA:IEEE, 2012:262-270.
[18] WANG X, MUEEN A, DING H, et al. Experimental comparison of representation methods and distance measures foe time series data[J]. Data Mining and Knowledge Discovery, 2013, 26(2):275-309.
[19] SHOKOOHI-YEKTA M, WANG J, KEOGH E. On the non-trivial generalization of dynamic time warping to the multi-dimensional case[C] //SDM Vancouver,Canada:Society for Industrial and Applied Mathematics, 2015:289-297.
[20] SAKURAI Y, YOSHIKAWA M, FALOUTSOS C. FTW: fast similarity search under the time warping distance[C] //PODS. Baltimore, Maryland:Sprinter US, 2005:326-337.
[21] BANKÓ Z, ABONYI J. Correlation based dynamic time warping of multivariate time series[J]. Expert Systems with Applications, 2012, 39(17):12814-12823.
[22] 李正欣,张凤鸣,李克武,等.一种支持DTW距离的多元时间序列索引结构[J]. 软件学报,2014,25(3):560-575. LI Zhengxin, ZHANG Fengming, LI Kewu, et al. Index structure for multivariate time series under DTW distance metric[J]. Journal of Software, 2014, 25(3):560-575.
[23] HYUNJIN Yoon, KIYOUNG Yang, CYRUS Shahabi. Feature subset selection and feature ranking for multivariate time series[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(9):1186-1198.
[24] 李伟卫,赵航,张阳,等. 基于MapReduce的海量数据挖掘技术研究[J].计算机工程与应用,2013,49(20):112-117. LI Weiwei, ZHAO Hang, ZHANG Yang, et al. Research on massive data mining based on mapreduce[J]. Computer Engineering and Applications, 2013, 49(20):112-117.
[25] 何东之,张吉沣,赵鹏飞. 不确定传播算法的MapReduce并行化实现[J]. 山东大学学报(工学版),2015,44(5):22-28. HE Dongzhi, ZHANG Jifeng, ZHAO Pengfei. Parallel implementing probabilistic spreading algorithm using MapReduce programming mode[J]. Journal of Shandong University(Engineering Science), 2015, 44(5):22-28.
[26] AGRAWAL R, FALOUTSOS C, SWAMI A. Efficient similarity search in sequence databases[C] //Proc of the 4th Int'l Conf on Foundations of Data Organization and Algorithms. Chicoga, USA:Springer Berlin Heidelberg, 1993:69-84.
[1] REN Libo, SHANG Libao, YAN Rixiong, HE Hailan, ZHAO Hongxia, HAN Jitian. CFD-DEM simulation of bubbling and particle mixing properties in pulsed jet fluidized bed [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(2): 62-66.
Full text



[1] CHENG Daizhan, LI Zhiqiang. A survey on linearization of nonlinear systems[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 26 -36 .
[2] WANG Yong, XIE Yudong. Gas control technology of largeflow pipe[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 70 -74 .
[3] LIU Xin 1, SONG Sili 1, WANG Xinhong 2. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 98 -100 .
[5] CHEN Huaxin, CHEN Shuanfa, WANG Binggang. The aging behavior and mechanism of base asphalts[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 125 -130 .
[7] LI Shijin, WANG Shengte, HUANG Leping. Change detection with remote sensing images based on forward-backward heterogenicity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 1 -9 .
[8] ZHAO Ke-Jun, WANG Xin-Jun, LIU Xiang, CHOU Yi-Hong. Algorithms of continuous top-k join query over structured overlay networks[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 32 -37 .
[9] ZHAO Zhi-guang,WANG Deng-jie,TIAN Yun-fei . Roadbed settlement based on the gray theory[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 86 -88 .
[10] YAO Zhan-yong,SHANG Qing-sen,ZHAO Zhi-zhong,JIA Zhao-xia . The influence analysis of the semirigid asphalt pavement configuration stress and distortion by interface conditions[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 93 -99 .