您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2021, Vol. 51 ›› Issue (3): 7-14.doi: 10.6040/j.issn.1672-3961.0.2019.508

• • 上一篇    下一篇

OLAP中基于GPU的中位数计算算法

吴振鹏1,张健2,范星奇3,李翠平4   

  1. 1. 中国人民武装警察部队山东省总队, 山东 济南 250116;2. 中国人民武装警察部队辽宁省总队, 辽宁 沈阳 110000;3. 上海交通大学机械与动力工程学院, 上海 200240;4. 中国人民大学信息学院, 北京 100872
  • 出版日期:2021-06-20 发布日期:2021-06-24
  • 作者简介:吴振鹏(1988— ),男,山东潍坊人,工学硕士,助理工程师,主要研究方向为数据仓库与基于GPU的高性能计算. E-mail:wzpcapf0502@ruc.edu.cn

Median calculation algorithms based on GPU in OLAP

WU Zhenpeng1, ZHANG Jian2, FAN Xingqi3, LI Cuiping4   

  1. 1. Shandong Provincial Corps, Chinese People's Armed Police Force, Jinan 250116, Shandong, China;
    2. Liaoning Provincial Corps, Chinese People's Armed Police Force, Shenyang 110000, Liaoning, China;
    3. School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    4. School of Information, Renmin University of China, Beijing 100872, China
  • Online:2021-06-20 Published:2021-06-24

摘要: 针对联机分析处理(online analytical processing, OLAP)中的整体型聚集函数中位数,提出基于图形处理单元(graphics processing unit, GPU)的GPU-Median算法,通过对数据进行划分,分段排序,不断裁剪全局中位数之前的数据,对未裁剪的数据进行合并,得到最终的中位数,避免了全局的排序时间。提出GPU-Median+算法,对GPU-Median算法进行优化和扩展,使用CPU与GPU协同作业实现聚集操作,利用GPU处理每个队列的数据,CPU处理全局数据。试验和分析证明,相比CPU算法,GPU-Median+算法将中位数计算的时间复杂度从O(n2)降低到了O(n);相比GPU上的基数排序算法,GPU-Median+算法的计算时间减少了三分之一。该算法的应用使得GPU计算OLAP中的整体型聚集函数时,发挥出更加优良的并行计算能力,为提升OLAP的查询性能提供了新的思路。

关键词: 联机分析处理, 图形处理单元, 整体型聚集函数, 中位数

Abstract: An algorithm for one of the holistic aggregate operations in online analytical processing(OLAP)called Median was proposed based on graphics processing unit(GPU), which was named GPU-Median algorithm. This algorithm obtained the median of a series of data by segmenting the data, sorting the data by segments,cutting the data preceding the global median, and finally merging the uncut data. Through the algorithm above much time spent on global sorting was saved. Then an algorithm called GPU-Median+was presented in order to optimize and extend the GPU-Median algorithm. This algorithm implemented the aggregate operations through the collaboration of CPU and GPU, which used GPU to deal with segments of data and CPU to deal with global data. Experiments and analysis proved that the GPU-Median + algorithm reduced the time complexity of the median calculation from O(n2)to O(n)compared to the CPU algorithm,and that the GPU-Median + algorithm reduced a third of the calculation time compared to the radix sort algorithm on the GPU. The application of this algorithm enabled the GPU to improve its ability of parallel calculations when calculating the holistic aggregate function in OLAP, thus providing a new idea for improving the query performance of OLAP.

Key words: OLAP, GPU, holistic aggregate functions, median

中图分类号: 

  • TP392
[1] CODD E F, CODD S B, SALLEY C T. Providing OLAP(on-line analytical processing)to user-analysts: an IT mandate[J]. White Paper of Arbor Software Corporation, 1993: 125-129.
[2] CODD E F, CODD S B. Beyond decision support[J]. Computer World, 1993, 27(30): 87-89.
[3] CUZZOCREA A, BELLATRECHE L, SONG I, et al. Data warehousing and OLAP over big data: current challenges and future research directions[C] // Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP. New York, USA: Association for Computing Machinery, 2013: 67-70.
[4] CUZZOCREA A. Data warehousing and OLAP over big data: a survey of the state-of-the-art, open problems and future challenges[J]. International Journal of Business Process Integration and Management, 2015, 7(4): 372-377.
[5] 孙婷婷, 黄皓, 王嘉伦, 等. 面向 CPU-GPU 异构系统的数据分析负载均衡策略[J]. 计算机工程与科学, 2019(3): 417-423. SUN Tingting, HUANG Hao, WANG Jialun, et al. A load balancing strategy on heterogeneous CPU-GPU data analytic systems[J]. Computer Engineering and Science, 2019(3): 417-423.
[6] NVIDIAC. CUDA C++programming guide PG-02829-001_v11.0[EB/OL].(2020-06-01)[2020-06-08]. https://docs.nvidia.com/pdf/CUDA_C_Programming_Guide.pdf.
[7] HARRIS M. Optimizing CUDA. SC07: high performance computing with CUDA[EB/OL].(2009-01-21)[2020-06-08]. http://www.enseignement.polytechnique.fr/profs/informatique/Eric.Goubault/Cours09/CUDA/SC07_CUDA_5_Optimization_Harris.pdf.
[8] YUAN Y, LEE R, ZHANG X. The Yin and Yang of processing data warehousing queries on GPU devices[J]. Proceedings of the VLDB Endowment, 2013, 6(10): 817-828.
[9] SITARIDI E A, ROSS K A. A meliorating memory contention of OLAP operators on GPU processors[C] //Proceedings of the Eighth International Workshop on Data Management on New Hardware. New York, USA: Association for Computing Machinery, 2012: 39-47.
[10] LAUER T, DATTA A, KHADIKOV Z, et al. Exploring graphics processing units as parallel coprocessors for online aggregation[C] //Proceedings of the ACM 13th international workshop on Data warehousing and OLAP. New York, USA: Association for Computing Machinery, 2010: 77-84.
[11] WU H, DIAMOS G, WANG J, et al. Optimizing data warehousing applications for GPUs using kernel fusion/fission[C] //2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. Piscataway: IEEE, 2012: 2433-2442.
[12] ZHOU G, CHEN H. Parallel cube computation on modern CPUs and GPUs[J]. The Journal of Supercomputing, 2012, 61(3): 394-417.
[13] DORAISWAMY H, VO H T, SILVA C T, et al. A GPU-based index to support interactive spatio-temporal queries over historical data[C] //2016 IEEE 32nd International Conference on Data Engineering(ICDE). Piscataway: IEEE, 2016: 1086-1097.
[14] PAPADIAS D, KALNIS P, ZHANG J, et al. Efficient OLAP operations in spatial data warehouses[C] //International Symposium on Spatial and Temporal Databases. Berlin, Germany: Springer, 2001: 443-459.
[15] ZHANG Y, WANG S, HUANG W. ParaCube: a scalable OLAP model based on distributed aggregate computing with sibling cubes[C] //2010 12th International Asia-Pacific Web Conference. Piscataway, USA: IEEE, 2010: 323-329.
[16] 焦敏,张延松,王珊,等.内存OLAP多核并行查询优化技术研究[J].计算机学报,2014,37(9):1895-1910. JIAO Min, ZHANG Yansong, WANG Shan, et al. Research on multicore parallel query processing techniques for main-memory OLAP[J]. Chinese Journal of Computers, 2014, 37(9): 1895-1910.
[17] 张延松, 肖艳芹,王珊, 等. 主存 OLAP 系统中 what-if 查询处理策略[J]. 软件学报, 2010(10): 2494-2512. ZHANG Yansong, XIAO Yanqin, WANG Shan, et al. What-if query processing policy of main-memory OLAP system[J]. Journal of Software, 2010(10): 2494-2512.
[18] BRESS S, FUNKE H, TEUBNER J. Robust query processing in co-processor-accelerated databases[C] //Proceedings of the 2016 International Conference on Management of Data. New York, USA: Association for Computing Machinery, 2016: 1891-1906.
[19] ALABI T, BLANCHARD J D, GORDON B, et al. Fast k-selection algorithms for graphics processing units[J]. ACM Journal of Experimental Algorithms, 2012, 17(4): 1-29.
[20] SHANBHAG A, PIRK H, MADDEN S. Efficient top-k query processing on massively parallel hardware[C] //Proceedings of the 2018 International Conference on Management of Data. New York, USA: Association for Computing Machinery, 2018: 1557-1570.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[2] 施来顺,万忠义 . 新型甜菜碱型沥青乳化剂的合成与性能测试[J]. 山东大学学报(工学版), 2008, 38(4): 112 -115 .
[3] 李梁,罗奇鸣,陈恩红. 对象级搜索中基于图的对象排序模型(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 15 -21 .
[4] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[5] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[6] 秦通,孙丰荣*,王丽梅,王庆浩,李新彩. 基于极大圆盘引导的形状插值实现三维表面重建[J]. 山东大学学报(工学版), 2010, 40(3): 1 -5 .
[7] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .
[8] 王丽君,黄奇成,王兆旭 . 敏感性问题中的均方误差与模型比较[J]. 山东大学学报(工学版), 2006, 36(6): 51 -56 .
[9] 刘忠国,张晓静,刘伯强,刘常春 . 视觉刺激间隔对大脑诱发电位的影响[J]. 山东大学学报(工学版), 2006, 36(3): 34 -38 .
[10] 杨发展1 ,艾兴1 ,赵军1 ,侯建锋2 . ZrO2含量对WC基复合材料的力学性能和微观结构的影响[J]. 山东大学学报(工学版), 2009, 39(1): 92 -95 .