您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2019, Vol. 49 ›› Issue (3): 8-14.doi: 10.6040/j.issn.1672-3961.0.2017.417

• 机器学习与数据挖掘 • 上一篇    下一篇

基于核相似性删减策略的支持向量回归算法

李英达(),谢宗霞   

  1. 天津大学软件学院, 天津 300350
  • 收稿日期:2017-08-24 出版日期:2019-06-20 发布日期:2019-06-27
  • 作者简介:李英达(1993—),男,黑龙江哈尔滨人,硕士研究生,主要研究方向为机器学习与数据挖掘.E-mail:18222289139@163.com
  • 基金资助:
    国家基金面向大数据的粒计算理论与方法资助项目(61432011)

Support vector regression algorithm based on kernel similarity reduced strategy

Yingda LI(),Zongxia XIE   

  1. School of Computer Software, Tianjin University, Tianjin 300350, China
  • Received:2017-08-24 Online:2019-06-20 Published:2019-06-27
  • Supported by:
    国家基金面向大数据的粒计算理论与方法资助项目(61432011)

摘要:

在核相似性基础上结合随机梯度下降算法提出支持向量删减策略(SVs_reduced strategy, SRS);引入核相似性删减冗余的支持向量来提高大型非线性支持向量回归的效率。在每次随机梯度下降的迭代中,如果新的样本被认为是一个支持向量,那么就会计算它和其它支持向量间的相似性。如果相似性大于一个设定的阈值,那么就会删减这个支持向量。基于UCI数据集, LIBSVM数据集和风速数据集的试验结果表明,与其它流行算法相比,这个策略可以十分高效地解决大型支持向量回归问题。

关键词: 支持向量回归, 快速算法, 随机梯度下降, 核相似性

Abstract:

With the increase of iterations, SGD was susceptible to the curse of kernelization. We introduced the kernel similarity to remove the redundant SVs called SVs-reduced strategy (SRS) for improving the efficiency of SGD in large scale non-linear modeling. At each iteration in SGD, the similarities between a new sample and the recorded SVs were computed if the new sample was a SV. If the similarity was larger than a threshold, this SV was not saved. Experimental results on UCI, LIBSVM and Wind-speed datasets demonstrated that the proposed SRS could be used to solve large-scale non-linear SVR comparing with some state-of-the-art algorithms.

Key words: support vector regression, efficient algorithms, stochastic gradient descent, kernel similarity

中图分类号: 

  • TP181

表1

数据集的特性"

编号 数据集 实例数 训练样本数 测试样本数 属性数
1 Cadata 20 640 180 000 2 640 8
2 CASP 45 730 40 000 5 730 9
3 Cpusmall 8 192 600 000 2 192 12
4 Space_ga 3 107 250 000 607 6
5 TFIDF-2006 16 087 16 087 3 308 150 360
6 Wind-ningxia 51 840 45 000 6 840 24
7 Wind-farm 246 360 246 360 26 360 6

表2

算法的均方误差和支持向量个数"

数据集 SVR_SRS SVR_BSGD SVR_SGD LIBSVM
Kreduce MSE #SVs MSE #SVs MSE #SVs MSE #SVs
Cadata 0.99 0.014 4 6 237 0.014 0 500 0.013 8 61 692 0.014 0 51 871
CASP 0.98 0.044 2 19 873 0.049 1 500 0.046 2 25 658 0.044 4 21 997
Cpusmall 0.99 0.002 1 836 0.002 2 500 0.002 2 5 938 0.002 3 4 314
Space_ga 0.99 0.002 2 149 0.002 3 500 0.002 3 4 749 0.001 8 3 685
TFIDF-2006 0.98 0.146 4 765 0.153 2 500 0.140 3 1 878 0.140 3 11 410
Wind-ningxia 0.97 0.490 1 10 741 0.614 6 500 0.510 2 23 154 0.470 4 35 469
Wind-farm 0.99 4.118 9 4 249 4.248 5 500 4.2 21 2 183 413 3.936 4 232 965

表3

不同算法训练的时间比较"

数据集 SVR_SRS SVR_BSGD SVR_SGD LIBSVM
Cadata 41.87 146.15 417.11 817.01
CASP 29.67 62.53 47.77 101.24
Cpusmall 35.53 43.58 109.11 334.76
Space_ga 5.67 17.31 24.88 33.28
TFIDF-2006 1 021.26 1 411.99 1 658.66 1 611.85
Wind-ningxia 44.95 54.42 167.73 1 258.58
Wind-farm 44.81 540.45 1 722.33 2 452.69

图1

Cadata数据集上均方误差和训练时间对比变化曲线"

图2

CASP数据集上均方误差和训练时间对比变化曲线"

图3

Wind-ningxia数据集上均方误差和训练时间对比变化曲线"

1 BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C]//Proceedings of the Workshop on Computational Learning Theory. Pittsburgh, PA: [s.n.], 1992: 144-152.
2 BROWN J D , SUMMERS M F , JOHNSON B A . Prediction of hydrogen and carbon chemical shifts from RNA using database mining and support vector regression[J]. Journal of Biomolecular NMR, 2015, 63 (1): 1- 14.
doi: 10.1007/s10858-015-9981-0
3 CHEN J , XUE X , HA M , et al. Support vector regression method for wind speed prediction incorporating probability prior knowledge[J]. Mathematical Problems in Engineering, 2017, 2014 (2014): 1- 10.
4 OSUNA E, FREUND R, GIROSI F. Training support vector machines: an application to face detection[C]//Proceedings of the CVPR'97. New York, USA: IEEE, 1997: 130-136.
5 陈荣, 梁昌勇, 谢福伟. 基于SVR的非线性时间序列预测方法应用综述[J]. 合肥工业大学学报(自然科学版), 2013, 36 (3): 369- 374.
doi: 10.3969/j.issn.1003-5060.2013.03.025
6 HO C H , LIN C J . Large-scale linear support vector regression[J]. Journal of Machine Learning Research, 2012, 13 (1): 3323- 3348.
7 LIN C J, WENG R C, KEERTHI S S. Trust region Newton methods for large-scale logistic regression[C]//Proceedings of the Twenty-Fourth International Conference. Corvalis, Oregon, USA: DBLP, 2007: 561-568.
8 HSIEH C J, CHANG K W, LIN C J, et al. A dual coordinate descent method for large-scale linear SVM[C]//Proceedings of the ICML. Helsinki, Finland: [s.n.], 2008: 408-415.
9 XIE X, CHEN C, CHEN Z. Mini-batch quasi-newton optimization for large scale linear support vector regression[C]//Proceedings of the International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. Chengdu, China: [s.n.], 2015.
10 WANG Y, OU G, PANG W, et al. e-Distance weighted support vector regression[C]//Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS 2016). Barcelona, Spain: [s.n.], 2016.
11 KIVINEN J , SMOLA A J , WILLIAMSON R C . Online learning with kernels[J]. IEEE Transactions on Signal Processing, 2002, 52 (8): 2165- 2176.
12 SHALEV-SHWARTZ S, SINGER Y, SREBRO N. Pegasos: primal estimated sub-gradient solver for SVM[C]//Proceedings of the Twenty-Fourth ACM International Conference. Corvalis, Oregon, USA: DBLP, 2007: 807-814.
13 ZHANG T. Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]//Proceedings of the International Conference on Machine Learning. Banff, Canada: Omnipress, 2004: 116.
14 BORDES A , BOTTOU L , GALLINARI P . SGD-QN: careful quasi-newton stochastic gradient descent[J]. Journal of Machine Learning Research, 2009, 10 (3): 1737- 1754.
15 WANG Z , CRAMMER K , VUCETIC S . Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training[J]. Journal of Machine Learning Research, 2012, 13 (1): 3103- 3131.
16 SMOLA A J , LKOPF B . A tutorial on support vector regression[J]. Statistics and Computing, 2004, 14 (3): 199- 222.
doi: 10.1023/B:STCO.0000035301.49549.88
17 CRISTIANIN N , LKOPF B . Support vector machines and kernel methods: the new generation of learning machines[J]. Ai Magazine, 2002, 23 (3): 31- 41.
18 GRAF A, BORER S. Normalization in support vector machines[M]//Pattern Recognition.[S.1.]: Springer Berlin Heidelberg, 2001: 277-282.
19 CHANG C C , LIN C J . LIBSVM: a library for support vector machines[J]. Acm Transactions on Intelligent Systems & Technology, 2007, 2 (3): 27.
[1] 李笋,王超,张桂林,徐志根,程涛,王义元,王瑞琪. 基于支持向量回归的短期负荷预测[J]. 山东大学学报(工学版), 2017, 47(6): 52-56.
[2] 王梅,曾昭虎,孙莺萁,杨二龙,宋考平. 基于输入K-近邻的正则化路径上SVR贝叶斯组合[J]. 山东大学学报(工学版), 2016, 46(6): 8-14.
[3] 高大龙,黄雅平*,李清勇,王胜春,罗四维. 基于列车前向运动视频的全景图拼接算法[J]. 山东大学学报(工学版), 2013, 43(6): 1-6.
[4] 徐龙琴1,刘双印1,2,3,4*. 基于APSO-WLSSVR的水质预测模型[J]. 山东大学学报(工学版), 2012, 42(5): 80-86.
[5] 赵燕燕, 范丽亚. 多输出支持向量回归机在依赖时间的变分不等式中的应用[J]. 山东大学学报(工学版), 2011, 41(3): 23-30.
[6] 郑君君1,夏胜平1,李新光1,祝一薇1,刘建军1,谭立球1,2. 基于RSOM树的图像K近邻求解算法[J]. 山东大学学报(工学版), 2011, 41(2): 80-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王素玉,艾兴,赵军,李作丽,刘增文 . 高速立铣3Cr2Mo模具钢切削力建模及预测[J]. 山东大学学报(工学版), 2006, 36(1): 1 -5 .
[2] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[3] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[4] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[5] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[6] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[7] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[8] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[9] 浦剑1 ,张军平1 ,黄华2 . 超分辨率算法研究综述[J]. 山东大学学报(工学版), 2009, 39(1): 27 -32 .
[10] 秦通,孙丰荣*,王丽梅,王庆浩,李新彩. 基于极大圆盘引导的形状插值实现三维表面重建[J]. 山东大学学报(工学版), 2010, 40(3): 1 -5 .