基于核相似性删减策略的支持向量回归算法

doi:10.6040/j.issn.1672-3961.0.2017.417

摘要/Abstract

摘要：

在核相似性基础上结合随机梯度下降算法提出支持向量删减策略(SVs_reduced strategy, SRS);引入核相似性删减冗余的支持向量来提高大型非线性支持向量回归的效率。在每次随机梯度下降的迭代中,如果新的样本被认为是一个支持向量,那么就会计算它和其它支持向量间的相似性。如果相似性大于一个设定的阈值,那么就会删减这个支持向量。基于UCI数据集, LIBSVM数据集和风速数据集的试验结果表明,与其它流行算法相比,这个策略可以十分高效地解决大型支持向量回归问题。

关键词: 支持向量回归, 快速算法, 随机梯度下降, 核相似性

Abstract:

With the increase of iterations, SGD was susceptible to the curse of kernelization. We introduced the kernel similarity to remove the redundant SVs called SVs-reduced strategy (SRS) for improving the efficiency of SGD in large scale non-linear modeling. At each iteration in SGD, the similarities between a new sample and the recorded SVs were computed if the new sample was a SV. If the similarity was larger than a threshold, this SV was not saved. Experimental results on UCI, LIBSVM and Wind-speed datasets demonstrated that the proposed SRS could be used to solve large-scale non-linear SVR comparing with some state-of-the-art algorithms.

Key words: support vector regression, efficient algorithms, stochastic gradient descent, kernel similarity

中图分类号:

TP181

李英达,谢宗霞. 基于核相似性删减策略的支持向量回归算法[J]. 山东大学学报 (工学版), 2019, 49(3): 8-14.

Yingda LI,Zongxia XIE. Support vector regression algorithm based on kernel similarity reduced strategy[J]. Journal of Shandong University(Engineering Science), 2019, 49(3): 8-14.

图/表 6

表1

表2

表3

图1

图2

图3

参考文献 19

1	BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C]//Proceedings of the Workshop on Computational Learning Theory. Pittsburgh, PA: [s.n.], 1992: 144-152.
2	BROWN J D , SUMMERS M F , JOHNSON B A . Prediction of hydrogen and carbon chemical shifts from RNA using database mining and support vector regression[J]. Journal of Biomolecular NMR, 2015, 63 (1): 1- 14. doi: 10.1007/s10858-015-9981-0
3	CHEN J , XUE X , HA M , et al. Support vector regression method for wind speed prediction incorporating probability prior knowledge[J]. Mathematical Problems in Engineering, 2017, 2014 (2014): 1- 10.
4	OSUNA E, FREUND R, GIROSI F. Training support vector machines: an application to face detection[C]//Proceedings of the CVPR'97. New York, USA: IEEE, 1997: 130-136.
5	陈荣, 梁昌勇, 谢福伟. 基于SVR的非线性时间序列预测方法应用综述[J]. 合肥工业大学学报(自然科学版), 2013, 36 (3): 369- 374. doi: 10.3969/j.issn.1003-5060.2013.03.025
6	HO C H , LIN C J . Large-scale linear support vector regression[J]. Journal of Machine Learning Research, 2012, 13 (1): 3323- 3348.
7	LIN C J, WENG R C, KEERTHI S S. Trust region Newton methods for large-scale logistic regression[C]//Proceedings of the Twenty-Fourth International Conference. Corvalis, Oregon, USA: DBLP, 2007: 561-568.
8	HSIEH C J, CHANG K W, LIN C J, et al. A dual coordinate descent method for large-scale linear SVM[C]//Proceedings of the ICML. Helsinki, Finland: [s.n.], 2008: 408-415.
9	XIE X, CHEN C, CHEN Z. Mini-batch quasi-newton optimization for large scale linear support vector regression[C]//Proceedings of the International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. Chengdu, China: [s.n.], 2015.
10	WANG Y, OU G, PANG W, et al. e-Distance weighted support vector regression[C]//Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS 2016). Barcelona, Spain: [s.n.], 2016.
11	KIVINEN J , SMOLA A J , WILLIAMSON R C . Online learning with kernels[J]. IEEE Transactions on Signal Processing, 2002, 52 (8): 2165- 2176.
12	SHALEV-SHWARTZ S, SINGER Y, SREBRO N. Pegasos: primal estimated sub-gradient solver for SVM[C]//Proceedings of the Twenty-Fourth ACM International Conference. Corvalis, Oregon, USA: DBLP, 2007: 807-814.
13	ZHANG T. Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]//Proceedings of the International Conference on Machine Learning. Banff, Canada: Omnipress, 2004: 116.
14	BORDES A , BOTTOU L , GALLINARI P . SGD-QN: careful quasi-newton stochastic gradient descent[J]. Journal of Machine Learning Research, 2009, 10 (3): 1737- 1754.
15	WANG Z , CRAMMER K , VUCETIC S . Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training[J]. Journal of Machine Learning Research, 2012, 13 (1): 3103- 3131.
16	SMOLA A J , LKOPF B . A tutorial on support vector regression[J]. Statistics and Computing, 2004, 14 (3): 199- 222. doi: 10.1023/B:STCO.0000035301.49549.88
17	CRISTIANIN N , LKOPF B . Support vector machines and kernel methods: the new generation of learning machines[J]. Ai Magazine, 2002, 23 (3): 31- 41.
18	GRAF A, BORER S. Normalization in support vector machines[M]//Pattern Recognition.[S.1.]: Springer Berlin Heidelberg, 2001: 277-282.
19	CHANG C C , LIN C J . LIBSVM: a library for support vector machines[J]. Acm Transactions on Intelligent Systems & Technology, 2007, 2 (3): 27.

多维度评价

Viewed

Full text

428

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	10	0	0	418

From	Others	local

Times	58	370
Rate	14%	86%

Abstract

1321

Just accepted	Online first	Issue

0	0	1321

	From	Others

	Times	1321
	Rate	100%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

编号	数据集	实例数	训练样本数	测试样本数	属性数
1	Cadata	20 640	180 000	2 640	8
2	CASP	45 730	40 000	5 730	9
3	Cpusmall	8 192	600 000	2 192	12
4	Space_ga	3 107	250 000	607	6
5	TFIDF-2006	16 087	16 087	3 308	150 360
6	Wind-ningxia	51 840	45 000	6 840	24
7	Wind-farm	246 360	246 360	26 360	6

数据集	SVR_SRS			SVR_BSGD		SVR_SGD		LIBSVM
数据集	K_reduce	MSE	#SVs	MSE	#SVs	MSE	#SVs	MSE	#SVs
Cadata	0.99	0.014 4	6 237	0.014 0	500	0.013 8	61 692	0.014 0	51 871
CASP	0.98	0.044 2	19 873	0.049 1	500	0.046 2	25 658	0.044 4	21 997
Cpusmall	0.99	0.002 1	836	0.002 2	500	0.002 2	5 938	0.002 3	4 314
Space_ga	0.99	0.002 2	149	0.002 3	500	0.002 3	4 749	0.001 8	3 685
TFIDF-2006	0.98	0.146 4	765	0.153 2	500	0.140 3	1 878	0.140 3	11 410
Wind-ningxia	0.97	0.490 1	10 741	0.614 6	500	0.510 2	23 154	0.470 4	35 469
Wind-farm	0.99	4.118 9	4 249	4.248 5	500	4.2 21 2	183 413	3.936 4	232 965

数据集	SVR_SRS	SVR_BSGD	SVR_SGD	LIBSVM
Cadata	41.87	146.15	417.11	817.01
CASP	29.67	62.53	47.77	101.24
Cpusmall	35.53	43.58	109.11	334.76
Space_ga	5.67	17.31	24.88	33.28
TFIDF-2006	1 021.26	1 411.99	1 658.66	1 611.85
Wind-ningxia	44.95	54.42	167.73	1 258.58
Wind-farm	44.81	540.45	1 722.33	2 452.69

[1]	李笋,王超,张桂林,徐志根,程涛,王义元,王瑞琪. 基于支持向量回归的短期负荷预测[J]. 山东大学学报(工学版), 2017, 47(6): 52-56.
[2]	王梅,曾昭虎,孙莺萁,杨二龙,宋考平. 基于输入K-近邻的正则化路径上SVR贝叶斯组合[J]. 山东大学学报(工学版), 2016, 46(6): 8-14.
[3]	高大龙,黄雅平*,李清勇,王胜春,罗四维. 基于列车前向运动视频的全景图拼接算法[J]. 山东大学学报(工学版), 2013, 43(6): 1-6.
[4]	徐龙琴1,刘双印1,2,3,4*. 基于APSO-WLSSVR的水质预测模型[J]. 山东大学学报(工学版), 2012, 42(5): 80-86.
[5]	赵燕燕, 范丽亚. 多输出支持向量回归机在依赖时间的变分不等式中的应用[J]. 山东大学学报(工学版), 2011, 41(3): 23-30.
[6]	郑君君1,夏胜平1,李新光1,祝一薇1,刘建军1,谭立球1,2. 基于RSOM树的图像K近邻求解算法[J]. 山东大学学报(工学版), 2011, 41(2): 80-84.