一种选择特征的稀疏在线学习算法

doi:10.6040/j.issn.1672-3961.1.2016.060

山东大学学报(工学版) ›› 2017, Vol. 47 ›› Issue (1): 22-27.doi: 10.6040/j.issn.1672-3961.1.2016.060

一种选择特征的稀疏在线学习算法

魏波^1,2,张文生¹,李元香³,夏学文²,吕敬钦²

1.中国科学院自动化研究所, 北京 100190;2.华东交通大学软件学院, 江西南昌 330013;3.武汉大学软件工程国家重点实验室, 湖北武汉 430072

收稿日期:2016-03-01 出版日期:2017-02-20 发布日期:2016-03-01
作者简介:魏波(1983— ),男,湖北天门人,讲师,博士,主要研究方向为智能计算与机器学习. E-mail:weibo@whu.edu.cn
基金资助:
国家自然科学基金重点资助项目(61432008);国家自然科学基金资助项目(61463017);江西省自然科学基金资助项目(20151BAB207022);江西省教育厅科学技术研究资助项目(GJJ150539)

A sparse online learning algorithm for feature selection

WEI Bo^1,2, ZHANG Wensheng¹, LI Yuanxiang³, XIA Xuewen², LYU Jingqin²

1. Institute of Automation, Chinese Academy of Science, Beijing 100190, China;
2. School of Software, East China Jiaotong University, Nanchang 330013, Jiangxi, China;
3. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, Hubei, China

Received:2016-03-01 Online:2017-02-20 Published:2016-03-01

摘要/Abstract

摘要： 为了有效处理海量、高维、稀疏的大数据,提高对数据的分类效率,提出一种基于L₁准则稀疏性原理的在线学习算法(a sparse online learning algorithm for selection feature, SFSOL)。运用在线机器学习算法框架,对高维流式数据的特征进行新颖的“取整”处理,加大数据特征稀疏性的同时增强了阀值范围内部分特征的值,极大地提高了对稀疏数据分类的效果。利用公开的数据集对SFSOL算法的性能进行分析,并将该算法与其它3种稀疏在线学习算法的性能进行比较,试验结果表明提出的SFSOL算法对高维稀疏数据分类的准确性更高。

关键词: 大数据, 机器学习, 在线学习, 稀疏性, L₁准则

Abstract: In order to effectively deal with mass, high dimensional and sparse big data and improve the efficiency of data classification, an online learning algorithm based on the sparsity principle of L₁ norm was proposed. The feature of high dimensional streaming data were novel “Integer” processed by using the online machine learning algorithm framework increased the sparsity of data feature, meanwhile enhanced the partial feature value within the scope of the threshold value and greatly improved the effect of sparse data classification. The performance of SFSOL algorithm was analyzed by using public data sets. The algorithm and the performance of the other three sparse online learning algorithms were compared. The experimental results showed that SFSOL algorithm was more suitable to accurately classify for high-dimensional sparse data.

Key words: L₁ norm, big data, machine learning, online learning, sparsity

中图分类号:

TP391

魏波,张文生,李元香,夏学文,吕敬钦. 一种选择特征的稀疏在线学习算法[J]. 山东大学学报(工学版), 2017, 47(1): 22-27.

WEI Bo, ZHANG Wensheng, LI Yuanxiang, XIA Xuewen, LYU Jingqin. A sparse online learning algorithm for feature selection[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 22-27.

参考文献

[1] GUYON I, ELISSCCFF A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research, 2003, 3(6):1157-1182.
[2] SAEYS Y, INZA I, LARRANAGA P. A review of feature selection techniques in bioinformatics[J]. Bioinformatics, 2007, 23(19):2507-2517.
[3] 李霞,王连喜,蒋盛益.面向不平衡问题的集成特征选择[J].山东大学学报(工学版),2011,41(3):7-11. LI Xia, WANG Lianxi, JIANG Shengyi. Ensemble learning based feature selection for imbalanced problems[J]. Journal of Shandong University(Engineering Science), 2011, 41(3):7-11.
[4] 万中英,王明文,左家莉,等. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版),2016,51(5):87-93. WAN Zhongying, WANG Mingwen, ZUO Jiali, et al. Feature selection combined with the global and local information[J]. Journal of Shandong University(Natural Science), 2016, 51(5):87-93.
[5] 孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862. SUN Dawei, ZHANG Guangyan, ZHENG Weimin. Big data stream computing: technologies and instances[J]. Journal of Software, 2014, 25(4):839-862.
[6] LANGFORD J, LI Lihong, ZHANG Tong. Sparse online learning via truncated gradient[J]. Journal of Machine Learning Research, 2009, 10(1):777-801.
[7] 李志杰,李元香,王峰,等. 面向大数据分析的在线学习算法综述[J].计算机研究与发展,2015,52(8):1707-1721. LI Zhijie, LI Yuanxiang, WANG Feng, et al. Online learning algorithms for big data analytics: a survey[J]. Journal of Computer Research and Development, 2015, 52(8):1707-1721.
[8] ROSENBLATT F. A probabilistic model for information storage and organization in the brain1[J]. Artificial Intelligence: Critical Concepts, 2000, 2(6):386-408.
[9] LI Yi, LONG P M. The relaxed online maximum margin algorithm[J]. Machine Learning, 2002, 46(1-3):361-387.
[10] GENTILE Claudio. A new approximate maximal margin classification algorithm[J]. Journal of Machine Learning Research, 2001, 2(2):213-242.
[11] CRAMMER Koby, DEKEL Ofer, KESHET Joseph, et al. Online passive-aggressive algorithms[J]. Journal of Machine Learning Research, 2006, 7(3):551-585.
[12] DREDZE Mark, CRAMMER Koby, PEREIRA Fernando. Confidence-weighted linear classification[J]. Journal of Machine Learning Research, 2012, 13(9):1891-1926.
[13] WANG Jialei, ZHAO Peilin, HOI S C H. Exact soft confidence-weighted learning[C] // Proceedings of the 29th International Conference on Machine Learning. Scotland, Braitain: Edinburgh, 2012: 1-8.
[14] KIVINEN J, SMOLA A J, WILLIAMSON R C. Online learning with kernels[J]. IEEE Transactions on Signal Processing, 2004, 52(8):2165-2176.
[15] DASH M, GOPALKRISHNAN V. Distance based feature selection for clustering microarray data[C] // International Conference on Database Systems for Advanced Applications. New Delhi, India: Springer Berlin Heidelberg, 2008, 49(47):512-519.
[16] KOHAVI R,JOHN G H. Wrappers for feature subset selection[J]. Artificial Intelligence, 1997, 97(1): 273-324.
[17] ZHAO Z, LIU H. Spectral feature selection for supervised and unsupervised learning[C] // Proceedings of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007: 1151-1157.
[18] XU Z, JIN R, YE J, et al. Non-monotonic feature selection[C] // International Conference on Machine Learning. Montreal, USA: ACM, 2009: 45-51.
[19] DONOHO D L. Compressed sensing[J]. IEEE Transactions on Information Theory, 2006, 52(4):1289-1306.
[20] DUCHI J, SINGER Y. Efficient online and batch learning using forward backward splitting[J].The Journal of Machine Learning Research, 2009, 10(8):2899-2934.

相关文章 15

[1]	张冕,黄颖,梅海艺,郭毓. 基于Kinect的配电作业机器人智能人机交互方法[J]. 山东大学学报(工学版), 2018, 48(5): 103-108.
[2]	王婷婷,翟俊海,张明阳,郝璞. 基于HBase和SimHash的大数据K-近邻算法[J]. 山东大学学报(工学版), 2018, 48(3): 54-59.
[3]	刘洋,刘博,王峰. 基于Parameter Server框架的大数据挖掘优化算法[J]. 山东大学学报(工学版), 2017, 47(4): 1-6.
[4]	周旺,张晨麟,吴建鑫. 一种基于Hartigan-Wong和Lloyd的定性平衡聚类算法[J]. 山东大学学报(工学版), 2016, 46(5): 37-44.
[5]	林耀进,张佳,林梦雷,王娟. 一种基于模糊信息熵的协同过滤推荐方法[J]. 山东大学学报(工学版), 2016, 46(5): 13-20.
[6]	孟令恒,丁世飞. 基于单静态图像的深度感知模型[J]. 山东大学学报(工学版), 2016, 46(3): 37-43.
[7]	张佳,林耀进,林梦雷,刘景华,李慧宗. 基于信息熵的协同过滤算法[J]. 山东大学学报(工学版), 2016, 46(2): 43-50.
[8]	刘杰, 杨鹏, 吕文生, 刘阿古达木, 刘俊秀. 基于气象因素的PM_2.5质量浓度预测模型[J]. 山东大学学报(工学版), 2015, 45(6): 76-83.
[9]	李新玉, 徐桂云, 任世锦, 杨茂云. 基于鉴别流形的不相关稀疏投影非负矩阵分解[J]. 山东大学学报(工学版), 2015, 45(5): 1-12.
[10]	郑毅, 朱成璋. 基于深度信念网络的PM_2.5预测[J]. 山东大学学报(工学版), 2014, 44(6): 19-25.
[11]	谢琳1,殷熙尧2,李凡长3,吴佳3. 一种逆归结学习表示[J]. 山东大学学报(工学版), 2013, 43(4): 46-50.
[12]	何雪英1,2, 秦伟1, 尹义龙1*, 赵联征1,乔昊3. 基于机器学习的视频指纹识别[J]. 山东大学学报(工学版), 2011, 41(4): 29-33.
[13]	梁春林1,彭凌西2*. 基于免疫网络的无监督式分类算法[J]. 山东大学学报(工学版), 2010, 40(5): 82-86.
[14]	郭茂祖邹权李文滨韩英鹏. 生物信息学中的学习问题[J]. 山东大学学报(工学版), 2009, 39(3): 1-6.
[15]	白树忠,刘琚,孙国霞 . 基于最小均方误差和稀疏特征的欠定盲源分离[J]. 山东大学学报(工学版), 2008, 38(4): 97-101 .

多维度评价

Viewed

Full text

1005

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	1005

From	Others	local

Times	53	952
Rate	5%	95%

Abstract

1634

Just accepted	Online first	Issue

0	0	1634

	From	Others

	Times	1634
	Rate	100%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

一种选择特征的稀疏在线学习算法

A sparse online learning algorithm for feature selection

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

多维度评价

本文评价

推荐阅读 0