JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2017, Vol. 47 ›› Issue (1): 22-27.doi: 10.6040/j.issn.1672-3961.1.2016.060

Previous Articles     Next Articles

A sparse online learning algorithm for feature selection

WEI Bo1,2, ZHANG Wensheng1, LI Yuanxiang3, XIA Xuewen2, LYU Jingqin2   

  1. 1. Institute of Automation, Chinese Academy of Science, Beijing 100190, China;
    2. School of Software, East China Jiaotong University, Nanchang 330013, Jiangxi, China;
    3. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, Hubei, China
  • Received:2016-03-01 Online:2017-02-20 Published:2016-03-01

Abstract: In order to effectively deal with mass, high dimensional and sparse big data and improve the efficiency of data classification, an online learning algorithm based on the sparsity principle of L1 norm was proposed. The feature of high dimensional streaming data were novel “Integer” processed by using the online machine learning algorithm framework increased the sparsity of data feature, meanwhile enhanced the partial feature value within the scope of the threshold value and greatly improved the effect of sparse data classification. The performance of SFSOL algorithm was analyzed by using public data sets. The algorithm and the performance of the other three sparse online learning algorithms were compared. The experimental results showed that SFSOL algorithm was more suitable to accurately classify for high-dimensional sparse data.

Key words: L1 norm, big data, machine learning, online learning, sparsity

CLC Number: 

  • TP391
[1] GUYON I, ELISSCCFF A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research, 2003, 3(6):1157-1182.
[2] SAEYS Y, INZA I, LARRANAGA P. A review of feature selection techniques in bioinformatics[J]. Bioinformatics, 2007, 23(19):2507-2517.
[3] 李霞,王连喜,蒋盛益.面向不平衡问题的集成特征选择[J].山东大学学报(工学版),2011,41(3):7-11. LI Xia, WANG Lianxi, JIANG Shengyi. Ensemble learning based feature selection for imbalanced problems[J]. Journal of Shandong University(Engineering Science), 2011, 41(3):7-11.
[4] 万中英,王明文,左家莉,等. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版),2016,51(5):87-93. WAN Zhongying, WANG Mingwen, ZUO Jiali, et al. Feature selection combined with the global and local information[J]. Journal of Shandong University(Natural Science), 2016, 51(5):87-93.
[5] 孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862. SUN Dawei, ZHANG Guangyan, ZHENG Weimin. Big data stream computing: technologies and instances[J]. Journal of Software, 2014, 25(4):839-862.
[6] LANGFORD J, LI Lihong, ZHANG Tong. Sparse online learning via truncated gradient[J]. Journal of Machine Learning Research, 2009, 10(1):777-801.
[7] 李志杰,李元香,王峰,等. 面向大数据分析的在线学习算法综述[J].计算机研究与发展,2015,52(8):1707-1721. LI Zhijie, LI Yuanxiang, WANG Feng, et al. Online learning algorithms for big data analytics: a survey[J]. Journal of Computer Research and Development, 2015, 52(8):1707-1721.
[8] ROSENBLATT F. A probabilistic model for information storage and organization in the brain1[J]. Artificial Intelligence: Critical Concepts, 2000, 2(6):386-408.
[9] LI Yi, LONG P M. The relaxed online maximum margin algorithm[J]. Machine Learning, 2002, 46(1-3):361-387.
[10] GENTILE Claudio. A new approximate maximal margin classification algorithm[J]. Journal of Machine Learning Research, 2001, 2(2):213-242.
[11] CRAMMER Koby, DEKEL Ofer, KESHET Joseph, et al. Online passive-aggressive algorithms[J]. Journal of Machine Learning Research, 2006, 7(3):551-585.
[12] DREDZE Mark, CRAMMER Koby, PEREIRA Fernando. Confidence-weighted linear classification[J]. Journal of Machine Learning Research, 2012, 13(9):1891-1926.
[13] WANG Jialei, ZHAO Peilin, HOI S C H. Exact soft confidence-weighted learning[C] // Proceedings of the 29th International Conference on Machine Learning. Scotland, Braitain: Edinburgh, 2012: 1-8.
[14] KIVINEN J, SMOLA A J, WILLIAMSON R C. Online learning with kernels[J]. IEEE Transactions on Signal Processing, 2004, 52(8):2165-2176.
[15] DASH M, GOPALKRISHNAN V. Distance based feature selection for clustering microarray data[C] // International Conference on Database Systems for Advanced Applications. New Delhi, India: Springer Berlin Heidelberg, 2008, 49(47):512-519.
[16] KOHAVI R,JOHN G H. Wrappers for feature subset selection[J]. Artificial Intelligence, 1997, 97(1): 273-324.
[17] ZHAO Z, LIU H. Spectral feature selection for supervised and unsupervised learning[C] // Proceedings of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007: 1151-1157.
[18] XU Z, JIN R, YE J, et al. Non-monotonic feature selection[C] // International Conference on Machine Learning. Montreal, USA: ACM, 2009: 45-51.
[19] DONOHO D L. Compressed sensing[J]. IEEE Transactions on Information Theory, 2006, 52(4):1289-1306.
[20] DUCHI J, SINGER Y. Efficient online and batch learning using forward backward splitting[J].The Journal of Machine Learning Research, 2009, 10(8):2899-2934.
[1] ZHU Ming, SHI Chenglong, LÜ Pan, LIU Xianrong, SUN Chi, CHEN Jiancheng, FAN Hongyun. Deformation prediction method and engineering application of deep foundation pit based on optimized LSTM method [J]. Journal of Shandong University(Engineering Science), 2025, 55(3): 141-148.
[2] Xiushan NIE,Yuling MA,Huiyan QIAO,Jie GUO,Chaoran CUI,Zhiyun YU,Xingbo LIU,Yilong YIN. Survey on student academic performance prediction from the perspective of task granularity [J]. Journal of Shandong University(Engineering Science), 2022, 52(2): 1-14.
[3] Gaoteng YUAN,Yihui LIU,Wei HUANG,Bing HU. MR image classification and recognition model of breast cancer based onGabor feature [J]. Journal of Shandong University(Engineering Science), 2020, 50(3): 15-23.
[4] Dapeng ZHANG,Yajun LIU,Wei ZHANG,Fen SHEN,Jiansheng YANG. Fake comment detection based on heterogeneous ensemble learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 1-9.
[5] Haijun ZHANG,Yinghui CHEN. Semantic analysis and vectorization for intelligent detection of big data cross-site scripting attacks [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 118-128.
[6] Minghe GAO,Ying ZHANG,Rongrong ZHANG,Zihao HUANG,Linyan HUANG,Fanyu LI,Xin ZHANG,Yanhao WANG. Air quality prediction approach based on integrating forecasting dataset [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 91-99.
[7] Yutian LIU, Runjia SUN, Hongtao WANG, Xueping GU. Review on application of artificial intelligence in power system restoration [J]. Journal of Shandong University(Engineering Science), 2019, 49(5): 1-8.
[8] Tong LI,Ran MA,Honghe ZHENG,Ping AN,Xiangyu HU. An error sensitivity model based on video statistical features [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 116-121.
[9] Qijie ZOU,Haoyu LI,Rubo ZHANG,Tengda PEI,Yan LIU. Survey of human-robot interaction control for autonomous driving [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 23-33.
[10] Mian ZHANG,Ying HUANG,Haiyi MEI,Yu GUO. Intelligent interaction method for power distribution robot based on Kinect [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 103-108.
[11] WANG Tingting, ZHAI Junhai, ZHANG Mingyang, HAO Pu. K-NN algorithm for big data based on HBase and SimHash [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 54-59.
[12] LIU Yang, LIU Bo, WANG Feng. Optimization algorithm for big data mining based on parameter server framework [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 1-6.
[13] LIN Yaojin, ZHANG Jia, LIN Menglei, WANG Juan. A method of collaborative filtering recommendation based on fuzzy information entropy [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 13-20.
[14] ZHOU Wang, ZHANG Chenlin, WU Jianxin. Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 37-44.
[15] MENG Lingheng, DING Shifei. Depth perceptual model based on the single image [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 37-43.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Kan . Empolder and implement of the embedded weld control system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 37 -41 .
[2] SHI Lai-shun,WAN Zhong-yi . Synthesis and performance evaluation of a novel betaine-type asphalt emulsifier[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 112 -115 .
[3] LAI Xiang . The global domain of attraction for a kind of MKdV equations[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 87 -92 .
[4] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[5] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[6] WANG Bo,WANG Ning-sheng . Automatic generation and combinatory optimization of disassembly sequence for mechanical-electric assembly[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 52 -57 .
[7] ZHANG Ying,LANG Yongmei,ZHAO Yuxiao,ZHANG Jianda,QIAO Peng,LI Shanping . Research on technique of aerobic granular sludge cultivationby seeding EGSB anaerobic granular sludge[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 56 -59 .
[8] Yue Khing Toh1, XIAO Wendong2, XIE Lihua1. Wireless sensor network for distributed target tracking: practices via real test bed development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 50 -56 .
[9] SUN Weiwei, WANG Yuzhen. Finite gain stabilization of singlemachine infinite bus system subject to saturation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 69 -76 .
[10] SUN Yu-li,LI De-fa,ZUO Dun-wen,QI mei . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(6): 19 -23 .