您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2017, Vol. 47 ›› Issue (5): 195-202.doi: 10.6040/j.issn.1672-3961.0.2017.180

• • 上一篇    下一篇

一种基于椭球体支持向量描述的异常检测方法

姚宇,冯健*,张化光,韩克镇   

  1. 东北大学信息科学与工程学院, 辽宁 沈阳 110819
  • 收稿日期:2017-02-10 出版日期:2017-10-20 发布日期:2017-02-10
  • 通讯作者: 冯健(1971— ),男,辽宁沈阳人,教授,博士生导师,博士,主要研究方向为复杂工业系统优化控制与故障诊断,智能信息处理及智能系统在工业中的应用. E-mail:fjneu@163.com E-mail:y-yaoyu@163.com
  • 作者简介:姚宇(1992— ),男,河北青县人,硕士研究生,主要研究方向为过程工业故障诊断. E-mail:y-yaoyu@163.com

Weighted hyper-ellipsoidal support vector data description with negative samples for outlier detection

YAO Yu, FENG Jian*, ZHANG Huaguang, HAN Kezhen   

  1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, Liaoning, China
  • Received:2017-02-10 Online:2017-10-20 Published:2017-02-10

摘要: 为了解决训练样本数据集中正类、负类样本不平衡的问题,提出一种考虑负类样本信息的加权超椭球体支持向描述方法(weighted hyper-ellipsoidal support vector data description with negative samples, WNESVDD)。 该方法首先引入马氏距离,充分考虑样本分布信息,同时利用正类、负类样本信息建模,融合代价敏感学习思想对不同类样本赋予不同权重。研究结果表明,所提方法可有效减少决策边界包围的空白区域,更好地调整决策边界,而且数据集的利用率明显提高。所提方法应用在University of California at Irvine(UCI)数据集和半导体工业过程数据上的试验结果证明,所提方法具有较强的异常检测能力,相比于同类方法,漏报误报明显减少。

关键词: 样本不平衡, 马氏距离, 超椭球体支持向量描述, 边界几何中心, 异常检测, 空白区域

Abstract: To solve the influence of the imbalance between positive and negative samples in training sample set, a method named weighted hyper-ellipsoidal support vector data description with negative samples(WNESVDD)was proposed. Mahalanobis distance was introduced such that the information of sample distribution was completely considered. Both normal and negative samples were utilized to modeling. Cost-sensitive learning was introduced to set different weights for different classes. The results showed that the empty areas that decision boundary enclosed were reduced effectively and the decision boundary was refined in the proposed method. The data utilization rate was obviously improved. Several experiments on University of California at Irvine(UCI)data sets and the data set from the semi-conductor manufacturing process were conducted. The experiments results showed that the proposed method had strong ability of anomaly detection, and compared with the similar method, false positives and false negatives were dramatically reduced.

Key words: Mahalanobis distance, geometric center of boundary, empty area, outlier detection, sample imbalance, hyper-ellipsoidal support vector support vector data description

中图分类号: 

  • TP181
[1] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection: a survey[J]. ACM Computing Surveys(CSUR), 2009, 41(3): 75-79.
[2] 陈斌, 陈松灿, 潘志松,等. 异常检测综述[J]. 山东大学学报(工学版), 2009, 39(6):13-23. CHEN Bin, CHEN Songcan, PAN Zhisong, et al. Survey of outlier detection technologies[J]. Journal of Shandong University(Engineering Science), 2009, 39(6): 13-23.
[3] 莫小勇, 潘志松, 邱俊洋,等. 基于在线特征选择的网络流异常检测[J]. 山东大学学报(工学版), 2016, 46(4): 21-27. MO Xiaoyong, PAN Zhisong, QIU Junyang, et al. Anomaly detection in network traffic based on online feature selection[J]. Journal of Shandong University(Engineering Science), 2016, 46(4): 21-27.
[4] DHAR S, CHERKASSKY V. Development and evaluation of cost-sensitive universum-SVM[J]. IEEE Transactions on Cybernetics, 2015, 45(4): 806-818.
[5] MALDONADO S, LÓPEZ J. Imbalanced data classification using second-order cone programming support vector machines[J]. Pattern Recognition, 2014, 47(5): 2070-2079.
[6] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
[7] WANG S, LI Z, CHAO W, et al. Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning[C] //International Joint Conference on Neural Networks(IJCNN). Brisbane, Australia: IEEE, 2012: 1-8.
[8] ANAND A, PUGALENTHI G, FOGEL G B, et al. An approach for classification of highly imbalanced data using weighting and under-sampling[J]. Amino Acids, 2010, 39(5): 1385-1391.
[9] 梁玮, 陶亮, 张光先,等. 基于特征提取和极值搜索的焊接缺陷检测算法[J]. 山东大学学报(工学版), 2014, 44(3): 48-51. LIANG Wei, TAO Liang, ZHANG Guangxian, et al. Welding defect detection method based on feature extraction and extreme searching[J]. Journal of Shandong University(Engineering Science), 2014, 44(3): 48-51.
[10] MALDONADO S, WEBER R, FAMILI F. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines[J]. Information Sciences, 2014, 286: 228-246.
[11] FENG J, WANG J, ZHANG H, et al. Fault diagnosis method of joint fisher discriminant analysis based on the local and global manifold learning and its kernel version[J]. IEEE Transactions on Automation Science and Engineering, 2016, 13(1): 122-133.
[12] ZHANG Y, WANG D. A cost-sensitive ensemble method for class-imbalanced datasets[J]. Abstract and Applied Analysis, 2013, 2013(1): 900-914.
[13] TAX D M J. One-class classification: concept-learning in the absence of counter-examples[D]. Delft:Delft University of Technology, 2001.
[14] XANTHOPOULOS P, RAZZAGHI T. A weighted support vector machine method for control chart pattern recognition[J]. Computers & Industrial Engineering, 2014, 70(4): 134-149.
[15] CAO P, ZHAO D, ZAIANE O. An optimized cost-sensitive SVM for imbalanced data learning[C] //Pacific-Asia Conference on Knowledge Discovery and Data Mining. Gold Coast, Australia: Springer Berlin Heidelberg, 2013: 280-292.
[16] TANG M Z, YANG C H, GUI W H. Fault detection based on modified QBC and CS-SVM[J]. Control & Decision, 2012, 27(10): 1489-1493.
[17] TAX D M J, DUIN R P W. Support vector data description[J]. Machine Learning, 2004, 54(1): 45-66.
[18] LIU B, XIAO Y, CAO L, et al. SVDD-based outlier detection on uncertain data[J]. Knowledge and Information Systems, 2013, 34(3): 597-618.
[19] YANG Min, ZHANG Huanguo, FU Jianming, et al. Anomaly intrusion detection method based on SVDD[J]. Computer Engineering, 2005, 31(3): 39-42.
[20] 张思懿, 王士同. 核化空间深度间距的特征提取方法[J]. 山东大学学报(工学版), 2012, 42(3): 45-51. ZHANG Siyi, WANG Shitong. Kernelized spatial depth function for the feature extraction method[J]. Journal of Shandong University(Engineering Science), 2012, 42(3): 45-51.
[21] VAPNIK V N. An overview of statistical learning theory[J]. IEEE Transactions on Neural Networks, 1999, 10(5): 988-999.
[22] HE Y, PI D. Anomaly detection algorithm for helicopter rotor based on STFT and SVDD[C] //International Conference on Cloud Computing and Security. Nanjing, China: Springer International Publishing, 2016: 383-393.
[23] CHEN G, ZHANG X, WANG Z J, et al. Robust support vector data description for outlier detection with noise or uncertain data[J]. Knowledge-Based Systems, 2015, 90(C): 129-137.
[24] ZHENG S. Smoothly approximated support vector domain description[J]. Pattern Recognition, 2016, 49(C): 55-64.
[25] GHASEMIGOL M, MONSEFI R, YAZDI H S. Intrusion detection by new data description method[C] //International Conference on Intelligent Systems, Modelling and Simulation. Liverpool, United Kingdom: IEEE, 2010: 1-5.
[26] RAJASEGARAR S, LECKIE C, BEZDEK J C, et al. Centered hyperspherical and hyperellipsoidal one-class support vector machines for anomaly detection in sensor networks[J]. IEEE Transactions on Information Forensics and Security, 2010, 5(3): 518-533.
[1] 莫小勇,潘志松,邱俊洋,余亚军,蒋铭初. 基于在线特征选择的网络流异常检测[J]. 山东大学学报(工学版), 2016, 46(4): 21-27.
[2] 陶志伟,张莉. 基于马氏距离的分段矢量量化时间序列分类[J]. 山东大学学报(工学版), 2016, 46(3): 51-57.
[3] 张思懿1,2,王士同1*. 核化空间深度间距的特征提取方法[J]. 山东大学学报(工学版), 2012, 42(3): 45-51.
[4] 孙静宇,余雪丽,陈俊杰, 李鲜花. 采样特异性因子及异常检测[J]. 山东大学学报(工学版), 2010, 40(5): 56-59.
[5] 冯爱民1,刘学军1,陈斌2. 结构大间隔单类分类器[J]. 山东大学学报(工学版), 2010, 40(3): 6-12.
[6] 陈斌 陈松灿 潘志松 李斌. 异常检测综述[J]. 山东大学学报(工学版), 2009, 39(6): 13-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!