JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2017, Vol. 47 ›› Issue (5): 195-202.doi: 10.6040/j.issn.1672-3961.0.2017.180

Previous Articles     Next Articles

Weighted hyper-ellipsoidal support vector data description with negative samples for outlier detection

YAO Yu, FENG Jian*, ZHANG Huaguang, HAN Kezhen   

  1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, Liaoning, China
  • Received:2017-02-10 Online:2017-10-20 Published:2017-02-10

Abstract: To solve the influence of the imbalance between positive and negative samples in training sample set, a method named weighted hyper-ellipsoidal support vector data description with negative samples(WNESVDD)was proposed. Mahalanobis distance was introduced such that the information of sample distribution was completely considered. Both normal and negative samples were utilized to modeling. Cost-sensitive learning was introduced to set different weights for different classes. The results showed that the empty areas that decision boundary enclosed were reduced effectively and the decision boundary was refined in the proposed method. The data utilization rate was obviously improved. Several experiments on University of California at Irvine(UCI)data sets and the data set from the semi-conductor manufacturing process were conducted. The experiments results showed that the proposed method had strong ability of anomaly detection, and compared with the similar method, false positives and false negatives were dramatically reduced.

Key words: Mahalanobis distance, geometric center of boundary, empty area, outlier detection, sample imbalance, hyper-ellipsoidal support vector support vector data description

CLC Number: 

  • TP181
[1] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection: a survey[J]. ACM Computing Surveys(CSUR), 2009, 41(3): 75-79.
[2] 陈斌, 陈松灿, 潘志松,等. 异常检测综述[J]. 山东大学学报(工学版), 2009, 39(6):13-23. CHEN Bin, CHEN Songcan, PAN Zhisong, et al. Survey of outlier detection technologies[J]. Journal of Shandong University(Engineering Science), 2009, 39(6): 13-23.
[3] 莫小勇, 潘志松, 邱俊洋,等. 基于在线特征选择的网络流异常检测[J]. 山东大学学报(工学版), 2016, 46(4): 21-27. MO Xiaoyong, PAN Zhisong, QIU Junyang, et al. Anomaly detection in network traffic based on online feature selection[J]. Journal of Shandong University(Engineering Science), 2016, 46(4): 21-27.
[4] DHAR S, CHERKASSKY V. Development and evaluation of cost-sensitive universum-SVM[J]. IEEE Transactions on Cybernetics, 2015, 45(4): 806-818.
[5] MALDONADO S, LÓPEZ J. Imbalanced data classification using second-order cone programming support vector machines[J]. Pattern Recognition, 2014, 47(5): 2070-2079.
[6] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
[7] WANG S, LI Z, CHAO W, et al. Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning[C] //International Joint Conference on Neural Networks(IJCNN). Brisbane, Australia: IEEE, 2012: 1-8.
[8] ANAND A, PUGALENTHI G, FOGEL G B, et al. An approach for classification of highly imbalanced data using weighting and under-sampling[J]. Amino Acids, 2010, 39(5): 1385-1391.
[9] 梁玮, 陶亮, 张光先,等. 基于特征提取和极值搜索的焊接缺陷检测算法[J]. 山东大学学报(工学版), 2014, 44(3): 48-51. LIANG Wei, TAO Liang, ZHANG Guangxian, et al. Welding defect detection method based on feature extraction and extreme searching[J]. Journal of Shandong University(Engineering Science), 2014, 44(3): 48-51.
[10] MALDONADO S, WEBER R, FAMILI F. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines[J]. Information Sciences, 2014, 286: 228-246.
[11] FENG J, WANG J, ZHANG H, et al. Fault diagnosis method of joint fisher discriminant analysis based on the local and global manifold learning and its kernel version[J]. IEEE Transactions on Automation Science and Engineering, 2016, 13(1): 122-133.
[12] ZHANG Y, WANG D. A cost-sensitive ensemble method for class-imbalanced datasets[J]. Abstract and Applied Analysis, 2013, 2013(1): 900-914.
[13] TAX D M J. One-class classification: concept-learning in the absence of counter-examples[D]. Delft:Delft University of Technology, 2001.
[14] XANTHOPOULOS P, RAZZAGHI T. A weighted support vector machine method for control chart pattern recognition[J]. Computers & Industrial Engineering, 2014, 70(4): 134-149.
[15] CAO P, ZHAO D, ZAIANE O. An optimized cost-sensitive SVM for imbalanced data learning[C] //Pacific-Asia Conference on Knowledge Discovery and Data Mining. Gold Coast, Australia: Springer Berlin Heidelberg, 2013: 280-292.
[16] TANG M Z, YANG C H, GUI W H. Fault detection based on modified QBC and CS-SVM[J]. Control & Decision, 2012, 27(10): 1489-1493.
[17] TAX D M J, DUIN R P W. Support vector data description[J]. Machine Learning, 2004, 54(1): 45-66.
[18] LIU B, XIAO Y, CAO L, et al. SVDD-based outlier detection on uncertain data[J]. Knowledge and Information Systems, 2013, 34(3): 597-618.
[19] YANG Min, ZHANG Huanguo, FU Jianming, et al. Anomaly intrusion detection method based on SVDD[J]. Computer Engineering, 2005, 31(3): 39-42.
[20] 张思懿, 王士同. 核化空间深度间距的特征提取方法[J]. 山东大学学报(工学版), 2012, 42(3): 45-51. ZHANG Siyi, WANG Shitong. Kernelized spatial depth function for the feature extraction method[J]. Journal of Shandong University(Engineering Science), 2012, 42(3): 45-51.
[21] VAPNIK V N. An overview of statistical learning theory[J]. IEEE Transactions on Neural Networks, 1999, 10(5): 988-999.
[22] HE Y, PI D. Anomaly detection algorithm for helicopter rotor based on STFT and SVDD[C] //International Conference on Cloud Computing and Security. Nanjing, China: Springer International Publishing, 2016: 383-393.
[23] CHEN G, ZHANG X, WANG Z J, et al. Robust support vector data description for outlier detection with noise or uncertain data[J]. Knowledge-Based Systems, 2015, 90(C): 129-137.
[24] ZHENG S. Smoothly approximated support vector domain description[J]. Pattern Recognition, 2016, 49(C): 55-64.
[25] GHASEMIGOL M, MONSEFI R, YAZDI H S. Intrusion detection by new data description method[C] //International Conference on Intelligent Systems, Modelling and Simulation. Liverpool, United Kingdom: IEEE, 2010: 1-5.
[26] RAJASEGARAR S, LECKIE C, BEZDEK J C, et al. Centered hyperspherical and hyperellipsoidal one-class support vector machines for anomaly detection in sensor networks[J]. IEEE Transactions on Information Forensics and Security, 2010, 5(3): 518-533.
[1] TAO Zhiwei, ZHANG Li. Time series classification using piecewise vector quantized approximation based on Mahalanobis distance [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 51-57.
[2] XIN Liling, HE Wei, YU Jian, JIA Caiyan. An outlier detection algorithm based on density difference [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(3): 7-14.
[3] ZHANG Si-yi1,2, WANG Shi-tong1*. Kernelized spatial depth function for the feature extraction method [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(3): 45-51.
[4] HU Yun1,2, LI Hui1, SHI Jun1, CAI Hong1. An outlier detection algorithm based on attribute reduction and relative entropy [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(6): 31-36.
[5] HUANG Tian-qiang1,2, CHEN Zhi-wen1. Digital video forgeries detection based on bidirectional motion vectors [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 13-19.
[6] YANG Jin-wei, WANG Li-zhen*, CHEN Hong-mei, ZHAO Li-hong. Distance-based outlier detection over uncertain data [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 34-37.
[7] SUN Jing-yu, YU Xue-li, CHEN Jun-jie, LI Xian-hua. Sampled peculiarity factor and its application in anomaly detection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 56-59.
[8] FENG Ai-min1, LIU Xue-jun1, CHEN Bin2. Structure large margin one-class classifier [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 6-12.
[9] CHEN Bin, CHEN Song-Can, PAN Zhi-Song, LI Bin. Survey of outlier detection technologies [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(6): 13-23.
[10] LUO Yu-Pan, SHANG Lin. Detect outliers in time series data with multi-granule periodic patterns [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 11-15.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!