JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2015, Vol. 45 ›› Issue (3): 7-14.doi: 10.6040/j.issn.1672-3961.1.2014.182

Previous Articles     Next Articles

An outlier detection algorithm based on density difference

XIN Liling, HE Wei, YU Jian, JIA Caiyan   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
  • Received:2014-03-26 Revised:2015-03-17 Online:2015-06-20 Published:2014-03-26

Abstract: An improved algorithm IMMOD was proposed based on the algorithm MMOD, which considered the difference among different attributes and improved the accuracy of detection. The algorithm introduced entropy to confirm the significance of attribute. The weight of attribute determined by the entropy was used to calculate the weighted distance between objects. In addition, determining and reducing the secondary attributes could guarantee the computational complexity and improve the precision on the high dimensional datasets. The theoretical analysis and the empirical study both showed that the IMMOD could be applied on high dimensional datasets well with a few of parameters and high accuracy, which was better than other algorithms.

Key words: information entropy, outlier detection, attribute reduction, weighted distance, MMOD, IMMOD

CLC Number: 

  • TP391
[1] HAWKINS D M.Identification of outliers[M].London:Chapman and Hall, 1980:1-1.
[2] KLEINBAUM D, KUPPER L, MULLER K, et al. Applied regression analysis and other multivariable methods[M]. 3 ed. California:Duxbury Press, 1998:228-237.
[3] CHEN Yixin, DANG Xin.Outlier detection with the kernelized spatial depth function[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2):288-305.
[4] KNORR E M, NG R T.Algorithms for mining distance-based outliers in large datasets[C]//Proceedings of the 24rd International Conference on Very Large Data Bases.San Francisco, USA:Morgan Kaufmann, 1998:392-403.
[5] ANGIULLI F, BASTA S, PIZZUTI C. Distance-based detection and prediction of outliers[J].IEEE Transactions on Knowledge and Data Engineering, 2006, 18(2):145-160.
[6] BREITENBACH M,GRUDIC G Z.Clustering through ranking on manifolds[C]//In Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany:ACM, 2005:73-80.
[7] HE Zengyou, XU Xiaofei, DENG Shengchun.Discovering cluster based local outliers[J]. Pattern Recognition Letters, 2003, 24:1641-1650.
[8] GUPTA K K, NATH B, KOTAGIRI R. Layered approach using conditional random fields for intrusion detection[J].IEEE Transactions on Dependable and Secure Computing, 2010, 7(1):35-49.
[9] BREUNIG M M, KRIEGEL H P, NG R T, et al.Identifying density-based local outliers[C]// In Proceedings of the 2000 ACM SIGMOD international conference on Management of data.New York, USA:ACM, 2000:93-104.
[10] LATECKI L, LAZAREVIC A, POKRAJAC D. Outlier detection with kernel density functions[C]//Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition.Leipzig, Germany:Springer, 2007:61-75.
[11] 何威.基于数据密度估计的聚类与离群点检测研究[D].北京:北京交通大学,2011. HE Wei. Data density based clustering and outlier detection[D].Beijing:Beijing Jiaotong University, 2011.
[12] MARKUS Goldstein. An expectation maximization based local outlier detection algorithm[C]//Proceedings of the 21st International Conference on Pattern Recognition. Tsukuba, Japan:IEEE, 2012:2282-2285.
[13] RALF O.A fuzzy vector valued KNN-algorithm for automatic outlier detection[J].Applied Soft Computing, 2009, 9:1263-1272.
[14] PARZEN E.On the estimate of a probability density function and mode[J].Annals Mathematical Statistics, 1962, 33(3):1065-1076.
[15] TERRELL G R, SCOTT D W.Variable kernel density estimation[J].The Annals of Statistics, 1992, 20(3):1236-1265.
[16] KRIEGEL H P, KRGER P, SCHUBERT E, et al.Local outlier probabilities[C]//Proceeding of the 18th ACM Conference on Information and Knowledge Management.Hong Kong, China:ACM, 2009:1649-1652.
[17] YAGER R R,FILEV D P.Approximate clustering via the mountain method[J].IEEE Transactions on Systems,Man and Cybernetics, 1994, 24(8):1279-1284.
[18] CHIU S L.Fuzzy model identification based on cluster estimation[J].Journal of Intelligent Fuzzy Systems, 1994, 2:267-278.
[19] YANG M S, WU K L.A similarity-based robust clustering method[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(4):434-448.
[20] MANOR L Z, PERONA P. Self-tuning spectral clustering[C]// Proceedings of the 18th Annual Conference on Neural Information Processing Systems. Cambridge, UK:MIT Press, 2004:1601-1608.
[21] SHANNON C E.The mathematical theory of communication[J].Bell System Technical Journal, 1948, 27(3-4):373-423.
[22] MLLER E, SCHIFFER M, SEIDL T.Statistical selection of relevant subspace projections for outlier ranking[C]// Proceedings of the 27th International Conference on Data Engineering. Hannover, Germany:IEEE, 2011:434-445.
[23] 胡彩平,秦小麟.一种基于密度的局部离群点检测算法DLOF[J].计算机研究与发展,2010,47(12):2110-2116. HU Caiping, QIN Xiaolin. A density-based local outlier detecting algorithm[J].Journal of Computer Research and Development, 2010, 47(12):2110-2116.
[24] LAZAREVIC A,KUMAR V.Feature bagging for outlier detection[C]//Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York, USA:ACM, 2005:157-166.
[25] VAIDYA P M.An optimal algorithm for the all-nearest-neighbors problem[C]//27th Annual Symposium Foundations of Computer Science. Toronto,Canada:IEEE,1986:117-122.
[1] YAO Yu, FENG Jian, ZHANG Huaguang, HAN Kezhen. Weighted hyper-ellipsoidal support vector data description with negative samples for outlier detection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(5): 195-202.
[2] WU Jianping, JIANG Bin, LIU Jianwei. Fault diagnosis of asynchronous motor based on wavelet packet entropy and wavelet neural network [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(5): 223-228.
[3] LIN Yaojin, ZHANG Jia, LIN Menglei, WANG Juan. A method of collaborative filtering recommendation based on fuzzy information entropy [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 13-20.
[4] JING Yunge, LI Tianrui. An incremental approach for reduction based on knowledge granularity [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(1): 1-9.
[5] ZHU Quan-yin1, YAN Yun-yang1, ZHOU Pei1, GU Tian-feng2. Price forecasting model based on linear backfilling and adaptive sliding windows [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(5): 53-58.
[6] LI Hui1,2, HU Yun1,3, LI Cun-hua1. The technique of gas disaster information feature extraction based on rough set theory [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(5): 91-95.
[7] HU Yun1,2, LI Hui1, SHI Jun1, CAI Hong1. An outlier detection algorithm based on attribute reduction and relative entropy [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(6): 31-36.
[8] GUO Jian-yi1,2, LEI Chun-ya1, YU Zheng-tao1,2, SU Lei1,2, ZHAO Jun1, TIAN Wei1. A semi-supervised learning method based on information entropy to extract the domain entity relation [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 7-12.
[9] HUANG Tian-qiang1,2, CHEN Zhi-wen1. Digital video forgeries detection based on bidirectional motion vectors [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 13-19.
[10] ZHAI Jun-hai, GAO Yuan-yuan, WANG Xi-zhao, CHEN Jun-fen. An attribute reduction algorithm based on partition subset [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 24-28.
[11] YANG Jin-wei, WANG Li-zhen*, CHEN Hong-mei, ZHAO Li-hong. Distance-based outlier detection over uncertain data [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 34-37.
[12] LEI Xiao-feng1, ZHUANG Wei1, CHENG Yu1, DING Shi-fei1, XIE Kun-qing2. OPHCLUS:An order-preserving based hierarchical clustering algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 48-55.
[13] SUN Jing-yu, YU Xue-li, CHEN Jun-jie, LI Xian-hua. Sampled peculiarity factor and its application in anomaly detection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 56-59.
[14] CHEN Bin, CHEN Song-Can, PAN Zhi-Song, LI Bin. Survey of outlier detection technologies [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(6): 13-23.
[15] LUO Yu-Pan, SHANG Lin. Detect outliers in time series data with multi-granule periodic patterns [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 11-15.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!