山东大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (3): 7-14.doi: 10.6040/j.issn.1672-3961.1.2014.182
辛丽玲, 何威, 于剑, 贾彩燕
XIN Liling, HE Wei, YU Jian, JIA Caiyan
摘要: 在MMOD算法的基础上提出一种改进算法IMMOD,该算法考虑各属性的差异对离群点检测的影响,通过引入信息熵来确定属性的重要程度以量化权重向量,进而采用加权距离计算各数据点相异性。此外,在处理高维数据时,确定次要属性后采用属性约简方法,在保证时间效率的同时提高检测精度。理论分析和试验结果表明IMMOD算法参数少、检测准确性高,能很好地适用于高维数据,整体性能优于同类算法。
中图分类号:
[1] HAWKINS D M.Identification of outliers[M].London:Chapman and Hall, 1980:1-1. [2] KLEINBAUM D, KUPPER L, MULLER K, et al. Applied regression analysis and other multivariable methods[M]. 3 ed. California:Duxbury Press, 1998:228-237. [3] CHEN Yixin, DANG Xin.Outlier detection with the kernelized spatial depth function[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2):288-305. [4] KNORR E M, NG R T.Algorithms for mining distance-based outliers in large datasets[C]//Proceedings of the 24rd International Conference on Very Large Data Bases.San Francisco, USA:Morgan Kaufmann, 1998:392-403. [5] ANGIULLI F, BASTA S, PIZZUTI C. Distance-based detection and prediction of outliers[J].IEEE Transactions on Knowledge and Data Engineering, 2006, 18(2):145-160. [6] BREITENBACH M,GRUDIC G Z.Clustering through ranking on manifolds[C]//In Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany:ACM, 2005:73-80. [7] HE Zengyou, XU Xiaofei, DENG Shengchun.Discovering cluster based local outliers[J]. Pattern Recognition Letters, 2003, 24:1641-1650. [8] GUPTA K K, NATH B, KOTAGIRI R. Layered approach using conditional random fields for intrusion detection[J].IEEE Transactions on Dependable and Secure Computing, 2010, 7(1):35-49. [9] BREUNIG M M, KRIEGEL H P, NG R T, et al.Identifying density-based local outliers[C]// In Proceedings of the 2000 ACM SIGMOD international conference on Management of data.New York, USA:ACM, 2000:93-104. [10] LATECKI L, LAZAREVIC A, POKRAJAC D. Outlier detection with kernel density functions[C]//Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition.Leipzig, Germany:Springer, 2007:61-75. [11] 何威.基于数据密度估计的聚类与离群点检测研究[D].北京:北京交通大学,2011. HE Wei. Data density based clustering and outlier detection[D].Beijing:Beijing Jiaotong University, 2011. [12] MARKUS Goldstein. An expectation maximization based local outlier detection algorithm[C]//Proceedings of the 21st International Conference on Pattern Recognition. Tsukuba, Japan:IEEE, 2012:2282-2285. [13] RALF O.A fuzzy vector valued KNN-algorithm for automatic outlier detection[J].Applied Soft Computing, 2009, 9:1263-1272. [14] PARZEN E.On the estimate of a probability density function and mode[J].Annals Mathematical Statistics, 1962, 33(3):1065-1076. [15] TERRELL G R, SCOTT D W.Variable kernel density estimation[J].The Annals of Statistics, 1992, 20(3):1236-1265. [16] KRIEGEL H P, KRGER P, SCHUBERT E, et al.Local outlier probabilities[C]//Proceeding of the 18th ACM Conference on Information and Knowledge Management.Hong Kong, China:ACM, 2009:1649-1652. [17] YAGER R R,FILEV D P.Approximate clustering via the mountain method[J].IEEE Transactions on Systems,Man and Cybernetics, 1994, 24(8):1279-1284. [18] CHIU S L.Fuzzy model identification based on cluster estimation[J].Journal of Intelligent Fuzzy Systems, 1994, 2:267-278. [19] YANG M S, WU K L.A similarity-based robust clustering method[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(4):434-448. [20] MANOR L Z, PERONA P. Self-tuning spectral clustering[C]// Proceedings of the 18th Annual Conference on Neural Information Processing Systems. Cambridge, UK:MIT Press, 2004:1601-1608. [21] SHANNON C E.The mathematical theory of communication[J].Bell System Technical Journal, 1948, 27(3-4):373-423. [22] MLLER E, SCHIFFER M, SEIDL T.Statistical selection of relevant subspace projections for outlier ranking[C]// Proceedings of the 27th International Conference on Data Engineering. Hannover, Germany:IEEE, 2011:434-445. [23] 胡彩平,秦小麟.一种基于密度的局部离群点检测算法DLOF[J].计算机研究与发展,2010,47(12):2110-2116. HU Caiping, QIN Xiaolin. A density-based local outlier detecting algorithm[J].Journal of Computer Research and Development, 2010, 47(12):2110-2116. [24] LAZAREVIC A,KUMAR V.Feature bagging for outlier detection[C]//Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York, USA:ACM, 2005:157-166. [25] VAIDYA P M.An optimal algorithm for the all-nearest-neighbors problem[C]//27th Annual Symposium Foundations of Computer Science. Toronto,Canada:IEEE,1986:117-122. |
[1] | 吴建萍,姜斌,刘剑慰. 基于小波包信息熵和小波神经网络的异步电机故障诊断[J]. 山东大学学报(工学版), 2017, 47(5): 223-228. |
[2] | 林耀进,张佳,林梦雷,王娟. 一种基于模糊信息熵的协同过滤推荐方法[J]. 山东大学学报(工学版), 2016, 46(5): 13-20. |
[3] | 张佳,林耀进,林梦雷,刘景华,李慧宗. 基于信息熵的协同过滤算法[J]. 山东大学学报(工学版), 2016, 46(2): 43-50. |
[4] | 景运革,李天瑞. 基于知识粒度的增量约简算法[J]. 山东大学学报(工学版), 2016, 46(1): 1-9. |
[5] | 潘盼1,王熙照2,翟俊海2. 基于有序决策树的改进归纳算法[J]. 山东大学学报(工学版), 2014, 44(1): 41-44. |
[6] | 施珺,朱敏. 一种基于灰色系统和支持向量机的预测优化模型[J]. 山东大学学报(工学版), 2012, 42(5): 7-11. |
[7] | 李慧1,2,胡云1,3,李存华1. 基于粗糙集理论的瓦斯灾害信息特征提取技术[J]. 山东大学学报(工学版), 2012, 42(5): 91-95. |
[8] | 胡云1,2,李慧1,施珺1,蔡虹1. 基于属性约简和相对熵的离群点检测算法[J]. 山东大学学报(工学版), 2011, 41(6): 31-36. |
[9] | 郭剑毅1,2,雷春雅1,余正涛1,2,苏磊1,2,赵君1,田维1. 基于信息熵的半监督领域实体关系抽取研究[J]. 山东大学学报(工学版), 2011, 41(4): 7-12. |
[10] | 黄添强1,2,陈智文1. 基于双向运动矢量的数字视频篡改鉴定[J]. 山东大学学报(工学版), 2011, 41(4): 13-19. |
[11] | 翟俊海,高原原,王熙照,陈俊芬. 基于划分子集的属性约简算法[J]. 山东大学学报(工学版), 2011, 41(4): 24-28. |
[12] | 雷小锋1,庄伟1,程宇1,丁世飞1,谢昆青2. OPHCLUS:基于序关系保持的层次聚类算法[J]. 山东大学学报(工学版), 2010, 40(5): 48-55. |
[13] | 罗玉盘 商琳. 基于多粒度周期模式的时序离群点检测算法[J]. 山东大学学报(工学版), 2009, 39(3): 11-15. |
[14] | 管延勇,胡海清,王洪凯 . α-粗糙集模型中的不可分辨关系[J]. 山东大学学报(工学版), 2006, 36(1): 75-80 . |
|