山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (6): 21-34.doi: 10.6040/j.issn.1672-3961.0.2024.288
• 机器学习与数据挖掘 • 上一篇
唐杰烽,张佳*,龙锦益
TANG Jiefeng, ZHANG Jia*, LONG Jinyi
摘要: 为解决多标签学习维度灾难及过滤式特征选择方法易陷入局部最优的问题,提出一种基于全局冗余最小(global redundancy minimization, GRM)的快速多标签特征选择方法。通过K-means聚类和互信息计算从原始标签空间和特征空间中筛选出候选标签与候选特征子集;通过全局冗余最小化解决局部最优问题,得到特征冗余最小特征权重,保证输出的特征子集为最佳特征子集;采用集成学习策略增强特征选择稳定性。在14个多标签数据集上的试验结果表明所提方法相较其他方法在各分类指标上均有较优表现。
中图分类号:
| [1] ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837. [2] 马坤, 刘筱云, 李乐平, 等. 用于意图识别的自适应多标签信息学习模型[J]. 山东大学学报(工学版), 2024, 54(1): 45-51. MA Kun, LIU Xiaoyun, LI Leping, et al. Adaptive label information learning for intention detection [J]. Journal of Shandong University(Engineering Science), 2024, 54(1): 45-51. [3] 李云, 卢志翔, 刘姝伊, 等. 基于深度多模态关联学习的短视频多标签分类研究[J]. 数据分析与知识发现, 2024, 8(7): 77-88. LI Yun, LU Zhixiang, LIU Shuyi, et al. Research on micro-video multi-label classification based on deep multimodal association learning[J]. Data Analysis and Knowledge Discovery, 2024, 8(7): 77-88. [4] 张建贺, 姜晓燕. 结合双路网络和多标签分类的弱监督行人搜索[J]. 计算机工程与应用, 2023, 59(9): 159-166. ZHANG Jianhe, JIANG Xiaoyan. Weakly supervised person search combining dual-path network and multi-label classification[J]. Computer Engineering and Applications, 2023, 59(9): 159-166. [5] 周慧颖,汪廷华, 张代俐. 多标签特征选择研究进展[J]. 计算机工程与应用, 2022, 58(15): 52-67. ZHOU Huiying, WANG Tinghua, ZHANG Daili. Research progress of multi-label feature selection[J]. Computer Engineering and Applications, 2022, 58(15): 52-67. [6] 李永豪, 胡亮, 高万夫. 基于稀疏系数矩阵重构的多标记特征选择[J]. 计算机学报, 2022, 45(9): 1827-1841. LI Yonghao, HU Liang, GAO Wanfu. Multi-label feature selection based on sparse coefficient matrix reconstruction[J]. Chinese Journal of Computers,2022, 45(9):1827-1841. [7] 胡军, 王海峰. 基于加权信息粒化的多标记数据特征选择算法[J]. 智能系统学报, 2023, 18(3): 619-628. HU Jun, WANG Haifeng. Feature selection algorithm of multi-labeled data based on weighted information granulation[J]. CAAI Transactions on Intelligent Systems, 2023, 18(3): 619-628. [8] LEE J, KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognition Letters, 2013, 34(3): 349-357. [9] LIN Y J, HU Q H, LIU J H, et al. Multi-label feature selection based on max-dependency and min-redundancy[J]. Neurocomputing, 2015, 168: 92-103. [10] LEE J, KIM D W. Fast multi-label feature selection based on information-theoretic feature ranking[J]. Pattern Recognition, 2015, 48(9): 2761-2771. [11] SUN Z Q, ZHANG J, DAI L, et al. Mutual information based multi-label feature selection via constrained convex optimization[J]. Neurocomputing, 2019, 329: 447-456. [12] ZHANG J, LIN Y D, JIANG M, et al. Multi-label feature selection via global relevance and redundancy optimization[C] //Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. Yokohama, Japan: IEEE, 2020: 2512-2518. [13] DAI J H, HUANG W Y, ZHANG C C, et al. Multi-label feature selection by strongly relevant label gain and label mutual aid[J]. Pattern Recognition, 2024, 145: 109945. [14] BUGATA P, DROTAR P. On some aspects of minimum redundancy maximum relevance feature selection[J]. Science China Information Sciences, 2020, 63(1): 112103. [15] HASHEMI A, DOWLATSHAHI M B, NAZAMABADI-POUR H. Minimum redundancy maximum relevance ensemble feature selection: a bi-objective Pareto-based approach[J]. Journal of Soft Computing and Information Technology, 2023, 12(1): 20-28. [16] ZHOU H F, WANG X Q, ZHU R R. Feature selection based on mutual information with correlation coefficient[J]. Applied Intelligence, 2022, 52(5): 5457-5474. [17] WANG X J, ZHOU Y C. Multi-label feature selection with conditional mutual information[J]. Computational Intelligence and Neuroscience, 2022(1): 9243893. [18] ZHANG P, LIU G X, SONG J Z. MFSJMI: multi-label feature selection considering join mutual information and interaction weight[J]. Pattern Recognition, 2023, 138: 109378. [19] 张俐, 王枞. 基于最大相关最小冗余联合互信息的多标签特征选择算法[J]. 通信学报, 2018, 39(5): 111-122. ZHANG Li, WANG Cong. Multi-label feature selection algorithm based on joint mutual information of max-relevance and min-redundancy[J]. Journal on Communications, 2018, 39(5): 111-122. [20] RAKESH D K, JANA P K. A general framework for class label specific mutual information feature selection method[J]. IEEE Transactions on Information Theory, 2022, 68(12): 7996-8014. [21] JIAN L, LI J, SHU K, et al. Multi-label informed feature selection[C] //Proceedings of the Twenty-Fifth international Joint Conference on Artificial Intelligence. New York, USA: AAAI, 2016: 1627-1633. [22] PENG H C, LONG F H, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. [23] SHIRZAD M B, KEYVANPOUR M R. A feature selection method based on minimum redundancy maximum relevance for learning to rank[C] //2015 AI & Robotics(IRANOPEN). Qazvin, Iran: IEEE, 2015: 1-5. [24] AGHAEIPOOR F, JAVIDI M M. A hybrid fuzzy feature selection algorithm for high-dimensional regression problems: an mRMR-based framework[J]. Expert Systems with Applications, 2020, 162: 113859. [25] 徐洪峰, 孙振强. 多标签学习中基于互信息的快速特征选择方法[J]. 计算机应用, 2019, 39(10): 2815-2821. XU Hongfeng, SUN Zhenqiang. Fast feature selection method based on mutual information in multi-label learning[J]. Journal of Computer Applications, 2019, 39(10): 2815-2821. [26] BERTSEKAS D P. Constrained optimization and lagrange multiplier methods[M]. Amsterdam: Elsevier, 1982. [27] WU X Z, ZHOU Z H. A unified view of multi-label performance measures[C] //Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM, 2017: 3780-3788. [28] MA J H, CHIU B C Y, CHOW T W S. Multilabel classification with group-based mapping: a framework with local feature selection and local label correlation[J]. IEEE Transactions on Cybernetics, 2022, 52(6): 4596-4610. [29] HUANG R, JIANG W D, SUN G L. Manifold-based constraint Laplacian score for multi-label feature selection[J]. Pattern Recognition Letters, 2018, 112: 346-352. [30] ZHANG Y, HUO W, TANG J. Multi-label feature selection via latent representation learning and dynamic graph constraints[J]. Pattern Recognition, 2024, 151: 110411. [31] SUN Z Z, XIE H, LIU J H, et al. Multi-label feature selection via adaptive dual-graph optimization[J]. Expert Systems with Applications, 2024, 243: 122884. [32] ZHANG M L, ZHOU Z H. ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048. [33] FRIEDMAN M. A comparison of alternative tests of significance for the problem of m rankings[J]. The Annals of Mathematical Statistics, 1940, 11(1): 86-92. [34] SHESKIN D J. Handbook of parametric and nonpara-metric statistical procedures, fifth edition[M]. New York: Chapman and Hall/CRC, 2020. [35] DEMŠAR J. Statistical comparisons of classifiers over multiple data sets[J]. The Journal of Machine Learning Research, 2006, 7: 1-30. |
| [1] | 吴正健,吾尔尼沙·买买提,杨耀威,阿力木江·艾沙,库尔班·吾布力. 基于DRCoALTP的印刷体文档图像多文种识别方法[J]. 山东大学学报 (工学版), 2025, 55(1): 51-57. |
| [2] | 刘财辉,周琪,叶晓文. 一种基于改进ReliefF算法的入侵检测模型[J]. 山东大学学报 (工学版), 2023, 53(2): 1-10. |
| [3] | 许传臻,袭肖明,李维翠,孙仪,杨璐. 基于自适应多分辨率特征学习的CNV分型网络[J]. 山东大学学报 (工学版), 2022, 52(4): 69-75. |
| [4] | 袁高腾,周晓峰,郭宏乐. 基于特征选择算法的ECG信号分类[J]. 山东大学学报 (工学版), 2022, 52(4): 38-44. |
| [5] | 彭岩,冯婷婷,王洁. 基于集成学习的O3的质量浓度预测模型[J]. 山东大学学报 (工学版), 2020, 50(4): 1-7. |
| [6] | 汪嘉晨, 唐向红, 陆见光. 轴承故障诊断中特征选取技术[J]. 山东大学学报 (工学版), 2019, 49(2): 80-87. |
| [7] | 陈红,杨小飞,万青,马盈仓. 基于相关熵和流形学习的多标签特征选择算法[J]. 山东大学学报 (工学版), 2018, 48(6): 27-36. |
| [8] | 牟廉明. 自适应特征选择加权k子凸包分类[J]. 山东大学学报 (工学版), 2018, 48(5): 32-37. |
| [9] | 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42. |
| [10] | 方昊,李云. 基于多次随机欠采样和POSS方法的软件缺陷检测[J]. 山东大学学报(工学版), 2017, 47(1): 15-21. |
| [11] | 莫小勇,潘志松,邱俊洋,余亚军,蒋铭初. 基于在线特征选择的网络流异常检测[J]. 山东大学学报(工学版), 2016, 46(4): 21-27. |
| [12] | 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014, 44(6): 15-18. |
| [13] | 魏小敏,徐彬,关佶红. 基于递归特征消除法的蛋白质能量热点预测[J]. 山东大学学报(工学版), 2014, 44(2): 12-20. |
| [14] | 潘冬寅,朱发,徐昇,业宁*. 结肠癌基因表达谱的特征选取研究[J]. 山东大学学报(工学版), 2012, 42(2): 23-29. |
| [15] | 李霞1,王连喜2,蒋盛益1. 面向不平衡问题的集成特征选择[J]. 山东大学学报(工学版), 2011, 41(3): 7-11. |
|
||