山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (1): 51-57.doi: 10.6040/j.issn.1672-3961.0.2023.288
• 机器学习与数据挖掘 • 上一篇
吴正健1,2,吾尔尼沙·买买提1,杨耀威1,阿力木江·艾沙1,库尔班·吾布力1*
WU Zhengjian1,2, Hornisa Mamat1, YANG Yaowei1, Alimjan Aysa1, Kurban Ubul1*
摘要: 针对视觉结构类似导致的文种相似性问题,基于局部三值模式的相邻共生矩阵(co-occurrence of adjacent local ternary patterns, CoALTP)提出一种具有判别性和鲁棒性的局部三值模式的相邻共生矩阵(discriminant and robust co-occurrence of adjacent local ternary patterns, DRCoALTP)方法,用于获取图像纹理。计算文档图像的相邻稀疏局部三值模式(adjacent sparse local ternary patterns, ASLTP),将采样点数量设定为8,以便获得详细的局部纹理,设计出一种基于自适应中值滤波思想的半自适应阈值方法,用于提取灰度图像中心像素周边对角邻域像素的编码值。ASLTP在邻域像素位置存放稀疏局部三值模式(local ternary patterns, LTP)的值,提取灰度共生矩阵(gray-level co-occurrence matrix, GLCM),从4个方向统计使用ASLTP后灰度图像像素之间的频率关系。该算法在阿拉伯文、俄文、简体中文、哈萨克文、藏文、蒙古文、土耳其文、维吾尔文、英文、吉尔吉斯斯坦文和塔吉克斯坦文11个文种的自建印刷体文档图像数据集中验证。试验结果表明,相较于基线和先进的纹理方法,改进后的方法更具判别性,平均识别准确率为99.14%。为改善CoALTP方法可能产生低效分类特征的问题,提出半自适应阈值方法,有效提高识别率并抑制噪声。此外,针对算法产生的高维特征,采用基于均方差的特征选择方法,通过支持向量机(support vector machine, SVM)分类器特征选择后,识别速度提高284%,对11个文种的平均识别准确率达99.44%。
中图分类号:
[1] WANG G, JIN Y, LIU L, et al. Identification of East Asian languages based on multi-feature fusion[J]. Computer Science, 2013, 40(1): 273-276. [2] OJALA T, PIETIKAINEN M,HARWOOD D. A comparative study of texture measures with classification based on feature distributions[J]. Pattern Recognition, 1996, 29: 51-59. [3] NANNI L, BRAHNAM S, ALESSANDRA L. A simple method for improving local binary patterns by considering non-uniform patterns[J]. Pattern Recognition, 2012, 45: 3844-3852. [4] QIAN X, HUA X, CHEN P, et al. PLBP: an effective local binary patterns texture descriptor with pyramid representation[J]. Pattern Recognition, 2011, 44: 2502-2515. [5] GUO Z, ZHANG L, ZHANG D. Rotation invariant texture classification using LBP variance with global matching[J]. Pattern Recognition, 2010, 43: 706-716. [6] GUO Z, ZHANG L, ZHANG D. A completed modeling of local binary pattern operator for texture classification[J]. IEEE Transactions on Image Processing, 2010, 19(6): 1657-1663. [7] GUO Y, ZHAO G, PIETIKAINEN M. Discriminative features for texture description[J]. Pattern Recognition, 2012, 45: 3834-3843. [8] MUTHULAKSHMI M, KAVITHA G. An integrated multi-objective whale optimized support vector machine and local texture feature model for severity prediction in subjects with cardiovascular disorder[J]. International Journal of Computer Assisted Radiology and Surgery, 2020, 15: 601-615. [9] LORIS N, ALESSANDRA L, SHERYL B. Survey on LBP based texture descriptors for image classification[J]. Expert Systems with Applications, 2012, 39(3): 3634-3641. [10] SHI M, HEALEY G. Hyperspectral texture recognition using a multiscale opponent representation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2003, 41(5): 1090-1095. [11] REN J, JIANG X, YUAN J. Noise-resistant local binary pattern with an embedded error-correction mechanism[J]. IEEE Transactions on Image Processing, 2013, 22(10): 4049-4060. [12] SATPATHY A, JIANG X, ENG H. LBP-based edge-texture features for object recognition[J]. IEEE Transactions on Image Processing, 2014, 23(5): 1953-1964. [13] ZHANG B, GAO Y, ZHAO S, et al. Local derivative pattern versus local binary pattern: face recognition with higher-order local pattern descriptor[J]. IEEE Transactions on Image Processing, 2010, 19(2): 533-544. [14] TAN X, TRIGGS B. Enhanced local texture feature sets for face recognition under difficult lighting conditions[J]. IEEE Transactions on Image Processing, 2010, 19(6): 1635-1650. [15] PAPAKOSTAS G A, KOULOURIOTIS D E, KARAKASIS E G, et al. Moment based local binary patterns: a novel descriptor for invariant pattern recognition applications[J]. Neurocomputing, 2013, 99: 358-371. [16] HIREMATH P S, SHIVASHANKAR S. Wavelet based co-occurrence histogram features for texture classification with an application to script identification in a document image[J]. Pattern Recognition Letters, 2008, 29(9): 1182-1189. [17] HAN X, AYSA A, MAMAT H, et al. Script identification of central Asia based on fused texture features[C] //Proceedings of the 2018 24th International Conference on Pattern Recognition(ICPR). Beijing, China: IEEE, 2018: 3675-3680. [18] RAJPUT G G, UMMAPURE S B. Script identification from handwritten document images using LBP technique at block level[C] //Proceedings of the 2019 International Conference on Data Science and Communication(IconDSC). Bangalore, India: IEEE, 2019: 8816944. [19] HARALICK R M, SHANMUNGAM K, DINSTEIN I. Textural features of image classification[J]. IEEE Transactions on Systems, Man and Cybernatics, 1973, 3: 610-621. [20] SINGH P K, DALAL S K, SARKAR R, et al. Page-level script identification from multi-script handwritten documents[C] //Proceedings of the 2015 Third Inter-national Conference on Computer, Communication, Control and Information Technology(C3IT). Hooghly, India: IEEE, 2015: 7060113. [21] NAGHASHI V. Co-occurrence of adjacent sparse local ternary patterns: a feature descriptor for texture and face image retrieval[J]. Optik, 2018, 157: 877-889. [22] TIAN S, BHATTACHARYA U, LU S, et al. Multilingual scene character recognition with co-occurrence of histogram of oriented gradients[J]. Pattern Recognition, 2016, 51: 125-134. [23] NANNI L, BRAHNAM S, LUMINI A. Selecting the best performing rotation invariant patterns in localbinary/ternary patterns[C] //Proceedings of the 2010 Inter-national Conference on Image Processing, Computer Vision, and Pattern Recognition(IPCV'10). Las Vegas, USA: IEEE, 2010: 369-375. [24] LI S, MUTELIPU M, MAMAT H, et al. Script identification of multi-script document images based on discrete curvelet transform[J]. Computer Engineering and Design, 2019, 40(5): 1376-1382. |
[1] | 刘财辉,周琪,叶晓文. 一种基于改进ReliefF算法的入侵检测模型[J]. 山东大学学报 (工学版), 2023, 53(2): 1-10. |
[2] | 许传臻,袭肖明,李维翠,孙仪,杨璐. 基于自适应多分辨率特征学习的CNV分型网络[J]. 山东大学学报 (工学版), 2022, 52(4): 69-75. |
[3] | 袁高腾,周晓峰,郭宏乐. 基于特征选择算法的ECG信号分类[J]. 山东大学学报 (工学版), 2022, 52(4): 38-44. |
[4] | 吴正健,木特力甫·马木提,吾尔尼沙·买买提,阿力木江·艾沙,库尔班·吾布力. 基于LTP和HOG纹理特征融合的中亚文档图像文种识别[J]. 山东大学学报 (工学版), 2021, 51(2): 115-121. |
[5] | 彭岩,冯婷婷,王洁. 基于集成学习的O3的质量浓度预测模型[J]. 山东大学学报 (工学版), 2020, 50(4): 1-7. |
[6] | 汪嘉晨, 唐向红, 陆见光. 轴承故障诊断中特征选取技术[J]. 山东大学学报 (工学版), 2019, 49(2): 80-87. |
[7] | 陈红,杨小飞,万青,马盈仓. 基于相关熵和流形学习的多标签特征选择算法[J]. 山东大学学报 (工学版), 2018, 48(6): 27-36. |
[8] | 牟廉明. 自适应特征选择加权k子凸包分类[J]. 山东大学学报 (工学版), 2018, 48(5): 32-37. |
[9] | 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42. |
[10] | 方昊,李云. 基于多次随机欠采样和POSS方法的软件缺陷检测[J]. 山东大学学报(工学版), 2017, 47(1): 15-21. |
[11] | 莫小勇,潘志松,邱俊洋,余亚军,蒋铭初. 基于在线特征选择的网络流异常检测[J]. 山东大学学报(工学版), 2016, 46(4): 21-27. |
[12] | 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014, 44(6): 15-18. |
[13] | 魏小敏,徐彬,关佶红. 基于递归特征消除法的蛋白质能量热点预测[J]. 山东大学学报(工学版), 2014, 44(2): 12-20. |
[14] | 潘冬寅,朱发,徐昇,业宁*. 结肠癌基因表达谱的特征选取研究[J]. 山东大学学报(工学版), 2012, 42(2): 23-29. |
[15] | 李霞1,王连喜2,蒋盛益1. 面向不平衡问题的集成特征选择[J]. 山东大学学报(工学版), 2011, 41(3): 7-11. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 2
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 30
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Cited |
|
|||||||||||||||||||||||||||||||||||||||||||||
Shared | ||||||||||||||||||||||||||||||||||||||||||||||
Discussed |
|