您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2018, Vol. 48 ›› Issue (6): 89-94.doi: 10.6040/j.issn.1672-3961.0.2018.199

• 机器学习与数据挖掘 • 上一篇    下一篇

改进滴水算法的黏连字符分割方法

宋琦悦(),穆学文,程欢   

  1. 西安电子科技大学数学与统计学院, 陕西 西安 710071
  • 收稿日期:2018-05-31 出版日期:2018-12-20 发布日期:2018-12-26
  • 作者简介:宋琦悦(1993—),女,陕西榆林人,硕士研究生,主要研究方向为机器学习与计算机视觉等. E-mail:asdrttmn@163.com
  • 基金资助:
    陕西省自然科学基金(2015JM1031);中央高校基本科研业务费(JB150713)

Segmentation of connected characters based on improved drop-fall algorithm

Qiyue SONG(),Xuewen MU,Huan CHENG   

  1. School of Mathematics and Statistic, Xidian University, Xi′an 710071, Shaanxi, China
  • Received:2018-05-31 Online:2018-12-20 Published:2018-12-26
  • Supported by:
    陕西省自然科学基金(2015JM1031);中央高校基本科研业务费(JB150713)

摘要:

针对传统字符图像分割方法对笔画重叠黏连字符分割存在的不足,提出基于改进滴水算法来解决共用笔画黏连字符的分割。算法过程包括:利用Zhang-Sueng并行细化算法与自组织映射神经网络(self-organizing maps, SOM)聚类确定滴水算法初始点;定义新的水滴滴落路径。水滴从初始滴落点出发沿着字符重叠笔画的骨架滴落,水滴到达骨架末端时将继续沿着骨架倾斜方向滴落,直到遇到字符黏连部分的边界,水滴滚动的轨迹即为黏连字符切分路径。用改进滴水算法分割黏连字符避免了传统滴水算法初始滴落点定位不准确,导致字符分割断裂问题。对所提算法进行试验,与传统滴水算法和竖直分割算法进行比较,证明改进算法对笔画重叠黏连字符分割效果理想。

关键词: 黏连字符, 字符分割, 滴水算法, Zhang-Sueng并行细化算法, SOM神经网络聚类

Abstract:

As the traditional segmentation methods could not segment connected characters correctly, a segmentation algorithm based on improved drop-fall algorithm was proposed. The algorithm included two steps. Zhang-Sueng's thinning algorithm and the clustering of the connected region via self-organizing maps was used to find the starting drop point of drop-fall algorithm. A new drop path was defined to improve drop-fall algorithm. The water dropped from the starting drop point, along the skeleton of the character overlap stroke, at the end of the overlapped stroke skeleton, then continued dropping along the slant angle direction of the skeleton, until met the boundary of the character connected part. The water drop path was defined as the connected character segmentation path. This method solved the problem of character strokes fracture caused by the traditional drop-fall algorithm. Compared with the traditional drop-fall algorithm and the vertical projection segmentation algorithm, the experimental results showed that it was an ideal method for segmenting connected characters.

Key words: connected character, character segmentation, drop-fall algorithm, Zhang-Sueng's thinning algorithm, SOM-based clustering

中图分类号: 

  • TP391

图1

水滴相邻像素"

图2

滴落规则"

图3

改进滴水算法"

表1

验证码图像二值化"

类型 京东商城 建设银行
验证码图像
二值化图像

图4

连通域投影直方图"

图5

P1的8领域"

图6

黏连字符图像细化"

图7

候选分割点确定"

图8

SOM神经网络拓扑结构"

图9

黏连字符拓扑结构"

图10

改进滴水算法初始点与滴落路径"

图11

分割对比示意图"

表2

验证码样本"

网站 投影黏连 单点黏连 重叠黏连 多点黏连 复杂黏连 其他
京东商城
其他

表3

字符分割试验结果"

验证码 样本数 竖直分割 传统滴水算法 改进滴水算法
京东 200 28% 45% 70%
建设银行 200 44% 61% 73%
对比测试 200 13% 75% 87%
1 VON AHN L , BLUM M , LANGFORD J . Telling humans and computers apart automatically[J]. Communications of the ACM, 2004, 47 (2): 56- 60.
doi: 10.1145/966389
2 BURSZTEIN E, MARTIN M, MITCHELL J. Text-based captcha strengths and weaknesses[C]//ACM Conference on Computer and Communications Security. Chicago, USA: ACM, 2011: 125-138.
3 CHEN J , LUO X , GUO Y , et al. A survey on breaking technique of text-based captcha[J]. Security & Communication Networks, 2017, 2017 (1-2): 1- 15.
4 YAN J, AHMAD A S E. A low-cost attack on a microsoft captcha[C]//ACM Conference on Computer and Communications Security. Alexandria, USA: DBLP, 2008: 543-554.
5 HUANG S Y , LEE Y K , BELL G , et al. An efficient segmentation algorithm for captchas with line cluttering and character warping[J]. Multimedia Tools & Applications, 2010, 48 (2): 267- 289.
6 NACHAR R A , INATY E , BONNIN P J , et al. Breaking down captcha using edge corners and fuzzy logic segmentation/recognition technique[J]. Security & Communication Networks, 2016, 8 (18): 3995- 4012.
7 GAO H, WEI W, WANG X, et al. The robustness of hollow captchas[C]//ACM Sigsac Conference on Computer & Communications Security. Berlin, Germany: ACM, 2013: 1075-1086.
8 张闯, 蔺志青, 肖波, 等. 适用于银行票据手写数字串切分的滴水算法[J]. 北京邮电大学学报, 2006, 29 (1): 13- 16.
doi: 10.3969/j.issn.1007-5321.2006.01.003
ZHANG Chuang , LIN Zhiqing , XIAO Bo , et al. Segmentation algorithm for unconstrained handwritten numeral strings in bank check reader system[J]. Journal of Beijing University of Posts and Telecommunications, 2006, 29 (1): 13- 16.
doi: 10.3969/j.issn.1007-5321.2006.01.003
9 李兴国, 高炜. 基于滴水算法的验证码中粘连字符分割方法[J]. 计算机工程与应用, 2014, 50 (1): 163- 166.
doi: 10.3778/j.issn.1002-8331.1208-0310
LI Xingguo , GAO Wei . Segmentation method for merged characters in captcha based on drop fall algorithm[J]. Computer Engineering and Applications, 2014, 50 (1): 163- 166.
doi: 10.3778/j.issn.1002-8331.1208-0310
10 马瑞, 杨静宇. 一种用于手写数字分割的滴水算法的改进[J]. 小型微型计算机系统, 2007, 28 (11): 2110- 2112.
doi: 10.3969/j.issn.1000-1220.2007.11.040
MA Rui , YANG Jingyu . An improved drop-fall aigorithm for handwritten numerals segmentation[J]. Journal of Chinese Computer Systems, 2007, 28 (11): 2110- 2112.
doi: 10.3969/j.issn.1000-1220.2007.11.040
11 WANG Xiujuan , ZHENG Kangfeng , GUO Jun . Inertial and big drop fall algorithm[J]. International Journal of Information Technology, 2006, 12 (4): 39- 48.
12 ZHANG T Y , SUEN C Y . A fast parallel algorithm for thinning digital patterns[J]. Comm Acm, 1984, 27 (3): 236- 239.
doi: 10.1145/357994.358023
13 ARABMAKKI E , KANTARDZIC M . SOM-based partial labeling of imbalanced data stream[J]. Neurocomputing, 2017, 262, 120- 133.
doi: 10.1016/j.neucom.2016.11.088
14 姚金良, 翁璐斌, 王小华. 一种基于连通分量的文本区域定位方法[J]. 模式识别与人工智能, 2012, 25 (2): 325- 331.
doi: 10.3969/j.issn.1003-6059.2012.02.021
YAO Jinliang , WENG Lubin , WANG Xiaohua . A text region method based on connected component[J]. Pattern Recognition and Artificial Intelligence, 2012, 25 (2): 325- 331.
doi: 10.3969/j.issn.1003-6059.2012.02.021
15 AKINDUKO A A , MIRKES E M , GORBAN A N . SOM: stochastic initialization versus principal components[J]. Information Sciences, 2016, 364-365, 213- 221.
doi: 10.1016/j.ins.2015.10.013
16 OTSU N . A threshold selection method from gray-level histograms[J]. IEEE Transactions on Systems Man & Cybernetics, 2007, 9 (1): 62- 66.
17 张学东, 张仁秋, 关云虎, 等. 一种快速的手写体汉字细化算法[J]. 计算机应用与软件, 2009, 26 (11): 17- 18.
doi: 10.3969/j.issn.1000-386X.2009.11.006
ZHANG Xuedong , ZHANG Renqiu , GUAN Yunhu , et al. A fast thinning algorithm for handwritten Chinese[J]. Computer Applications and Software, 2009, 26 (11): 17- 18.
doi: 10.3969/j.issn.1000-386X.2009.11.006
18 张翠芳, 杨国为, 岳明明. Zhang并行细化算法的改进[J]. 信息技术与信息化, 2016, (6): 69- 71.
doi: 10.3969/j.issn.1672-9528.2016.06.017
ZHANG Cuifang , YANG Guowei , YUE Mingming . Improving of Zhang parallel thinning algorithm[J]. Information Technology and Informatization, 2016, (6): 69- 71.
doi: 10.3969/j.issn.1672-9528.2016.06.017
19 HUDSON I L , LEEMAQZ S Y , KIM S W , et al. SOM clustering and modelling of australian railway drivers' sleep, wake, duty profiles[J]. Studies in Computational Intelligence, 2016, 628, 235- 279.
20 ABDELSAMEA M M , GNECCO G , GABER M M . A SOM-based Chan—Vese model for unsupervised image segmentation[J]. Soft Computing, 2017, 21 (8): 1- 21.
[1] 王 枚,王国宏 . 基于伴生与互补颜色特征的车牌字符分割新方法[J]. 山东大学学报(工学版), 2007, 37(1): 31-34 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[2] 孙从征,管从胜,秦敬玉,程川 . 铝合金化学镀镍磷合金结构和性能[J]. 山东大学学报(工学版), 2007, 37(5): 108 -112 .
[3] 王佰伟,曹升乐 . 工业废水治理效果多目标评价方法研究[J]. 山东大学学报(工学版), 2007, 37(3): 89 -92 .
[4] 丑武胜 王朔. 大刚度环境下力反馈主手自适应算法研究[J]. 山东大学学报(工学版), 2010, 40(1): 1 -5 .
[5] 曹刚 董朝阳 黄洁宝 薛禹胜. 应用FACTS装置实现电力系统区间震荡阻尼控制[J]. 山东大学学报(工学版), 2009, 39(3): 31 -36 .
[6] 李贻斌,阮久宏,刘鲁源,宋 锐,荣学文 . 车辆纵向加速度自抗扰控制研究[J]. 山东大学学报(工学版), 2008, 38(4): 1 -04 .
[7] 施来顺,万忠义 . 新型甜菜碱型沥青乳化剂的合成与性能测试[J]. 山东大学学报(工学版), 2008, 38(4): 112 -115 .
[8] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[9] 胡天亮,李鹏,张承瑞,左毅 . 基于VHDL的正交编码脉冲电路解码计数器设计[J]. 山东大学学报(工学版), 2008, 38(3): 10 -13 .
[10] 徐丽丽,季忠,夏继梅 . 同规格货物装箱问题的优化计算[J]. 山东大学学报(工学版), 2008, 38(3): 14 -17 .