Journal of Shandong University(Engineering Science) ›› 2018, Vol. 48 ›› Issue (6): 89-94, 108.doi: 10.6040/j.issn.1672-3961.0.2018.199

• Machine Learning & Data Mining • Previous Articles     Next Articles

Segmentation of connected characters based on improved drop-fall algorithm

Qiyue SONG(),Xuewen MU,Huan CHENG   

  1. School of Mathematics and Statistic, Xidian University, Xi′an 710071, Shaanxi, China
  • Received:2018-05-31 Online:2018-12-20 Published:2018-12-26
  • Supported by:
    陕西省自然科学基金(2015JM1031);中央高校基本科研业务费(JB150713)

Abstract:

As the traditional segmentation methods could not segment connected characters correctly, a segmentation algorithm based on improved drop-fall algorithm was proposed. The algorithm included two steps. Zhang-Sueng's thinning algorithm and the clustering of the connected region via self-organizing maps was used to find the starting drop point of drop-fall algorithm. A new drop path was defined to improve drop-fall algorithm. The water dropped from the starting drop point, along the skeleton of the character overlap stroke, at the end of the overlapped stroke skeleton, then continued dropping along the slant angle direction of the skeleton, until met the boundary of the character connected part. The water drop path was defined as the connected character segmentation path. This method solved the problem of character strokes fracture caused by the traditional drop-fall algorithm. Compared with the traditional drop-fall algorithm and the vertical projection segmentation algorithm, the experimental results showed that it was an ideal method for segmenting connected characters.

Key words: connected character, character segmentation, drop-fall algorithm, Zhang-Sueng's thinning algorithm, SOM-based clustering

CLC Number: 

  • TP391

Fig.1

Drop point adjacent pixels"

Fig.2

Movement rules for drop-fall algorithm"

Fig.3

Improved drop-fall algorithm"

Table 1

CAPTCHA image binarization"

类型 京东商城 建设银行
验证码图像
二值化图像

Fig.4

The linked domain vertical projection histogram"

Fig.5

8-Neighbor of P1"

Fig.6

Connected character image thinning"

Fig.7

Determine candidate segmentation points"

Fig.8

Self-Organizing Maps topologic structure"

Fig.9

Connected characters topologic structure"

Fig.10

Starting point and drop path of the improved drop-fall algorithm"

Fig.11

Comparison of two segmentation methods"

Table 2

Samples of text-based CAPTCHA"

网站 投影黏连 单点黏连 重叠黏连 多点黏连 复杂黏连 其他
京东商城
其他

Table 3

Segmentation experimental results"

验证码 样本数 竖直分割 传统滴水算法 改进滴水算法
京东 200 28% 45% 70%
建设银行 200 44% 61% 73%
对比测试 200 13% 75% 87%
1 VON AHN L , BLUM M , LANGFORD J . Telling humans and computers apart automatically[J]. Communications of the ACM, 2004, 47 (2): 56- 60.
doi: 10.1145/966389
2 BURSZTEIN E, MARTIN M, MITCHELL J. Text-based captcha strengths and weaknesses[C]//ACM Conference on Computer and Communications Security. Chicago, USA: ACM, 2011: 125-138.
3 CHEN J , LUO X , GUO Y , et al. A survey on breaking technique of text-based captcha[J]. Security & Communication Networks, 2017, 2017 (1-2): 1- 15.
4 YAN J, AHMAD A S E. A low-cost attack on a microsoft captcha[C]//ACM Conference on Computer and Communications Security. Alexandria, USA: DBLP, 2008: 543-554.
5 HUANG S Y , LEE Y K , BELL G , et al. An efficient segmentation algorithm for captchas with line cluttering and character warping[J]. Multimedia Tools & Applications, 2010, 48 (2): 267- 289.
6 NACHAR R A , INATY E , BONNIN P J , et al. Breaking down captcha using edge corners and fuzzy logic segmentation/recognition technique[J]. Security & Communication Networks, 2016, 8 (18): 3995- 4012.
7 GAO H, WEI W, WANG X, et al. The robustness of hollow captchas[C]//ACM Sigsac Conference on Computer & Communications Security. Berlin, Germany: ACM, 2013: 1075-1086.
8 张闯, 蔺志青, 肖波, 等. 适用于银行票据手写数字串切分的滴水算法[J]. 北京邮电大学学报, 2006, 29 (1): 13- 16.
doi: 10.3969/j.issn.1007-5321.2006.01.003
ZHANG Chuang , LIN Zhiqing , XIAO Bo , et al. Segmentation algorithm for unconstrained handwritten numeral strings in bank check reader system[J]. Journal of Beijing University of Posts and Telecommunications, 2006, 29 (1): 13- 16.
doi: 10.3969/j.issn.1007-5321.2006.01.003
9 李兴国, 高炜. 基于滴水算法的验证码中粘连字符分割方法[J]. 计算机工程与应用, 2014, 50 (1): 163- 166.
doi: 10.3778/j.issn.1002-8331.1208-0310
LI Xingguo , GAO Wei . Segmentation method for merged characters in captcha based on drop fall algorithm[J]. Computer Engineering and Applications, 2014, 50 (1): 163- 166.
doi: 10.3778/j.issn.1002-8331.1208-0310
10 马瑞, 杨静宇. 一种用于手写数字分割的滴水算法的改进[J]. 小型微型计算机系统, 2007, 28 (11): 2110- 2112.
doi: 10.3969/j.issn.1000-1220.2007.11.040
MA Rui , YANG Jingyu . An improved drop-fall aigorithm for handwritten numerals segmentation[J]. Journal of Chinese Computer Systems, 2007, 28 (11): 2110- 2112.
doi: 10.3969/j.issn.1000-1220.2007.11.040
11 WANG Xiujuan , ZHENG Kangfeng , GUO Jun . Inertial and big drop fall algorithm[J]. International Journal of Information Technology, 2006, 12 (4): 39- 48.
12 ZHANG T Y , SUEN C Y . A fast parallel algorithm for thinning digital patterns[J]. Comm Acm, 1984, 27 (3): 236- 239.
doi: 10.1145/357994.358023
13 ARABMAKKI E , KANTARDZIC M . SOM-based partial labeling of imbalanced data stream[J]. Neurocomputing, 2017, 262, 120- 133.
doi: 10.1016/j.neucom.2016.11.088
14 姚金良, 翁璐斌, 王小华. 一种基于连通分量的文本区域定位方法[J]. 模式识别与人工智能, 2012, 25 (2): 325- 331.
doi: 10.3969/j.issn.1003-6059.2012.02.021
YAO Jinliang , WENG Lubin , WANG Xiaohua . A text region method based on connected component[J]. Pattern Recognition and Artificial Intelligence, 2012, 25 (2): 325- 331.
doi: 10.3969/j.issn.1003-6059.2012.02.021
15 AKINDUKO A A , MIRKES E M , GORBAN A N . SOM: stochastic initialization versus principal components[J]. Information Sciences, 2016, 364-365, 213- 221.
doi: 10.1016/j.ins.2015.10.013
16 OTSU N . A threshold selection method from gray-level histograms[J]. IEEE Transactions on Systems Man & Cybernetics, 2007, 9 (1): 62- 66.
17 张学东, 张仁秋, 关云虎, 等. 一种快速的手写体汉字细化算法[J]. 计算机应用与软件, 2009, 26 (11): 17- 18.
doi: 10.3969/j.issn.1000-386X.2009.11.006
ZHANG Xuedong , ZHANG Renqiu , GUAN Yunhu , et al. A fast thinning algorithm for handwritten Chinese[J]. Computer Applications and Software, 2009, 26 (11): 17- 18.
doi: 10.3969/j.issn.1000-386X.2009.11.006
18 张翠芳, 杨国为, 岳明明. Zhang并行细化算法的改进[J]. 信息技术与信息化, 2016, (6): 69- 71.
doi: 10.3969/j.issn.1672-9528.2016.06.017
ZHANG Cuifang , YANG Guowei , YUE Mingming . Improving of Zhang parallel thinning algorithm[J]. Information Technology and Informatization, 2016, (6): 69- 71.
doi: 10.3969/j.issn.1672-9528.2016.06.017
19 HUDSON I L , LEEMAQZ S Y , KIM S W , et al. SOM clustering and modelling of australian railway drivers' sleep, wake, duty profiles[J]. Studies in Computational Intelligence, 2016, 628, 235- 279.
20 ABDELSAMEA M M , GNECCO G , GABER M M . A SOM-based Chan—Vese model for unsupervised image segmentation[J]. Soft Computing, 2017, 21 (8): 1- 21.
[1] WANG Mei,WANG Guo-hong . New method the vehical license plate character segmentation based on concomitant and complementary color features [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(1): 31-34 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Jing,LI Yu-jiang,ZHANG Xiao-jin,BI Yan-jun,CHEN Wei-suo . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(6): 100 -103 .
[2] WANG Pei,ZHANG Yanning,SHEN Jiazhen,LIU Juncheng, . Application of information measure and support vector machine in image edge detection[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(3): 95 -99 .
[3] CHENG Daizhan, LI Zhiqiang. A survey on linearization of nonlinear systems[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 26 -36 .
[4] WANG Yong, XIE Yudong. Gas control technology of largeflow pipe[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 70 -74 .
[5] LIU Xin 1, SONG Sili 1, WANG Xinhong 2. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 98 -100 .
[6] HU Tian-liang,LI Peng,ZHANG Cheng-rui,ZUO Yi . Design of a QEP decode counter based on VHDL[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 10 -13 .
[7] . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 104 -107 .
[8] CHEN Huaxin, CHEN Shuanfa, WANG Binggang. The aging behavior and mechanism of base asphalts[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 125 -130 .
[9] . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 131 -136 .
[10] LI Shijin, WANG Shengte, HUANG Leping. Change detection with remote sensing images based on forward-backward heterogenicity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 1 -9 .