Journal of Shandong University(Engineering Science) ›› 2018, Vol. 48 ›› Issue (6): 89-94.doi: 10.6040/j.issn.1672-3961.0.2018.199

• Machine Learning & Data Mining • Previous Articles     Next Articles

Segmentation of connected characters based on improved drop-fall algorithm

Qiyue SONG(),Xuewen MU,Huan CHENG   

  1. School of Mathematics and Statistic, Xidian University, Xi′an 710071, Shaanxi, China
  • Received:2018-05-31 Online:2018-12-20 Published:2018-12-26
  • Supported by:
    陕西省自然科学基金(2015JM1031);中央高校基本科研业务费(JB150713)

Abstract:

As the traditional segmentation methods could not segment connected characters correctly, a segmentation algorithm based on improved drop-fall algorithm was proposed. The algorithm included two steps. Zhang-Sueng's thinning algorithm and the clustering of the connected region via self-organizing maps was used to find the starting drop point of drop-fall algorithm. A new drop path was defined to improve drop-fall algorithm. The water dropped from the starting drop point, along the skeleton of the character overlap stroke, at the end of the overlapped stroke skeleton, then continued dropping along the slant angle direction of the skeleton, until met the boundary of the character connected part. The water drop path was defined as the connected character segmentation path. This method solved the problem of character strokes fracture caused by the traditional drop-fall algorithm. Compared with the traditional drop-fall algorithm and the vertical projection segmentation algorithm, the experimental results showed that it was an ideal method for segmenting connected characters.

Key words: connected character, character segmentation, drop-fall algorithm, Zhang-Sueng's thinning algorithm, SOM-based clustering

CLC Number: 

  • TP391

Fig.1

Drop point adjacent pixels"

Fig.2

Movement rules for drop-fall algorithm"

Fig.3

Improved drop-fall algorithm"

Table 1

CAPTCHA image binarization"

类型 京东商城 建设银行
验证码图像
二值化图像

Fig.4

The linked domain vertical projection histogram"

Fig.5

8-Neighbor of P1"

Fig.6

Connected character image thinning"

Fig.7

Determine candidate segmentation points"

Fig.8

Self-Organizing Maps topologic structure"

Fig.9

Connected characters topologic structure"

Fig.10

Starting point and drop path of the improved drop-fall algorithm"

Fig.11

Comparison of two segmentation methods"

Table 2

Samples of text-based CAPTCHA"

网站 投影黏连 单点黏连 重叠黏连 多点黏连 复杂黏连 其他
京东商城
其他

Table 3

Segmentation experimental results"

验证码 样本数 竖直分割 传统滴水算法 改进滴水算法
京东 200 28% 45% 70%
建设银行 200 44% 61% 73%
对比测试 200 13% 75% 87%
1 VON AHN L , BLUM M , LANGFORD J . Telling humans and computers apart automatically[J]. Communications of the ACM, 2004, 47 (2): 56- 60.
doi: 10.1145/966389
2 BURSZTEIN E, MARTIN M, MITCHELL J. Text-based captcha strengths and weaknesses[C]//ACM Conference on Computer and Communications Security. Chicago, USA: ACM, 2011: 125-138.
3 CHEN J , LUO X , GUO Y , et al. A survey on breaking technique of text-based captcha[J]. Security & Communication Networks, 2017, 2017 (1-2): 1- 15.
4 YAN J, AHMAD A S E. A low-cost attack on a microsoft captcha[C]//ACM Conference on Computer and Communications Security. Alexandria, USA: DBLP, 2008: 543-554.
5 HUANG S Y , LEE Y K , BELL G , et al. An efficient segmentation algorithm for captchas with line cluttering and character warping[J]. Multimedia Tools & Applications, 2010, 48 (2): 267- 289.
6 NACHAR R A , INATY E , BONNIN P J , et al. Breaking down captcha using edge corners and fuzzy logic segmentation/recognition technique[J]. Security & Communication Networks, 2016, 8 (18): 3995- 4012.
7 GAO H, WEI W, WANG X, et al. The robustness of hollow captchas[C]//ACM Sigsac Conference on Computer & Communications Security. Berlin, Germany: ACM, 2013: 1075-1086.
8 张闯, 蔺志青, 肖波, 等. 适用于银行票据手写数字串切分的滴水算法[J]. 北京邮电大学学报, 2006, 29 (1): 13- 16.
doi: 10.3969/j.issn.1007-5321.2006.01.003
ZHANG Chuang , LIN Zhiqing , XIAO Bo , et al. Segmentation algorithm for unconstrained handwritten numeral strings in bank check reader system[J]. Journal of Beijing University of Posts and Telecommunications, 2006, 29 (1): 13- 16.
doi: 10.3969/j.issn.1007-5321.2006.01.003
9 李兴国, 高炜. 基于滴水算法的验证码中粘连字符分割方法[J]. 计算机工程与应用, 2014, 50 (1): 163- 166.
doi: 10.3778/j.issn.1002-8331.1208-0310
LI Xingguo , GAO Wei . Segmentation method for merged characters in captcha based on drop fall algorithm[J]. Computer Engineering and Applications, 2014, 50 (1): 163- 166.
doi: 10.3778/j.issn.1002-8331.1208-0310
10 马瑞, 杨静宇. 一种用于手写数字分割的滴水算法的改进[J]. 小型微型计算机系统, 2007, 28 (11): 2110- 2112.
doi: 10.3969/j.issn.1000-1220.2007.11.040
MA Rui , YANG Jingyu . An improved drop-fall aigorithm for handwritten numerals segmentation[J]. Journal of Chinese Computer Systems, 2007, 28 (11): 2110- 2112.
doi: 10.3969/j.issn.1000-1220.2007.11.040
11 WANG Xiujuan , ZHENG Kangfeng , GUO Jun . Inertial and big drop fall algorithm[J]. International Journal of Information Technology, 2006, 12 (4): 39- 48.
12 ZHANG T Y , SUEN C Y . A fast parallel algorithm for thinning digital patterns[J]. Comm Acm, 1984, 27 (3): 236- 239.
doi: 10.1145/357994.358023
13 ARABMAKKI E , KANTARDZIC M . SOM-based partial labeling of imbalanced data stream[J]. Neurocomputing, 2017, 262, 120- 133.
doi: 10.1016/j.neucom.2016.11.088
14 姚金良, 翁璐斌, 王小华. 一种基于连通分量的文本区域定位方法[J]. 模式识别与人工智能, 2012, 25 (2): 325- 331.
doi: 10.3969/j.issn.1003-6059.2012.02.021
YAO Jinliang , WENG Lubin , WANG Xiaohua . A text region method based on connected component[J]. Pattern Recognition and Artificial Intelligence, 2012, 25 (2): 325- 331.
doi: 10.3969/j.issn.1003-6059.2012.02.021
15 AKINDUKO A A , MIRKES E M , GORBAN A N . SOM: stochastic initialization versus principal components[J]. Information Sciences, 2016, 364-365, 213- 221.
doi: 10.1016/j.ins.2015.10.013
16 OTSU N . A threshold selection method from gray-level histograms[J]. IEEE Transactions on Systems Man & Cybernetics, 2007, 9 (1): 62- 66.
17 张学东, 张仁秋, 关云虎, 等. 一种快速的手写体汉字细化算法[J]. 计算机应用与软件, 2009, 26 (11): 17- 18.
doi: 10.3969/j.issn.1000-386X.2009.11.006
ZHANG Xuedong , ZHANG Renqiu , GUAN Yunhu , et al. A fast thinning algorithm for handwritten Chinese[J]. Computer Applications and Software, 2009, 26 (11): 17- 18.
doi: 10.3969/j.issn.1000-386X.2009.11.006
18 张翠芳, 杨国为, 岳明明. Zhang并行细化算法的改进[J]. 信息技术与信息化, 2016, (6): 69- 71.
doi: 10.3969/j.issn.1672-9528.2016.06.017
ZHANG Cuifang , YANG Guowei , YUE Mingming . Improving of Zhang parallel thinning algorithm[J]. Information Technology and Informatization, 2016, (6): 69- 71.
doi: 10.3969/j.issn.1672-9528.2016.06.017
19 HUDSON I L , LEEMAQZ S Y , KIM S W , et al. SOM clustering and modelling of australian railway drivers' sleep, wake, duty profiles[J]. Studies in Computational Intelligence, 2016, 628, 235- 279.
20 ABDELSAMEA M M , GNECCO G , GABER M M . A SOM-based Chan—Vese model for unsupervised image segmentation[J]. Soft Computing, 2017, 21 (8): 1- 21.
[1] WANG Mei,WANG Guo-hong . New method the vehical license plate character segmentation based on concomitant and complementary color features [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(1): 31-34 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .
[2] SUN Cong-zheng,GUAN Cong-sheng,QIN Jing-yu,CHENG Chuan . The structure and performances of the electroless Ni-P alloy coating on aluminum alloy[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(5): 108 -112 .
[3] WANG Bai-wei,CAO Sheng-le . A mult-objective assessment method of the effects of industrial waste-water management[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 89 -92 .
[4] CHOU Wu-Sheng, WANG Shuo. Study on the adaptive algorithm of the force reflection robotic master under large stiffness of the environment[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(1): 1 -5 .
[5] CAO Gang, DONG Chao-Yang, HUANG Ji-Bao, XUE Yu-Qing. Power system inter-area oscillation damping control with FACTS devies[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 31 -36 .
[6] LI Yi-bin ,RUAN Jiu-hong ,LIU Lu-yuan,SONG Rui,RONG Xue-wen . Vehicle longitudinal acceleration control based on ADRC[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 1 -04 .
[7] SHI Lai-shun,WAN Zhong-yi . Synthesis and performance evaluation of a novel betaine-type asphalt emulsifier[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 112 -115 .
[8] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[9] HU Tian-liang,LI Peng,ZHANG Cheng-rui,ZUO Yi . Design of a QEP decode counter based on VHDL[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 10 -13 .
[10] XU Li-li,JI Zhong,XIA Ji-mei . The optimum algorithm for the container loading problem with homogeneous cargoes[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 14 -17 .