Visible to the public Combining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based CAPTCHA

TitleCombining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based CAPTCHA
Publication TypeConference Paper
Year of Publication2017
AuthorsWang, Y., Huang, Y., Zheng, W., Zhou, Z., Liu, D., Lu, M.
Conference Name2017 IEEE International Conference on Industrial Technology (ICIT)
Keywordsadd-on recognition part, Business, CAPTCHA, CAPTCHA segmentations, captchas, character recognition, character segmentation, China, clustering, Clustering algorithms, composability, convolution, convolutional neural network, data entry, Human Behavior, human beings, human factors, image segmentation, neural nets, Neural networks, Optical character recognition software, pubcrawl, reverse Turing test, Segmentation, self-adaptive algorithm, synthetic multidigit text-based CAPTCHA, text analysis, text-based scheme
AbstractWe always use CAPTCHA(Completely Automated Public Turing test to Tell Computers and Humans Apart) to prevent automated bot for data entry. Although there are various kinds of CAPTCHAs, text-based scheme is still applied most widely, because it is one of the most convenient and user-friendly way for daily user [1]. The fact is that segmentations of different types of CAPTCHAs are not always the same, which means one of CAPTCHA's bottleneck is the segmentation. Once we could accurately split the character, the problem could be solved much easier. Unfortunately, the best way to divide them is still case by case, which is to say there is no universal way to achieve it. In this paper, we present a novel algorithm to achieve state-of-the-art performance, what was more, we also constructed a new convolutional neural network as an add-on recognition part to stabilize our state-of-the-art performance of the whole CAPTCHA system. The CAPTCHA datasets we are using is from the State Administration for Industry& Commerce of the People's Republic of China. In this datasets, there are totally 33 entrances of CAPTCHAs. In this experiments, we assume that each of the entrance is known. Results are provided showing how our algorithms work well towards these CAPTCHAs.
Citation Keywang_combining_2017