Visible to the public Biblio

Filters: Keyword is CAPTCHA  [Clear All Filters]
2021-03-18
Bi, X., Liu, X..  2020.  Chinese Character Captcha Sequential Selection System Based on Convolutional Neural Network. 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL). :554—559.

To ensure security, Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is widely used in people's online lives. This paper presents a Chinese character captcha sequential selection system based on convolutional neural network (CNN). Captchas composed of English and digits can already be identified with extremely high accuracy, but Chinese character captcha recognition is still challenging. The task we need to complete is to identify Chinese characters with different colors and different fonts that are not on a straight line with rotation and affine transformation on pictures with complex backgrounds, and then perform word order restoration on the identified Chinese characters. We divide the task into several sub-processes: Chinese character detection based on Faster R-CNN, Chinese character recognition and word order recovery based on N-Gram. In the Chinese character recognition sub-process, we have made outstanding contributions. We constructed a single Chinese character data set and built a 10-layer convolutional neural network. Eventually we achieved an accuracy of 98.43%, and completed the task perfectly.

Kalaichelvi, T., Apuroop, P..  2020.  Image Steganography Method to Achieve Confidentiality Using CAPTCHA for Authentication. 2020 5th International Conference on Communication and Electronics Systems (ICCES). :495—499.

Steganography is a data hiding technique, which is generally used to hide the data within a file to avoid detection. It is used in the police department, detective investigation, and medical fields as well as in many more fields. Various techniques have been proposed over the years for Image Steganography and also attackers or hackers have developed many decoding tools to break these techniques to retrieve data. In this paper, CAPTCHA codes are used to ensure that the receiver is the intended receiver and not any machine. Here a randomized CAPTCHA code is created to provide additional security to communicate with the authenticated user and used Image Steganography to achieve confidentiality. For achieving secret and reliable communication, encryption and decryption mechanism is performed; hence a machine cannot decode it using any predefined algorithm. Once a secure connection has been established with the intended receiver, the original message is transmitted using the LSB algorithm, which uses the RGB color spectrum to hide the image data ensuring additional encryption.

2020-09-11
A., Jesudoss, M., Mercy Theresa.  2019.  Hardware-Independent Authentication Scheme Using Intelligent Captcha Technique. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). :1—7.

This paper provides hardware-independent authentication named as Intelligent Authentication Scheme, which rectifies the design weaknesses that may be exploited by various security attacks. The Intelligent Authentication Scheme protects against various types of security attacks such as password-guessing attack, replay attack, streaming bots attack (denial of service), keylogger, screenlogger and phishing attack. Besides reducing the overall cost, it also balances both security and usability. It is a unique authentication scheme.

Azakami, Tomoka, Shibata, Chihiro, Uda, Ryuya, Kinoshita, Toshiyuki.  2019.  Creation of Adversarial Examples with Keeping High Visual Performance. 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). :52—56.
The accuracy of the image classification by the convolutional neural network is exceeding the ability of human being and contributes to various fields. However, the improvement of the image recognition technology gives a great blow to security system with an image such as CAPTCHA. In particular, since the character string CAPTCHA has already added distortion and noise in order not to be read by the computer, it becomes a problem that the human readability is lowered. Adversarial examples is a technique to produce an image letting an image classification by the machine learning be wrong intentionally. The best feature of this technique is that when human beings compare the original image with the adversarial examples, they cannot understand the difference on appearance. However, Adversarial examples that is created with conventional FGSM cannot completely misclassify strong nonlinear networks like CNN. Osadchy et al. have researched to apply this adversarial examples to CAPTCHA and attempted to let CNN misclassify them. However, they could not let CNN misclassify character images. In this research, we propose a method to apply FGSM to the character string CAPTCHAs and to let CNN misclassified them.
Sain, Mangal, Kim, Ki-Hwan, Kang, Young-Jin, lee, hoon jae.  2019.  An Improved Two Factor User Authentication Framework Based on CAPTCHA and Visual Secret Sharing. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :171—175.

To prevent unauthorized access to adversaries, strong authentication scheme is a vital security requirement in client-server inter-networking systems. These schemes must verify the legitimacy of such users in real-time environments and establish a dynamic session key fur subsequent communication. Of late, T. H. Chen and J. C. Huang proposed a two-factor authentication framework claiming that the scheme is secure against most of the existing attacks. However we have shown that Chen and Huang scheme have many critical weaknesses in real-time environments. The scheme is prone to man in the middle attack and information leakage attack. Furthermore, the scheme does not provide two essential security services such user anonymity and session key establishment. In this paper, we present an enhanced user participating authenticating scheme which overcomes all the weaknesses of Chen et al.'s scheme and provide most of the essential security features.

Ababtain, Eman, Engels, Daniel.  2019.  Security of Gestures Based CAPTCHAs. 2019 International Conference on Computational Science and Computational Intelligence (CSCI). :120—126.
We present a security analysis of several gesture CAPTCHA challenges designed to operate on mobiles. Mobile gesture CAPTCHA challenges utilize the accelerometer and the gyroscope inputs from a mobile to allow a human to solve a simple test by physically manipulating the device. We have evaluated the security of gesture CAPTCHA in mobile devices and found them resistant to a range of common automated attacks. Our study has shown that using an accelerometer and the gyroscope readings as an input to solve the CAPTCHA is difficult for malware, but easy for a real user. Gesture CAPTCHA is effective in differentiating between humans and machines.
Kim, Donghoon, Sample, Luke.  2019.  Search Prevention with Captcha Against Web Indexing: A Proof of Concept. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :219—224.
A website appears in search results based on web indexing conducted by a search engine bot (e.g., a web crawler). Some webpages do not want to be found easily because they include sensitive information. There are several methods to prevent web crawlers from indexing in search engine database. However, such webpages can still be indexed by malicious web crawlers. Through this study, we explore a paradox perspective on a new use of captchas for search prevention. Captchas are used to prevent web crawlers from indexing by converting sensitive words to captchas. We have implemented the web-based captcha conversion tool based on our search prevention algorithm. We also describe our proof of concept with the web-based chat application modified to utilize our algorithm. We have conducted the experiment to evaluate our idea on Google search engine with two versions of webpages, one containing plain text and another containing sensitive words converted to captchas. The experiment results show that the sensitive words on the captcha version of the webpages are unable to be found by Google's search engine, while the plain text versions are.
Shekhar, Heemany, Moh, Melody, Moh, Teng-Sheng.  2019.  Exploring Adversaries to Defend Audio CAPTCHA. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). :1155—1161.
CAPTCHA is a web-based authentication method used by websites to distinguish between humans (valid users) and bots (attackers). Audio captcha is an accessible captcha meant for the visually disabled section of users such as color-blind, blind, near-sighted users. Firstly, this paper analyzes how secure current audio captchas are from attacks using machine learning (ML) and deep learning (DL) models. Each audio captcha is made up of five, seven or ten random digits[0-9] spoken one after the other along with varying background noise throughout the length of the audio. If the ML or DL model is able to correctly identify all spoken digits and in the correct order of occurance in a single audio captcha, we consider that captcha to be broken and the attack to be successful. Throughout the paper, accuracy refers to the attack model's success at breaking audio captchas. The higher the attack accuracy, the more unsecure the audio captchas are. In our baseline experiments, we found that attack models could break audio captchas that had no background noise or medium background noise with any number of spoken digits with nearly 99% to 100% accuracy. Whereas, audio captchas with high background noise were relatively more secure with attack accuracy of 85%. Secondly, we propose that the concepts of adversarial examples algorithms can be used to create a new kind of audio captcha that is more resilient towards attacks. We found that even after retraining the models on the new adversarial audio data, the attack accuracy remained as low as 25% to 36% only. Lastly, we explore the benefits of creating adversarial audio captcha through different algorithms such as Basic Iterative Method (BIM) and deepFool. We found that as long as the attacker has less than 45% sample from each kinds of adversarial audio datasets, the defense will be successful at preventing attacks.
Zhang, Yang, Gao, Haichang, Pei, Ge, Luo, Sainan, Chang, Guoqin, Cheng, Nuo.  2019.  A Survey of Research on CAPTCHA Designing and Breaking Techniques. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :75—84.
The Internet plays an increasingly important role in people's lives, but it also brings security problems. CAPTCHA, which stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart, has been widely used as a security mechanism. This paper outlines the scientific and technological progress in both the design and attack of CAPTCHAs related to these three CAPTCHA categories. It first presents a comprehensive survey of recent developments for each CAPTCHA type in terms of usability, robustness and their weaknesses and strengths. Second, it summarizes the attack methods for each category. In addition, the differences between the three CAPTCHA categories and the attack methods will also be discussed. Lastly, this paper provides suggestions for future research and proposes some problems worthy of further study.
Ababtain, Eman, Engels, Daniel.  2019.  Gestures Based CAPTCHAs the Use of Sensor Readings to Solve CAPTCHA Challenge on Smartphones. 2019 International Conference on Computational Science and Computational Intelligence (CSCI). :113—119.
We present novel CAPTCHA challenges based on user gestures designed for mobile. A gesture CAPTCHA challenge is a security mechanism to prevent malware from gaining access to network resources from mobile. Mobile devices contain a number of sensors that record the physical movement of the device. We utilized the accelerometer and gyroscope data as inputs to our novel CAPTCHAs to capture the physical manipulation of the device. We conducted an experimental study on a group of people. We discovered that younger people are able to solve this type of CAPTCHA challenges successfully in a short amount of time. We found that using accelerometer readings produces issues for some older people.
Shu, Yujin, Xu, Yongjin.  2019.  End-to-End Captcha Recognition Using Deep CNN-RNN Network. 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). :54—58.
With the development of the Internet, the captcha technology has also been widely used. Captcha technology is used to distinguish between humans and machines, namely Completely Automated Public Turing test to tell Computers and Humans Apart. In this paper, an end-to-end deep CNN-RNN network model is constructed by studying the captcha recognition technology, which realizes the recognition of 4-character text captcha. The CNN-RNN model first constructs a deep residual convolutional neural network based on the residual network structure to accurately extract the input captcha picture features. Then, through the constructed variant RNN network, that is, the two-layer GRU network, the deep internal features of the captcha are extracted, and finally, the output sequence is the 4-character captcha. The experiments results show that the end-to-end deep CNN-RNN network model has a good performance on different captcha datasets, achieving 99% accuracy. And experiment on the few samples dataset which only has 4000 training samples also shows an accuracy of 72.9 % and a certain generalization ability.
Kansuwan, Thivanon, Chomsiri, Thawatchai.  2019.  Authentication Model using the Bundled CAPTCHA OTP Instead of Traditional Password. 2019 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON). :5—8.
In this research, we present identity verification using the “Bundled CAPTCHA OTP” instead of using the traditional password. This includes a combination of CAPTCHA and One Time Password (OTP) to reduce processing steps. Moreover, a user does not have to remember any password. The Bundled CAPTCHA OTP which is the unique random parameter for any login will be used instead of a traditional password. We use an e-mail as the way to receive client-side the Bundled CAPTCHA OTP because it is easier to apply without any problems compare to using mobile phones. Since mobile phones may be crashing, lost, change frequently, and easier violent access than e-mail. In this paper, we present a processing model of the proposed system and discuss advantages and disadvantages of the model.
2019-04-01
Ye, Guixin, Tang, Zhanyong, Fang, Dingyi, Zhu, Zhanxing, Feng, Yansong, Xu, Pengfei, Chen, Xiaojiang, Wang, Zheng.  2018.  Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :332–348.
Despite several attacks have been proposed, text-based CAPTCHAs are still being widely used as a security mechanism. One of the reasons for the pervasive use of text captchas is that many of the prior attacks are scheme-specific and require a labor-intensive and time-consuming process to construct. This means that a change in the captcha security features like a noisier background can simply invalid an earlier attack. This paper presents a generic, yet effective text captcha solver based on the generative adversarial network. Unlike prior machine-learning-based approaches that need a large volume of manually-labeled real captchas to learn an effective solver, our approach requires significantly fewer real captchas but yields much better performance. This is achieved by first learning a captcha synthesizer to automatically generate synthetic captchas to learn a base solver, and then fine-tuning the base solver on a small set of real captchas using transfer learning. We evaluate our approach by applying it to 33 captcha schemes, including 11 schemes that are currently being used by 32 of the top-50 popular websites including Microsoft, Wikipedia, eBay and Google. Our approach is the most capable attack on text captchas seen to date. It outperforms four state-of-the-art text-captcha solvers by not only delivering a significant higher accuracy on all testing schemes, but also successfully attacking schemes where others have zero chance. We show that our approach is highly efficient as it can solve a captcha within 0.05 second using a desktop GPU. We demonstrate that our attack is generally applicable because it can bypass the advanced security features employed by most modern text captcha schemes. We hope the results of our work can encourage the community to revisit the design and practical use of text captchas.
Wang, M., Yang, Y., Zhu, M., Liu, J..  2018.  CAPTCHA Identification Based on Convolution Neural Network. 2018 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC). :364–368.
The CAPTCHA is an effective method commonly used in live interactive proofs on the Internet. The widely used CAPTCHAs are text-based schemes. In this paper, we document how we have broken such text-based scheme used by a website CAPTCHA. We use the sliding window to segment 1001 pieces of CAPTCHA to get 5900 images with single-character useful information, a total of 25 categories. In order to make the convolution neural network learn more image features, we augmented the data set to get 129924 pictures. The data set is trained and tested in AlexNet and GoogLeNet to get the accuracy of 87.45% and 98.92%, respectively. The experiment shows that the optimized network parameters can make the accuracy rate up to 92.7% in AlexNet and 98.96% in GoogLeNet.
Stein, G., Peng, Q..  2018.  Low-Cost Breaking of a Unique Chinese Language CAPTCHA Using Curriculum Learning and Clustering. 2018 IEEE International Conference on Electro/Information Technology (EIT). :0595–0600.

Text-based CAPTCHAs are still commonly used to attempt to prevent automated access to web services. By displaying an image of distorted text, they attempt to create a challenge image that OCR software can not interpret correctly, but a human user can easily determine the correct response to. This work focuses on a CAPTCHA used by a popular Chinese language question-and-answer website and how resilient it is to modern machine learning methods. While the majority of text-based CAPTCHAs focus on transcription tasks, the CAPTCHA solved in this work is based on localization of inverted symbols in a distorted image. A convolutional neural network (CNN) was created to evaluate the likelihood of a region in the image belonging to an inverted character. It is used with a feature map and clustering to identify potential locations of inverted characters. Training of the CNN was performed using curriculum learning and compared to other potential training methods. The proposed method was able to determine the correct response in 95.2% of cases of a simulated CAPTCHA and 67.6% on a set of real CAPTCHAs. Potential methods to increase difficulty of the CAPTCHA and the success rate of the automated solver are considered.

Li, Z., Liao, Q..  2018.  CAPTCHA: Machine or Human Solvers? A Game-Theoretical Analysis 2018 5th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2018 4th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom). :18–23.
CAPTCHAs have become an ubiquitous defense used to protect open web resources from being exploited at scale. Traditionally, attackers have developed automatic programs known as CAPTCHA solvers to bypass the mechanism. With the presence of cheap labor in developing countries, hackers now have options to use human solvers. In this research, we develop a game theoretical framework to model the interactions between the defender and the attacker regarding the design and countermeasure of CAPTCHA system. With the result of equilibrium analysis, both parties can determine the optimal allocation of software-based or human-based CAPTCHA solvers. Counterintuitively, instead of the traditional wisdom of making CAPTCHA harder and harder, it may be of best interest of the defender to make CAPTCHA easier. We further suggest a welfare-improving CAPTCHA business model by involving decentralized cryptocurrency computation.
Liu, F., Li, Z., Li, X., Lv, T..  2018.  A Text-Based CAPTCHA Cracking System with Generative Adversarial Networks. 2018 IEEE International Symposium on Multimedia (ISM). :192–193.
As a multimedia security mechanism, CAPTCHAs are completely automated public turing test to tell computers and humans apart. Although cracking CAPTCHA has been explored for many years, it is still a challenging problem for real practice. In this demo, we present a text based CAPTCHA cracking system by using convolutional neural networks(CNN). To solve small sample problem, we propose to combine conditional deep convolutional generative adversarial networks(cDCGAN) and CNN, which makes a tremendous progress in accuracy. In addition, we also select multiple models with low pearson correlation coefficients for majority voting ensemble, which further improves the accuracy. The experimental results show that the system has great advantages and provides a new mean for cracking CAPTCHAs.
Hu, Y., Chen, L., Cheng, J..  2018.  A CAPTCHA recognition technology based on deep learning. 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). :617–620.
Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is an important human-machine distinction technology for website to prevent the automatic malicious program attack. CAPTCHA recognition studies can find security breaches in CAPTCHA, improve CAPTCHA technology, it can also promote the technologies of license plate recognition and handwriting recognition. This paper proposed a method based on Convolutional Neural Network (CNN) model to identify CAPTCHA and avoid the traditional image processing technology such as location and segmentation. The adaptive learning rate is introduced to accelerate the convergence rate of the model, and the problem of over-fitting and local optimal solution has been solved. The multi task joint training model is used to improve the accuracy and generalization ability of model recognition. The experimental results show that the model has a good recognition effect on CAPTCHA with background noise and character adhesion distortion.
Usuzaki, S., Aburada, K., Yamaba, H., Katayama, T., Mukunoki, M., Park, M., Okazaki, N..  2018.  Interactive Video CAPTCHA for Better Resistance to Automated Attack. 2018 Eleventh International Conference on Mobile Computing and Ubiquitous Network (ICMU). :1–2.
A “Completely Automated Public Turing Test to Tell Computers and Humans Apart” (CAPTCHA) widely used online services so that prevents bots from automatic getting a large of accounts. Interactive video type CAPTCHAs that attempt to detect this attack by using delay time due to communication relays have been proposed. However, these approaches remain insufficiently resistant to bots. We propose a CAPTCHA that combines resistant to automated and relay attacks. In our CAPTCHA, the users recognize a moving object (target object) from among a number of randomly appearing decoy objects and tracks the target with mouse cursor. The users pass the test when they were able to track the target for a certain time. Since the target object moves quickly, the delay makes it difficult for a remote solver to break the CAPTCHA during a relay attack. It is also difficult for a bot to track the target using image processing because it has same looks of the decoys. We evaluated our CAPTCHA's resistance to relay and automated attacks. Our results show that, if our CAPTHCA's parameters are set suitable value, a relay attack cannot be established economically and false acceptance rate with bot could be reduced to 0.01% without affecting human success rate.
Rathour, N., Kaur, K., Bansal, S., Bhargava, C..  2018.  A Cross Correlation Approach for Breaking of Text CAPTCHA. 2018 International Conference on Intelligent Circuits and Systems (ICICS). :6–10.
Online web service providers generally protect themselves through CAPTCHA. A CAPTCHA is a type of challenge-response test used in computing as an attempt to ensure that the response is generated by a person. CAPTCHAS are mainly instigated as distorted text which the handler must correctly transcribe. Numerous schemes have been proposed till date in order to prevent attacks by Bots. This paper also presents a cross correlation based approach in breaking of famous service provider's text CAPTCHA i.e. PayPal.com and the other one is of India's most visited website IRCTC.co.in. The procedure can be fragmented down into 3 firmly tied tasks: pre-processing, segmentation, and classification. The pre-processing of the image is performed to remove all the background noise of the image. The noise in the CAPTCHA are unwanted on pixels in the background. The segmentation is performed by scanning the image for on pixels. The organization is performed by using the association values of the inputs and templates. Two types of templates have been used for classification purpose. One is the standard templates which give 30% success rate and other is the noisy templates made from the captcha images and success rate achieved with these is 100%.
Zhang, T., Zheng, H., Zhang, L..  2018.  Verification CAPTCHA Based on Deep Learning. 2018 37th Chinese Control Conference (CCC). :9056–9060.
At present, the captcha is widely used in the Internet. The method of captcha recognition using the convolutional neural networks was introduced in this paper. It was easier to apply the convolution neural network model of simple training to segment the captcha, and the network structure was established imitating VGGNet model. and the correct rate can be reached more than 90%. For the more difficult segmentation captcha, it can be used the end-to-end thought to the captcha as a whole to training, In this way, the recognition rate of the more difficult segmentation captcha can be reached about 85%.
2017-12-20
Azakami, T., Shibata, C., Uda, R..  2017.  Challenge to Impede Deep Learning against CAPTCHA with Ergonomic Design. 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). 1:637–642.

Once we had tried to propose an unbreakable CAPTCHA and we reached a result that limitation of time is effect to prevent computers from recognizing characters accurately while computers can finally recognize all text-based CAPTCHA in unlimited time. One of the existing usual ways to prevent computers from recognizing characters is distortion, and adding noise is also effective for the prevention. However, these kinds of prevention also make recognition of characters by human beings difficult. As a solution of the problems, an effective text-based CAPTCHA algorithm with amodal completion was proposed by our team. Our CAPTCHA causes computers a large amount of calculation costs while amodal completion helps human beings to recognize characters momentarily. Our CAPTCHA has evolved with aftereffects and combinations of complementary colors. We evaluated our CAPTCHA with deep learning which is attracting the most attention since deep learning is faster and more accurate than existing methods for recognition with computers. In this paper, we add jagged lines to edges of characters since edges are one of the most important parts for recognition in deep learning. In this paper, we also evaluate that how much the jagged lines decrease recognition of human beings and how much they prevent computers from the recognition. We confirm the effects of our method to deep learning.

An, G., Yu, W..  2017.  CAPTCHA Recognition Algorithm Based on the Relative Shape Context and Point Pattern Matching. 2017 9th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA). :168–172.
Using shape context descriptors in the distance uneven grouping and its more extensive description of the shape feature, so this descriptor has the target contour point set deformation invariance. However, the twisted adhesions verification code have more outliers and more serious noise, the above-mentioned invariance of the shape context will become very bad, in order to solve the above descriptors' limitations, this article raise a new algorithm based on the relative shape context and point pattern matching to identify codes. And also experimented on the CSDN site's verification code, the result is that the recognition rate is higher than the traditional shape context and the response time is shorter.
Wang, Y., Huang, Y., Zheng, W., Zhou, Z., Liu, D., Lu, M..  2017.  Combining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based CAPTCHA. 2017 IEEE International Conference on Industrial Technology (ICIT). :980–985.
We always use CAPTCHA(Completely Automated Public Turing test to Tell Computers and Humans Apart) to prevent automated bot for data entry. Although there are various kinds of CAPTCHAs, text-based scheme is still applied most widely, because it is one of the most convenient and user-friendly way for daily user [1]. The fact is that segmentations of different types of CAPTCHAs are not always the same, which means one of CAPTCHA's bottleneck is the segmentation. Once we could accurately split the character, the problem could be solved much easier. Unfortunately, the best way to divide them is still case by case, which is to say there is no universal way to achieve it. In this paper, we present a novel algorithm to achieve state-of-the-art performance, what was more, we also constructed a new convolutional neural network as an add-on recognition part to stabilize our state-of-the-art performance of the whole CAPTCHA system. The CAPTCHA datasets we are using is from the State Administration for Industry& Commerce of the People's Republic of China. In this datasets, there are totally 33 entrances of CAPTCHAs. In this experiments, we assume that each of the entrance is known. Results are provided showing how our algorithms work well towards these CAPTCHAs.
Le, T. A., Baydin, A. G., Zinkov, R., Wood, F..  2017.  Using synthetic data to train neural networks is model-based reasoning. 2017 International Joint Conference on Neural Networks (IJCNN). :3514–3521.
We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.