Position-encoding convolutional network to solving connected text CAPTCHA

Qing, Ke; Zhang, Rong

doi:10.2478/jaiscr-2022-0008

Artykuł - szczegóły

Tytuł artykułu

Position-encoding convolutional network to solving connected text CAPTCHA

Autorzy

Qing Ke , Zhang Rong

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2022-0008

Warianty tytułu

Języki publikacji

Abstrakty

Text-based CAPTCHA is a convenient and effective safety mechanism that has been widely deployed across websites. The efficient end-to-end models of scene text recognition consisting of CNN and attention-based RNN show limited performance in solving text-based CAPTCHAs. In contrast with the street view image and document, the character sequence in CAPTCHA is non-semantic. The RNN loses its ability to learn the semantic context and only implicitly encodes the relative position of extracted features. Meanwhile, the security features, which prevent characters from segmentation and recognition, extensively increase the complexity of CAPTCHAs. The performance of this model is sensitive to different CAPTCHA schemes. In this paper, we analyze the properties of the text-based CAPTCHA and accordingly consider solving it as a highly position-relative character sequence recognition task. We propose a network named PosConv to leverage the position information in the character sequence without RNN. PosConv uses a novel padding strategy and modified convolution, explicitly encoding the relative position into the local features of characters. This mechanism of PosConv makes the extracted features from CAPTCHAs more informative and robust. We validate PosConv on six text-based CAPTCHA schemes, and it achieves state-of-the-art or competitive recognition accuracy with significantly fewer parameters and faster convergence speed.

Słowa kluczowe

deep neural network position encoding CNN text-based CAPTCHA recognition character recognition

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2022

Tom

Vol. 12, No. 2

Strony

121--133

Opis fizyczny

Bibliogr. 23 poz., rys.

Twórcy

autor

Qing Ke

zrong@ustc.edu.cn

Department of Electronic Engineering and Information Science University of Science and Technology of China No. 443 Huangshan Rd, Hefei, Anhui Province, 230027 P. R. China

autor

Zhang Rong

Department of Electronic Engineering and Information Science University of Science and Technology of China No. 443 Huangshan Rd, Hefei, Anhui Province, 230027 P. R. China

Bibliografia

[1] Darko Brodic, Alessia Amelio, Nadeem Ahmad, and Syed Khuram Shahzad. Usability analysis of the image and interactive captcha via prediction of the response time. In International Workshop on Multi-disciplinary Trends in Artificial Intelligence, pages 252–265. Springer, 2017.
[2] Elie Bursztein, Jonathan Aigrain, Angelika Moscicki, and John C Mitchell. The end is nigh: Generic solving of text-based captchas. In 8th {USENIX}Workshop on Offensive Technologies ({WOOT}14), 2014.
[3] Elie Bursztein, Matthieu Martin, and John Mitchell. Text-based captcha strengths and weaknesses. In Proceedings of the 18th ACM conference on Computer and communications security, pages 125–138, 2011.
[4] Kumar Chellapilla, Kevin Larson, Patrice Y Simard, and Mary Czerwinski. Computers beat humans at single character recognition in reading based human interaction proofs (hips). In Conference on Email and Anti-Spam (CEAS), pages 1–8,2005.
[5] Chen Duan, Rong Zhang, and Ke Qing. Feature refine network for text-based captcha recognition. In International Conference on Image and Graphics,pages 64–73. Springer, 2019.
[6] Ian J. Goodfellow and Yaroslav Bulatov and Julian Ibarz and Sacha Arnoud and Vinay Shet, Multidigit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks, 1312.6082, 2014.
[7] Ahmad Salah El Ahmad, Jeff Yan, and Lindsay Marshall. The robustness of a new captcha. In Proceedings of the Third European Workshop on System Security, pages 36–41, 2010.
[8] Haichang Gao, Mengyun Tang, Yi Liu, Ping Zhang, and Xiyang Liu. Research on the security of microsoft’s two-layer captcha. IEEE Transactions on Information Forensics and Security, 12(7):1671–1685, 2017.
[9] Haichang Gao, Jeff Yan, Fang Cao, Zhengya Zhang, Lei Lei, Mengyun Tang, Ping Zhang, Xin Zhou, Xuqin Wang, and Jiawei Li. A simple generic attack on text captchas. In The Network and Distributed System Security Symposium (NDSS), pages 1–14, 2016.
[10] Md Amirul Islam, Sen Jia, and Neil D. B. Bruce. How much position information do convolutional neural networks encode?, 2020.
[11] Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, and Jason Yosinski. An intriguing failing of convolutional neural networks and the coordconv solution, 2018.
[12] Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. Mask textspotter: An endto-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 67–83, 2018.
[13] Rabih Al Nachar, Elie Inaty, Patrick J Bonnin, and Yasser Alayli. Breaking down captcha using Edge corners and fuzzy logic segmentation/recognition technique. Security and Communication Networks, 8(18):3995–4012, 2015.
[14] Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, and Fei Wu. Mango: A mask attention guided one-stage scene text spotter, 2020.
[15] Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules, 2017. [16] Mengyun Tang, Haichang Gao, Yang Zhang, Yi Liu, Ping Zhang, and Ping Wang. Research on deep learning techniques in breaking text-based captchas and designing image-basedcaptcha. IEEE Transactions on Information Forensics and Security, 13(10):2522–2537, 2018.
[17] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
[18] Luis Von Ahn, Manuel Blum, and John Langford.Telling humans and computers apart automatically. Communications of the ACM, 47(2):56–60, 2004.
[19] Zbigniew Wojna, Alexander N Gorban, DarShyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, and Julian Ibarz. Attention-based extraction of structured information from street view imagery. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 1, pages 844–850. IEEE, 2017.
[20] Jeff Yan and Ahmad Salah El Ahmad. A low-cost attack on a microsoft captcha. In Proceedings of the 15th ACM conference on Computer and communications security, pages 543–554, 2008.
[21] Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, Jungong Han, and Zheng Wang. Using generative adversarial networks to break and protect text captchas. ACM Transactions on Privacy and Security (TOPS), 23(2):1–29, 2020.
[22] Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, and Zheng Wang. Yet another text captcha solver: A generative adversarial network based approach. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 332–348, 2018.
[23] Yang Zi, Haichang Gao, Zhouhang Cheng, and Yi Liu. An end-to-end attack on text captchas. IEEE Transactions on Information Forensics and Security, 15:753–766, 2019.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-bc49af8e-1c73-4d18-a574-c05fc4e0d42c