Tytuł artykułu
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
In the modern world, fast and efficient processing of non-digital (handwritten or typed) texts is the task of extreme importance. Similar to many other fields, optical character recognition (OCR) benefits from the application of machine learning (ML) which allows developing effective and accurate methods. In order to achieve good performance, a machine learning algorithm requires great amount of data. Nowadays, a large database of handwritten characters prepared by National Institute of Standards and Technology (NIST), USA, can be used for training an ML model. However, significant differences between the manners of handwriting exist in the US and Poland. That fact, along with the absence of Polish diacritical marks, causes the NIST database to be less useful for development of an OCR model for the Polish language. According to the best of the authors’ knowledge, no database with samples of Polish handwriting exists. The present research is focused at filling this gap, i.e. gathering and preparing an extensive database of Polish handwritten characters. The paper presents the very first database of Polish handwriting samples. The database is by far larger than all the datasets used in the previous attempts of implementing OCR for the Polish handwriting. It is also the first fully publicly accessible database of Polish handwriting of this scale. The same method and developed tools can be used to build handwritten characters databases of other languages.
Wydawca
Rocznik
Tom
Strony
30--38
Opis fizyczny
Bibliogr. 22 poz., fig.
Twórcy
autor
- Department of Computer Science, Lublin University of Technology, ul. Nadbystrzycka 36B, 20-618, Lublin
autor
- Department of Computer Science, Lublin University of Technology, ul. Nadbystrzycka 36B, 20-618, Lublin
autor
- Department of Computer Science, Lublin University of Technology, ul. Nadbystrzycka 36B, 20-618, Lublin
Bibliografia
- 1. Bhattacharya, U., Chaudhuri, B. B. Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE transactions on pattern analysis and machine intelligence. 2008, 31(3), 444–457.
- 2. Dash, K. S., Puhan, N. B., & Panda, G. BESAC: Binary External Symmetry Axis Constellation for unconstrained handwritten character recognition. Pattern Recognition Letters. 2016, 83, 413–422.
- 3. Dhaka, V. S., Kumar, M., Chaudhary, P. Offline Handwritten English Script Recognition: A Survey. International Journal of Advanced Networking and Applications (IJANA). 2014, 114–124.
- 4. Garris M. D., Wilkinson R. A. HWSC – Handwritten segmented characters database. In Technical Report Special Database. National Institute of Standards and Technology. 2017.
- 5. Garris, M. Methods for evaluating the performance of systems intended to recognize characters from image data scanned from forms. 1993
- 6. Górska, Z., Janicki, A. Recognition of extraversion level based on handwriting and support vector machines. Perceptual and motor skills. 2012, 114(3), 857–869.
- 7. Grother, P. J. NIST Special Database 19. NIST, Handprinted Forms and Characters Database. National Institute of Standards and Technology. 1995.
- 8. Grzelak, D., Podlaski, K., Wiatrowski, G. Analyze the effectiveness of an algorithm for identifying Polish characters in handwriting based on neural machine learning technologies. Journal of King Saud University-Computer and Information Sciences. 2019, 1–7.
- 9. Khosravi, H., Kabir, E. Introducing a very large dataset of handwritten Farsi digits and a study on their varieties. Pattern recognition letters. 2007, 28(10), 1133–1141.
- 10. Kurzynski, M., Sas, J. Combining Character Level Classifier and Probabilistic Lexicons in Handwritten Word Recognition–Comparative Analysis of Methods. In International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg. 2005, 330–337.
- 11. Manjusha, K., Kumar, M. A., & Soman, K. P. On developing handwritten character image database for Malayalam language script. Engineering Science and Technology, an International Journal. 2019, 22(2), 637–645.
- 12. Modi, H., Parikh, M. C. A review on optical character recognition techniques. Int J Comput Appl. 2017, 160(6), 20–24.
- 13. Pesch, H., Hamdani, M., Forster, J., Ney, H. Analysis of preprocessing techniques for latin handwrit ing recognition. In 2012 International Conference on Frontiers in Handwriting Recognition. IEEE. 2012, 280–284.
- 14. Ravi S., Khan A. M. Morphological operations for image processing: understanding and its applications. 2nd National Conference on VLSI, Signal processing & Communications NCVSComs. 2013.
- 15. Sachdeva, R., Nagpal, P. Text Localization and Extraction in Images Using Mathematical Morphology and OCR Techniques. International Journal of Scientific Engineering and Research. 2013, 1(1).
- 16. Shastay, A. Misidentification of Alphanumeric Symbols in Both Handwritten and Computer-Generated Information. Home healthcare now. 2015, 33(6), 338–339.
- 17. Shinde, A. A., Chougule, D. G. Text Pre-processing and Text Segmentation for OCR. International Journal of Computer Science Engineering and Technology. 2012, 2(1), 810–812.
- 18. Turnbull, S. J., Jones, A. E., Allen, M. Identification of the class characteristics in the handwriting of Polish people writing in English. Journal of forensic sciences. 2010, 55(5), 1296–1303.
- 19. Wilkinson R. A., Geist J., Janet S., Grother P. J., Burges C. J., Creecy R., Wilson C. L. The first census optical character recognition system US Department of Commerce, National Institute of Standards and Technology. 1992.
- 20. Wilkinson, R. A., Garris, M. D., Geist, J. C. Machine-assisted human classification of segmented characters for OCR testing and training. In Character Recognition Technologies. International Society for Optics and Photonics. 1993, 1906, 208–217.
- 21. Wilson, C. L., Garris M. D. Handprinted character database (HWDB). Technical Report Special Database 1. National Institute of Standards and Technology. 1990.
- 22. Zarro, Rina D., and Mardin A. Anwer. “Recognition-based online Kurdish character recognition using hidden Markov model and harmony search.” Engineering Science and Technology, an International Journal, 2017, 20.2, 783–794.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-b97bdeca-28ef-4915-a601-b36a819d9ab3