Tytuł artykułu
Warianty tytułu
Review of methods used for the text line extraction in complex hand-written documents
Języki publikacji
W artykule przedstawiono metody komputerowe wykorzystywane do wykrywania linii tekstu w dokumentach rękopiśmiennych. Przedstawiono problematykę automatycznej identyfikacji autora tekstu na podstawie cech jego pisma. Ponieważ jest to problematyka złożona, omówiono ogólną metodologię przetwarzania tekstu z wykorzystaniem przetwarzania cyfrowej wersji obrazu dokumentu zeskanowanego lub pozyskanego poprzez fotografię. Omówiono główne grupy algorytmów służących do wykrywania linii w tekście, przedstawiając ich ogólną ideę, wady i zalety. Zaprezentowano także autorski algorytm wykorzystujący transformatę Hougha, którego skuteczność analizy trudnych średniowiecznych dokumentów łacińskich jest wyższa, niż pozostałych podejść. Wykazano jej dokładność na przykładzie eksperymentu z wybranymi dokumentami archiwalnymi.
The paper presents the computer-based methods for the text line detection in hand-written manuscripts. The problem of the automated author detection based on his writing habits was defined. Because the task is difficult and complex, the general text processing methodology is introduced, working with the scanned or photographed documents. The main groups of algorithms applied to the text line detection were introduced with their advantages and drawbacks iterated. The novel approach for the task, exploiting the modified Hough transform is also presented. Its efficiency of detecting text lines in the complex medieval manuscripts is higher than for approaches used so far. This is demonstrated based on the selected archived documents.
Opis fizyczny
Bibliogr. 16 poz., il., rys.
- Politechnika Warszawska, Instytut Radioelektroniki
- [1] (10.05.2015). Evangelistarium. Available: http://polona.pl/item/14637590/14/.
- [2] (10.05.2015). Miscellanea theologica. Available: http://polona.pl/item/12909419/56/.
- [3] J. Szymański, Nauki Pomocnicze Historii. Wydawn. Nauk. PWN, 2001.
- [4] E. Bruzzone and M. C. Coffetti, “An algorithm for extracting cursive text lines,” in Document Analysis and Recognition, 1999. ICDAR ‘99. Proceedings of the Fifth International Conference On, 1999, pp. 749–752.
- [5] M. Arivazhagan, H. Srinivasan and S. Srihari, “A statistical approach to line segmentation in handwritten documents,” in Electronic Imaging 2007, 2007, pp. 65000T-65000T-11.
- [6] L. Likforman-Sulem, A. Hanimyan and C. Faure, “A hough based algorithm for extracting text lines in handwritten documents,” in Document Analysis and Recognition, 1995. Proceedings of the Third International Conference On, 1995, pp. 774–777 vol. 2.
- [7] Y. Pu and Z. Shi, “A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents”, SERIES IN MACHINE PERCEPTION AND ARTIFICIAL INTELLIGENCE, vol. 34, pp. 141–152, 2000.
- [8] G. Louloudis, B. Gatos, I. Pratikakis and K. Halatsis. A blockbased hough transform mapping for text line detection in handwritten documents. Presented at Tenth International Workshop on Frontiers in Handwriting Recognition. 2006.
- [9] J. L. Pach and P. Bilski, “A robust text line detection in complex handwritten documents,” in The 8th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Warsaw, Poland, 2015, pp. 271–275.
- [10] Z. Shi and V. Govindaraju, “Line separation for complex document images using fuzzy runlength” in 2004, pp. 306.
- [11] B. Gatos, N. Stamatopoulos and G. Louloudis, “ICDAR2009 handwriting segmentation contest”, International Journal on Document Analysis and Recognition (IJDAR), vol. 14, pp. 25–33, 2011.
- [12] C. Weliwitage, A. Harvey and A. Jennings, “Handwritten document offline text line segmentation,” in Digital Image Computing: Techniques and Applications, 2005. DICTA’05. Proceedings 2005, 2005, pp. 27–27.
- [13] H. I. Koo and N. I. Cho, “Text-line extraction in handwritten chinese documents based on an energy minimization framework”, Image Processing, IEEE Transactions On, vol. 21, pp. 1169–1175, 2012.
- [14] Y. Tang, X. Wu and W. Bu, “Text line segmentation based on matched filtering and top-down grouping for handwritten documents”, in Document Analysis Systems (DAS), 2014 11th IAPR International Workshop On, 2014, pp. 365–369.
- [15] A. Alaei, P. Nagabhushan and U. Pal, “A new text-line alignment approach based on piece-wise painting algorithm for handwritten documents”, in Document Analysis and Recognition (ICDAR), 2011 International Conference On, 2011, pp. 324–328.
- [16] H. I. Koo and N. I. Cho, “State estimation in a document image and its application in text block identification and text line extraction”, in Computer Vision–ECCV 2010Anonymous Springer, 2010, pp. 421–434.
Typ dokumentu
Identyfikator YADDA