PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Multistage semi-automatic text image segmentation for training set acquisition in handwriting recognition

Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In the paper, a complete method of text image segmentation into the images of individual characters is proposed. The ultimate aim of the segmentation process is to prepare a set of correctly labeled character samples that can be used to train the character classifier applied as the component of the handwritten word recognizer. The method proposed consists of two stages. At the first stage, the text image is first divided into lines and then the lines are segmented into words. In this phase, the known spelling representation of the text on the image is used, so as to obtain as many segments as the number of words in the text. The information about the expected width of known words is also utilized. At the second stage, the obtained images of known words are segmented into individual characters. The multiphase procedure is applied. It first segments individual words independently, using the estimates of character widths obtained by the complete text corpus analysis. Then the global text segmentation is elaborated, which maximizes the similarity measures of samples extracted for all alphabet characters. Genetic algorithm is applied in this phase. Finally, the segmentation variants represented by chromosomes in the terminal population of the genetic algorithm are locally refined and the most dissimilar samples in sets corresponding to the alphabet characters are rejected. The experiments conducted showed that the accuracy of handwriting recognition achieved by recognizers trained with the training set obtained with the proposed method is close to the accuracy achievable with the training set prepared by a human expert.
Czasopismo
Rocznik
Strony
107--126
Opis fizyczny
Bibliogr. 19 poz., wykr.
Twórcy
autor
  • Wrocław University of Technology, Insitute of Applied Informatics, Wyb. Wyspiańskiego 27, 50-370 Wrocław, Poland, jerzy.sas@pwr.wroc.pl
Bibliografia
  • [1] Shoomaker L.S.B., Teulings H., Unsupervised learning of prototype allographs in cursive script using invariant handwritiing features, [in:] J.C. Simon, S. Impedovo, (eds.), From Pixels to Features III, Amsterdam, North-Holland, 1992, pp. 61-73.
  • [2] Maćkowiak J.L.A., Schomaker L.R.B., Vuurpijl L.G., Semi-automatic determination of allograph duration and position in on-line handwriting words based on the expected number of strokes, [in:] A.C. Downton, S. Impedovo, (eds.), Progress in Handwriting Recognition, London: World Scientific, 1997, pp. 69-74.
  • [3] Gilloux M., Sabourin R., Suen C.Y., El-Yacoubi A., An HMM-based approach for off-line unconstrained handwritten word modeling and recognition, IEEE Trans, on PAMI, Vol. 21, No. 8, 1999, pp. 752-760.
  • [4] Kavallieratou E., Balcan D.C., Popa M.F., Fakotakis N., Handwritten text localization in skewed documents, Proc. Int. Conf. Image Processing, Thesaloniki Greece, 2001, pp. 1102-1105.
  • [5] Ching Y., Kim J.H., Kim K.K, Suen C.Y., An HMM-MLP hybrid model for cursive script recognition. Pattern Analysis & Applications, No. 3, 2000, pp. 314-324.
  • [6] Kołcz A., Alspector J., Augusteijn M., Carlson R., Viorel Popescu G., A line-oriented approach to word spotting in handwritten documents, Pattern Analysis & Applications, No. 3, 2000, pp. 153-168.
  • [7] Arica N., Yarman-Vural F.T., An overview of character recognition focused on off-line handwriting, IEEE Trans, on SMC, Vol. 31, No. 2,2001, pp. 216-233.
  • [8] Marti U.V., Bunke H., Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 15, 2001, pp. 65-90.
  • [9] Arica N., Yarman-Vural F.T., Optical character recognition for cursive handwriting, IEEE Trans, on PAMI, Vol. 24, No. 6, 2002, pp. 801-813.
  • [10] Park Y., An adaptive approach to offline handwritten word recognition, IEEE Trans, on PAMI, Vol. 24, No. 7, 2002, pp. 920-931.
  • [11] Connel S.D. Jain A.K., Writer adaptation for online handwriting recognition, IEEE Trans, on PAMI, Vol. 24, No. 3, 2002, pp. 329-346.
  • [12] Vinciarelli A., Bengio S., Writer adaptation techniques in HMM-based offline cursive script recognition, Patterm Recognition Letters, No. 23, 2002, pp. 905-916.
  • [13] Koerich A.L., Sabourin R., Suen Y.C., Lexicon-driven HMM decoding for large vocabulary handwriting recognition with multiple character models, International Journal on Document Analysis and Recognition, No. 6, 2003, pp. 126-144.
  • [14] Liu C., Nakashima K., Sako H., Fujisawa H., Handwritten digit recognition: benchmarking of state-of-the-art techniques, Pattern Recognition, Vol. 36, 2003, pp. 2271-2285.
  • [15] Vinciarelli A., Bengio S., Bunke H., Offline recognition of unconstrained handwritten texts using HMMs and statistical language models, IEEE Trans, on PAMI, Vol. 26, No. 6, 2004, pp. 709-720.
  • [16] SAS J., Handwriting recognition accuracy improvement by author identification, [in:] L. Rutkowski, R. Tadeusiewicz, L.A. Zadech, (eds.), Artificial Intelligence and soft Computing - ICAISC 2007, Springer (LNAI), 2006, pp. 682-691.
  • [17] Sas J., Luzyna M., Combining character classifiers using member classifier assessment, [in:] H. Kwaśnicka, M. Paprzycki, (eds.), Proc, 5th Int. Conf. Intelligent Systems Design and Application, IEEE Comp. Society, 2005, pp. 400-405.
  • [18] Sas J., Combined approach to semi-automatic handwritten word segmentation based on character width approximation, [in:] A. Grzech, (ed.), Proc. 16th Int. Conf. Systems Science, Wrocław, Poland, Vol. 2, 2007, pp. 438-447.
  • [19] Sadri J., Suen Ch.J., Bui T.D., A genetic framework using contextual knowledge for segmentation and recognition of handwritten numeral strings, Pattern Recognition, Vol. 40, No. 3, 2007, pp. 898-919.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BAT5-0033-0055
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.