Handwritten text recognition using incomplete probabilistic lexicon and character language model
In the paper, a novel concept of two-level handwritten word recognizer is presented, which uses language models on word and character levels. Word level unigram language model (called here probabilistic lexicon) contains words most frequently appearing in the domain of texts being recognized with their prior probabilities. The probabilistic lexicon does not have to be complete but the recognizer is expected to recognize also words that do not belong to the lexicon. The proposed recognizer is a combination of two simpler word soft classifiers. The first classifier uses incomplete probabilistic lexicon. In result, it recognizes only words from the lexicon. The second one applies language character model to support recognition, so it is not constrained by the lexicon and in this way lexicon incompleteness is compensated. Character level language model contains conditional probabilities of character succession and precedence. It is used to create two Hidden Markov Models. The first of them analyses the word from left to right; the second one performs word analysis in reversed order. Viterbi procedure finding the set of most probable character sequences in HMM is used as the soft word recognition algorithm. Results of soft recognition provided by all component classifiers are combined, yielding final word recognition. The method was experimentally examined in an application to handwritten medical texts recognition. Experiment results are presented and discussed.
Bibliogr. 13 poz., rys.
-  EL-YACOUBI A., GILLOUX M., KOERICH A.L., SABOURIN R., SUEN C.Y., An HMM-based Approach for Off-line Unconstrained Handwritten Word Modeling and Recognition, IEEE Trans on PAMI, Vol. 21, 1999, 752-760.
-  XIAO X., LEEDHAM G., Knowledge-based English Cursive Script Segmentation, Pattern Recognition Letters, No. 21, 2000, 945-954.
-  KUNCHEVA L, Combining Classifiers: Soft Computing Solutions, [in:] Pattern Recognition: from Classical to Modern Approaches, S. Pal, A. Pal [eds.], World Scientific, 2001, 421-451.
-  MARTI U.V., BUNKE H., Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwritting Recognition System, Int. J. of Pattern Recognition and Artificial Intelligence, Vol. 15, 2001, 65-90.
-  PARK Y., GOVINDARAYU V., Use of adaptive segmentation in Handwritten Phrase Recognition, Pattern Recognition, Vol. 35, 2002, 245-252.
-  LU Y., TAN C., Combination of Multiple Classifiers Using Probabilistic Dictionary and its Application to Postcode Generation, Pattern Recognition, Vol. 35, 2002, 2823-2832.
-  LIU C., NAKASHIMA K., SAKO H., FUJISAWA H., Handwritten Digit Recognition: Benchmarking of Slate-of-the-Art Techniques. Pattern Recognition, Vol. 36, 2003, 2271-2285.
-  KOERICH A.L., SABOURIN R., SUEN C.Y., Fast Two-Level HMM Decoding Algorithm for Large Vocabulary Handwriting Recognition, Proc. of 9th Workshop on Frontiers in Handwriting Recognition, Hitachi Central Research Laboratory, Tokyo 2004.
-  VINCIARELLI A., BENGIO S., BUNKE H., Offline Recognition of Unconstrained Handwritten Text Using HMMs and Statistical Language Models, IEEE Trans, on PAMI, Vol. 26, 2004, 709-720.
-  SAS J., LUZYNA M., Combining Character Classifier Using Member Classifiers Assessment, Proc. of 5th Int.Conf. on Intelligent Systems Design and Applications, ISDA 2005, IEEE Press, 2005, 400-405.
-  KOERICH A.L., SABOURIN R., SUEN C.Y., Recognition and verification of Handwritten Words, IEEE Trans on PAMI, Vol. 27, 2005, 1509-1522.
-  KURZYŃSKI M., SAS J., Application of Three-level Handprinted Document Recognition in Medical Information systems, Proc. of 6th Symposium ISBMDA 2006, LNBI, Springer-Verlag, 2005, 1-12.
-  PIASECKI M., SAS J., Application of Syntactic Properties to Three-level Recognition of Polish Handwritten Medical Texts, Proc. of ACM Symposium on Document Engineering, 10-13 October 2006, ACM Press, 2006.