Building compact language models for medical speech recognition in mobile devices with limited amount of memory

Sas, J.

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Building compact language models for medical speech recognition in mobile devices with limited amount of memory

Autorzy

Sas J.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The article presents the method of building compact language model for speech recognition in devices with limited amount of memory. Most popularly used bigram word-based language models allow for highly accurate speech recognition but need large amount of memory to store, mainly due to the big number of word bigrams. The method proposed here ranks bigrams according to their importance in speech recognition and replaces explicit estimation of less important bigrams probabilities by probabilities derived from the class-based model. The class-based model is created by assigning words appearing in the corpus to classes corresponding to syntactic properties of words. The classes represent various combinations of part of speech inflectional features like number, case, tense, person etc. In order to maximally reduce the amount of memory necessary to store class-based model, a method that reduces the number of part-of-speech classes has been applied, that merges the classes appearing in stochastically similar contexts in the corpus. The experiments carried out with selected domains of medical speech show that the method allows for 75% reduction of model size without significant loss of speech recognition accuracy.

Słowa kluczowe

automatic speech recognition medical information systems language modeling

rozpoznawanie mowy automatyczne medyczne systemy informacyjne modelowanie języka

Wydawca

University of Silesia, Institute of Informatics, Computer Systems Department

Czasopismo

Journal of Medical Informatics & Technologies

Rocznik

2012

Tom

Vol. 20

Strony

111--119

Opis fizyczny

Bibliogr. 19 poz., rys., tab.

Twórcy

autor

Sas J.

jerzy.sas@pwr.wroc.pl

Instutute of Informatics, Wroclaw University of Technology, 50-370 Wroclaw, ul.Wyb. Wyspianskiego 27

Bibliografia

[1] BROWN P., DESOUZA P. V., MERCER R. L., PIETRA V. J. D., LAI J. C., Class-based n-gram models of natural language, Computational Linguistics, 1992, Vol. 18, No. 1, pp. 467–479.
[2] BRYCHCIN T., KONOPIK M., Morphological based language models for inflectional languages, Proceedings of 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, 2011, pp. 560–563.
[3] CHEN S., GOODMAN S., An empirical study of smoothing techniques for language modeling, Computer Speech and Language, 1999, Vol. 13, No. 1, pp. 359–394.
[4] DEVINE E., GAEHDE S., CURTIS A., Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports, Journal of American Medical Informatics Association, 2007, Vol. 7, No. 1, pp. 462–468.
[5] JELINEK F., Statistical methods for speech recognition Speech and language processing, The MIT Press, Cambridge, 1998.
[6] LEE A., KAWAHARA T. SHIKANO K., Julius - an open source real-time large vocabulary recognition engine. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 2001, pp. 1691–1694.
[7] MIKOLOV T., DEORAS A., KOMBRINK S., BURGET L. CERNOCKY J., Empirical evaluation and combination of advanced language modeling techniques, INTERSPEECH, ISCA, 2011, pp. 605–608.
[8] BROWN L.D., CAI T., DASGUPTA A., Interval Estimation for a Binomial Proportion. Statistical Science, 2001, Vol. 16, No. 2, pp. 101–133.
[9] NIESLER T., WHITTAKER E.W.D., WOODLAND P., Comparison of part-of-speech and automatically derived category-based language models for speech recognition, Proceedings of ICASSP 98, 1998, pp. 177–180.
[10] NIESLER T., D., WOODLAND P., Word-to-category backoff language model, CUED/F-INFENG/TR.258, Cambridge University Technical Report, 1996.
[11] PIASECKI M., Polish tagger TaKIPI: Rule based construction and optimization, Task Quarterly, 2007, Vol. 11, No. 1, pp. 151–167.
[12] SAS J., Optimal spoken dialog control in hands-free medical information systems, Journal of Medical Informatics and Technologies, 2008, Vol. 13, pp. 113–120.
[13] SAS J., Application of local bidirectional language model to error correction in polish medical speech recognition, Journal of Medical Informatics and Technologies, 2010, Vol. 15, No. 1, pp. 127–134.
[14] SAS J., ZOLNIEREK A., Distant co-occurrence language model for ASR in loose word order languages, Advances in Intelligent and Soft Computing, Proceedings of International Conference on Computer Recognition Systems Cores, 2011, pp. 767–778.
[15] VAICIUNAS A., KAMINSKAS V., RASKINIS G., Statistical language models of Lithuanian based on word clustering and morphological decomposition, Informatica, 2004, Vol. 15, No. 4, pp. 565–580.
[16] WARD W, ISSAR S,. A class based language model for speech recognition, Proceedings of the Acoustics, Speech, and Signal Processing, ICASSP 96, 1996, pp. 416–418.
[17] WHITTAKER, E., WOODLAND, P., Language modeling for Russian and English using words and classes, Computer Speech and Language, 2003, Vol. 17, No. 1, pp. 87–104.
[18] WOLINSKI M., Morphosyntactic tag system in IPI PAN corpus, Polonica, 2003, No. 22, pp. 39-54.
[19] YOUNG S., EVERMAN G., HTK Book (for HTK Version 3.4), Cambridge University Engineering Department, Cambridge CB2 1PZ, United Kingdom, 2009.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-PWA4-0027-0013