Acoustic model training, using Kaldi, for automatic whispery speech recognition

Kozierski, P.; Sadalla, T.; Drgas, Sz.; Dąbrowski, A.; Ziętkiewicz, J.; Giernacki, W.

doi:10.15439/2018F255

Artykuł - szczegóły

Tytuł artykułu

Acoustic model training, using Kaldi, for automatic whispery speech recognition

Autorzy

Kozierski P. , Sadalla T. , Drgas Sz. , Dąbrowski A. , Ziętkiewicz J. , Giernacki W.

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2018F255

Warianty tytułu

Konferencja

Federated Conference on Computer Science and Information Systems (09-12.09.2018 ; Poznań, Poland)

Języki publikacji

Abstrakty

The article presents research on the automatic whispery speech recognition. The main task was to find dependences between a number of triphone classes (number of leaves in decision tree) and the total number of Gaussian distributions and therefore, to determine optimal values, for which the quality of speech recognition is best. Moreover, it was found, how these dependences differ between normal and whispery speech, what was not done earlier, and this is the innovative part of this work. Based on the performed experiments and obtained results one can say that the number of triphone classes (number of leaves) for whispered speech should be significantly lower than for normal speech.

Słowa kluczowe

whispered speech recognition normal speech whispered speech speech recognition software Kaldi

rozpoznawanie mowy szeptanej szept mówienie oprogramowanie do rozpoznawania mowy Kaldi

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2018

Tom

Vol. 16

Strony

109--114

Opis fizyczny

Bibliogr. 30 poz., tab., wz., wykr.

Twórcy

autor

Kozierski P.

piotr.kozierski@gmail.com

Poznan University of Technology, Piotrowo street 3a, 60-965 Poznan, Poland

autor

Sadalla T.

Faculty of Electrical Engineering, Institute of Control, Robotics and Information Engineering, Division of Control and Robotics

autor

Drgas Sz.

szymon.drgas@put.poznan.pl

Faculty of Computing, Institute of Automation and Robotics, Division of Signal Processing and Electronic Systems

autor

Dąbrowski A.

Faculty of Computing, Institute of Automation and Robotics, Division of Signal Processing and Electronic Systems

autor

Ziętkiewicz J.

Faculty of Electrical Engineering, Institute of Control, Robotics and Information Engineering, Division of Control and Robotics

autor

Giernacki W.

Faculty of Electrical Engineering, Institute of Control, Robotics and Information Engineering, Division of Control and Robotics

Bibliografia

[1] H. R. Sharifzadeh, I. V. McLoughlin, and F. Ahmadi, “Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec,” Biomedical Engineering, IEEE Transactions on, vol. 57, no. 10, pp. 2448–2458, 2010.
[2] H. F. Nijdam, A. A. Annyas, H. K. Schutte, and H. Leever, “A new prosthesis for voice rehabilitation after laryngectomy,” Archives of Otorhinolaryngology, vol. 237, no. 1, pp. 27–33, 1982.
[3] X. Huang, A. Acero, F. Alleva, M. Y. Hwang, L. Jiang, and M. Mahajan, “Microsoft Windows highly intelligent speech recognizer: Whisper,” in Acoustics, Speech, and Signal Processing, 1995 International Conference on (ICASSP-95), vol. 1, pp. 93–96.
[4] T. J. Raitio, M. J. Hunt, H. B. Richards, and M. Chinthakunta, “Digital assistant providing whispered speech,” U.S. Patent 15/266,932, December 14, 2017.
[5] D. T. Williamson, M. H. Draper, G. L. Calhoun, and T. P. Barry, “Commercial speech recognition technology in the military domain: Results of two recent research efforts,” International Journal of Speech Technology, vol. 8, no. 1, pp. 9–16, 2005.
[6] S. Pigeon, C. Swail, E. Geoffrois, G. Bruckner, D. Van Leeuwen, C. Teixeira, et al., Use of speech and language technology in military environments, Montreal, Canada, North Atlantic Treaty Organization, 2005.
[7] S. C. S. Jou, T. Schultz, and A. Waibel, “Whispery speech recognition using adapted articulatory features,” in ICASSP, March 2005, pp. 1009–1012.
[8] Q. Jin, S. C. S. Jou, and T. Schultz, “Whispering speaker identification,” in Multimedia and Expo, 2007 IEEE International Conference on, pp. 1027–1030.
[9] M. Akamine, and J. Ajmera, “Decision tree-based acoustic models for speech recognition,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2012, art. no. 10, p. 8, 2012.
[10] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, et al., “The Kaldi speech recognition toolkit,” in IEEE 2011 workshop on automatic speech recognition and understanding, No. EPFL-CONF-192584, 2011.
[11] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri, “OpenFst: A general and efficient weighted finite-state transducer library,” in Implementation and Application of Automata, J. Holub and J. Ždárek, Ed. Berlin: Springer Heidelberg, 2007, pp. 11–23.
[12] O. Platek, “Speech recognition using KALDI,” M.S. thesis, Inst. Form. Appl. Ling., Charles Univ., Prague, Czech Republic, 2014.
[13] A. Stolcke, “SRILM-an extensible language modeling toolkit,” in Proc. Intl. Conf. Spoken Language Processing (INTERSPEECH), Denver, Colorado, September 2002, pp. 901–904.
[14] M. Bisani, and H. Ney, “Joint-sequence models for grapheme-tophoneme conversion,” Speech Communication, vol. 50, no. 5, pp. 434–451, 2008.
[15] I. H. Witten, and T. C. Bell, “The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression,” IEEE Transactions on Information Theory, vol. 37, no. 4, pp. 1085–1094, 1991.
[16] G. Demenko, M. Wypych, and E. Baranowska, “Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis,” Speech and Language Technology, vol. 7, pp. 79–97, 2003.
[17] M. Wypych, E. Baranowska, and G. Demenko, “A grapheme-tophoneme transcription algorithm based on the SAMPA alphabet extension for the Polish language,” in Phonetic Sciences, 15th International Congress of (ICPhS), Barcelona, August 2003, pp. 2601–2604.
[18] P. Kłosowski, “Improving speech processing based on phonetics and phonology of Polish language,” Przeglad Elektrotechniczny, vol. 89, no. 8, pp. 303–307, 2013.
[19] A. Karpov, K. Markov, I. Kipyatkova, D. Vazhenina, and A. Ronzhin, “Large vocabulary Russian speech recognition using syntacticostatistical language modeling,” Speech Communication, vol. 56, pp. 213–228, 2014.
[20] P. Kozierski, T. Sadalla, S. Drgas, A. Dąbrowski, “Allophones in automatic whispery speech recognition,” in Methods and Models in Automation and Robotics (MMAR), 21st International Conference on, 2016, pp. 811-815. DOI: 10.1109/MMAR.2016.7575241
[21] F. Portet, M. Vacher, C. Golanski, C. Roux, and B. Meillon, “Design and evaluation of a smart home voice interface for the elderly: Acceptability and objection aspects,” Personal and Ubiquitous Computing, vol. 17, no. 1, pp. 127–144, 2013.
[22] K. Szostek, “Optimization of HMM models and their usage in speech recognition (in Polish),” Elektrotechnika i Elektronika, vol. 24, no. 2, pp. 172–182, 2005.
[23] B. Lewandowska-Tomaszczyk, M. Bańko, R. L. Górski, P. Pęzik, and A. Przepiórkowski, National corpus of Polish language (in Polish), Warszawa: Wydawnictwo Naukowe PWN, 2012.
[24] F. Cummins, M. Grimaldi, T. Leonard, and J. Simko, “The Chains corpus: Characterizing individual speakers,” in Proc. of SPECOM, vol. 6, 2006, pp. 431–435.
[25] T. Tran, S. Mariooryad, and C. Busso, “Audiovisual corpus to analyze whisper speech,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, May 2013, pp. 8101–8105.
[26] T. Ito, K. Takeda, and F. Itakura, “Analysis and recognition of whispered speech,” Speech Communication, vol. 45, no. 2, pp. 139–152, 2005.
[27] C. Huang, E. Chang, J. Zhou, K. and F. Lee, “Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition,” in INTERSPEECH, October 2000, pp. 818–821.
[28] P. Kozierski, T. Sadalla, S. Drgas, A. Dąbrowski, and J. Zietkiewicz, “The impact of vocabulary size and language model order on the Polish whispery speech recognition,” in Methods and Models in Automation and Robotics (MMAR), 22nd International Conference on, 2017, pp. 616–621. DOI: 10.1109/MMAR.2017.8046899
[29] L. Besacier, E. Barnard, A. Karpov, and T. Schultz, “Automatic speech recognition for under-resourced languages: A survey,” Speech Communication, vol 56, pp. 85–100, 2014.
[30] P. Kozierski, T. Sadalla, S. Drgas, A. Dąbrowski, and D. Horla, “Kaldi toolkit in Polish whispery speech recognition,” Przeglad Elektrotechniczny, vol. 92, no. 11, pp. 301–304, 2016. DOI: 10.15199/48.2016.11.70

Uwagi

1. Track 4: Software Systems Development & Applications

2. Technical Session: 5th Doctoral Symposium on Recent Advances in Information Technology

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-2daec44b-0005-4b72-938e-953eb6ab3af2