PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Acoustic model training, using Kaldi, for automatic whispery speech recognition

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
Federated Conference on Computer Science and Information Systems (09-12.09.2018 ; Poznań, Poland)
Języki publikacji
EN
Abstrakty
EN
The article presents research on the automatic whispery speech recognition. The main task was to find dependences between a number of triphone classes (number of leaves in decision tree) and the total number of Gaussian distributions and therefore, to determine optimal values, for which the quality of speech recognition is best. Moreover, it was found, how these dependences differ between normal and whispery speech, what was not done earlier, and this is the innovative part of this work. Based on the performed experiments and obtained results one can say that the number of triphone classes (number of leaves) for whispered speech should be significantly lower than for normal speech.
Rocznik
Tom
Strony
109--114
Opis fizyczny
Bibliogr. 30 poz., tab., wz., wykr.
Twórcy
autor
  • Poznan University of Technology, Piotrowo street 3a, 60-965 Poznan, Poland
autor
  • Faculty of Electrical Engineering, Institute of Control, Robotics and Information Engineering, Division of Control and Robotics
autor
  • Faculty of Computing, Institute of Automation and Robotics, Division of Signal Processing and Electronic Systems
  • Faculty of Computing, Institute of Automation and Robotics, Division of Signal Processing and Electronic Systems
  • Faculty of Electrical Engineering, Institute of Control, Robotics and Information Engineering, Division of Control and Robotics
autor
  • Faculty of Electrical Engineering, Institute of Control, Robotics and Information Engineering, Division of Control and Robotics
Bibliografia
  • [1] H. R. Sharifzadeh, I. V. McLoughlin, and F. Ahmadi, “Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec,” Biomedical Engineering, IEEE Transactions on, vol. 57, no. 10, pp. 2448–2458, 2010.
  • [2] H. F. Nijdam, A. A. Annyas, H. K. Schutte, and H. Leever, “A new prosthesis for voice rehabilitation after laryngectomy,” Archives of Otorhinolaryngology, vol. 237, no. 1, pp. 27–33, 1982.
  • [3] X. Huang, A. Acero, F. Alleva, M. Y. Hwang, L. Jiang, and M. Mahajan, “Microsoft Windows highly intelligent speech recognizer: Whisper,” in Acoustics, Speech, and Signal Processing, 1995 International Conference on (ICASSP-95), vol. 1, pp. 93–96.
  • [4] T. J. Raitio, M. J. Hunt, H. B. Richards, and M. Chinthakunta, “Digital assistant providing whispered speech,” U.S. Patent 15/266,932, December 14, 2017.
  • [5] D. T. Williamson, M. H. Draper, G. L. Calhoun, and T. P. Barry, “Commercial speech recognition technology in the military domain: Results of two recent research efforts,” International Journal of Speech Technology, vol. 8, no. 1, pp. 9–16, 2005.
  • [6] S. Pigeon, C. Swail, E. Geoffrois, G. Bruckner, D. Van Leeuwen, C. Teixeira, et al., Use of speech and language technology in military environments, Montreal, Canada, North Atlantic Treaty Organization, 2005.
  • [7] S. C. S. Jou, T. Schultz, and A. Waibel, “Whispery speech recognition using adapted articulatory features,” in ICASSP, March 2005, pp. 1009–1012.
  • [8] Q. Jin, S. C. S. Jou, and T. Schultz, “Whispering speaker identification,” in Multimedia and Expo, 2007 IEEE International Conference on, pp. 1027–1030.
  • [9] M. Akamine, and J. Ajmera, “Decision tree-based acoustic models for speech recognition,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2012, art. no. 10, p. 8, 2012.
  • [10] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, et al., “The Kaldi speech recognition toolkit,” in IEEE 2011 workshop on automatic speech recognition and understanding, No. EPFL-CONF-192584, 2011.
  • [11] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri, “OpenFst: A general and efficient weighted finite-state transducer library,” in Implementation and Application of Automata, J. Holub and J. Ždárek, Ed. Berlin: Springer Heidelberg, 2007, pp. 11–23.
  • [12] O. Platek, “Speech recognition using KALDI,” M.S. thesis, Inst. Form. Appl. Ling., Charles Univ., Prague, Czech Republic, 2014.
  • [13] A. Stolcke, “SRILM-an extensible language modeling toolkit,” in Proc. Intl. Conf. Spoken Language Processing (INTERSPEECH), Denver, Colorado, September 2002, pp. 901–904.
  • [14] M. Bisani, and H. Ney, “Joint-sequence models for grapheme-tophoneme conversion,” Speech Communication, vol. 50, no. 5, pp. 434–451, 2008.
  • [15] I. H. Witten, and T. C. Bell, “The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression,” IEEE Transactions on Information Theory, vol. 37, no. 4, pp. 1085–1094, 1991.
  • [16] G. Demenko, M. Wypych, and E. Baranowska, “Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis,” Speech and Language Technology, vol. 7, pp. 79–97, 2003.
  • [17] M. Wypych, E. Baranowska, and G. Demenko, “A grapheme-tophoneme transcription algorithm based on the SAMPA alphabet extension for the Polish language,” in Phonetic Sciences, 15th International Congress of (ICPhS), Barcelona, August 2003, pp. 2601–2604.
  • [18] P. Kłosowski, “Improving speech processing based on phonetics and phonology of Polish language,” Przeglad Elektrotechniczny, vol. 89, no. 8, pp. 303–307, 2013.
  • [19] A. Karpov, K. Markov, I. Kipyatkova, D. Vazhenina, and A. Ronzhin, “Large vocabulary Russian speech recognition using syntacticostatistical language modeling,” Speech Communication, vol. 56, pp. 213–228, 2014.
  • [20] P. Kozierski, T. Sadalla, S. Drgas, A. Dąbrowski, “Allophones in automatic whispery speech recognition,” in Methods and Models in Automation and Robotics (MMAR), 21st International Conference on, 2016, pp. 811-815. DOI: 10.1109/MMAR.2016.7575241
  • [21] F. Portet, M. Vacher, C. Golanski, C. Roux, and B. Meillon, “Design and evaluation of a smart home voice interface for the elderly: Acceptability and objection aspects,” Personal and Ubiquitous Computing, vol. 17, no. 1, pp. 127–144, 2013.
  • [22] K. Szostek, “Optimization of HMM models and their usage in speech recognition (in Polish),” Elektrotechnika i Elektronika, vol. 24, no. 2, pp. 172–182, 2005.
  • [23] B. Lewandowska-Tomaszczyk, M. Bańko, R. L. Górski, P. Pęzik, and A. Przepiórkowski, National corpus of Polish language (in Polish), Warszawa: Wydawnictwo Naukowe PWN, 2012.
  • [24] F. Cummins, M. Grimaldi, T. Leonard, and J. Simko, “The Chains corpus: Characterizing individual speakers,” in Proc. of SPECOM, vol. 6, 2006, pp. 431–435.
  • [25] T. Tran, S. Mariooryad, and C. Busso, “Audiovisual corpus to analyze whisper speech,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, May 2013, pp. 8101–8105.
  • [26] T. Ito, K. Takeda, and F. Itakura, “Analysis and recognition of whispered speech,” Speech Communication, vol. 45, no. 2, pp. 139–152, 2005.
  • [27] C. Huang, E. Chang, J. Zhou, K. and F. Lee, “Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition,” in INTERSPEECH, October 2000, pp. 818–821.
  • [28] P. Kozierski, T. Sadalla, S. Drgas, A. Dąbrowski, and J. Zietkiewicz, “The impact of vocabulary size and language model order on the Polish whispery speech recognition,” in Methods and Models in Automation and Robotics (MMAR), 22nd International Conference on, 2017, pp. 616–621. DOI: 10.1109/MMAR.2017.8046899
  • [29] L. Besacier, E. Barnard, A. Karpov, and T. Schultz, “Automatic speech recognition for under-resourced languages: A survey,” Speech Communication, vol 56, pp. 85–100, 2014.
  • [30] P. Kozierski, T. Sadalla, S. Drgas, A. Dąbrowski, and D. Horla, “Kaldi toolkit in Polish whispery speech recognition,” Przeglad Elektrotechniczny, vol. 92, no. 11, pp. 301–304, 2016. DOI: 10.15199/48.2016.11.70
Uwagi
1. Track 4: Software Systems Development & Applications
2. Technical Session: 5th Doctoral Symposium on Recent Advances in Information Technology
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-2daec44b-0005-4b72-938e-953eb6ab3af2
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.