PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Preliminary Evaluation of Convolutional Neural Network Acoustic Model for Iban Language Using NVIDIA NeMo

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
For the past few years, artificial neural networks (ANNs) have been one of the most common solutions relied upon while developing automated speech recognition (ASR) acoustic models. There are several variants of ANNs, such as deep neural networks (DNNs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs). A CNN model is widely used as a method for improving image processing performance. In recent years, CNNs have also been utilized in ASR techniques, and this paper investigates the preliminary result of an end-to-end CNN-based ASR using NVIDIA NeMo on the Iban corpus, an under-resourced language. Studies have shown that CNNs have also managed to produce excellent word error (WER) rates for the acoustic model on ASR for speech data. Conversely, results and studies concerned with under-resourced languages remain unsatisfactory. Hence, by using NVIDIA NeMo, a new ASR engine developed by NVIDIA, the viability and the potential of this alternative approach are evaluated in this paper. Two experiments were conducted: the number of resources used in the works of our ASR’s training was manipulated, as was the internal parameter of the engine used, namely the epochs. The results of those experiments are then analyzed and compared with the results shown in existing papers.
Rocznik
Tom
Strony
43--53
Opis fizyczny
Bibliogr. 22 poz., rys., tab.
Twórcy
  • Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
  • Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
autor
  • Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
Bibliografia
  • [1] V. Passricha and R. Aggarwal, „Convolutional neural networks for raw speech recognition", IntechOpen., vol. 32, pp. 137-144, 2013 (DOI:10.5772/intechopen.80026).
  • [2] E. Chuangsuwanich, „Multilingual techniques for low resource automatic speech recognition", Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016 [Online]. Available: http://hdl.handle.net/1721.1/105571
  • [3] B. Pulugundla et al., „BUT system for low resource Indian language ASR", in Proc. 19th Ann. Conf. of the Int. Speech Commun. Assoc. Interspeech 2018, Hyderabad, India, 2018, pp. 3182-3186 (ISSN: 1990-9772).
  • [4] O. Mamyrbayev et al., „Voice identification using classification algorithms", in Intelligent System and Computing, Yang Yi, Ed. IntechOpen, 2020 (DOI: 10.5772/intechopen.88239).
  • [5] J. Li et al., „Jasper: An end-to-end convolutional neural acoustic model", in Proc. of the Ann. Conf. of the Int. Speech Commun. Assoc., Interspeech 2019, Graz, Austria, 2019, pp. 71-75, 2019 (DOI:10.21437/Interspeech.2019-1819).
  • [6] W. Han et al., „ContextNet: Improving convolutional neural networks for automatic speech recognition with global context", in Proc. of the Ann. Conf. of the Int. Speech Commun. Assoc., Interspeech 2020, vol. 2020-Octob, pp. 3610-3614, 2020 (DOI: 10.21437/interspeech.2020-2059) [Online]. Available: https://arxiv.org/pdf/2005.03191.pdf
  • [7] A. Biswas, F. D. Wet, E. V. D. Weisthuizen, E. Yilmaz, and T. Niesler, „Multilingual neural network acoustic modelling for ASR of under-resourced English-Isizulu code-switched speech", in Proc. of the Ann. Conf. of the Int. Speech Commun. Assoc., Interspeech 2018, Hyderabad, India, 2018, pp. 2603-2607 (DOI: 10.21437/Interspeech.2018-1711).
  • [8] D. He, B. P. Lim, X. Yang, M. Hagesawa-Johnson, and D. Chen, „Improved ASR for under-resourced languages through multi-task learning with acoustic landmarks", Proc. of the Ann. Conf. of the Int. Speech Commun. Assoc., Interspeech 2018, Hyderabad, India, 2018, pp. 2618-2622 (DOI:10.21437/Interspeech.2018-1124).
  • [9] D. Palaz, R. Collobert, and M. Magimai-Doss, „Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks", in Proc. of the Ann. Conf. of the Int. Speech Commun. Assoc., Interspeech 2013, Lyon, France, 2013, pp. 1766-1770 [Online]. Available: https://arxiv.org/pdf/1304.1018
  • [10] D. Palaz, M. Magimai-Doss, and R. Collobert, „Convolutional neural networks-based continuous speech recognition using raw speech signal", in Proc. IEEE Int. Con. on Acoust., Speech and Sig. Process. ICASSP 2015, South Brisbane, QLD, Australia, 2015, pp. 4295-4299 (DOI: 10.1109/ICASSP.2015.7178781).
  • [11] F. Reyes, A. Fajardo, and A. Hernandez, „Convolutional neural network for automatic speech recognition of Filipino language", Int. J. of Adv. Trends in Comp. Sci. and Engin., vol. 9, no. 1.1, pp. 34-40, 2020 (DOI:10.30534/ijatcse/2020/0791.12020).
  • [12] B. Thai, R. Jimerson, R. Ptucha, and E. Prud'hommeaux, „Fully convolutional ASR for less-resourced endangered languages", in Proc. of the 1st Joint Worksh. on Spok. Language Technol. for Under-res. Lang. (SLTU) and Collab. and Comput. for Under-Resourced Lang. (CCURL), Marseille, France, 2020, pp. 126-130 [Online]. Available: https://aclanthology.org/2020.sltu-1.17.pdf
  • [13] A. N. Mon, „Myanmar language continuous speech recognition using convolutional neural network (CNN)", Ph.D. thesis, University of Computer Studies, Yangon, 2019, pp. 87-88 [Online]. Available: https://meral.edu.mm/record/4316/files/AyeNyeinMonThesisBook.pdf
  • [14] K. R. Lekshmi and E. Sherly, „An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language", Int. J. of Speech Technol., vol. 24, pp. 483-495, 2021 (DOI: 10.1007/s10772-021-09807-1).
  • [15] R. Collobert, C. Puhrsch, and G. Synnaeve, „Wav2Letter: an End-to-End ConvNet-based Speech Recognition System", arXiv:1609.03193v2, 2016.
  • [16] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Upper Saddle River, NJ: Prentice-Hall, 1993 (ISSN: 9780130151575).
  • [17] S. S. Juan, „Exploiting resources from closely-related languages for automatic speech recognition in low-resource languages from Malaysia", PhD. thesis, Universitfie Grenoble Alpes, France, 2015, pp. 115-118 [Online]. Available: https://tel.archives-ouvertes.fr/tel-1314120/document
  • [18] S. Saha, „A Comprehensive Guide to Convolutional Neural Networks - the ELI5 way", Towards Data Science, 2018 [Online]. Available: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
  • [19] Ujjwal Karn, „An intuitive explanation of convolutional neural networks", the data science blog, 2016 [Online]. Available: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets
  • [20] J. Brownlee, „A gentle introduction to the rectified linear unit (ReLU)", Machine Learning Mastery, 2010 [Online]. Available: https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/, 2020.
  • [21] „NVIDIA Deep Learning NeMo Documentation", Nvidia website, 2021 [Online]. Available: https://docs.nvidia.com/deeplearning/nemo/index.html
  • [22] S. S. Juan, L. Besacier, B. Lecouteux, and M. Dyab, „Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for Iban", in Proc. of the Ann. Conf. of the Int. Speech Commun. Assoc., Interspeech 2015, Dresden, Germany, 2015 (DOI: 10.21437/Interspeech.2015-318).
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-895415ac-ac9d-412d-a072-a91d8d9a3dc0
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.