PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A robust ensemble model for spoken language recognition

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The identity of a language being spoken has been tackled over the years via statistical models on audio samples. A drawback of these approaches is the unavailability of phonetically transcribed data for all languages. This work proposes an approach based on image classification that utilized image representations of audio samples. Our model used Neural Networks and deep learning algorithms to analyse and classify three languages. The input to our network is a Spectrogram that was processed through the networks to extract local visual and temporal features for language prediction. From the model, we achieved 95.56 % accuracy on the test samples from the 3 languages.
Rocznik
Strony
56--68
Opis fizyczny
Bibliogr. 19 poz., fig., tab.
Twórcy
autor
  • University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan, Nigeria
  • University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan, Nigeria
Bibliografia
  • [1] Abdel-Hamid, O., Mohamed, A. R., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 22(10), 1533–1545. https://doi.org/10.1109/taslp.2014.2339736
  • [2] Adami, A., & Hermansky, H. (2003). Segmentation of speech for speaker and language recognition. EUROSPEECH-2003 (pp. 841–844). Geneva. Retrieved from https://www.academia.edu/ 32317887/Segmentation_of_speech_for_speaker_and_language_recognition
  • [3] Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., ... Narang, S. (2015). Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. CoRR, abs/1512.02595. Retrieved from https://arxiv.org/abs/1512.02595v1
  • [4] Ashby, M., & Maidment, J. (2005). Introducing phonetic science. Cambridge University Press.
  • [5] Bartz, C., Herold, T., Yang, H., & Meinel, C. (2017). Language Identification Using Deep Convolutional Recurrent Neural Networks. In D. Liu, S. Xie, Y. Li, D. Zhao, & E. El-Alfy (Eds.), Neural Information Processing ICONIP 2017. Lecture Notes in Computer Science (vol. 10639). Springer. https://doi.org/10.1007/978-3-319-70136-3_93
  • [6] Boussard, J., Deveau, A., & Pyron, J. (2017). Methods for Spoken Language Identification. Retrieved from http://cs229.stanford.edu/proj2017/final-reports/5239784.pdf
  • [7] Eberhard, D. M., Simons, G. F., & Fennig, C. D. (Eds.). (2020). Ethnologue: Languages of the World. Retrieved from http://www.ethnologue.com
  • [8] Kirchhoff, K. (2006). Language characteristics. In T. Schultz, & K. Kirchhoff (Eds.), Multilingual Speech Processing (pp. 5–33). Elsevier.
  • [9] Li, H., Ma, B., & Lee, K. A. (2013). Spoken Language Recognition: From Fundamentals to Practice. Proceedings of the IEEE, 101(5), 1136–1159. https://doi.org/10.1109/JPROC.2012.2237151
  • [10] Muthusamy, Y. K., Cole, R., & Oshika, B. (1992). The OGI multi-language telephone speech corpus. Int. Conf. Spoken Lang. Process, 895-898. Retrieved from https://pdfs.semanticscholar.org/ aad7/274fdd57191e89f9df2880a50ec14581d671.pdf
  • [11] Navratil, J. (2001). Spoken language recognition A step toward multilinguality in speech processing. IEEE Trans. Speech Audio Process, 9(6), 678–685. https://doi.org/10.1109/89.943345
  • [12] Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. Proc. Interspeech 2019 (pp. 2613–2617). https://doi.org/10.21437/interspeech.2019-2680
  • [13] Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech re-synthesis. Journal of Acoustical Society of America, 105(1), 512–521. https://doi.org/10.1121/1.424522
  • [14] Safitri, N. E., Zahra, A., & Adriani, M. (2016). Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Javanese Languages. Procedia Computer Science 81 (pp. 182–187). Elsevier. https://doi.org/10.1016/j.procs.2016.04.047
  • [15] Sugiyama, M. (1991). Automatic language recognition using acoustic features. International Conference on Acoustics, Speech, and Signal Processing (pp. 813–816). Toronto. https://doi.org/10.1109/icassp.1991.150461
  • [16] Torres-Carrasquillo, P., Singer, E., Kohler, M., Greene, R., Reynolds, D., & Deller, J. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In ICSLP-2002 (pp. 89–92). Denver. https://doi.org/10.1109/icassp.2002. 5743828
  • [17] Zhao, J., Shu, H., Zhang, L., Wang, X., Gong, Q., & Li, P. (2008). Cortical competition during language discrimination. NeuroImage, 43(3), 624–633. https://doi.org/10.1016/ j.neuroimage.2008.07.025
  • [18] Zissman, M. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–44. https://doi.org/10.1109/icassp.1993.319323
  • [19] Zissman, M. A. (1993). Automatic language identification using Gaussian mixture and hidden Markov models. IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. 399–402). IEEE. https://doi.org/10.1109/tsa.1996.481450
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-e89024f1-a3a8-414f-b191-bf0066ee60a4
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.