Effect of Time-domain Windowing on Isolated Speech Recognition System Performance

Ananthakrishna, Thalengala; Anitha, H.; Girisha, T.

doi:10.24425/ijet.2022.139856

Artykuł - szczegóły

Tytuł artykułu

Effect of Time-domain Windowing on Isolated Speech Recognition System Performance

Autorzy

Ananthakrishna Thalengala , Anitha H. , Girisha T.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.24425/ijet.2022.139856

Warianty tytułu

Języki publikacji

Abstrakty

Speech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recognition accuracy. Speech is highly non-stationary in nature and therefore analysis is carried out by considering short time-domain window or frame. In the speech recognition task, cepstral (Mel frequency cepstral coefficients (MFCC)) features are commonly used and are extracted for short time-frame. The effectiveness of features depend upon duration of the time-window chosen. The present study is aimed at investigation of optimal time-window duration for extraction of cepstral features in the context of speech recognition task. A speaker independent speech recognition system for the Kannada language has been considered for the analysis. In the current work, speech utterances of Kannada news corpus recorded from different speakers have been used to create speech database. The hidden Markov tool kit (HTK) has been used to implement the speech recognition system. The MFCC along with their first and second derivative coefficients are considered as feature vectors. Pronunciation dictionary required for the study has been built manually for mono-phone system. Experiments have been carried out and results have been analyzed for different time-window lengths. The overlapping Hamming window has been considered in this study. The best average word recognition accuracy of 61.58% has been obtained for a window length of 110 msec duration. This recognition accuracy is comparable with the similar work found in literature. The experiments have shown that best word recognition performance can be achieved by tuning the window length to its optimum value.

Słowa kluczowe

hidden Markov model HMM isolated speech recognition system ISR Kannada language mono-phone model Mel frequency cepstral coefficients MFCC

Wydawca

Polish Academy of Sciences, Committee of Electronics and Telecommunication

Czasopismo

International Journal of Electronics and Telecommunications

Rocznik

2022

Tom

Vol. 68, No. 1

Strony

161--166

Opis fizyczny

Bibliogr. 24 poz., schem., tab., wykr.

Twórcy

autor

Ananthakrishna Thalengala

anantha.kt@manipal.edu

Department of Electronics and Communication Engineering, Manipal Institute of Technology (MIT), Manipal Academy of Higher Education (MAHE), Manipal, Karnataka State, India

autor

Anitha H.

anitha.h@manipal.edu

Department of Electronics and Communication Engineering, Manipal Institute of Technology (MIT), Manipal Academy of Higher Education (MAHE), Manipal, Karnataka State, India

autor

Girisha T.

grsh.246@gmail.com

Department of Electronics and Communication Engineering, Manipal Institute of Technology (MIT), Manipal Academy of Higher Education (MAHE), Manipal, Karnataka State, India

Bibliografia

[1] Bharali, S. S., & Kalita, S. K., ”A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language”. International Journal of Speech Technology, 18(4), 673-684 (2015). https://doi.org/10.1007/s10772-015-9311-7
[2] Kumar, K., Aggarwal, R. K., & Jain, A., ”A Hindi speech recognition system for connected words using HTK”, International Journal of Computational Systems Engineering, 1(1), 25-32 (2012). https://doi.org/10.1504/IJCSYSE.2012.044740
[3] Thangarajan, R., Natarajan, A. M., & Selvam, M., ”Syllable modeling in continuous speech recognition for Tamil language”, International Journal of Speech Technology, 12(1), 47-57 (2009). https://doi.org/10.1007/s10772-009-9058-0
[4] Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S., ”Punjabi automatic speech recognition using HTK”, IJCSI International Journal of Computer Science Issues, 9(4), 1694-0814 (2012).
[5] Hegde, S., Achary, K., & Shetty, S., ”Statistical analysis of features and classification of alphasyllabary sounds in Kannada language”, International Journal of Speech Technology, 18(1), 65–75 (2015). https://doi.org/10.1007/s10772-014-9250-8
[6] Panda, S. P., & Nayak, A. K., ”Automatic speech segmentation in syllable centric speech recognition system”, International Journal of Speech Technology, 19(1), 9-18 (2016). https://doi.org/10.1007/s10772-015-9320-6
[7] Thangarajan, R., Natarajan, A., & Selvam, M., ”Syllable modeling in continuous speech recognition for Tamil language”, International Journal of Speech Technology, 12(1), 47–57 (2009). https://doi.org/10.1007/s10772-009-9058-0
[8] Manjunath, K. E., Jayagopi, D. B., Rao, K. S., & Ramasubramanian, V. (2019), ”Development and analysis of multilingual phone recognition systems using Indian languages”, International Journal of Speech Technology, 22(1), 157-168. https://doi.org/10.1007/s10772-018-09589-z
[9] Kumar, C. S., & Mohandas, V. P. (2011), ”Robust features for multilingual acoustic modeling”, International Journal of Speech Technology, 14(3), 147-155. https://doi.org/10.1007/s10772-011-9092-6
[10] Ananthakrishna, T., Maithri, M., & Shama, K., ”Kannada word recognition system using HTK”, In 2015 Annual India Conference, INDICON, New Delhi, India , pp. 1-5, (2015, December). https://doi.org/10.1109/INDICON.2015.7443122
[11] Thalengala, A., & Shama, K., ”Study of sub-word acoustical models for Kannada isolated word recognition system”, International Journal of Speech Technology, 19(4), 817-826, (2016).
[12] Thalengala Ananthakrishna, Kumara Shama, and Maithri Mangalore, ”Performance Analysis of Isolated Speech Recognition System Using Kannada Speech Database”, Pertanika Journal of Science & Technology 26.4 (2018). https://doi.org/10.1007/s10772-016-9374-0
[13] Rabiner, L. R., Juang B. H., & Yegnanarayana B., ”Fundamentals of speech recognition”, Englewood Cliffs: PTR Prentice Hall (2012).
[14] Rabiner, L. R., ”A tutorial on hidden Markov models and selected applications in speech recognition”, Proceedings of the IEEE, 77(2), 257-286 (1989). https://doi.org/10.1109/5.18626
[15] Deller J. R., Proakis J. G. & Hansen J. H. L., ”Discrete Time Processing of Speech Signals”, New York: Macmillan Publishing Company, (1993).
[16] Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013, November), ”Development of Kannada speech corpus for prosodically guided phonetic search engine”, In 2013 international conference oriental COCOSDA held jointly with 2013 conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE) (pp. 1-6). IEEE. https://doi.org/10.1109/ICSDA.2013.6709875
[17] Krishnamurti, B., ”The Dravidian Languages”, Cambridge: Cambridge University Press, (2003).
[18] Steever, S. B., ”The Dravidian languages”. London: Routledge Publications, (2015).
[19] Akhmetov, B., Tereykovsky, I., Doszhanova, A., & Tereykovskaya, L. (2018), ”Determination of input parameters of the neural network model, intended for phoneme recognition of a voice signal in the systems of distance learning”, International Journal of Electronics and Telecommunications, 64(4), 425-432. https://doi.org/10.24425/123541
[20] Kumar, R. S., & Lajish, V. L. (2013), ”Phoneme recognition using zerocrossing interval distribution of speech patterns and ANN”, International Journal of Speech Technology, 16(1), 125-131. https://doi.org/10.1007/s10772-012-9169-x
[21] Young S., Evermann G., Gales M., Hain T., Kershaw D., Liu, ”The HTK book (Vol. 2)” Cambridge: Entropic Cambridge Research Laboratory.
[22] Davis, S., & Mermelstein, P., ”Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357-366 (1980). https://doi.org/10.1109/TASSP.1980.1163420
[23] Nilsson, M., ”First Order Hidden Markov Model: Theory and implementation issues”, Technical Report, 2005:02. Blekinge Institute of Technology.
[24] OShaughnessy, D., ”Automatic speech recognition: History, methods and challenges”, Pattern Recognition, 41(10), 2965–2979 (2008)

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-9b399b5a-4022-446c-b061-586250aa1070