Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Kaur, G.; Srivastava, M.; Kumar, A.

doi:10.26636/jtit.2018.119617

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Autorzy

Kaur G. , Srivastava M. , Kumar A.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.26636/jtit.2018.119617

Warianty tytułu

Języki publikacji

Abstrakty

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

Słowa kluczowe

deep neural networks genetic algorithm LPCC MFCC PLP RASTA-PLP speaker recognition speech recognition

Wydawca

Instytut Łączności - Państwowy Instytut Badawczy

Czasopismo

Journal of Telecommunications and Information Technology

Rocznik

2018

Tom

nr 2

Strony

23--31

Opis fizyczny

Bibliogr. 36 poz., rys., tab.

Twórcy

autor

Kaur G.

regs4gurpreet@yahoo.co.in

I. K. Gujral Punjab Technical University, Kapurthala, Jalandhar, India
University Institute of Engineering & Technology, Panjab University, Chandigarh, India

autor

Srivastava M.

mohit.ece.@cgc.edu.in

Chandigarh Engineering College, Landran, Mohali, Punjab, India

autor

Kumar A.

csioamod@yahoo.com

Central Scientific Instruments Organisation, Chandigarh, India

Bibliografia

[1] D. R. Reddy, “Speech recognition by machine: A review”, Proc. of the IEEE, vol. 64, no. 4, pp. 501–531, 1976 (doi: 10.1109/PROC.1976.10158).
[2] S. Furui, “50 Years of Progress in Speech and Speaker Recognition Research”, ECTI Transact. on Comput. and Infor. Technol., vol. 1, no. 2, pp. 64–74, 2005.
[3] J. Campbell, “Speaker recognition: A tutorial”, Proc. of the IEEE, vol. 85, no. 9, pp. 1437–1462, 1997 (doi: 10.1109/5.628714).
[4] L. Mary and B. Yegnanarayana, “Extraction and representation of prosodic features for language and speaker recognition”, Speech Communic., vol. 50, no. 10, pp. 782–796, 2008 (doi: 10.1016/j.specom2008.04.010).
[5] I. Bhardwaj, “Speaker dependent and independent isolated Hindi word recognizer using hidden Markov model (HMM)”, Int. J. of Comp. Applic., vol. 52, no. 7, pp. 34–40, 2012.
[6] S. Squartini, E. Principi, R. Rotili, and F. Piazza, “Environmental robust speech and speaker recognition through multi-channel histogram equalization”, Neurocomputing, vol. 78, no. 1, pp. 111–120, 2012 (doi: 10.1016/j.neurocom.2011.05.035).
[7] N. S. Dey, R. Mohanty, and K. L. Chugh, “Speech and speaker recognition system using artiﬁcial neural networks and hidden Markov model”, in Proc. IEEE Int. Conf. on Communic. Sys. and Network Technol. CSNT, Bhopal, Madhya Pradesh, India, 2012, pp. 311–315 (doi: 10.1109/CSNT.2012.221).
[8] T. Gaafar, H. Bakr, and M. Abdalla, “An improved method for speech/speaker recognition”, in Int. Conf. on Infor., Electr. and Vision ICIEV, Dhaka, Bangladesh, 2014 (doi: 10.1109/ICIEV.2014.6850693).
[9] T. A. Smadi, ”An improved real-time speech signal in case of isolated word recognition”, Int. J. of Engineer. Research and Applic., vol. 3, no. 5, pp. 1748–1754, 2013.
[10] V. Fontaine and H. Bourlard, ”Speaker dependent speech recognition based on phone–like units models application to voice dialing”, in Proc. IEEE Conf. on Acoustics, Speech, and Signal Proces. ICASSP’97, Munich, Bavaria, Germany, 1997, pp. 2–5 (doi: 10.1109/ICASSP.1997.596241).
[11] S. J. Wright, D. Kanevsky, and L. Deng, “Optimization algorithms and applications for speech and language processing”, IEEE Transact. on Audio, Speech and Lang. Proces. vol. 21, no. 11, pp. 1527–1530, 2013 (doi: 10.1109/TASL.2013.2283777).
[12] M. Mitchell, “Genetic algorithms: An overview 1”, Complexity, vol. 1, pp. 31–39, 1995 (doi: 10.1102/cplx.6130010108).
[13] M. Sarma, “Speech recognition using deep neural network – recent trends”, Int. J. of Int. Sys. Design and Computing, vol. 1, no. 12, pp. 71–86, 2017 (doi: 10.1504/IJISDC.2017.082853).
[14] L. Deng, M. Seltzer, D. Yu, A Acero, A. Mohamed, and G. Hinton, “Binary coding of speech spectrograms using a deep auto – encoder”, in Proc. 11th Int. Conf. on Speech Commun. Assoc., Makuhari, Chiba, Japan, 2010, pp. 1692–1695, 2010.
[15] F. Guojiang, “A novel isolated speech recognition method based on neural network”, 2nd Int. Conf. on Network. and Infor. Technol., Singapore, 2011, vol. 17, pp. 264–269.
[16] I. Lopez-Moreno et al., “On the use of deep feed forward neural networks for automatic language identiﬁcation”, Computer Speech Lang., vol. 40, no. C, pp. 46–59, 2016 (doi: 10.1016/j.csl.2016.03.001).
[17] M. Mimura, S. Sakai, and T. Kawahara, “Reverberant speech recognition combining deep neural networks and deep auto encoders augmented with a phone-class feature”, EURASIP J. on Advances in Signal Proces., vol. 62, p. 13, 2015 (doi: 10.1186/s13634-015-0246-6).
[18] M. L. Lan, S. T. Pan, and C. C. Lai, “Using genetic algorithm to improve the performance of speech recognition based on artiﬁcial neural network”, in 1st Int. Conf. on Innovative Computing, Infor. and Control – Vol. I ICICIC’06, Beijing, China, 2006, vol. 2, no. 1, pp. 6–9 (doi: 10.1109/ICICIC.2006.372).
[19] S. Balochian, E. A. Seidabad, and S. Z. Rad, “Neural network optimization by genetic algorithms for the audio classiﬁcation to speech and music”, Int. J. of Signal Proces., Image Proces. and Pattern Recog., vol. 6, no. 3, pp. 47–54, 2013.
[20] S. King, J. Frankel, K. Livescu, and E. Mcdermott, “Speech production knowledge in automatic speech recognition”, J. of the Acoustic. Soc. of America, vol. 121, no. 2, pp. 723–742, 2007 (doi: 10.1121/1.2404622).
[21] S. I. Levitan, T. Mishra, and S. Bangalore, “Automatic identiﬁcation of gender from speech”, in Proc. Conf. on Speech Prosody, Boston, MA, USA, 2016 pp. 84–88 (doi: 10.21437/SpeechProsody.2016-18).
[22] M. Honda, “Human speech production mechanisms”, NTT Technic. Rev., vol. 1, no. 2, pp. 24–29, 2003.
[23] N. S. Nehe and R. S. Holambe, “DWT and LPC based feature extraction methods for isolated word recognition”, EURASIP J. on Audio, Speech, and Music Proces., vol. 7 pp. 1–7, 2012 (doi: 10.1186/1687-4722-2012-7).
[24] A. Pramanik and R. Raha, “Automatic speech recognition using correlation analysis”, in Proc. World Cong. on Infor. and Commun. Technol. WICT, Trivandnum, Kerala, India, 2012 (doi: 10.1109/WICT.2012.6409160).
[25] X. Zhang, Y. Guo, and X. Hou , “A speech recognition method of isolated words based on modiﬁed LPC cepstrum”, in IEEE Granular Computing Conf., San Jose, CA, USA, 2007 (doi: 10.1109/GrC.2007.96).
[26] I. Hermansky, K. Tsuga, S. Makino, and H. Wakita, “Perceptually based processing in automatic speech recognition”, in Proc. IEEE Conf. on Acoustics, Speech, and Signal Proces. ICASSP’86, Tokyo, Japan, 1986 (doi: 10.1109/ICASSP.1986.1168649).
[27] S. Swamy and K. V. Ramakrishnan, “An efficient speech recognition”, Int. J. of Comp. Science and Engineer., vol. 3, no. 4, pp. 21–27, 2013 (doi: 10.5121/cseji.2013.3403).
[28] H. Ali, N. Ahmad, X. Zhou, K. Iqbal, and S. M. Ali, “DWT features performance analysis for automatic speech recognition of Urdu”, Springer Plus, vol. 3, pp. 1–10, 2014 (doi; 10.1186/2193-1801-3-204).
[29] G. Kaur, R. Khanna, and A. Kumar, “Automatic speech and speaker recognition using MFCC: Review”, Int. J. of Advances in Science and Technol., vol. 2, no. 3, 2014.
[30] G. Kaur, R. Khanna, and A. Kumar, “Implementation of Text Dependent Speaker Veriﬁcation on Matlab”, in Proc. 2nd Conf. on Recent Adv. in Engineer. and Comput. Sciences RAECS, Chandigarh, India, 2015 (doi: 10.1109/RAECS.2015.7453344).
[31] R. Price, K. Iso, and K. Shinoda, “Wise teachers train better DNN acoustic models”, EURASIP J. of Audio, Speech, Music Proces., vol. 10, art. no. 88, 2016 (doi: 10.1186/s13636-016-0088-7).
[32] D. Reynolds, T. Quatieri, and R. Dunn, “Speaker veriﬁcation using adapted gaussian mixture models”, Digital Signal Proces., vol. 10, pp. 19–41, 2000 (doi: 10.1006/dspr.1999.0361).
[33] F. Seide, G. Li, and D. Yu, “Conversational speech transcription using context-dependent deep neural networks”, Interspeech, pp. 437–440, 2011.
[34] M. L. Seltzer, D. Yu, Y. Wang, “An investigation of deep neural networks for noise robust speech recognition”, in IEEE Int. Conf. on Acoust. Speech Signal Proces. ICASSP’13, Vancouver, BC, Canada, 2013 (doi: 10.1109/ICASSP.2013.6639100).
[35] S. Casale, A. Russo, and S. Serrano, “Classiﬁcation of speech under stress using features selected by genetic algorithms”, in Proc. 14th European Signal Proces. Conf., Florence, Tuscany, Italy, 2006 pp. 1–4.
[36] I. Perikos and I. Hatzilygeroudis, “Recognizing emotions in text using ensemble of classiﬁers”, Engineer. Applic. of Artif. Intel., vol. 51, pp. 191–201, 2016 (doi: 10.1016/j.engappai.2016.01.012).

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2018).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-03c8931a-6dbb-4cb5-af9c-7e82115a9c79