PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Huge growth is observed in the speech and speaker recognition field due to many artificial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coefficient (MFCC) speech features, and classification is performed using a Deep Neural Network (DNN). In the first phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and efficiency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and specificity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefficients (MFCC) and relative spectra filtering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of different methods based on existing techniques for both clean and noisy environments is made as well.
Rocznik
Tom
Strony
23--31
Opis fizyczny
Bibliogr. 36 poz., rys., tab.
Twórcy
autor
  • I. K. Gujral Punjab Technical University, Kapurthala, Jalandhar, India
  • University Institute of Engineering & Technology, Panjab University, Chandigarh, India
  • Chandigarh Engineering College, Landran, Mohali, Punjab, India
autor
  • Central Scientific Instruments Organisation, Chandigarh, India
Bibliografia
  • [1] D. R. Reddy, “Speech recognition by machine: A review”, Proc. of the IEEE, vol. 64, no. 4, pp. 501–531, 1976 (doi: 10.1109/PROC.1976.10158).
  • [2] S. Furui, “50 Years of Progress in Speech and Speaker Recognition Research”, ECTI Transact. on Comput. and Infor. Technol., vol. 1, no. 2, pp. 64–74, 2005.
  • [3] J. Campbell, “Speaker recognition: A tutorial”, Proc. of the IEEE, vol. 85, no. 9, pp. 1437–1462, 1997 (doi: 10.1109/5.628714).
  • [4] L. Mary and B. Yegnanarayana, “Extraction and representation of prosodic features for language and speaker recognition”, Speech Communic., vol. 50, no. 10, pp. 782–796, 2008 (doi: 10.1016/j.specom2008.04.010).
  • [5] I. Bhardwaj, “Speaker dependent and independent isolated Hindi word recognizer using hidden Markov model (HMM)”, Int. J. of Comp. Applic., vol. 52, no. 7, pp. 34–40, 2012.
  • [6] S. Squartini, E. Principi, R. Rotili, and F. Piazza, “Environmental robust speech and speaker recognition through multi-channel histogram equalization”, Neurocomputing, vol. 78, no. 1, pp. 111–120, 2012 (doi: 10.1016/j.neurocom.2011.05.035).
  • [7] N. S. Dey, R. Mohanty, and K. L. Chugh, “Speech and speaker recognition system using artificial neural networks and hidden Markov model”, in Proc. IEEE Int. Conf. on Communic. Sys. and Network Technol. CSNT, Bhopal, Madhya Pradesh, India, 2012, pp. 311–315 (doi: 10.1109/CSNT.2012.221).
  • [8] T. Gaafar, H. Bakr, and M. Abdalla, “An improved method for speech/speaker recognition”, in Int. Conf. on Infor., Electr. and Vision ICIEV, Dhaka, Bangladesh, 2014 (doi: 10.1109/ICIEV.2014.6850693).
  • [9] T. A. Smadi, ”An improved real-time speech signal in case of isolated word recognition”, Int. J. of Engineer. Research and Applic., vol. 3, no. 5, pp. 1748–1754, 2013.
  • [10] V. Fontaine and H. Bourlard, ”Speaker dependent speech recognition based on phone–like units models application to voice dialing”, in Proc. IEEE Conf. on Acoustics, Speech, and Signal Proces. ICASSP’97, Munich, Bavaria, Germany, 1997, pp. 2–5 (doi: 10.1109/ICASSP.1997.596241).
  • [11] S. J. Wright, D. Kanevsky, and L. Deng, “Optimization algorithms and applications for speech and language processing”, IEEE Transact. on Audio, Speech and Lang. Proces. vol. 21, no. 11, pp. 1527–1530, 2013 (doi: 10.1109/TASL.2013.2283777).
  • [12] M. Mitchell, “Genetic algorithms: An overview 1”, Complexity, vol. 1, pp. 31–39, 1995 (doi: 10.1102/cplx.6130010108).
  • [13] M. Sarma, “Speech recognition using deep neural network – recent trends”, Int. J. of Int. Sys. Design and Computing, vol. 1, no. 12, pp. 71–86, 2017 (doi: 10.1504/IJISDC.2017.082853).
  • [14] L. Deng, M. Seltzer, D. Yu, A Acero, A. Mohamed, and G. Hinton, “Binary coding of speech spectrograms using a deep auto – encoder”, in Proc. 11th Int. Conf. on Speech Commun. Assoc., Makuhari, Chiba, Japan, 2010, pp. 1692–1695, 2010.
  • [15] F. Guojiang, “A novel isolated speech recognition method based on neural network”, 2nd Int. Conf. on Network. and Infor. Technol., Singapore, 2011, vol. 17, pp. 264–269.
  • [16] I. Lopez-Moreno et al., “On the use of deep feed forward neural networks for automatic language identification”, Computer Speech Lang., vol. 40, no. C, pp. 46–59, 2016 (doi: 10.1016/j.csl.2016.03.001).
  • [17] M. Mimura, S. Sakai, and T. Kawahara, “Reverberant speech recognition combining deep neural networks and deep auto encoders augmented with a phone-class feature”, EURASIP J. on Advances in Signal Proces., vol. 62, p. 13, 2015 (doi: 10.1186/s13634-015-0246-6).
  • [18] M. L. Lan, S. T. Pan, and C. C. Lai, “Using genetic algorithm to improve the performance of speech recognition based on artificial neural network”, in 1st Int. Conf. on Innovative Computing, Infor. and Control – Vol. I ICICIC’06, Beijing, China, 2006, vol. 2, no. 1, pp. 6–9 (doi: 10.1109/ICICIC.2006.372).
  • [19] S. Balochian, E. A. Seidabad, and S. Z. Rad, “Neural network optimization by genetic algorithms for the audio classification to speech and music”, Int. J. of Signal Proces., Image Proces. and Pattern Recog., vol. 6, no. 3, pp. 47–54, 2013.
  • [20] S. King, J. Frankel, K. Livescu, and E. Mcdermott, “Speech production knowledge in automatic speech recognition”, J. of the Acoustic. Soc. of America, vol. 121, no. 2, pp. 723–742, 2007 (doi: 10.1121/1.2404622).
  • [21] S. I. Levitan, T. Mishra, and S. Bangalore, “Automatic identification of gender from speech”, in Proc. Conf. on Speech Prosody, Boston, MA, USA, 2016 pp. 84–88 (doi: 10.21437/SpeechProsody.2016-18).
  • [22] M. Honda, “Human speech production mechanisms”, NTT Technic. Rev., vol. 1, no. 2, pp. 24–29, 2003.
  • [23] N. S. Nehe and R. S. Holambe, “DWT and LPC based feature extraction methods for isolated word recognition”, EURASIP J. on Audio, Speech, and Music Proces., vol. 7 pp. 1–7, 2012 (doi: 10.1186/1687-4722-2012-7).
  • [24] A. Pramanik and R. Raha, “Automatic speech recognition using correlation analysis”, in Proc. World Cong. on Infor. and Commun. Technol. WICT, Trivandnum, Kerala, India, 2012 (doi: 10.1109/WICT.2012.6409160).
  • [25] X. Zhang, Y. Guo, and X. Hou , “A speech recognition method of isolated words based on modified LPC cepstrum”, in IEEE Granular Computing Conf., San Jose, CA, USA, 2007 (doi: 10.1109/GrC.2007.96).
  • [26] I. Hermansky, K. Tsuga, S. Makino, and H. Wakita, “Perceptually based processing in automatic speech recognition”, in Proc. IEEE Conf. on Acoustics, Speech, and Signal Proces. ICASSP’86, Tokyo, Japan, 1986 (doi: 10.1109/ICASSP.1986.1168649).
  • [27] S. Swamy and K. V. Ramakrishnan, “An efficient speech recognition”, Int. J. of Comp. Science and Engineer., vol. 3, no. 4, pp. 21–27, 2013 (doi: 10.5121/cseji.2013.3403).
  • [28] H. Ali, N. Ahmad, X. Zhou, K. Iqbal, and S. M. Ali, “DWT features performance analysis for automatic speech recognition of Urdu”, Springer Plus, vol. 3, pp. 1–10, 2014 (doi; 10.1186/2193-1801-3-204).
  • [29] G. Kaur, R. Khanna, and A. Kumar, “Automatic speech and speaker recognition using MFCC: Review”, Int. J. of Advances in Science and Technol., vol. 2, no. 3, 2014.
  • [30] G. Kaur, R. Khanna, and A. Kumar, “Implementation of Text Dependent Speaker Verification on Matlab”, in Proc. 2nd Conf. on Recent Adv. in Engineer. and Comput. Sciences RAECS, Chandigarh, India, 2015 (doi: 10.1109/RAECS.2015.7453344).
  • [31] R. Price, K. Iso, and K. Shinoda, “Wise teachers train better DNN acoustic models”, EURASIP J. of Audio, Speech, Music Proces., vol. 10, art. no. 88, 2016 (doi: 10.1186/s13636-016-0088-7).
  • [32] D. Reynolds, T. Quatieri, and R. Dunn, “Speaker verification using adapted gaussian mixture models”, Digital Signal Proces., vol. 10, pp. 19–41, 2000 (doi: 10.1006/dspr.1999.0361).
  • [33] F. Seide, G. Li, and D. Yu, “Conversational speech transcription using context-dependent deep neural networks”, Interspeech, pp. 437–440, 2011.
  • [34] M. L. Seltzer, D. Yu, Y. Wang, “An investigation of deep neural networks for noise robust speech recognition”, in IEEE Int. Conf. on Acoust. Speech Signal Proces. ICASSP’13, Vancouver, BC, Canada, 2013 (doi: 10.1109/ICASSP.2013.6639100).
  • [35] S. Casale, A. Russo, and S. Serrano, “Classification of speech under stress using features selected by genetic algorithms”, in Proc. 14th European Signal Proces. Conf., Florence, Tuscany, Italy, 2006 pp. 1–4.
  • [36] I. Perikos and I. Hatzilygeroudis, “Recognizing emotions in text using ensemble of classifiers”, Engineer. Applic. of Artif. Intel., vol. 51, pp. 191–201, 2016 (doi: 10.1016/j.engappai.2016.01.012).
Uwagi
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2018).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-03c8931a-6dbb-4cb5-af9c-7e82115a9c79
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.