Speech-Based Vehicle Movement Control Solution

Kaur, Gurpreet; Srivastava, Mohit; Kumar, Amod

doi:10.26636/jtit.2021.149820

Artykuł - szczegóły

Tytuł artykułu

Speech-Based Vehicle Movement Control Solution

Autorzy

Kaur Gurpreet , Srivastava Mohit , Kumar Amod

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.26636/jtit.2021.149820

Warianty tytułu

Języki publikacji

Abstrakty

The article describes a speech-based robotic prototype designed to aid the movement of elderly or handicapped individuals. Mel frequency cepstral coefficients (MFCC) are used for the extraction of speech features and a deep belief network (DBN) is trained for the recognition of commands. The prototype was tested in a real-world environment and achieved an accuracy rate of 87.4%.

Słowa kluczowe

deep belief networks mel frequency cepstral coeficients speech recognition

Wydawca

Instytut Łączności - Państwowy Instytut Badawczy

Czasopismo

Journal of Telecommunications and Information Technology

Rocznik

2021

Tom

nr 3

Strony

72--77

Opis fizyczny

Bibliogr. 25 poz., rys., tab.

Twórcy

autor

Kaur Gurpreet

regs4gurpreet@yahoo.co.in

UIET, Panjab University, Chandigarh, India

https://orcid.org/0000-0003-3735-0568

autor

Srivastava Mohit

mohitsrivastava.78@gmail.com

CEC Landran, India

https://orcid.org/0000-0002-4566-4279

autor

Kumar Amod

csioamod@yahoo.com

NITTTR, Chandigarh, India

https://orcid.org/0000-0003-1177-3191

Bibliografia

[1] J. H. L. Hansen and T. Hasan, „Speaker recognition by machines and humans: a tutorial review", in IEEE Signal Process. Mag., vol. 32, no. 6, pp. 74-99, 2015 (DOI: 10.1109/MSP.2015.2462851).
[2] D. R. Reddy, „Speech recognition by machine: A review", in Proc. of the IEEE, vol. 64, no. 4, 1976, pp. 501-531 (DOI: 10.1109/PROC.1976.10158).
[3] M. Nishimori, T. Saitoh, and R. Konishi, „Voice controlled intelligent wheelchair", in Proc. of the SICE Annual Conf., Takamatsu, Japan, 2007, pp. 336-340 (DOI: 10.1109/SICE.2007.4421003).
[4] N. Peixoto, H. G. Nik, and H. Charkhkar, „Voice controlled wheelchairs: Fine control by humming", Computer Methods and Programs in Biomedicine, vol. 112, pp. 156-165, 2013 (DOI: 10.1016/j.cmpb.2013.06.009).
[5] V. Partha Saradi and P. Kailasapathi, „Voice-based motion control of a robotic vehicle through visible light communication", Computers and Electrical Engineer., vol. 76, pp. 154-167, 2019 (DOI: 10.1016/j.compeleceng.2019.03.011).
[6] G. Kaur, M. Srivastava, and A. Kumar, „Integrated speaker and speech recognition for wheel chair movement using artificial intelligence", Informatica, vol. 42, pp. 587-594, 2018 (DOI:10.31449/inf.v42i4.2003).
[7] S. Squartini, E. Principi, R. Rotili, and F. Piazza, „Environmental robust speech and speaker recognition through multi-channel histogram equalization", Neurocomputing, vol. 78, no. 1, pp. 111-120, 2012 (DOI:10.1016/j.neucom.2011.05.035).
[8] S. Furui, „50 Years of progress in speech and speaker recognition", ECTI Trans. on Computer and Information Technol., pp. 64-74, 2012 (DOI:10.37936/ecti-cit.200512.51834).
[9] G. Kaur, M. Srivastava, and A. Kumar, „Implementation of text dependent speaker verification on Matlab", in Proc. of 2nd Conf. on Recent Adv. in Engineer. and Comput. Sci. RAECS, Chandigarh, India, 2015 (DOI: 10.1109/RAECS.2015.7453344).
[10] G. Kaur, M. Srivastava, and A. Kumar, „Analysis of feature extraction methods for speaker dependent speech recognition", Int. J. of Engineer. and Technol. Innovation, vol. 7, pp. 78-88, 2017 [Online]. Available: https://ojs.imeti.org/index.php/IJETI/article/view/382/395
[11] S. Narang and D. Gupta, „Speech feature extraction techniques: a review", Int. J. of Computer Sci. and Mobile Comput., vol. 4, no. 3, pp. 107-114, 2015 [Online]. Available: https://www.ijcsmc.com/docs/papers/March2015/V4I3201545.pdf
[12] D. Y. Huang, Z. Zhang, and S. S. Ge, „Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines", Computer Speech Language, vol. 28, no. 2, pp. 392-419, 2014 (DOI: 10.1016/j.csl.2013.06.002).
[13] S. M. Siniscalchi, T. Svendsen, and C. H. Lee, „An artificial neural network approach to automatic speech processing", Neurocomputing, vol. 140, pp. 326-338, 2014 (DOI: 10.1016/j.neucom.2014.03.005).
[14] N. S. Dey, R. Mohanty, and K. L. Chugh, „Speech and speaker recognition system using artificial neural networks and hidden Markov model", in Proc. IEEE Int. Conf. on Communication System and Network Technology CSNT, Rajkot, Gujarat, India, 2012, pp. 311-315 (DOI: 10.1109/CSNT.2012.221).
[15] R. Makhijani and R. Gupta, „Isolated word speech recognition system using Dynamic Time Warping", Int. J. of Engineering Sciences & Emerging Technologies, vol. 6, no. 3, pp. 352-367, 2013 [Online]. Available: https://www.ijeset.com/media/0002/9N13-IJESET0603130-v6-iss3-352-363.pdf
[16] G. Dede and M. H. Sazli, „Speech recognition with artificial neural networks", Digital Signal Process.: A Review Journal, vol. 20, no. 3, pp. 763-768, 2010 (DOI: 10.1016/j.dsp.2009.10.004).
[17] T. Nikoskinen, „From neural network to deep neural network", Alto University School of Science, pp. 1-27, 2015 [Online]. Available: https://sal.aalto._/publications/pdf-files/enik15 public.pdf
[18] L. Moreno et al., „On the use of deep feed forward neural networks for automatic language identification", Computer Speech Language, vol. 40, pp. 46-59, 2016 (DOI: 10.1016/j.csl.2016.03.001).
[19] T. Alsmadi, H. A. Alissa, E. Trad, and K. Alsmadi, „Artificial intelligence for speech recognition based on ne (DOI: 10.4236/jsip.2015.62006).
[20] V. Mitra et al., „Hybrid convolutional neural networks for articulatory and acoustic information-based speech recognition", Speech Commun., vol. 89, pp. 103-112, 2017 (DOI: 10.1016/j.specom.2017.03.003).
[21] M. Farahat and R. Halavati, „Noise robust speech recognition Rusing Deep Belief Networks", Int. J. of Comput. Intelligence and Applicat., vol. 15, no. 1, pp. 1-17, 2016 (DOI: 10.1142/S146902681650005X).
[22] R. Sarikaya, G. E. Hinton, and A. Deoras, „Application of Deep Belief Networks for natural language understanding", IEEE/ACM Trans. on Audio, Speech, and Language Process., pp. 778-784, 2014 (DOI: 10.1109/TASLP.2014.2303296).
[23] A. R. Mohamed, G. E. Dahl, and G. Hinton, „Acoustic Modeling Using Deep Belief Networks", IEEE Trans. on Audio, Speech and Language Process., vol. 20, no. 1, pp. 14-22, 2011 (DOI: 10.1109/TASL.2011.2109382).
[24] X. Chen, X. Liu, Y. Wang, M. J. F. Gales, and P. C. Woodland, „Efficient training and evaluation of recurrent neural Network language models for automatic speech recognition", IEEE/ACM Trans. on Audio, Speech, and Language Process., vol. 24, no. 11, pp. 2146-2157, 2016 (DOI: 10.1109/TASLP.2016.2598304).
[25] R. Ajgou, S. Sbaa, S. Ghendir, and A. Chemsa, „An efficient approach for MFCC feature extraction for text independent speaker identification system", Int. J. of Commun., vol. 9, pp. 114-122, 2015 [Online]. Available: http://www.naun.org/main/NAUN/communications/2015/a382006-081.pdf.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-e452ed9d-7a65-4d9c-a8f6-761835253a3f