Tytuł artykułu
Autorzy
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
The article describes a speech-based robotic prototype designed to aid the movement of elderly or handicapped individuals. Mel frequency cepstral coefficients (MFCC) are used for the extraction of speech features and a deep belief network (DBN) is trained for the recognition of commands. The prototype was tested in a real-world environment and achieved an accuracy rate of 87.4%.
Rocznik
Tom
Strony
72--77
Opis fizyczny
Bibliogr. 25 poz., rys., tab.
Twórcy
autor
- UIET, Panjab University, Chandigarh, India
autor
- CEC Landran, India
autor
- NITTTR, Chandigarh, India
Bibliografia
- [1] J. H. L. Hansen and T. Hasan, „Speaker recognition by machines and humans: a tutorial review", in IEEE Signal Process. Mag., vol. 32, no. 6, pp. 74-99, 2015 (DOI: 10.1109/MSP.2015.2462851).
- [2] D. R. Reddy, „Speech recognition by machine: A review", in Proc. of the IEEE, vol. 64, no. 4, 1976, pp. 501-531 (DOI: 10.1109/PROC.1976.10158).
- [3] M. Nishimori, T. Saitoh, and R. Konishi, „Voice controlled intelligent wheelchair", in Proc. of the SICE Annual Conf., Takamatsu, Japan, 2007, pp. 336-340 (DOI: 10.1109/SICE.2007.4421003).
- [4] N. Peixoto, H. G. Nik, and H. Charkhkar, „Voice controlled wheelchairs: Fine control by humming", Computer Methods and Programs in Biomedicine, vol. 112, pp. 156-165, 2013 (DOI: 10.1016/j.cmpb.2013.06.009).
- [5] V. Partha Saradi and P. Kailasapathi, „Voice-based motion control of a robotic vehicle through visible light communication", Computers and Electrical Engineer., vol. 76, pp. 154-167, 2019 (DOI: 10.1016/j.compeleceng.2019.03.011).
- [6] G. Kaur, M. Srivastava, and A. Kumar, „Integrated speaker and speech recognition for wheel chair movement using artificial intelligence", Informatica, vol. 42, pp. 587-594, 2018 (DOI:10.31449/inf.v42i4.2003).
- [7] S. Squartini, E. Principi, R. Rotili, and F. Piazza, „Environmental robust speech and speaker recognition through multi-channel histogram equalization", Neurocomputing, vol. 78, no. 1, pp. 111-120, 2012 (DOI:10.1016/j.neucom.2011.05.035).
- [8] S. Furui, „50 Years of progress in speech and speaker recognition", ECTI Trans. on Computer and Information Technol., pp. 64-74, 2012 (DOI:10.37936/ecti-cit.200512.51834).
- [9] G. Kaur, M. Srivastava, and A. Kumar, „Implementation of text dependent speaker verification on Matlab", in Proc. of 2nd Conf. on Recent Adv. in Engineer. and Comput. Sci. RAECS, Chandigarh, India, 2015 (DOI: 10.1109/RAECS.2015.7453344).
- [10] G. Kaur, M. Srivastava, and A. Kumar, „Analysis of feature extraction methods for speaker dependent speech recognition", Int. J. of Engineer. and Technol. Innovation, vol. 7, pp. 78-88, 2017 [Online]. Available: https://ojs.imeti.org/index.php/IJETI/article/view/382/395
- [11] S. Narang and D. Gupta, „Speech feature extraction techniques: a review", Int. J. of Computer Sci. and Mobile Comput., vol. 4, no. 3, pp. 107-114, 2015 [Online]. Available: https://www.ijcsmc.com/docs/papers/March2015/V4I3201545.pdf
- [12] D. Y. Huang, Z. Zhang, and S. S. Ge, „Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines", Computer Speech Language, vol. 28, no. 2, pp. 392-419, 2014 (DOI: 10.1016/j.csl.2013.06.002).
- [13] S. M. Siniscalchi, T. Svendsen, and C. H. Lee, „An artificial neural network approach to automatic speech processing", Neurocomputing, vol. 140, pp. 326-338, 2014 (DOI: 10.1016/j.neucom.2014.03.005).
- [14] N. S. Dey, R. Mohanty, and K. L. Chugh, „Speech and speaker recognition system using artificial neural networks and hidden Markov model", in Proc. IEEE Int. Conf. on Communication System and Network Technology CSNT, Rajkot, Gujarat, India, 2012, pp. 311-315 (DOI: 10.1109/CSNT.2012.221).
- [15] R. Makhijani and R. Gupta, „Isolated word speech recognition system using Dynamic Time Warping", Int. J. of Engineering Sciences & Emerging Technologies, vol. 6, no. 3, pp. 352-367, 2013 [Online]. Available: https://www.ijeset.com/media/0002/9N13-IJESET0603130-v6-iss3-352-363.pdf
- [16] G. Dede and M. H. Sazli, „Speech recognition with artificial neural networks", Digital Signal Process.: A Review Journal, vol. 20, no. 3, pp. 763-768, 2010 (DOI: 10.1016/j.dsp.2009.10.004).
- [17] T. Nikoskinen, „From neural network to deep neural network", Alto University School of Science, pp. 1-27, 2015 [Online]. Available: https://sal.aalto._/publications/pdf-files/enik15 public.pdf
- [18] L. Moreno et al., „On the use of deep feed forward neural networks for automatic language identification", Computer Speech Language, vol. 40, pp. 46-59, 2016 (DOI: 10.1016/j.csl.2016.03.001).
- [19] T. Alsmadi, H. A. Alissa, E. Trad, and K. Alsmadi, „Artificial intelligence for speech recognition based on ne (DOI: 10.4236/jsip.2015.62006).
- [20] V. Mitra et al., „Hybrid convolutional neural networks for articulatory and acoustic information-based speech recognition", Speech Commun., vol. 89, pp. 103-112, 2017 (DOI: 10.1016/j.specom.2017.03.003).
- [21] M. Farahat and R. Halavati, „Noise robust speech recognition Rusing Deep Belief Networks", Int. J. of Comput. Intelligence and Applicat., vol. 15, no. 1, pp. 1-17, 2016 (DOI: 10.1142/S146902681650005X).
- [22] R. Sarikaya, G. E. Hinton, and A. Deoras, „Application of Deep Belief Networks for natural language understanding", IEEE/ACM Trans. on Audio, Speech, and Language Process., pp. 778-784, 2014 (DOI: 10.1109/TASLP.2014.2303296).
- [23] A. R. Mohamed, G. E. Dahl, and G. Hinton, „Acoustic Modeling Using Deep Belief Networks", IEEE Trans. on Audio, Speech and Language Process., vol. 20, no. 1, pp. 14-22, 2011 (DOI: 10.1109/TASL.2011.2109382).
- [24] X. Chen, X. Liu, Y. Wang, M. J. F. Gales, and P. C. Woodland, „Efficient training and evaluation of recurrent neural Network language models for automatic speech recognition", IEEE/ACM Trans. on Audio, Speech, and Language Process., vol. 24, no. 11, pp. 2146-2157, 2016 (DOI: 10.1109/TASLP.2016.2598304).
- [25] R. Ajgou, S. Sbaa, S. Ghendir, and A. Chemsa, „An efficient approach for MFCC feature extraction for text independent speaker identification system", Int. J. of Commun., vol. 9, pp. 114-122, 2015 [Online]. Available: http://www.naun.org/main/NAUN/communications/2015/a382006-081.pdf.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-e452ed9d-7a65-4d9c-a8f6-761835253a3f