HMM-based phoneme speech recognition system for the control and command of industrial robots

Naik, Adwait

doi:10.37705/TechTrans/e2021002

Artykuł - szczegóły

Tytuł artykułu

HMM-based phoneme speech recognition system for the control and command of industrial robots

Autorzy

Naik Adwait

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.37705/TechTrans/e2021002

Warianty tytułu

Języki publikacji

Abstrakty

n recent years, the integration of human-robot interaction with speech recognition has gained a lot of pace in the manufacturing industries. Conventional methods to control the robots include semi-autonomous, fully-autonomous, and wired methods. Operating through a teaching pendant or a joystick is easy to implement but is not effective when the robot is deployed to perform complex repetitive tasks. Speech and touch are natural ways of communicating for humans and speech recognition, being the best option, is a heavily researched technology. In this study, we aim at developing a stable and robust speech recognition system to allow humans to communicate with machines (robotic-arm) in a seamless manner. This paper investigates the potential of the linear predictive coding technique to develop a stable and robust HMM-based phoneme speech recognition system for applications in robotics. Our system is divided into three segments: a microphone array, a voice module, and a robotic arm with three degrees of freedom (DOF). To validate our approach, we performed experiments with simple and complex sentences for various robotic activities such as manipulating a cube and pick and place tasks. Moreover, we also analyzed the test results to rectify problems including accuracy and recognition score.

Słowa kluczowe

speech recognition phoneme robotics human-robot interaction HRI linear predictive coding LPC hidden Markov model HMM

rozpoznawanie mowy robotyka interakcja człowiek-robot liniowe kodowanie predykcyjne ukryty model Markowa

Wydawca

Wydawnictwo Politechniki Krakowskiej im. Tadeusza Kościuszki

Czasopismo

Technical Transactions

Rocznik

2021

Tom

Vol. 118, iss. 1

Strony

art. no. e2021002

Opis fizyczny

Bibliogr. 21 poz., il., tab., wz., wykr.

Twórcy

autor

Naik Adwait

adwaitnaik2@gmail.com

University of Mumbai, India

Bibliografia

1. Alifani, F., Purboyo, T.W., Setianingsih, C. (2019). Implementation of Voice Recognition in Disaster Victim Detection Using Hidden Markov Model (HMM) Method. International Seminar on Intelligent Technology and Its Applications (ISITIA).
2. Alim, S.A., Rashid, N.K. (2018). Some Commonly Used Speech Feature Extraction Algorithms.
3. Ande, S. K., Kuchibotla, M. R., Adavi, B. K. (2020). Robot acquisition, control and interfacing using multimodal feedback. Journal of Ambient Intelligence and Humanized Computing, 1–11.
4. Bahar, P., Makarov, N., Zeyer, A., Schlüter, R., Ney, H. (2020). Exploring A Zero-Order Direct Hmm Based on Latent Attention for Automatic Speech Recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7854–7858.
5. Baranwal, N., Singh, A. K., & Hellstrom, T. (2019). Fusion of Gesture and Speech for Increased Accuracy in Human Robot Interaction. 24th International Conference on Methods and Models in Automation and Robotics (MMAR).
6. Becker, K. (2016). Identifying the Gender of a Voice using Machine Learning. Retrieved from http://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning (access: 29/05/2020).
7. Bendel, O. (2020). Co-Robots as Care Robots. Preprint arXiv arXiv:2004.04374.
8. Bongomin, O., Yemane, A., Kembabazi, B., Malanda, C., Mwape, M. C., Mpofu, N.S., Tigalana, D. (2020). The Hype and Disruptive Technologies of Industry 4.0 in Major Industrial Sectors: A State of the Art. M., Abdelaziz, A. H., & Kolossa, D. (2016). Twin-HMM-based non-intrusive speech intelligibility prediction. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
9. Charles J., Vishwas M., Ruixi L. (2020). Improved Robust ASR for Social Robots in Public Spaces. Preprint arXiv:2001.0.04619.
10. Kennedy, J., Lemaignan, S., Montassier, C., Lavalade, P., Irfan, B., Papadopoulos, F., Senft, E., Belpaeme, T. (2017). Child Speech Recognition in Human-Robot Interaction. Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. HRI ’17. ACM/IEEE International Conference on Human-Robot Interaction.
11. Lakomkin, E., Zamani, M. A., Weber, C., Magg, S., Wermter, S. (2018). On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
12. Naik, A. HMM-based phoneme speech recognition system for control and command of industrial robots. Preprint arXiv:2000.01222, 1–23.
13. Ninh, D. K. (2019). A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System. 2019 11th International Conference on Knowledge and Systems Engineering (KSE). 11th International Conference on Knowledge and Systems Engineering (KSE).
14. Novoa, J., Wuth, J., Escudero, J. P., Fredes, J., Mahu, R., Yoma, N. B. (2018). DNN-HMM based automatic speech recognition for HRI scenarios. In Proceedings of the 2018 ACM/IEEE International Conference on Human- Robot Interaction (pp. 150-159).
15. Palaz, D., Magimai-Doss, M., Collobert, R. (2019). End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition. Speech Communication, 108, 15–32.
16. Sharma, U., Maheshkar, S., Mishra, A. N., Kaushik, R. (2019). Visual Speech Recognition Using Optical Flow and Hidden Markov Model. Wireless Personal Communications, 106(4), 2129–2147.
17. Ting, W. (2019). An Acoustic Recognition Model for English Speech Based on Improved HMM Algorithm. In 2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), 729–732.
18. Zhou, W., Schlüter, R., Ney, H. (2020). Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7834–7838.
19. http://geetech.com (access: 29/05/2020).
20. http://threegraphs.com (access: 29/05/2020).
21. http://www.creately.com (access: 29/05/2020)

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-a00b0b2d-75ee-4fa3-a0bb-151446c734ff