Development of Speaker Voice Identification Using Main Tone Boundary Statistics for Applying To Robot-Verbal Systems

Amirgaliyev, Yedilkhan; Musabayev, Timur; Yedilkhan, Didar; Wojcik, Waldemar; Amirgaliyeva, Zhazira

doi:10.24425/ijet.2020.134015

Artykuł - szczegóły

Tytuł artykułu

Development of Speaker Voice Identification Using Main Tone Boundary Statistics for Applying To Robot-Verbal Systems

Autorzy

Amirgaliyev Yedilkhan , Musabayev Timur , Yedilkhan Didar , Wojcik Waldemar , Amirgaliyeva Zhazira

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.24425/ijet.2020.134015

Warianty tytułu

Języki publikacji

Abstrakty

Hereby there is given the speaker identification basic system. There is discussed application and usage of the voice interfaces, in particular, speaker voice identification upon robot and human being communication. There is given description of the information system for speaker automatic identification according to the voice to apply to robotic-verbal systems. There is carried out review of algorithms and computer-aided learning libraries and selected the most appropriate, according to the necessary criteria, ALGLIB. There is conducted the research of identification model operation performance assessment at different set of the fundamental voice tone. As the criterion of accuracy there has been used the percentage of improperly classified cases of a speaker identification.

Słowa kluczowe

speaker voice identification voice interface FXO human being human-robot interaction HRI speech recognition statistics of voice fundamental tone computer-aided learning neural network

Wydawca

Polish Academy of Sciences, Committee of Electronics and Telecommunication

Czasopismo

International Journal of Electronics and Telecommunications

Rocznik

2020

Tom

Vol. 66, No. 3

Strony

583--588

Opis fizyczny

Bibliogr. 23 poz., il.

Twórcy

autor

Amirgaliyev Yedilkhan

amir_ed@mail.ru

Institute of Information and Computing Technologies of the Science Committee of RK MES and Al-Farabi Kazakh National University

autor

Musabayev Timur

tmusab@yandex.ru

Institute of Information and Computing Technologies of the Science Committee of RK MES, Astana IT University

autor

Yedilkhan Didar

yedilkhan@gmail.com

Institute of Information and Computing Technologies of the Science Committee of RK MES, Astana IT University

autor

Wojcik Waldemar

waldemar.wojcik@pollub.pl

Lublin Technical University

autor

Amirgaliyeva Zhazira

Institute of Information and Computing Technologies of the Science Committee of RK MES

Bibliografia

[1] J. P. Campell and Jr., Speaker Recognition: A Tutorial, Proceeding of IEEE, vol. 85, pp. 1437–1462, (1997).
[2] Osman Buyuk and Lavent M. Arslan, HMM-based Text-dependent Speaker Recognition with Handset-channel Recognition, IEEE ICSPCA, pp. 383–386, (2010).
[3] D. A. Reynolds and R. C. Rose, Robust Text-independent Speaker Identification using Gaussian Mixture Speaker Models, IEEE Transaction on SAP, vol. 3, no. 1, pp. 72–83, (1995).
[4] R. E. Wohiford, E. H. Jr. Wrench and B. P. Landell, A Comparison of Four Techniques for Automatic Speaker Recognition, Proceedings of IEEE ICASSP, vol. 5, pp. 908–911, (1980).
[5] B. Atal, Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification, The Journal of the Acoustical Society America, vol. 55, pp. 1304–1312, (1974).
[6] Sangeeta Biswas’, Shamim Ahmadt and Md Khademul Islam Molladt, Speaker Identification using Cepstral Based Features and Discrete Hidden Markov Model, Proceedings of IEEE ICICT, pp. 303–306, (2007).
[7] Latha, Robust Speaker Identification Incorporating High Frequency Features, Procedia Computer Science, vol. 89, 2016, pp. 804-811.
[8] https://ru.wikipedia.org/wiki/Speechrecognition.
[9] F. Alonso-Martin, J. F. Gorostiza, M. Malfaz, and M. Salichs. Multimodal Fusion as Communicative Acts during Human-Robot Interaction. Cybernetics and Systems, 44(8): 681–703, 2013.
[10] E. Dalmasso, F. Castaldo, P. Laface, D. Colibro, and C. Vair. Loquendo - Speaker recognition evaluation system. In Acoustics, Speech and Signal Processing, ICASSP 2009. IEEE
[11] F. Alonso Martin, A. Ramey, M. A. Salichs. Speaker identification using three signal voice domains during human-robot interaction. HRI’14. 2014.
[12] Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda, T. Kitamura. Minimum Classification Error Interactive Training for Speaker Identification. Proceedings. (ICASSP ’05). 2005.
[13] Alisa (voice helper) // https://ru.wikipedia.org/wiki/Alisa: 24.11.2017
[14] Kovalj S.L., Labutin P. V., Malaya Ye. V., Proshina Ye. А. Speakers identification based on the main voice tone statistic comparison // Informatization and information security of law-enforcement agencies: proceedings of the XV International scientific conference - М.: Russia Ministry of the Interior Academy of management, 2006. – p.p. 324–327.
[15] Bulgakova Ye.V., Sholokhov А.V., Tomashenko N.А. Speakers identification method based on phonemes length statistics comparison // Scientific-technical vestnik of information technologies, mechanics and optics. – 2015. – No 1. – p.p. 70–77.
[16] Lukiyanov D. I., Mikhailova А. S. Human being automatic identification according to the voice using an algorithm based on Gaussian mixtures model // Vestnik of RSRTU. – 2017. – No 61. – p.p. 19-24.
[17] Math.NET Numerics // https://numerics.mathdotnet.com/: 28.07.2017.
[18] Statistics – Math.NET Numerics Documentation. Extension methods to return basic statistics on set of data // https://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Statistics.htm:24.11.2017.
[19] Vetrov D.P., Kropotov D.А. Bayesian method of computer-aided learning. – Study guide – М., 2007. – 132 p.
[20] Glushkov V.M., Amosov N.М., Artyemenko I.А. Cybernetics encyclopedia. Volume 2. – K.: Main office of Ukrainian soviet encyclopedia, 1974. – 624 p.
[21] Mussabayev R.R., Amirgaliyev Ye. N., Tairova A.T., Mussabayev T.R., Koibagarov K. Ch. The technology for the automatic formation of the personal digital voice pattern // 10th IEEE International Conference on Application of Information and Communication Technologies (AICT). – Azerbaijan, Baku, 2016. – P. 422-426.
[22] General concepts. Library of algorithms ALGLIB // http://alglib.sources.ru/dataanalysis/generalprinciples.php:18.08.2017.
[23] Full set of sentence recordings for downloading. The Centre for Speech Technology Research // http://www.cstr.ed.ac.uk/projects/eustace/download.html:25.08.2017.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-9dccfc18-90ab-4325-a80f-4717fc188a50