PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Investigation of the Lombard effect based on a machine learning approach

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The Lombard effect is an involuntary increase in the speaker’s pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters related to speech changes produced by the Lombard effect are extracted. Mid-term statistics are built upon the parameters and used for the self-similarity matrix construction. They constitute input data for a convolutional neural network (CNN). The self-similarity-based approach is then compared with two other methods, i.e., spectrograms used as input to the CNN and speech acoustic parameters combined with the k-nearest neighbors algorithm. The experimental investigations show the superiority of the self-similarity approach applied to Lombard effect detection over the other two methods utilized. Moreover, small standard deviation values for the self-similarity approach prove the resulting high accuracies.
Rocznik
Strony
479--492
Opis fizyczny
Bibliogr. 54 poz., rys., tab., wykr.
Twórcy
  • Institute of Data Science and Digital Technologies, Vilnius University, Akademijos str. 4, LT-08412 Vilnius, Lithuania
  • Institute of Data Science and Digital Technologies, Vilnius University, Akademijos str. 4, LT-08412 Vilnius, Lithuania
  • PGS Software, ul. Sucha 3, 50-086 Wrocław, Poland
  • Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, ul. G. Narutowicza 11/12, 80-233 Gdańsk, Poland
Bibliografia
  • [1] Bernardo, L.S., Damaševičius, R., de Albuquerque, V.H.C. and Maskeliūnas, R. (2021). A hybrid two-stage SqueezeNet and support vector machine system for Parkinson’s disease detection based on handwritten spiral patterns, International Journal of Applied Mathematics and Computer Science 31(4): 549-561, DOI: 10.34768/amcs-2021-0037.
  • [2] Berrar, D. (2019). Cross-validation, in S. Ranganathan et al. (Eds), Encyclopedia of Bioinformatics and Computational Biology, Academic Press, Oxford, pp. 542-545.
  • [3] Boril, H. and Hansen, J.H. (2009). Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments, IEEE Transactions on Audio, Speech, and Language Processing 18(6): 1379-1393.
  • [4] Bottalico, P., Passione, I.I., Graetzer, S. and Hunter, E.J. (2017). Evaluation of the starting point of the Lombard effect, Acta Acustica United With Acustica 103(1): 169-172.
  • [5] Bottalico, P., Piper, R.N. and Legner, B. (2022). Lombard effect, intelligibility, ambient noise, and willingness to spend time and money in a restaurant amongst older adults, Scientific Reports 12(1): 1-9.
  • [6] Chiu, W., Xu, Y., Abel, A., Lin, C. and Tu, Z. (2020). Investigating the visual Lombard effect with Gabor based features, Proceedings of INTERSPEECH, pp. 4606-4610, (online).
  • [7] Choi, K., Fazekas, G., Sandler, M. and Cho, K. (2018). A comparison of audio signal preprocessing methods for deep neural networks on music tagging, 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, pp. 1870-1874.
  • [8] Diamantaras, K.I. (2002). Neural networks and principal component analysis, in Y.H. Hu and J.-N. Hwand (Eds), Handbook of Neural Network Signal Processing, CRC Press, Boca Raton, pp. 8.1-8.38, DOI: 10.1201/9781315220413.
  • [9] Dimoulas, C., Kalliris, G., Papanikolaou, G. and Kalampakas, A. (2006). Novel wavelet domain wiener filtering de-noising techniques: Application to bowel sounds captured by means of abdominal surface vibrations, Biomedical Signal Processing and Control 1(3): 177-218.
  • [10] Dong, W., Zhang, L., Shi, G. and Li, X. (2012). Nonlocally centralized sparse representation for image restoration, IEEE Transactions on Image Processing 22(4): 1620-1630.
  • [11] Downie, J.S. (2003). Music information retrieval, Annual Review of Information Science and Technology 37(1): 295-340.
  • [12] Esmaili, I., Dabanloo, N.J. and Vali, M. (2016). Automatic classification of speech dysfluencies in continuous speech based on similarity measures and morphological image processing tools, Biomedical Signal Processing and Control 23: 104-114.
  • [13] Foote, J. (1999). Visualizing music and audio using self-similarity, Proceedings of the 7th ACM International Conference on Multimedia (Part 1), Orlando, USA, pp. 77-80.
  • [14] Gama, R., Castro, M.E., van Lith-Bijl, J.T. and Desuter, G. (2021). Does the wearing of masks change voice and speech parameters?, European Archives of Oto-Rhino-Laryngology 2022(279): 1701-1708, DOI: 10.1007/s00405-021-07086-9.
  • [15] Garnier, M. and Henrich, N. (2014). Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise?, Computer Speech & Language 28(2): 580-597.
  • [16] Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, pp. 249-256.
  • [17] Hansen, J.H. (1994). Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect, IEEE Transactions on Speech and Audio Processing 2(4): 598-614.
  • [18] Hotchkin, C. and Parks, S. (2013). The Lombard effect and other noise-induced vocal modifications: Insight from mammalian communication systems, Biological Reviews 88(4): 809-824.
  • [19] Huzaifah, M. (2017). Comparison of time-frequency representations for environmental sound classification using convolutional neural networks, arXiv: 1706.07156.
  • [20] Kherif, F. and Latypova, A. (2020). Principal component analysis, in A. Mechelli and S. Vieira (Eds), Machine Learning, Academic Press, Cambridge, pp. 209-225, DOI: 10.1016/B978-0-12-815739-8.00012-2.
  • [21] Kim, H.-G., Moreau, N. and Sikora, T. (2005). MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval, Wiley, Chichester.
  • [22] Kim, J. and Davis, C. (2014). Comparing the consistency and distinctiveness of speech produced in quiet and in noise, Computer Speech & Language 28(2): 598-606.
  • [23] Kingma, D.P. and Ba, J. (2014). Adam: A method for stochastic optimization, arXiv: 1412.6980.
  • [24] Kleczkowski, P., Żak, A. and Król-Nowak, A. (2017). Lombard effect in Polish speech and its comparison in English speech, Archives of Acoustics 42(4): 561-569.
  • [25] Korvel, G., Kąkol, K., Kurasova, O. and Kostek, B. (2020). Evaluation of Lombard speech models in the context of speech in noise enhancement, IEEE Access 8: 155156-155170, DOI: 10.1109/ACCESS.2020.3015421.
  • [26] Korvel, G., Kurowski, A., Kostek, B. and Czyzewski, A. (2019). Speech analytics based on machine learning, in G.A. Tsihrintzis et al. (Eds), Machine Learning Paradigms, Springer, Cham, pp. 129-157.
  • [27] Korvel, G., Treigys, P. and Kostek, B. (2021). Highlighting interlanguage phoneme differences based on similarity matrices and convolutional neural network, Journal of the Acoustical Society of America 149(1): 508-523.
  • [28] Kostek, B., Kupryjanow, A., Zwan, P., Jiang, W., Raś, Z. W., Wojnarski, M. and Swietlicka, J. (2011). Report of the ISMIS 2011 contest: Music information retrieval, International Symposium on Methodologies for Intelligent Systems, Warsaw, Poland, pp. 715-724.
  • [29] Kowal, M. and Korbicz, J. (2019). Refinement of convolutional neural network based cell nuclei detection using Bayesian inference, 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, pp. 7216-7222.
  • [30] Lee, J., Park, J., Kim, K.L. and Nam, J. (2018). SampleCNN: End-to-end deep convolutional neural networks using very small filters for music classification, Applied Sciences 8(1): 1-14.
  • [31] Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A. and Talwalkar, A. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization, Journal of Machine Learning Research 18(1): 6765-6816.
  • [32] Luo, J., Hage, S.R. and Moss, C.F. (2018). The Lombard effect: From acoustics to neural mechanisms, Trends in Neurosciences 41(12): 938-949.
  • [33] Maheswari, S.U., Shahina, A., Rishickesh, R. and Khan, A.N. (2020). A study on the impact of Lombard effect on recognition of hindi syllabic units using CNN based multimodal ASR systems, Archives of Acoustics 45(3): 419-431.
  • [34] Manaswi, N.K., Manaswi, N.K. and John, S. (2018). Deep Learning with Applications Using Python, Apress, Berkeley.
  • [35] Marcoux, K., Cooke, M., Tucker, B.V. and Ernestus, M. (2022). The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners, Speech Communication 136: 53-62.
  • [36] Marxer, R., Barker, J., Alghamdi, N. and Maddock, S. (2018). The impact of the Lombard effect on audio and visual speech recognition systems, Speech Communication 100: 58-68.
  • [37] Noé, P.-G., Nautsch, A., Evans, N., Patino, J., Bonastre, J.-F., Tomashenko, N. and Matrouf, D. (2022). Towards a unified assessment framework of speech pseudonymisation, Computer Speech & Language 72: 101299.
  • [38] Nugraha, A.A., Sekiguchi, K. and Yoshii, K. (2020). A flow-based deep latent variable model for speech spectrogram modeling and enhancement, IEEE/ACM Transactions on Audio, Speech, and Language Processing 28: 1104-1117.
  • [39] O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H. and Invernizzi, L. (2019). KerasTuner-A hyperparameter optimization framework, https://github.com/keras-team/keras-tuner.
  • [40] Ouyang, Z., Yu, H., Zhu, W.-P. and Champagne, B. (2019). A fully convolutional neural network for complex spectrogram processing in speech enhancement, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, pp. 5756-5760.
  • [41] Panek, D., Skalski, A., Gajda, J. and Tadeusiewicz, R. (2015). Acoustic analysis assessment in speech pathology detection, International Journal of Applied Mathematics and Computer Science 25(3): 631-643, DOI: 10.1515/amcs-2015-0046.
  • [42] Piotrowska, M., Czyżewski, A., Ciszewski, T., Korvel, G., Kurowski, A. and Kostek, B. (2021). Evaluation of aspiration problems in L2 English pronunciation employing machine learning, Journal of the Acoustical Society of America 150(1): 120-132.
  • [43] Piotrowska, M., Korvel, G., Kostek, B., Ciszewski, T. and Czyżewski, A. (2019). Machine learning-based analysis of English lateral allophones, International Journal of Applied Mathematics and Computer Science 29(2): 393-405, DOI: 10.2478/amcs-2019-0029.
  • [44] Rybka, J. and Janicki, A. (2013). Comparison of speaker dependent and speaker independent emotion recognition, International Journal of Applied Mathematics and Computer Science 23(4): 797-808, DOI: 10.2478/amcs-2013-0060.
  • [45] Saba, J.N. and Hansen, J.H. (2022). The effects of Lombard perturbation on speech intelligibility in noise for normal hearing and cochlear implant listeners, Journal of the Acoustical Society of America 151(2): 1007-1021.
  • [46] Schedl, M., Gómez, E. and Urbano, J. (2014). Music information retrieval: Recent developments and applications, Foundations and Trends® in Information Retrieval 8(2-3): 127-261.
  • [47] Smailis, C., Sarafianos, N., Giannakopoulos, T. and Perantonis, S. (2016). Fusing active orientation models and mid-term audio features for automatic depression estimation, Proceedings of the 9th ACM International Conference on Pervasive Technologies Related to Assistive Environments, Corfu, Greece, pp. 1-4.
  • [48] Stathopoulos, E.T., Huber, J.E., Richardson, K., Kamphaus, J., DeCicco, D., Darling, M., Fulcher, K. and Sussman, J.E. (2014). Increased vocal intensity due to the Lombard effect in speakers with Parkinson’s disease: Simultaneous laryngeal and respiratory strategies, Journal of Communication Disorders 48: 1-17.
  • [49] Summers, W.V., Pisoni, D.B., Bernacki, R.H., Pedlow, R.I. and Stokes, M.A. (1988). Effects of noise on speech production: Acoustic and perceptual analyses, Journal of the Acoustical Society of America 84(3): 917-928.
  • [50] Tsardoulias, E., Thallas, A.G., Symeonidis, A.L. and Mitkas, P.A. (2016). Improving multilingual interaction for consumer robots through signal enhancement in multichannel speech, Journal of the Audio Engineering Society 64(7/8): 514-524.
  • [51] Vlaj, D. and Kacic, Z. (2011). The influence of Lombard effect on speech recognition, in I. Ipšić (Ed), Speech Technologies, INTECH Open Access Publisher, London, pp. 151-168.
  • [52] Wang, S., Wei, Y., Long, K., Zeng, X. and Zheng, M. (2018). Image super-resolution via self-similarity learning and conformal sparse representation, IEEE Access 6: 68277-68287.
  • [53] Wei, I.-C., Wu, C.-W. and Su, L. (2019). Generating structured drum pattern using variational autoencoder and self-similarity matrix, 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, pp. 847-854.
  • [54] Zhang, S., Li, X., Zong, M., Zhu, X. and Wang, R. (2017). Efficient kNN classification with different numbers of nearest neighbors, IEEE Transactions on Neural Networks and Learning Systems 29(5): 1774-1785.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-eadbb82e-d121-4972-a372-f6281968e179
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.