Automatic recognition of artificial reverberation settings in speech recordings

Kachniarz, Krzysztof; Lewandowski, Marcin

Artykuł - szczegóły

Tytuł artykułu

Automatic recognition of artificial reverberation settings in speech recordings

Autorzy

Kachniarz Krzysztof , Lewandowski Marcin

Treść / Zawartość

Pełne teksty:

Kachniarz_Lewandowski_Automatic_1_2019.pdf

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The aim of this study is to create the method for automatic recognition of artificial reverberation settings extracted from a reference speech recordings. The proposed method employs machine-learning techniques to support the sound engineer in finding the ideal settings for artificial reverberation plugin available at a given Digital Audio Workstation (DAW), i.e. Gaussian Mixture Model (GMM) approach and deep Convolutional Neural Network (CNN) VGG13, which is a novel approach. Training set and data set are 1885 speech signals selected from a EMIME Bilingual Database which were processed with 66 artificial reverberation presets selected from Semantic Audio Labs’s SAFE Reverb plugin database. Performance of the proposed automatic recognition method was evaluated using similarity measures between features of reference and analysed speech recordings. Evaluation procedure showed that a classical GMM approach gives 43.8% of recognition accuracy while proposed method with VGG13 deep CNN gives 99.94% of accuracy.

Słowa kluczowe

artificial reverberation machine learning digital audio signal processing

sztuczny pogłos uczenie maszynowe cyfrowe przetwarzanie sygnałów audio

Wydawca

Poznan University of Technology. Institute of Applied Mechanics

Czasopismo

Vibrations in Physical Systems

Rocznik

2019

Tom

Vol. 30, nr 1

Strony

art. no. 2019125

Opis fizyczny

Bibliogr. 14 poz., wykr.

Twórcy

autor

Kachniarz Krzysztof

kkachni1@mion.elka.pw.edu.pl

Warsaw University of Technology, Faculty of Electronics and Information Technology, Institute of Radioelectronics and Multimedia Technology, Nowowiejska 15/19, 00-665 Warsaw
Promity Sp. z o.o., Wiejska 14/25, 00-490 Warsaw

autor

Lewandowski Marcin

marcin.lewandowski@ire.pw.edu.pl

Warsaw University of Technology, Faculty of Electronics and Information Technology, Institute of Radioelectronics and Multimedia Technology, Nowowiejska 15/19, 00-665 Warsaw

Bibliografia

1. M. R. Schroeder, B. F. Logan, Colorless artificial reverberation, IRE Transactions on Audio, 6 (1961) 209 - 214.
2. V. Valimaki, J. D. Parker, L. Savioja, J. O. Smith, J. S. Abel, Fifty years of artificial reverberation, IEEE Transactions on Audio, Speech, and Language Processing, 20(5) (2012) 1421 - 1448.
3. J. Jullien, E. Kahle, M. Marin, O. Warusfel, Spatializer: a perceptual approach, 94th Convention of the Audio Engineering Society, Preprint, 3465 (1993).
4. M. F. Zbyszynski, A. Freed, Control of VST plug-ins using OSC, Proc. of the International Computer Music Conference, Spain 2005, 263 - 266.
5. Z. Rafii, B. Pardo, Learning to control a reverberator using subjective perceptual descriptors, 10th International Society for Music Information Retrieval (2009).
6. N. Peters, J. Choi, H. Lei, Matching Artificial Reverb Settings to Unknown Room Recordings: a Recommendation System for Reverb Plugins, 133rd Audio Engineering Society Convention, USA 2012.
7. E. T. Chourdakis, J. D. Reiss, A machine-learning approach to application of intelligent artificial reverberation, Journal of the Audio Engineering Society, 2017.
8. D. Reynolds, T. Quatieri, R. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, 10(1-3) (2000) 19 - 41.
9. T. Iqbal, Q. Kong, M. Plumbley, W. Wang, Stacked Convolutional Neural Networks For General-Purpose Audio Tagging, Centre for Vision, Speech and Signal Processing, University of Surrey 2018.
10. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 3rd ICLR 2015.
11. H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, mixup: Beyond empirical risk minimization, 6th ICLR 2015.
12. S. O. Sadjadi, M. Slaney, L. Heck, MSR Identity Toolbox, Microsoft Research 2013.
13. M. Wester, The EMIME Bilingual Database, The University of Edynburg: Centre for Speech Technology Research 2012.
14. Semantic Audio Labs, [Online], http://www.semanticaudio.co.uk/, [Access: 10.07.2019].

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-38237fc8-d2c4-4dd2-8625-24061a8baf3d