Tytuł artykułu
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
The aim of this study is to create the method for automatic recognition of artificial reverberation settings extracted from a reference speech recordings. The proposed method employs machine-learning techniques to support the sound engineer in finding the ideal settings for artificial reverberation plugin available at a given Digital Audio Workstation (DAW), i.e. Gaussian Mixture Model (GMM) approach and deep Convolutional Neural Network (CNN) VGG13, which is a novel approach. Training set and data set are 1885 speech signals selected from a EMIME Bilingual Database which were processed with 66 artificial reverberation presets selected from Semantic Audio Labs’s SAFE Reverb plugin database. Performance of the proposed automatic recognition method was evaluated using similarity measures between features of reference and analysed speech recordings. Evaluation procedure showed that a classical GMM approach gives 43.8% of recognition accuracy while proposed method with VGG13 deep CNN gives 99.94% of accuracy.
Czasopismo
Rocznik
Tom
Strony
art. no. 2019125
Opis fizyczny
Bibliogr. 14 poz., wykr.
Twórcy
autor
- Warsaw University of Technology, Faculty of Electronics and Information Technology, Institute of Radioelectronics and Multimedia Technology, Nowowiejska 15/19, 00-665 Warsaw
- Promity Sp. z o.o., Wiejska 14/25, 00-490 Warsaw
autor
- Warsaw University of Technology, Faculty of Electronics and Information Technology, Institute of Radioelectronics and Multimedia Technology, Nowowiejska 15/19, 00-665 Warsaw
Bibliografia
- 1. M. R. Schroeder, B. F. Logan, Colorless artificial reverberation, IRE Transactions on Audio, 6 (1961) 209 - 214.
- 2. V. Valimaki, J. D. Parker, L. Savioja, J. O. Smith, J. S. Abel, Fifty years of artificial reverberation, IEEE Transactions on Audio, Speech, and Language Processing, 20(5) (2012) 1421 - 1448.
- 3. J. Jullien, E. Kahle, M. Marin, O. Warusfel, Spatializer: a perceptual approach, 94th Convention of the Audio Engineering Society, Preprint, 3465 (1993).
- 4. M. F. Zbyszynski, A. Freed, Control of VST plug-ins using OSC, Proc. of the International Computer Music Conference, Spain 2005, 263 - 266.
- 5. Z. Rafii, B. Pardo, Learning to control a reverberator using subjective perceptual descriptors, 10th International Society for Music Information Retrieval (2009).
- 6. N. Peters, J. Choi, H. Lei, Matching Artificial Reverb Settings to Unknown Room Recordings: a Recommendation System for Reverb Plugins, 133rd Audio Engineering Society Convention, USA 2012.
- 7. E. T. Chourdakis, J. D. Reiss, A machine-learning approach to application of intelligent artificial reverberation, Journal of the Audio Engineering Society, 2017.
- 8. D. Reynolds, T. Quatieri, R. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, 10(1-3) (2000) 19 - 41.
- 9. T. Iqbal, Q. Kong, M. Plumbley, W. Wang, Stacked Convolutional Neural Networks For General-Purpose Audio Tagging, Centre for Vision, Speech and Signal Processing, University of Surrey 2018.
- 10. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 3rd ICLR 2015.
- 11. H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, mixup: Beyond empirical risk minimization, 6th ICLR 2015.
- 12. S. O. Sadjadi, M. Slaney, L. Heck, MSR Identity Toolbox, Microsoft Research 2013.
- 13. M. Wester, The EMIME Bilingual Database, The University of Edynburg: Centre for Speech Technology Research 2012.
- 14. Semantic Audio Labs, [Online], http://www.semanticaudio.co.uk/, [Access: 10.07.2019].
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-38237fc8-d2c4-4dd2-8625-24061a8baf3d