Powiadomienia systemowe
- Sesja wygasła!
Tytuł artykułu
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Binaural technology has been known for decades. However, advancements in software and consumer electronics have facilitated its widespread adoption, primarily in the post-millennium era. As binaural sound becomes more popular, the demand for spatial analysis tools is expected to grow. This paper evaluates three methods for assessing ensemble width in binaural music recordings: (1) an auditory model with decision trees, (2) a neural network model, and (3) a spatial spectrogram approach. Under ideal, anechoic conditions, the auditory model performed best with a mean absolute error (MAE) of 6.59° (±0.11°), followed by the neural network (8.57° ±0.19°) and the technique based on spatial spectrograms (13.54° ±0.92°). Extend-ing previous work, this study analyzes the methods’ robustness to reverberation and noise. Noise resilience tests indicate moderate resistance, with the auditory model yielding an MAE of 12.34° at a 10 dB signal-to-noise ratio. However, reverberation tests show a significant drop in accuracy even at an RT60 reverberation time of 0.1 seconds. The findings may contribute to the improvement of models for estimating ensemble width in binaural recordings of music, which could influence the development of binaural sound analysis tools, with potential applications in audio production.
Rocznik
Tom
Strony
5
Opis fizyczny
Bibliogr., 33 poz., rys., tab.
Twórcy
autor
- Bialystok University of Technology, Poland
- Bialystok University of Technology, Poland
Bibliografia
- [1] S. Paul, “Binaural Recording Technology: A Historical Review and Possible Future Developments,” Acta Acustica united with Acustica, vol. 95, pp. 767-788, Sep. 2009.
- [2] S. Linkwitz, “Binaural audio in the era of virtual reality: A digest of research papers presented at recent aes conventions,” Journal of the Audio Engineering Society, vol. 51, no. 11, pp. 1066-1072, Nov. 2003.
- [3] D. Begault and E. Wenzel, “Techniques and Applications for Binaural Sound Manipulation,” International Journal of Aviation Psychology - INT J AVIAT PSYCHOL, vol. 2, pp. 1-22, Feb. 1992.
- [4] J. Thiemann, M. Müller, D. Marquardt, S. Doclo, and S. van de Par, “Speech enhancement for multimicrophone binaural hearing aids aiming to preserve the spatial auditory scene,” EURASIP Journal on Advances in Signal Processing, vol. 2016, no. 1, p. 12, Feb. 2016. [Online]. Available: https://doi.org/10.1186/s13634-016-0314-6
- [5] E. C. Cherry, “Some Experiments on the Recognition of Speech, with One and with Two Ears,” The Journal of the Acoustical Society of America, vol. 25, no. 5, pp. 975-979, Sep. 1953, _eprint: https://pubs.aip.org/asa/jasa/article-pdf/25/5/975/18731769/975_1_online.pdf. [Online]. Available: https://doi.org/10.1121/1.1907229
- [6] A. Bregman, “Auditory Scene Analysis: The Perceptual Organization of Sound,” in Journal of The Acoustical Society of America - J ACOUST SOC AMER, Jan. 1990, vol. 95, journal Abbreviation: Journal of The Acoustical Society of America - J ACOUST SOC AMER.
- [7] F. Rumsey, “Spatial Quality Evaluation for Reproduced Sound: Termi-nology, Meaning, and a Scene-Based Paradigm,” Journal of the Audio Engineering Society, vol. 50, pp. 651-666, Sep. 2002.
- [8] D. Griesinger, “The Psychoacoustics of Apparent Source Width, Spa-ciousness and Envelopment in Performance Spaces,” Acta Acustica united with Acustica, vol. 83, pp. 721-731, Jul. 1997.
- [9] P. Antoniuk, S. K. Zieli´nski, and H. Lee, “Ensemble width estimation in HRTF-convolved binaural music recordings using an auditory model and a gradient-boosted decision trees regressor,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2024, no. 1, p. 53, Oct. 2024. [Online]. Available: https://doi.org/10.1186/s13636-024-00374-2
- [10] P. Antoniuk and S. K. Zielinski, “Estimating Ensemble Location and Width in Binaural Recordings of Music with Convolutional Neural Networks,” Archives of Acoustics, 2024, Accepted for publication.
- [11] S. Arthi and T. V. Sreenivas, “Binaural Spatial Transform for Multi-source Localization determining Angular Extent of Ensemble Source Width,” in 2022 IEEE International Conference on Signal Processing and Communications (SPCOM). Bangalore, India: IEEE, Jul. 2022, pp. 1-5. [Online]. Available: https://ieeexplore.ieee.org/document/9840782/
- [12] P. Antoniuk and S. K. Zieli´nski, “Blind estimation of ensemble width in binaural music recordings using ‘spatiograms’ under simulated anechoic conditions,” in Audio Engineering Society Conference: AES 2023 International Conference on Spatial and Immersive Audio, Aug. 2023. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=22203
- [13] M. M. Taye, “Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions,” Computers, vol. 12, no. 5, p. 91, May 2023, number: 5 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/2073-431X/12/5/91
- [14] W. Ye, G. Zheng, X. Cao, Y. Ma, and A. Zhang, “Spurious Correlations in Machine Learning: A Survey,” May 2024, arXiv:2402.12715 [cs]. [Online]. Available: http://arxiv.org/abs/2402.12715
- [15] M. T. Ribeiro, S. Singh, and C. Guestrin, “"Why Should I Trust You?": Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: Association for Computing Machinery, Aug. 2016, pp. 1135-1144. [Online]. Available: https://dl.acm.org/doi/10.1145/2939672.2939778
- [16] E. L. Benaroya, N. Obin, M. Liuni, A. Roebel, W. Raumel, and S. Argentieri, “Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 6, pp. 1072-1082, Jun. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8294267/
- [17] N. Ma and G. J. Brown, “Speech Localisation in a Multitalker Mixture by Humans and Machines,” in Interspeech 2016. ISCA, Sep. 2016, pp. 3359-3363. [Online]. Available: https://www.isca-speech.org/archive/interspeech_2016/ma16c_interspeech.html
- [18] N. Ma, T. May, and G. J. Brown, “Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2444-2453, Dec. 2017. [Online]. Available: https://ieeexplore.ieee.org/document/8086216/
- [19] T. May, S. Van De Par, and A. Kohlrausch, “A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 1-13, Jan. 2011. [Online]. Available: http://ieeexplore.ieee.org/document/5406118/
- [20] T. May, N. Ma, and G. J. Brown, “Robust localisation of multiple speakers exploiting head movements and multi-conditional training of binaural cues,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). South Brisbane, Queensland, Australia: IEEE, Apr. 2015, pp. 2679-2683. [Online]. Available: http://ieeexplore.ieee.org/document/7178457/
- [21] Q. Yang and Y. Zheng, “DeepEar: Sound Localization With Binaural Microphones,” IEEE Transactions on Mobile Computing, vol. 23, no. 1, pp. 359-375, Jan. 2024. [Online]. Available: https://ieeexplore.ieee.org/document/9954178/
- [22] D. Pavlidi, M. Puigt, A. Griffin, and A. Mouchtaris, “Real-time multiple sound source localization using a circular microphone array based on single-source confidence measures,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2012, pp. 2625-2628, iSSN: 2379-190X. [Online]. Available: https://ieeexplore.ieee.org/document/6288455
- [23] M. Hahmann, E. Fernandez-Grande, H. Gunawan, and P. Gerstoft, “Sound source localization using multiple ad hoc distributed microphone arrays,” JASA Express Letters, vol. 2, no. 7, p. 074801, Jul. 2022.
- [24] M. Liu, J. Hu, Q. Zeng, Z. Jian, and L. Nie, “Sound Source Localization Based on Multi-Channel Cross-Correlation Weighted Beamforming,” Micromachines, vol. 13, no. 7, p. 1010, Jul. 2022, number: 7 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/2072-666X/13/7/1010
- [25] “ITU-R BS.1770-5: Algorithms to measure audio programme loudness and true-peak audio level,” in International Communications Union, Geneva, Switzerland, Nov. 2023.
- [26] A. J. King, O. Kacelnik, T. D. Mrsic-Flogel, J. W. Schnupp, C. H. Parsons, and D. R. Moore, “How Plastic Is Spatial Hearing?” Audiology and Neurotology, vol. 6, no. 4, pp. 182-186, Nov. 2001. [Online]. Available: https://doi.org/10.1159/000046829
- [27] J. Blauert, Ed., The Technology of Binaural Listening. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. [Online]. Available: http://link.springer.com/10.1007/978-3-642-37762-4
- [28] R. Decorsière and T. May, “Auditory front-end. Two Ears Project Documentatio,” 2016. [Online]. Available: https://docs.twoears.eu/en/latest/afe/
- [29] A. Raake, “A computational framework for modelling active exploratory listening that assigns meaning to auditory scenes—reading the world with two ears,” 2016. [Online]. Available: http://twoears.eu/
- [30] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
- [31] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84-90, May 2017. [Online]. Available: https://dl.acm.org/doi/10.1145/3065386
- [32] M. Lin, Q. Chen, and S. Yan, “Network In Network,” CoRR, vol. abs/1312.4400, 2013. [Online]. Available: https://api.semanticscholar.org/CorpusID:16636683
- [33] A. Wabnitz, N. Epain, C. T. Jin, and A. v. Schaik, “Room acoustics simulation for multichannel microphone arrays,” in Proceedings of the International Symposium on Room Acoustics (ISRA), 2010. [Online]. Available: https://www.acoustics.asn.au/conference_proceedings/ICA2010/cdrom-ISRA2010/Papers/P5d.pdf
Uwagi
The work was supported by grants from Bialystok University of Technology (WI/WI-IIT/3/2022 and WZ/WI-IIT/5/2023) and funded with resources for research by the Ministry of Science and Higher Education in Poland.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-1c3fa7de-eb52-4692-a30d-79ec4adbf3b8
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.