Head-Related Transfer Function Selection Using Neural Networks

Yao, S.-N.; Collins, T.; Liang, C.

doi:10.1515/aoa-2017-0038

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Head-Related Transfer Function Selection Using Neural Networks

Autorzy

Yao S.-N. , Collins T. , Liang C.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.1515/aoa-2017-0038

Warianty tytułu

Języki publikacji

Abstrakty

In binaural audio systems, for an optimal virtual acoustic space a set of head-related transfer functions (HRTFs) should be used that closely matches the listener’s ones. This study aims to select the most appropriate HRTF dataset from a large database for users without the need for extensive listening tests. Currently, there is no way to reliably reduce the number of datasets to a smaller, more manageable number without risking discarding potentially good matches. A neural network that estimates the appropriateness of HRTF datasets based on input vectors of anthropometric measurements is proposed. The shapes and sizes of listeners’ heads and pinnas were measured using digital photography; the measured anthropometric parameters form the feature vectors used by the neural network. A graphical user interface (GUI) was developed for participants to listen to music transformed using different HRTFs and to evaluate the fitness of each HRTF dataset. The listening scores recorded were the target outputs used to train the neural networks. The aim was to learn a mapping between anthropometric parameters and listener’s perception scores. Experimental validations were performed on 30 subjects. It is demonstrated that the proposed system produces a much more reliable HRTF selection than previously used methods.

Słowa kluczowe

head-related transfer function neural networks localisation music audio anthropometry pinna

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Komitet Akustyki PAN
Polskie Towarzystwo Akustyczne

Czasopismo

Archives of Acoustics

Rocznik

2017

Tom

Vol. 42, No. 3

Strony

365--373

Opis fizyczny

Bibliogr. 27 poz., fot., rys., tab., wykr.

Twórcy

autor

Yao S.-N.

snyao@gm.ntpu.edu.tw

Department of Electrical Engineering, National Taipei University, No. 151, University Rd., San Shia District, New Taipei City 23741, Taiwan

autor

Collins T.

T.Collins@mmu.ac.uk

School of Engineering, Manchester Metropolitan University, Manchester, M1 5GD, UK

autor

Liang C.

cliang@ntu.edu.tw

Department of Bio-Industry Communication and Development, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan

Bibliografia

1. Algazi V. R., Duda R. O., Thompson D. M., Avendano C. (2001), The CIPIC HRTF database, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Electro-Acoustics, pp. 99-102.
2. Batteau D. W. (1967), The role of the pinna in human localisation, Royal Society London, 168, B, 158-180.
3. Benitez J. M., Castro J. L., Requena I. (1997), Are artificial neural networks black boxes, IEEE Transactions on Neural Networks, 8, 5, 1156-1164.
4. Brown C. P., Duda R. O. (1997), An efficient HRTF model for 3-D sound, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 19-22.
5. Brown C. P., Duda R. O. (1998), A structural model for binaural sound synthesis, Virtual sound rendering in a stereophonic loudspeaker setup, IEEE Transactions on Audio, Speech, and Language Processing, 6, 5, 476-488.
6. Choi T., Park Y., Youn D., Lee S. (2011), Virtual sound rendering in a stereophonic loudspeaker setup, IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 1962 -1974.
7. Chun C. J., Kim H. K., Choi S. H., Jang S. J., Lee S. P. (2011), Sound source elevation using spectra notch filtering and directional band boosting in stereo loudspeaker reproduction, IEEE Transactions on Consumer Electronics, 57, 4, 1915-1920.
8. Collins T. (2013), Binaural ambisonic decoding with enhanced lateral localization, Proceedings of Audio Engineering Society 134th Convention.
9. Dave V. S., Dutta K. (2014), Neural network based models for software effort estimation: a review, Artificial Intelligence Review, 42, 2, 295-307.
10. Fechner G. T. (1860), Elements of psychophysics, Holt Rinehart & Winston, New York.
11. Gupta N., Barreto A., Joshi M., Aguedelo J. (2010), HRTF database at FIU DSP lab, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 169-172.
12. Gupta N., Barreto A., Ordonez C. (2002), Spectral modification of head-related transfer functions for improved virtual sound spatialization, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1953-1956.
13. Hagan M. T., Demuth H. B., Beale M. (2002), Neural Network Design, CITIC Publishing House, Beijing.
14. Ideri A., Abran A., Mbarki S. (2004), Validating and understanding software cost estimation models based on neural networks, Proceedings of IEEE International Conference on Information and Communication Technologies, pp. 433-434.
15. Ircam (2002), Listen HRTF database, http://recherche.ircam.fr/equipes/salles/listen/.
16. Jang J.-S. R, Sun C. T. (1993), Functional equivalence between radial basis function networks and fuzzy inference systems, IEEE Transactions on Neural Networks, 4, 1, 156-159.
17. Masterson C., Kearney G., Gorzel M., Boland F. M. (2012), HRIR order reduction using approximate factorization, IEEE Transactions on Audio, Speech, and Language Processing, 20, 6, 1808-1817.
18. Pett M. A. (1997), Nonparametric statistics for health care research: Statistics for small samples and unusual distributions, Sage Publications, Thousand Oaks, CA.
19. Ranjan R., Gan W.-S. (2015), Natural listening over headphones in augmented reality using adaptive filtering techniques, IEEE/ACM Trans. Audio, Speech and Language Processing, 23, 11, 1988-2002.
20. Salkind N. J. (2004), Statistics for people who (think they) hate statistics, Sage Publications, Thousand Oaks, CA.
21. Shabtai N. R., Rafaely B. (2014), Generalized spherical array beamforming for binaural speech reproduction, IEEE/ACM Transactions on Audio, Speech and Language Processing, 22, 1, 238-247.
22. Tan C.-J., Gan W.-S. (1998), User-defined spectra manipulation of HRTF for improved localisation in 3D sound systems, Electronics Letters, 34, 25, 2387-2389.
23. Watkins A. J. (1978), Psychoacoustical aspects of synthesized vertical locale cues, Journal of Acoustical Society of America, 63, 4, 1152-1165.
24. Wythoff B. J. (1993), Backpropagation neural networks: a tutorial, Chemometrics and Intelligent Laboratory Systems, 18, 115-155.
25. Yao S.-N., Chen L. J. (2013), HRTF Adjustments with audio quality assessments, Archives of Acoustics, 38, 1, 55-62.
26. Zhang M., Tan K.-C., Er M. H. (1998), A refined algorithm of 3-D sound synthesis, Proceedings of IEEE International Conference on Signal Processing Proceedings, pp. 1408-1411.
27. Zotkin D. N., Duraiswami R., Davis L. S. (2004), Rendering localized spatial audio in a virtual auditory space, IEEE Transactions on Multimedia, 6, 4, 553-564.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-4cd4d157-a46b-460b-9f3b-2a560d4dc60c