Urban sound classification using long short-term memory neural network

Lezhenin, Iurii; Bogach, Natalia; Pyshkin, Evgeny

doi:10.15439/2019F185

Artykuł - szczegóły

Tytuł artykułu

Urban sound classification using long short-term memory neural network

Autorzy

Lezhenin Iurii , Bogach Natalia , Pyshkin Evgeny

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2019F185

Warianty tytułu

Konferencja

Federated Conference on Computer Science and Information Systems (14 ; 01-04.09.2019 ; Leipzig, Germany)

Języki publikacji

Abstrakty

Environmental sound classification has received more attention in recent years. Analysis of environmental sounds is difficult because of its unstructured nature. However, the presence of strong spectro-temporal patterns makes the classification possible. Since LSTM neural networks are efficient at learning temporal dependencies we propose and examine a LSTM model for urban sound classification. The model is trained on magnitude mel-spectrograms extracted from UrbanSound8K dataset audio. The proposed network is evaluated using 5-fold cross-validation and compared with the baseline CNN. It is shown that the LSTM model outperforms a set of existing solutions and is more accurate and confident than the CNN.

Słowa kluczowe

environmental sound classification long short-term memory convolutional neural networks UrbanSound8K data set

klasyfikacja dźwięków środowiskowych pamięć krótkotrwała konwolucyjne sieci neuronowe zbiór danych UrbanSound8K

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2019

Tom

Vol. 18

Strony

57--60

Opis fizyczny

Bibliogr. 28 poz., wz., tab.

Twórcy

autor

Lezhenin Iurii

lezhenin@kspt.icc.spbstu.ru

Institute of Computer Science and Technology, Peter the Great St.Petersburg Polytechnic University, St.Petersburg, 195251, Russia

autor

Bogach Natalia

bogach@kspt.icc.spbstu.ru

Institute of Computer Science and Technology, Peter the Great St.Petersburg Polytechnic University, St.Petersburg, 195251, Russia

autor

Pyshkin Evgeny

pyshe@u-aizu.ac.jp

Software Engineering Lab, University of Aizu, Aizu-Wakamatsu, 965-8580, Japan

Bibliografia

1. R. Radhakrishnan, A. Divakaran, and A. Smaragdis, “Audio analysis for surveillance applications,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005. IEEE, 2005, pp. 158–161. [Online]. Available: https://doi.org/10.1109/ASPAA.2005.1540194
2. M. Cristani, M. Bicego, and V. Murino, “Audio-visual event recognition in surveillance video sequences,” IEEE Transactions on Multimedia, vol. 9, no. 2, pp. 257–267, 2007. [Online]. Available: https://doi.org/10.1109/TMM.2006.886263
3. S. Chu, S. Narayanan, C.-C. J. Kuo, and M. J. Mataric, “Where am i? scene recognition for mobile robots using audio features,” in 2006 IEEE International conference on multimedia and expo. IEEE, 2006, pp. 885–888. [Online]. Available: https://doi.org/10.1109/ICME.2006.262661
4. R. Bardeli, D. Wolff, F. Kurth, M. Koch, K.-H. Tauchert, and K.-H. Frommolt, “Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring,” Pattern Recognition Letters, vol. 31, no. 12, pp. 1524–1534, 2010. [Online]. Available: https://doi.org/10.1016/j.patrec.2009.09.014
5. C. Mydlarz, J. Salamon, and J. P. Bello, “The implementation of low-cost urban acoustic monitoring devices,” Applied Acoustics, vol. 117, pp. 207–218, 2017. [Online]. Available: https://doi.org/10.1016/j.apacoust.2016.06.010
6. D. Steele, J. Krijnders, and C. Guastavino, “The sensor city initiative: cognitive sensors for soundscape transformations,” GIS Ostrava, pp. 1–8, 2013.
7. V. Davidovski, “Exponential innovation through digital transformation,” in Proceedings of the 3rd International Conference on Applications in Information Technology. ACM, 2018, pp. 3–5. [Online]. Available: https://doi.org/10.1145/3274856.3274858
8. F. Tappero, R. M. Alsina-Pagès, L. Duboc, and F. Alı́as, “Leveraging urban sounds: A commodity multi-microphone hardware approach for sound recognition,” in Multidisciplinary Digital Publishing Institute Proceedings, vol. 4, no. 1, 2019, p. 55. [Online]. Available: https://doi.org/10.3390/ecsa-5-05756
9. E. Pyshkin, “Designing human-centric applications: Transdisciplinary connections with examples,” in 2017 3rd IEEE International Conference on Cybernetics (CYBCONF). IEEE, 2017, pp. 1–6. [Online]. Available: https://doi.org/10.1109/CYBConf.2017.7985774
10. E. Pyshkin and A. Kuznetsov, “Approaches for web search user interfaces-how to improve the search quality for various types of information,” JoC, vol. 1, no. 1, pp. 1–8, 2010. [Online]. Available: https://www.earticle.net/Article/A188181
11. M. B. Dias, “Navpal: Technology solutions for enhancing urban navigation for blind travelers,” tech. report CMU-RI-TR-21, Robotics Institute, Carnegie Mellon University, 2014.
12. S. Chu, S. Narayanan, and C.-C. J. Kuo, “Environmental sound recognition with time–frequency audio features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142–1158, 2009. [Online]. Available: https://doi.org/10.1109/TASL.2009.2017438
13. S. Chachada and C.-C. J. Kuo, “Environmental sound recognition: A survey,” vol. 3, 10 2013, pp. 1–9. [Online]. Available: https://doi.org/10.1109/APSIPA.2013.6694338
14. D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. Plumbley, “Detection and classification of acoustic scenes and events: An ieee aasp challenge,” 10 2013, pp. 1–4. [Online]. Available: https://doi.org/10.1109/WASPAA.2013.6701819
15. Z. Kons, O. Toledo-Ronen, and M. Carmel, “Audio event classification using deep neural networks.” in Interspeech, 2013, pp. 1482–1486.
16. K. J. Piczak, “Environmental sound classification with convolutional neural networks,” in 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2015, pp. 1–6. [Online]. Available: https://doi.org/10.1109/MLSP.2015.7324337
17. J. Salamon and J. P. Bello, “Deep convolutional neural networks and data augmentation for environmental sound classification,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279–283, 2017. [Online]. Available: https://doi.org/10.1109/LSP.2017.2657381
18. V. Boddapati, A. Petef, J. Rasmusson, and L. Lundberg, “Classifying environmental sounds using image recognition networks,” Procedia computer science, vol. 112, pp. 2048–2056, 2017. [Online]. Available: https://doi.org/10.1016/j.procs.2017.08.250
19. B. Zhu, K. Xu, D. Wang, L. Zhang, B. Li, and Y. Peng, “Environmental sound classification based on multi-temporal resolution convolutional neural network combining with multi-level features,” in Pacific Rim Conference on Multimedia. Springer, 2018, pp. 528–537. [Online]. Available: https://doi.org/10.1007/978-3-030-00767-6 49
20. Y. Wang, L. Neves, and F. Metze, “Audio-based multimedia event detection using deep recurrent neural networks,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 2742–2746. [Online]. Available: https://doi.org/10.1109/ICASSP.2016.7472176
21. S. H. Bae, I. Choi, and N. S. Kim, “Acoustic scene classification using parallel combination of lstm and cnn,” in Proceedings of the Detec- tion and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), 2016, pp. 11–15.
22. J. Sang, S. Park, and J. Lee, “Convolutional recurrent neural networks for urban sound classification using raw waveforms,” in 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 2018, pp. 2444–2448. [Online]. Available: https://doi.org/10.23919/EUSIPCO.2018.8553247
23. A. Graves, S. Fernández, and J. Schmidhuber, “Bidirectional lstm networks for improved phoneme classification and recognition,” in International Conference on Artificial Neural Networks. Springer, 2005, pp. 799–804. [Online]. Available: https://doi.org/10.1007/11550907_126
24. A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013, pp. 6645–6649. [Online]. Available: https://doi.org/10.1109/ICASSP.2013.6638947
25. Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, “Tts synthesis with bidirectional lstm based recurrent neural networks,” in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
26. J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4694–4702. [Online]. Available: https://doi.org/10.1109/CVPR.2015.7299101
27. J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for urban sound research,” in Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014, pp. 1041–1044. [Online]. Available: https://doi.org/10.1145/2647868.2655045
28. J. Salamon and J. P. Bello, “Unsupervised feature learning for urban sound classification,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 171–175. [Online]. Available: https://doi.org/10.1109/ICASSP.2015.7177954

Uwagi

1. This work was partially supported by the grant 17K00509 of Japan Society for the Promotion of Science (JSPS).

2. Track 1: Artificial Intelligence and Applications

3. Technical Session: 14th International Symposium Advances in Artificial Intelligence and Applications

4. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-dce07654-3091-401d-a84b-8bc7ad34fc73