Czasopismo
Tytuł artykułu
Wybrane pełne teksty z tego czasopisma
Warianty tytułu
Konferencja
Federated Conference on Computer Science and Information Systems (14 ; 01-04.09.2019 ; Leipzig, Germany)
Języki publikacji
Abstrakty
Environmental sound classification has received more attention in recent years. Analysis of environmental sounds is difficult because of its unstructured nature. However, the presence of strong spectro-temporal patterns makes the classification possible. Since LSTM neural networks are efficient at learning temporal dependencies we propose and examine a LSTM model for urban sound classification. The model is trained on magnitude mel-spectrograms extracted from UrbanSound8K dataset audio. The proposed network is evaluated using 5-fold cross-validation and compared with the baseline CNN. It is shown that the LSTM model outperforms a set of existing solutions and is more accurate and confident than the CNN.
Rocznik
Tom
Strony
57--60
Opis fizyczny
Bibliogr. 28 poz., wz., tab.
Twórcy
autor
- Institute of Computer Science and Technology, Peter the Great St.Petersburg Polytechnic University, St.Petersburg, 195251, Russia, lezhenin@kspt.icc.spbstu.ru
autor
- Institute of Computer Science and Technology, Peter the Great St.Petersburg Polytechnic University, St.Petersburg, 195251, Russia, bogach@kspt.icc.spbstu.ru
autor
- Software Engineering Lab, University of Aizu, Aizu-Wakamatsu, 965-8580, Japan, pyshe@u-aizu.ac.jp
Bibliografia
- 1. R. Radhakrishnan, A. Divakaran, and A. Smaragdis, “Audio analysis for surveillance applications,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005. IEEE, 2005, pp. 158–161. [Online]. Available: https://doi.org/10.1109/ASPAA.2005.1540194
- 2. M. Cristani, M. Bicego, and V. Murino, “Audio-visual event recognition in surveillance video sequences,” IEEE Transactions on Multimedia, vol. 9, no. 2, pp. 257–267, 2007. [Online]. Available: https://doi.org/10.1109/TMM.2006.886263
- 3. S. Chu, S. Narayanan, C.-C. J. Kuo, and M. J. Mataric, “Where am i? scene recognition for mobile robots using audio features,” in 2006 IEEE International conference on multimedia and expo. IEEE, 2006, pp. 885–888. [Online]. Available: https://doi.org/10.1109/ICME.2006.262661
- 4. R. Bardeli, D. Wolff, F. Kurth, M. Koch, K.-H. Tauchert, and K.-H. Frommolt, “Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring,” Pattern Recognition Letters, vol. 31, no. 12, pp. 1524–1534, 2010. [Online]. Available: https://doi.org/10.1016/j.patrec.2009.09.014
- 5. C. Mydlarz, J. Salamon, and J. P. Bello, “The implementation of low-cost urban acoustic monitoring devices,” Applied Acoustics, vol. 117, pp. 207–218, 2017. [Online]. Available: https://doi.org/10.1016/j.apacoust.2016.06.010
- 6. D. Steele, J. Krijnders, and C. Guastavino, “The sensor city initiative: cognitive sensors for soundscape transformations,” GIS Ostrava, pp. 1–8, 2013.
- 7. V. Davidovski, “Exponential innovation through digital transformation,” in Proceedings of the 3rd International Conference on Applications in Information Technology. ACM, 2018, pp. 3–5. [Online]. Available: https://doi.org/10.1145/3274856.3274858
- 8. F. Tappero, R. M. Alsina-Pagès, L. Duboc, and F. Alı́as, “Leveraging urban sounds: A commodity multi-microphone hardware approach for sound recognition,” in Multidisciplinary Digital Publishing Institute Proceedings, vol. 4, no. 1, 2019, p. 55. [Online]. Available: https://doi.org/10.3390/ecsa-5-05756
- 9. E. Pyshkin, “Designing human-centric applications: Transdisciplinary connections with examples,” in 2017 3rd IEEE International Conference on Cybernetics (CYBCONF). IEEE, 2017, pp. 1–6. [Online]. Available: https://doi.org/10.1109/CYBConf.2017.7985774
- 10. E. Pyshkin and A. Kuznetsov, “Approaches for web search user interfaces-how to improve the search quality for various types of information,” JoC, vol. 1, no. 1, pp. 1–8, 2010. [Online]. Available: https://www.earticle.net/Article/A188181
- 11. M. B. Dias, “Navpal: Technology solutions for enhancing urban navigation for blind travelers,” tech. report CMU-RI-TR-21, Robotics Institute, Carnegie Mellon University, 2014.
- 12. S. Chu, S. Narayanan, and C.-C. J. Kuo, “Environmental sound recognition with time–frequency audio features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142–1158, 2009. [Online]. Available: https://doi.org/10.1109/TASL.2009.2017438
- 13. S. Chachada and C.-C. J. Kuo, “Environmental sound recognition: A survey,” vol. 3, 10 2013, pp. 1–9. [Online]. Available: https://doi.org/10.1109/APSIPA.2013.6694338
- 14. D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. Plumbley, “Detection and classification of acoustic scenes and events: An ieee aasp challenge,” 10 2013, pp. 1–4. [Online]. Available: https://doi.org/10.1109/WASPAA.2013.6701819
- 15. Z. Kons, O. Toledo-Ronen, and M. Carmel, “Audio event classification using deep neural networks.” in Interspeech, 2013, pp. 1482–1486.
- 16. K. J. Piczak, “Environmental sound classification with convolutional neural networks,” in 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2015, pp. 1–6. [Online]. Available: https://doi.org/10.1109/MLSP.2015.7324337
- 17. J. Salamon and J. P. Bello, “Deep convolutional neural networks and data augmentation for environmental sound classification,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279–283, 2017. [Online]. Available: https://doi.org/10.1109/LSP.2017.2657381
- 18. V. Boddapati, A. Petef, J. Rasmusson, and L. Lundberg, “Classifying environmental sounds using image recognition networks,” Procedia computer science, vol. 112, pp. 2048–2056, 2017. [Online]. Available: https://doi.org/10.1016/j.procs.2017.08.250
- 19. B. Zhu, K. Xu, D. Wang, L. Zhang, B. Li, and Y. Peng, “Environmental sound classification based on multi-temporal resolution convolutional neural network combining with multi-level features,” in Pacific Rim Conference on Multimedia. Springer, 2018, pp. 528–537. [Online]. Available: https://doi.org/10.1007/978-3-030-00767-6 49
- 20. Y. Wang, L. Neves, and F. Metze, “Audio-based multimedia event detection using deep recurrent neural networks,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 2742–2746. [Online]. Available: https://doi.org/10.1109/ICASSP.2016.7472176
- 21. S. H. Bae, I. Choi, and N. S. Kim, “Acoustic scene classification using parallel combination of lstm and cnn,” in Proceedings of the Detec- tion and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), 2016, pp. 11–15.
- 22. J. Sang, S. Park, and J. Lee, “Convolutional recurrent neural networks for urban sound classification using raw waveforms,” in 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 2018, pp. 2444–2448. [Online]. Available: https://doi.org/10.23919/EUSIPCO.2018.8553247
- 23. A. Graves, S. Fernández, and J. Schmidhuber, “Bidirectional lstm networks for improved phoneme classification and recognition,” in International Conference on Artificial Neural Networks. Springer, 2005, pp. 799–804. [Online]. Available: https://doi.org/10.1007/11550907_126
- 24. A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013, pp. 6645–6649. [Online]. Available: https://doi.org/10.1109/ICASSP.2013.6638947
- 25. Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, “Tts synthesis with bidirectional lstm based recurrent neural networks,” in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
- 26. J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4694–4702. [Online]. Available: https://doi.org/10.1109/CVPR.2015.7299101
- 27. J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for urban sound research,” in Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014, pp. 1041–1044. [Online]. Available: https://doi.org/10.1145/2647868.2655045
- 28. J. Salamon and J. P. Bello, “Unsupervised feature learning for urban sound classification,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 171–175. [Online]. Available: https://doi.org/10.1109/ICASSP.2015.7177954
Uwagi
1. This work was partially supported by the grant 17K00509 of Japan Society for the Promotion of Science (JSPS).
2. Track 1: Artificial Intelligence and Applications
3. Technical Session: 14th International Symposium Advances in Artificial Intelligence and Applications
4. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-dce07654-3091-401d-a84b-8bc7ad34fc73