Searching for loops and sound samples with feature learning

Jakubik, Jan

doi:10.15439/2022F279

Artykuł - szczegóły

Tytuł artykułu

Searching for loops and sound samples with feature learning

Autorzy

Jakubik Jan

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2022F279

Warianty tytułu

Konferencja

17th Conference on Computer Science and Intelligence Systems

Języki publikacji

Abstrakty

In this paper, we evaluate feature learning in the problem of retrieving subjectively interesting sounds from electronic music tracks. We describe an active learning system designed to find sounds categorized as samples or loops. These retrieval tasks originate from a broader R&D project, which concerns the use of machine learning for streamlining the creation of videogame content synchronized with soundtracks. The method is expected to function in the context of limited data availability, and as such cannot rely on supervised learning of what constitutes an "interesting sound''. We apply an active learning procedure that allows us to find sound samples without predefined classes through user interaction, and evaluate the use of neural network feature extraction in the problem.

Słowa kluczowe

music information retrieval machine learning signal processing

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2022

Tom

Vol. 31

Strony

13--18

Opis fizyczny

Bibliogr. 27 poz., wykr.

Twórcy

autor

Jakubik Jan

jan.jakubik@pwr.edu.p

Wroclaw University of Science and Technology Faculty of Information and Communication Technology Department of Artificial Intelligence

Bibliografia

1. E. J. Humphrey, J. P. Bello, Y. LeCun, “Moving beyond feature design: Deep architectures and automatic feature learning in music informatics,” in ISMIR 2012, pp. 403-408.
2. M. Defferrard, K. Benzi, P. Vandergheynst, X. Bresson, “FMA: A dataset for music analysis,” arXiv preprint https://arxiv.org/abs/1612.01840. 2017, https://doi.org/10.48550/arXiv.1612.01840
3. Y. A. Chen, Y. H. Yang, J. C. Wang, H. Chen, “The AMG1608 dataset for music emotion recognition,” in ICASSP 2015, pp. 693-697, https://doi.org/0.1109/ICASSP.2015.7178058
4. J. W. Kim, J. Salamon, P. Li, J. P. Bello, “Crepe: A convolutional representation for pitch estimation,” in ICASSP 2018, pp. 161-165, https://doi.org/10.1109/ICASSP.2018.8461329
5. J. Jakubik, “Retrieving Sound Samples of Subjective Interest With User Interaction,” in Proc. of the 2020 Federated Conference on Computer Science and Information Systems, 2020, pp. 387-390, https://doi.org/10.15439/2020F82
6. B. McFee, D. Ellis, “Analyzing Song Structure with Spectral Clustering,” in ISMIR 2014, pp. 405-410, https://doi.org/10.5281/zenodo.1415778
7. Kothinti, S., Imoto, K., Chakrabarty, D., Sell, G., Watanabe, S., Elhilali, M. (2019, May). “Joint acoustic and class inference for weakly supervised sound event detection,” in ICASSP 2019, pp. 36-40, https://doi.org/10.1109/ICASSP.2019.8682772
8. H. Xie, T. V. Huang, “Zero-Shot Audio Classification via Semantic Embeddings,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021, pp. 1233-1242, https://doi.org/10.48550/arXiv.2011.12133
9. S. Makino, "Audio source separation," Springer, 2018.
10. J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M. B. Sandler, “A tutorial on onset detection in music signals,” in IEEE Transactions on speech and audio processing, vol. 13, no. 5, 2005, pp. 1035-1047, https://doi.org/10.1109/TSA.2005.851998
11. R. Marxer, J. Janer, "Study of Regularizations and Constraints in NMF-Based Drums Monaural Separation", in Proc. of the 7th Int. Conference on Digital Audio Effects (DAFx’13). Maynooth, Ireland, 2013.
12. L. Lu, M. Wang, H. J. Zhang, “Repeating pattern discovery and structure analysis from acoustic music data,” in Proc. of the 6th ACM SIGMM Int. Workshop on Multimedia Information Retrieval, 2016, pp. 275-282, https://doi.org/10.1145/1026711.1026756
13. P. López-Serrano, C. Dittmar, J. Driedger, M. Müller, “Towards Modeling and Decomposing Loop-Based Electronic Music,” in ISMIR 2016, pp. 502-508.
14. J. B. L. Smith, M. Goto, “Nonnegative tensor factorization for source separation of loops in audio,” in ICASSP 2018, Calgary, Canada, pp. 171–175, https://doi.org/10.1109/MSP.2018.2877582
15. J. B. L. Smith, Y. Kawasaki, M. Goto, “Unmixer: An interface for extracting and remixing loops,” in ISMIR 2019, Delft, Nethedlands, pp. 824–831, https://doi.org/10.5281/zenodo.3527938
16. C. Chen, S. Xin, “Combined Transfer and Active Learning for High Accuracy Music Genre Classification Method,” in 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), IEEE, 2021, https://doi.org/10.1109/ICBAIE52039.2021.9390062
17. A. Sarasúa, C. Laurier, P. Herrera, “Support vector machine active learning for music mood tagging,” in 9th International Symposium on Computer Music Modeling and Retrieval (CMMR), London, 2012, https://doi.org/10.1007/s00530-006-0032-2
18. W. Li, X. Feng, M. Xue, “Reducing manual labeling in singing voice detection: An active learning approach,” in 2016 IEEE International Conference on Multimedia and Expo (ICME) IEEE, 2016, https://doi.org/10.1109/ICME.2016.7552987
19. Fu, Yifan, Xingquan Zhu, and Bin Li. “A survey on instance selection for active learning,” in Knowledge and information systems, vol. 35.2, pp. 249-283, 2013, https://doi.org/10.1007/s10115-012-0507-8
20. T. H. Hsieh, L. Su, Y. H. Yang, “A streamlined encoder/decoder architecture for melody extraction,” in ICASSP 2019, pp. 156-160, https://doi.org/10.1109/ICASSP.2019.8682389
21. J. Spijkervet, J. A.Y. Burgoyne, "Contrastive Learning of Musical Representations." arXiv preprint https://arxiv.org/abs/2103.09410, 2021, https://doi.org/10.48550/arXiv.2103.09410
22. Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Valko, M. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33, 21271-21284, https://doi.org/10.48550/arXiv.2006.07733
23. Nguyen, K., Nguyen, Y., & Le, B. (2021). Semi-Supervising Learning, Transfer Learning, and Knowledge Distillation with SimCLR. arXiv preprint https://arxiv.org/abs/2108.00587, https://doi.org/10.48550/arXiv.2108.00587
24. B. McFee, C. Raffel, D. Liang, D. P. W. Ellis, M. McVicar, E. Battenberg, O. Nieto, “librosa: Audio and music signal analysis in python,” in Proc. of the 14th python in science conference, pp. 18-25, 2015.
25. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural Information Processing Systems, vol. 32, 2019, pp. 8024-8035, https://doi.org/10.48550/arXiv.1912.01703
26. C.R. Harris, K.J. Millman, S.J. van der Walt, “Array programming with NumPy,” Nature vol. 585, pp. 357–362, 2020. http://dx.doi.org/0.1038/s41586-020-2649-2, https://doi.org/10.1038/s41586-020-2649-2
27. F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” in Hournal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011, https://doi.org/10.48550/arXiv.1201.0490

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-8fe2c367-aed5-4858-a356-803853b7d01f