Deep Image Features in Music Information Retrieval

Gwardys, G.; Grzywczak, D.

doi:10.2478/eletel-2014-0042

Artykuł - szczegóły

Tytuł artykułu

Deep Image Features in Music Information Retrieval

Autorzy

Gwardys G. , Grzywczak D.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/eletel-2014-0042

Warianty tytułu

Języki publikacji

Abstrakty

Applications of Convolutional Neural Networks (CNNs) to various problems have been the subject of a number of recent studies ranging from image classification and object detection to scene parsing, segmentation 3D volumetric images and action recognition in videos. CNNs are able to learn input data representation, instead of using fixed engineered features. In this study, the image model trained on CNN were applied to a Music Information Retrieval (MIR), in particular to musical genre recognition. The model was trained on ILSVRC-2012 (more than 1 million natural images) to perform image classification and was reused to perform genre classification using spectrograms images. Harmonic/percussive separation was applied, because it is characteristic for musical genre. At final stage, the evaluation of various strategies of merging Support Vector Machines (SVMs) was performed on well known in MIR community - GTZAN dataset. Even though, the model was trained on natural images, the results achieved in this study were close to the state-of-the-art.

Słowa kluczowe

music information retrieval deep learning genre classification convolutional neural networks transfer learning

Wydawca

Polish Academy of Sciences, Committee of Electronics and Telecommunication

Czasopismo

International Journal of Electronics and Telecommunications

Rocznik

2014

Tom

Vol. 60, No. 4

Strony

321--326

Opis fizyczny

Bibliogr. 29 poz., il., tab., wykr.

Twórcy

autor

Gwardys G.

g.gwardys@ire.pw.edu.pl

Institute of Radioelectronics, Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland

autor

Grzywczak D.

d.grzywczak@ire.pw.edu.pl

Institute of Radioelectronics, Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland

Bibliografia

[1] M. Kassler, “Toward Musical Information,” Perspectives of New Music, vol. 4, no. 2, pp. 59–66, 1966. [Online]. Available: http://www.jstor.org/discover/10.2307/832213?uid=3738840\&uid=2134\&uid=2\&uid=70\&uid=4\&sid=21103750075213
[2] Y. Song, S. Dixon, and M. Pearce, “A survey of music recommendation systems and future perspectives,” 9th International Symposium on Computer Music Modeling and Retrieval, 2012. [Online]. Available: http://www.mendeley.com/research/survey-music-recommendation-systems-future-perspectives-1/
[3] J. Futrelle and J. S. Downie, “Interdisciplinary Communities and Research Issues in Music Information Retrieval,” Library and Information Science, pp. 215–221, 2002. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.9456\&rep=rep1\&type=pdf
[4] J. S. Downie, K. West, A. F. Ehmann, and E. Vincent, “The 2005 Music Information Retrieval Evaluation Exchange (MIREX 2005): Preliminary Overview,” International Conference on Music Information Retrieval, no. Mirex, pp. 320–323, 2005.
[5] J. S. Downie, A. F. Ehmann, M. Bay, and M. C. Jones, “The music information retrieval evaluation exchange: Some observations and insights.” in Advances in Music Information Retrieval, ser. Studies in Computational Intelligence, Z. W. Ras and A. Wieczorkowska, Eds. Springer, 2010, vol. 274, pp. 93–115. [Online]. Available: http://dblp.uni-trier.de/db/series/sci/sci274.html\#DownieEBJ10
[6] P. Rao, “Audio signal processing,” in Speech, Audio, Image and Biomedical Signal Processing using Neural Networks, ser. Studies in Computational Intelligence, B. Prasad and S. Prasanna, Eds. Springer Berlin Heidelberg, 2008, vol. 83, pp. 169–189.
[7] D. Grzywczak and G. Gwardys, “Audio features in music information retrieval,” in Active Media Technology, ser. Lecture Notes in Computer Science, D. lzak, G. Schaefer, S. Vuong, and Y.-S. Kim, Eds. Springer International Publishing, 2014, vol. 8610, pp. 187–199.
[8] B. Zhen, X. Wu, Z. Liu, and H. Chi, “On the importance of components of the mfcc in speech and speaker recognition.” in INTERSPEECH. ISCA, 2000, pp. 487–490.
[9] K. Lee, “Automatic chord recognition from audio using enhanced pitch class profile,” in ICMC Proceedings, 2006.
[10] X. Yu, J. Zhang, J. Liu, W. Wan, and W. Yang, “An audio retrieval method based on chromagram and distance metrics,” in Audio Language and Image Processing (ICALIP), 2010 International Conference on. IEEE, 2010, pp. 425–428.
[11] J. Serr, E. Gmez, P. Herrera, and X. Serra, “Chroma binary similarity and local alignment applied to cover song identification,” IEEE Trans. on Audio, Speech, and Language Processing, 2008.
[12] J. Schmidhuber, “Multi-column deep neural networks for image classification,” in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), ser. CVPR ’12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 3642–3649. [Online]. Available: http://dl.acm.org/citation.cfm?id=2354409.2354694
[13] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, pp. 193–202, 1980.
[14] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, 1998, pp. 2278–2324.
[15] D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” in Advances in neural information processing systems, 2012, pp. 2843–2851.
[16] P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene parsing,” arXiv preprint arXiv:1306.2795, 2013.
[17] G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, “Convolutional learning of spatio-temporal features,” in Computer Vision–ECCV 2010. Springer, 2010, pp. 140–153.
[18] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Le-Cun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
[19] “Ilsvrc 2014,” http://image-net.org/challenges/LSVRC/2014/index, accessed: 2014-08-31.
[20] “Ilsvrc 2012 results,” http://image-net.org/challenges/LSVRC/2012/results.html, accessed: 2014-08-31.
[21] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information, pp. 1–9, 2012. [Online]. Available: http://books.nips.cc/papers/files/nips25/NIPS2012\ 0534.pdf
[22] “Mnist dataset,” http://yann.lecun.com/exdb/mnist/, accessed: 2014-08-31.
[23] S. J. Pan and Q. Yang, “A survey on transfer learning,” Knowledge and Data Engineering, IEEE Transactions on, vol. 22, no. 10, pp. 1345–1359, Oct 2010.
[24] W. Dai, G. rong Xue, Q. Yang, and Y. Yu, “Transferring naive bayes classifiers for text classification,” in In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, 2007, pp. 540–545.
[25] J. na Meng, H. fei Lin, and Y. hai Yu, “Transfer learning based on svd for spam filtering,” in Intelligent Computing and Cognitive Informatics (ICICCI), 2010 International Conference on, June 2010, pp. 491–494.
[26] H. Wang, F. Nie, H. Huang, and C. Ding, “Dyadic transfer learning for cross-domain image classification,” in Computer Vision (ICCV), 2011 IEEE International Conference on, Nov 2011, pp. 551–556.
[27] A Practical Transfer Learning Algorithm for Face Verification. International Conference on Computer Vision (ICCV), 2013. [Online]. Available: http://research.microsoft.com/apps/pubs/default.aspx?id=202192
[28] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
[29] N. Ono, K. Miyamoto, J. Le Roux, H. Kameoka, and S. Sagayama, “Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram,” in Proc. EUSIPCO, 2008.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-d8639001-5f16-479a-99a6-91ecda6372db