PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Deep Image Features in Music Information Retrieval

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Applications of Convolutional Neural Networks (CNNs) to various problems have been the subject of a number of recent studies ranging from image classification and object detection to scene parsing, segmentation 3D volumetric images and action recognition in videos. CNNs are able to learn input data representation, instead of using fixed engineered features. In this study, the image model trained on CNN were applied to a Music Information Retrieval (MIR), in particular to musical genre recognition. The model was trained on ILSVRC-2012 (more than 1 million natural images) to perform image classification and was reused to perform genre classification using spectrograms images. Harmonic/percussive separation was applied, because it is characteristic for musical genre. At final stage, the evaluation of various strategies of merging Support Vector Machines (SVMs) was performed on well known in MIR community - GTZAN dataset. Even though, the model was trained on natural images, the results achieved in this study were close to the state-of-the-art.
Rocznik
Strony
321--326
Opis fizyczny
Bibliogr. 29 poz., il., tab., wykr.
Twórcy
autor
  • Institute of Radioelectronics, Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
autor
  • Institute of Radioelectronics, Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
Bibliografia
  • [1] M. Kassler, “Toward Musical Information,” Perspectives of New Music, vol. 4, no. 2, pp. 59–66, 1966. [Online]. Available: http://www.jstor.org/discover/10.2307/832213?uid=3738840\&uid=2134\&uid=2\&uid=70\&uid=4\&sid=21103750075213
  • [2] Y. Song, S. Dixon, and M. Pearce, “A survey of music recommendation systems and future perspectives,” 9th International Symposium on Computer Music Modeling and Retrieval, 2012. [Online]. Available: http://www.mendeley.com/research/survey-music-recommendation-systems-future-perspectives-1/
  • [3] J. Futrelle and J. S. Downie, “Interdisciplinary Communities and Research Issues in Music Information Retrieval,” Library and Information Science, pp. 215–221, 2002. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.9456\&rep=rep1\&type=pdf
  • [4] J. S. Downie, K. West, A. F. Ehmann, and E. Vincent, “The 2005 Music Information Retrieval Evaluation Exchange (MIREX 2005): Preliminary Overview,” International Conference on Music Information Retrieval, no. Mirex, pp. 320–323, 2005.
  • [5] J. S. Downie, A. F. Ehmann, M. Bay, and M. C. Jones, “The music information retrieval evaluation exchange: Some observations and insights.” in Advances in Music Information Retrieval, ser. Studies in Computational Intelligence, Z. W. Ras and A. Wieczorkowska, Eds. Springer, 2010, vol. 274, pp. 93–115. [Online]. Available: http://dblp.uni-trier.de/db/series/sci/sci274.html\#DownieEBJ10
  • [6] P. Rao, “Audio signal processing,” in Speech, Audio, Image and Biomedical Signal Processing using Neural Networks, ser. Studies in Computational Intelligence, B. Prasad and S. Prasanna, Eds. Springer Berlin Heidelberg, 2008, vol. 83, pp. 169–189.
  • [7] D. Grzywczak and G. Gwardys, “Audio features in music information retrieval,” in Active Media Technology, ser. Lecture Notes in Computer Science, D. lzak, G. Schaefer, S. Vuong, and Y.-S. Kim, Eds. Springer International Publishing, 2014, vol. 8610, pp. 187–199.
  • [8] B. Zhen, X. Wu, Z. Liu, and H. Chi, “On the importance of components of the mfcc in speech and speaker recognition.” in INTERSPEECH. ISCA, 2000, pp. 487–490.
  • [9] K. Lee, “Automatic chord recognition from audio using enhanced pitch class profile,” in ICMC Proceedings, 2006.
  • [10] X. Yu, J. Zhang, J. Liu, W. Wan, and W. Yang, “An audio retrieval method based on chromagram and distance metrics,” in Audio Language and Image Processing (ICALIP), 2010 International Conference on. IEEE, 2010, pp. 425–428.
  • [11] J. Serr, E. Gmez, P. Herrera, and X. Serra, “Chroma binary similarity and local alignment applied to cover song identification,” IEEE Trans. on Audio, Speech, and Language Processing, 2008.
  • [12] J. Schmidhuber, “Multi-column deep neural networks for image classification,” in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), ser. CVPR ’12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 3642–3649. [Online]. Available: http://dl.acm.org/citation.cfm?id=2354409.2354694
  • [13] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, pp. 193–202, 1980.
  • [14] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, 1998, pp. 2278–2324.
  • [15] D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” in Advances in neural information processing systems, 2012, pp. 2843–2851.
  • [16] P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene parsing,” arXiv preprint arXiv:1306.2795, 2013.
  • [17] G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, “Convolutional learning of spatio-temporal features,” in Computer Vision–ECCV 2010. Springer, 2010, pp. 140–153.
  • [18] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Le-Cun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
  • [19] “Ilsvrc 2014,” http://image-net.org/challenges/LSVRC/2014/index, accessed: 2014-08-31.
  • [20] “Ilsvrc 2012 results,” http://image-net.org/challenges/LSVRC/2012/results.html, accessed: 2014-08-31.
  • [21] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information, pp. 1–9, 2012. [Online]. Available: http://books.nips.cc/papers/files/nips25/NIPS2012\ 0534.pdf
  • [22] “Mnist dataset,” http://yann.lecun.com/exdb/mnist/, accessed: 2014-08-31.
  • [23] S. J. Pan and Q. Yang, “A survey on transfer learning,” Knowledge and Data Engineering, IEEE Transactions on, vol. 22, no. 10, pp. 1345–1359, Oct 2010.
  • [24] W. Dai, G. rong Xue, Q. Yang, and Y. Yu, “Transferring naive bayes classifiers for text classification,” in In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, 2007, pp. 540–545.
  • [25] J. na Meng, H. fei Lin, and Y. hai Yu, “Transfer learning based on svd for spam filtering,” in Intelligent Computing and Cognitive Informatics (ICICCI), 2010 International Conference on, June 2010, pp. 491–494.
  • [26] H. Wang, F. Nie, H. Huang, and C. Ding, “Dyadic transfer learning for cross-domain image classification,” in Computer Vision (ICCV), 2011 IEEE International Conference on, Nov 2011, pp. 551–556.
  • [27] A Practical Transfer Learning Algorithm for Face Verification. International Conference on Computer Vision (ICCV), 2013. [Online]. Available: http://research.microsoft.com/apps/pubs/default.aspx?id=202192
  • [28] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
  • [29] N. Ono, K. Miyamoto, J. Le Roux, H. Kameoka, and S. Sagayama, “Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram,” in Proc. EUSIPCO, 2008.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-d8639001-5f16-479a-99a6-91ecda6372db
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.