PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Pre-trained deep neural network using sparse autoencoders and scattering wavelet transform for musical genre recognition

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Research described in this paper tries to combine the approach of Deep Neural Networks (DNN) with the novel audio features extracted using the Scatter- Ing Wavelet Transform (SWT) for classifying musical genres. The SWT uses A sequence of Wavelet Transforms to compute the modulation spectrum coef- Ficients of multiple orders, which has already shown to be promising for this Task. The DNN in this work uses pre-trained layers using Sparse Autoencoders (SAE). Data obtained from the Creative Commons website jamendo.com is Used to boost the well-known GTZAN database, which is a standard bench- mark for this task. The final classifier is tested using a 10-fold cross validation To achieve results similar to other state-of-the-art approaches.
Wydawca
Czasopismo
Rocznik
Strony
133--144
Opis fizyczny
Bibliogr. 23 poz., rys., wykr., tab.
Twórcy
autor
  • Polish-Japanese Academy of Information Technology, Warsaw, Poland
autor
  • Polish-Japanese Academy of Information Technology, Warsaw, Poland
Bibliografia
  • [1] Anden J., Mallat S.: Deep Scattering Spectrum . CoRR , vol. abs/1304.6763, 2013, http://arxiv.org/abs/1304.6763 .
  • [2] Bengio Y.: Learning Deep Architectures for AI. Foundations Trends Machine Learning , vol. 2(1), pp. 1–127, http://dx.doi.org/10.1561/2200000006 .
  • [3] Bengio Y., Lamblin P., Popovici D., Larochelle H., et al.: Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems , vol. 19, p. 153, 2007.
  • [4] Bishop C. M.: Neural Networks for Pattern Recognition . Oxford University Press, Inc., New York, NY, USA, 1995.
  • [5] Chang K. K., Jang J. S. R., Iliopoulos C. S.: Music Genre Classification via Compressive Sampling. In: ISMIR , pp. 387–392, 2010. 2015/08/24; 20:17 str. 10/12 142 Mariusz Kleć, Danijel Korzinek
  • [6] Chen X., Ramadge P. J.: Music genre classification using multiscale scattering and sparse representations. In: Information Sciences and Systems (CISS), 2013 47th Annual Conference on , pp. 1–6, IEEE, 2013.
  • [7] Glorot X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics , pp. 249–256, 2010.
  • [8] Grimaldi M., Cunningham P., Kokaram A.: A wavelet packet representation of audio signals for music genre classification using different ensemble and feature selection techniques. In: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval , pp. 102–108, ACM, 2003.
  • [9] Hamel P., Eck D.: Learning Features from Music Audio with Deep Belief Networks. In: ISMIR , pp. 339–344, Utrecht, The Netherlands, 2010.
  • [10] Hinton G., Osindero S., Teh Y. W.: A fast learning algorithm for deep belief nets. Neural Computation , vol. 18(7), pp. 1527–1554, 2006.
  • [11] Klec M., Korzinek D.: Unsupervised Feature Pre-training of the Scattering Wavelet Transform for Musical Genre Recognition. Procedia Technology , vol. 18, pp. 133–139, 2014.
  • [12] LeCun Y., Bengio Y.: The Handbook of Brain Theory and Neural Networks. chap. Convolutional Networks for Images, Speech, and Time Series, pp. 255–258, MIT Press, Cambridge, MA, USA, 1998, http://dl.acm.org/citation.cfm? id=303568.303704 .
  • [13] Lee H., Ekanadham C., Ng A.Y.: Sparse deep belief net model for visual area V2. In: Advances in neural information processing systems , pp. 873–880, MIT Press, 2008.
  • [14] Li T., Ogihara M., Li Q.: A comparative study on content-based music genre classification. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , pp. 282–289, ACM, 2003.
  • [15] Mallat S.: Group invariant scattering. Communications on Pure and Applied Mathematics , vol. 65(10), pp. 1331–1398, 2012.
  • [16] Ng A.: Sparse autoencoder. CS294A Lecture Notes , vol. 72, pp. 1–19, 2011.
  • [17] Panagakis Y., Kotropoulos C., Arce G. R.: Music Genre Classification Using Lo- cality Preserving Non-Negative Tensor Factorization and Sparse Representations. In: ISMIR , pp. 249–254, 2009.
  • [18] Poultney C., Chopra S., Cun Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems , pp. 1137–1144, 2006.
  • [19] Sigtia S., Dixon S.: Improved music feature learning with deep neural networks. In: Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 6959–6963, IEEE, 2014.
  • [20] Skajaa A.: Limited memory BFGS for nonsmooth optimization. Master’s thesis, Courant Institute of Mathematical Science, New York University, 2010. 2015/08/24; 20:17 str. 11/12 Pre-trained deep neural network using sparse autoencoders (. . . ) 143
  • [21] Sturm B.L.: The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use. arXiv preprint arXiv:1306.1461 , 2013.
  • [22] Tzanetakis G., Cook P.: Musical genre classification of audio signals. Speech and Audio Processing, IEEE transactions on , vol. 10(5), pp. 293–302, 2002.
  • [23] Vincent P., Larochelle H., Lajoie I., Bengio Y., Manzagol P. A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research , vol. 11, pp. 3371–3408, 2010
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-f8b7db01-9b1a-4904-bb9f-9daff22fa311
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.