Pre-trained deep neural network using sparse autoencoders and scattering wavelet transform for musical genre recognition

Kleć, M.; Korzinek, D.

doi:7494/csci.2015.16.2.133

Artykuł - szczegóły

Tytuł artykułu

Pre-trained deep neural network using sparse autoencoders and scattering wavelet transform for musical genre recognition

Autorzy

Kleć M. , Korzinek D.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

7494/csci.2015.16.2.133

Warianty tytułu

Języki publikacji

Abstrakty

Research described in this paper tries to combine the approach of Deep Neural Networks (DNN) with the novel audio features extracted using the Scatter- Ing Wavelet Transform (SWT) for classifying musical genres. The SWT uses A sequence of Wavelet Transforms to compute the modulation spectrum coef- Ficients of multiple orders, which has already shown to be promising for this Task. The DNN in this work uses pre-trained layers using Sparse Autoencoders (SAE). Data obtained from the Creative Commons website jamendo.com is Used to boost the well-known GTZAN database, which is a standard bench- mark for this task. The final classifier is tested using a 10-fold cross validation To achieve results similar to other state-of-the-art approaches.

Słowa kluczowe

Sparse Autoencoders deep learning genre recognition Scattering Wavelet Transform

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2015

Tom

Vol. 16 (2)

Strony

133--144

Opis fizyczny

Bibliogr. 23 poz., rys., wykr., tab.

Twórcy

autor

Kleć M.

mklec@pjwstk.edu.pl

Polish-Japanese Academy of Information Technology, Warsaw, Poland

autor

Korzinek D.

danijel@pjwstk.edu.pl

Polish-Japanese Academy of Information Technology, Warsaw, Poland

Bibliografia

[1] Anden J., Mallat S.: Deep Scattering Spectrum . CoRR , vol. abs/1304.6763, 2013, http://arxiv.org/abs/1304.6763 .
[2] Bengio Y.: Learning Deep Architectures for AI. Foundations Trends Machine Learning , vol. 2(1), pp. 1–127, http://dx.doi.org/10.1561/2200000006 .
[3] Bengio Y., Lamblin P., Popovici D., Larochelle H., et al.: Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems , vol. 19, p. 153, 2007.
[4] Bishop C. M.: Neural Networks for Pattern Recognition . Oxford University Press, Inc., New York, NY, USA, 1995.
[5] Chang K. K., Jang J. S. R., Iliopoulos C. S.: Music Genre Classification via Compressive Sampling. In: ISMIR , pp. 387–392, 2010. 2015/08/24; 20:17 str. 10/12 142 Mariusz Kleć, Danijel Korzinek
[6] Chen X., Ramadge P. J.: Music genre classification using multiscale scattering and sparse representations. In: Information Sciences and Systems (CISS), 2013 47th Annual Conference on , pp. 1–6, IEEE, 2013.
[7] Glorot X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics , pp. 249–256, 2010.
[8] Grimaldi M., Cunningham P., Kokaram A.: A wavelet packet representation of audio signals for music genre classification using different ensemble and feature selection techniques. In: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval , pp. 102–108, ACM, 2003.
[9] Hamel P., Eck D.: Learning Features from Music Audio with Deep Belief Networks. In: ISMIR , pp. 339–344, Utrecht, The Netherlands, 2010.
[10] Hinton G., Osindero S., Teh Y. W.: A fast learning algorithm for deep belief nets. Neural Computation , vol. 18(7), pp. 1527–1554, 2006.
[11] Klec M., Korzinek D.: Unsupervised Feature Pre-training of the Scattering Wavelet Transform for Musical Genre Recognition. Procedia Technology , vol. 18, pp. 133–139, 2014.
[12] LeCun Y., Bengio Y.: The Handbook of Brain Theory and Neural Networks. chap. Convolutional Networks for Images, Speech, and Time Series, pp. 255–258, MIT Press, Cambridge, MA, USA, 1998, http://dl.acm.org/citation.cfm? id=303568.303704 .
[13] Lee H., Ekanadham C., Ng A.Y.: Sparse deep belief net model for visual area V2. In: Advances in neural information processing systems , pp. 873–880, MIT Press, 2008.
[14] Li T., Ogihara M., Li Q.: A comparative study on content-based music genre classification. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , pp. 282–289, ACM, 2003.
[15] Mallat S.: Group invariant scattering. Communications on Pure and Applied Mathematics , vol. 65(10), pp. 1331–1398, 2012.
[16] Ng A.: Sparse autoencoder. CS294A Lecture Notes , vol. 72, pp. 1–19, 2011.
[17] Panagakis Y., Kotropoulos C., Arce G. R.: Music Genre Classification Using Lo- cality Preserving Non-Negative Tensor Factorization and Sparse Representations. In: ISMIR , pp. 249–254, 2009.
[18] Poultney C., Chopra S., Cun Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems , pp. 1137–1144, 2006.
[19] Sigtia S., Dixon S.: Improved music feature learning with deep neural networks. In: Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 6959–6963, IEEE, 2014.
[20] Skajaa A.: Limited memory BFGS for nonsmooth optimization. Master’s thesis, Courant Institute of Mathematical Science, New York University, 2010. 2015/08/24; 20:17 str. 11/12 Pre-trained deep neural network using sparse autoencoders (. . . ) 143
[21] Sturm B.L.: The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use. arXiv preprint arXiv:1306.1461 , 2013.
[22] Tzanetakis G., Cook P.: Musical genre classification of audio signals. Speech and Audio Processing, IEEE transactions on , vol. 10(5), pp. 293–302, 2002.
[23] Vincent P., Larochelle H., Lajoie I., Bengio Y., Manzagol P. A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research , vol. 11, pp. 3371–3408, 2010

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-f8b7db01-9b1a-4904-bb9f-9daff22fa311