Segregation of songs and instrumentals : a precursor to voice/accompaniment separation from songs in noisy scenario

Mukherjee, Himadri; Obaidullah, Sk Md; Santosh, K.C.; Gonçalves, Teresa; Phadikar, Santanu; Roy, Kaushik

doi:10.14313/JAMRIS/2‐2020/23

Artykuł - szczegóły

Tytuł artykułu

Segregation of songs and instrumentals : a precursor to voice/accompaniment separation from songs in noisy scenario

Autorzy

Mukherjee Himadri , Obaidullah Sk Md , Santosh K.C. , Gonçalves Teresa , Phadikar Santanu , Roy Kaushik

Treść / Zawartość

Pełne teksty:

Mukherjee_Segregation of songs and instrumentals.pdf

Pobierz

Identyfikatory

DOI

10.14313/JAMRIS/2‐2020/23

Warianty tytułu

Języki publikacji

Abstrakty

The music industry has come a long way since its inception. Music producers have also adhered to modern technology to infuse life into their creations. Systems capable of separating sounds based on sources especially vocals from songs have always been a necessity which has gained attention from researchers as well. The challenge of vocal separation elevates even more in the case of the multi‐instrument environment. It is essential for a system to be first able to detect that whether a piece of music contains vocals or not prior to attempting source separation. It is also very much challenging to perform source separation from audio which is contaminated with noise. In this paper, such a system is proposed being tested on a database of more than 99 hours of instrumentals and songs. Experiments were performed with both noise free as well as noisy audio clips. Using line spectral frequency‐based features, we have obtained the highest accuracies of 99.78% and 99.34% (noise free and noisy scenario respectively) from among six different classifiers, viz. BayesNet, Support Vector Machine, Multi Layer Perceptron, LibLinear, Simple Logistic and Decision Table.

Słowa kluczowe

background track vocals noisy audio line spectral frequency framing

Wydawca

Łukasiewicz Industrial Research Institute for Automation and Measurements PIAP

Czasopismo

Journal of Automation Mobile Robotics and Intelligent Systems

Rocznik

2020

Tom

Vol. 14, No. 2

Strony

81--90

Opis fizyczny

Bibliogr. 28 poz., rys.

Twórcy

autor

Mukherjee Himadri

himadrim027@gmail.com

Dept. of Computer Science, West Bengal State University, Kolkata, India

autor

Obaidullah Sk Md

sk.obaidullah@gmail.com

Dept. of Computer Science and Engineering, Aliah University, Kolkata, India

autor

Santosh K.C.

santosh.kc@usd.edu

Dept. of Computer Science, The University of South Dakota, SD, USA

autor

Gonçalves Teresa

tcg@uevora.pt

Dept. of Informatics, University of Evora, Evora, Portugal

autor

Phadikar Santanu

sphadikar@yahoo.com

Dept. of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India

autor

Roy Kaushik

kaushik.mrg@gmail.com

Dept. of Computer Science, West Bengal State University, Kolkata, India

Bibliografia

[1] T.‑W. Leung, C.‑W. Ngo, and R. Lau, “ICA‑FX features for classification of singing voice and instrumental sound”. In: Proceedings of the 17th International Conference on Pattern Recogition, 2004. ICPR 2004., vol. 2, 2004, 367–370, 10.1109/ICPR.2004.1334222, ISSN: 1051‑4651.
[2] A. Chanrungutai and C. A. Ratanamahatana, “Singing voice separation for mono‑channel music using Non‑negative Matrix Factorization”.In: 2008 International Conference on Advanced Technologies for Communications, 2008, 243–246, 10.1109/ATC.2008.4760565.
[3] M. Rocamora and P. Herrera, “Comparing audio descriptors for singing voice detection in music audio files”. In: 11th Brazilian symposium on computer music, San Pablo, Brazil, vol. 26, 2007.
[4] Chao‑Ling Hsu and J.‑S. Jang, “On the Improvement of Singing Voice Separation for Mo naural Recordings Using the MIR‑1K Dataset”,IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 2, 2010, 310–319,10.1109/TASL.2009.2026503.
[5] Z. Rafii and B. Pardo, “A simple music/voice separation method based on the extraction of the repeating musical structure”. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, 221–224, 10.1109/ICASSP.2011.5946380.
[6] Z. Rafii and B. Pardo, “Repeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 1, 2013, 73–84, 10.1109/TASL.2012.2213249.
[7] A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, and G. Richard, “Adaptive filtering for music/voice separation exploiting the repeating musical structure”. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, 53–56, 10.1109/ICASSP.2012.6287815.
[8] A. Ghosal, R. Chakraborty, B. C. Dhara, and S. K. Saha, “Song/instrumental classification using spectrogram based contextual features”. In: Proceedings of the CUBE International Information Technology Conference, New York, NY, USA, 2012, 21–25, 10.1145/2381716.2381722.
[9] M. Mauch, H. Fujihara, K. Yoshii, and M. Goto, “Timbre and Melody Features for the Recognition of Vocal Activity and Instrumental Solos in Polyphonic Music”. In: ISMIR, 2011, 233–238.
[10] H. Burute and P. B. Mane, “Separation of singing voice from music accompaniment using matrix factorization method”. In: 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), 2015, 166–171, 10.1109/ICATCCT.2015.7456876.
[11] A. Ghosal, R. Chakraborty, B. C. Dhara, and S. K. Saha, “Instrumental/song classification of music signal using RANSAC”. In: 2011 3rd International Conference on Electronics Computer Technology, vol. 1, 2011, 269–272, 10.1109/ICECTECH.2011.5941603.
[12] L. Regnier and G. Peeters, “Singing voice detection in music tracks using direct voice vibrato detection”. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, 1685–1688, 10.1109/ICASSP.2009.4959926.
[13] A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, “Adaptation of Bayesian Models for Single‑Channel Source Separation and its Application to Voice/Music Separation in Popular Songs”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 5, 2007, 1564–1578, 10.1109/TASL.2007.899291.
[14] C.‑L. Hsu, D. Wang, J.‑S. R. Jang, and K. Hu, “A Tandem Algorithm for Singing Pitch Extraction and Voice Separation From Music Accompaniment”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 5, 2012, 1482–1491, 10.1109/TASL.2011.2182510.
[15] B. Zhu, W. Li, R. Li, and X. Xue, “Multi‑Stage Non‑Negative Matrix Factorization for Monaural Singing Voice Separation”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, 2013, 2096–2107, 10.1109/TASL.2013.2266773, Conference Name: IEEE Transactions on Audio, Speech, and Language Processing.
[16] “Youtube”. https://www.youtube.com/, 2020. Accessed on: 2020‑09‑20.
[17] “Ethnologue: Languages of the World”. https: //www.ethnologue.com/, 2020. Accessed on: 2020‑09‑20.
[18] H. Mukherjee, S. M. Obaidullah, S. Phadikar, and K. Roy, “SMIL ‑ A Musical Instrument Identification System”. In: J. K. Mandal, P. Dutta, and S. Mukhopadhyay, eds., Computational Intelligence, Communications, and Business Analytics, Singapore, 2017, 129–140, 10.1007/978‑981‑10‑6427‑2_11.
[19] H. Mukherjee, S. Phadikar, P. Rakshit, and K. Roy, “REARC‑a Bangla Phoneme recognizer”. In: 2016 International Conference on Accessibility to Digital World (ICADW), 2016, 177–180, 10.1109/ICADW.2016.7942537.
[20] K. K. Paliwal, “On the use of line spectral frequency parameters for speech recognition”, Digital Signal Processing, vol. 2, no. 2, 1992, 80–87, 10.1016/1051‑2004(92)90028‑W.
[21] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers”, Machine Learning, vol. 29, no. 2, 1997, 131–163, 10.1023/A:1007465528199.
[22] N. Cristianini and J. Shawe‑Taylor. “An Introduction to Support Vector Machines and Other Kernel‑based Learning Methods”, March 2000.
[23] R.‑E. Fan, K.‑W. Chang, C.‑J. Hsieh, X.‑R. Wang, and C.‑J. Lin, “LIBLINEAR: A Library for Large Linear Classification”, The Journal of Machine Learning Research, vol. 9, 2008, 1871–1874.
[24] H. Mukherjee, C. Halder, S. Phadikar, and K. Roy, “READ—A Bangla Phoneme Recognition System”. In: S. C. Satapathy, V. Bhateja, S. K. Udgata, and P. K. Pattnaik, eds., Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications, Singapore, 2017, 599–607, 10.1007/978‑981‑10‑3153‑3_59.
[25] M. Sumner, E. Frank, and M. Hall, “Speeding Up Logistic Model Tree Induction”. In: A. M. Jorge, L. Torgo, P. Brazdil, R. Camacho, and J. Gama, eds., Knowledge Discovery in Databases: PKDD 2005, Berlin, Heidelberg, 2005, 675–683, 10.1007/11564126_72.
[26] R. Kohavi. “The power of decision tables”. In: N. Lavrac and S. Wrobel, eds., Machine Learning: ECML‑95, volume 912, 174–189. Springer, Berlin, Heidelberg, 1995.
[27] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update”, ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, 2009, 10–18, 10.1145/1656274.1656278.
[28] J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets”, Journal of Machine Learning Research, vol. 7, 2006, 1–30.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c3478ef0-46e1-4f10-91e8-05b067b65e21