Speaker Identification using Data-Driven Score Classification

Gan, H.; Mporas, I.; Safavi, S.; Sotudeh, R.

doi:10.1515/ipc-2016-0011

Artykuł - szczegóły

Tytuł artykułu

Speaker Identification using Data-Driven Score Classification

Autorzy

Gan H. , Mporas I. , Safavi S. , Sotudeh R.

Wybrane pełne teksty z tego czasopisma

http://new-ipc.utp.edu.pl/index.php/ipc

Identyfikatory

DOI

10.1515/ipc-2016-0011

Warianty tytułu

Języki publikacji

Abstrakty

We present a comparative evaluation of different classification algorithms for a fusion engine that is used in a speaker identity selection task. The fusion engine combines the scores from a number of classifiers, which uses the GMM-UBM approach to match speaker identity. The performances of the evaluated classification algorithms were examined in both the text-dependent and text-independent operation modes. The experimental results indicated a significant improvement in terms of speaker identification accuracy, which was approximately 7% and 14.5% for the text-dependent and the text-independent scenarios, respectively. We suggest the use of fusion with a discriminative algorithm such as a Support Vector Machine in a real-world speaker identification application where the text-independent scenario predominates based on the findings.

Słowa kluczowe

GMM-UBM support vector machine speaker identification accuracy

Wydawca

Instytut Telekomunikacji i Informatyki Uniwersytetu Technologiczno-Przyrodniczego w Bydgoszczy

Czasopismo

Image Processing & Communications

Rocznik

2016

Tom

Vol. 21, no. 2

Strony

55--64

Opis fizyczny

Bibliogr. 30 poz., rys., tab.

Twórcy

autor

Gan H.

autor

Mporas I.

autor

Safavi S.

autor

Sotudeh R.

Bibliografia

[1] Altman, N.S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185
[2] Beigi, H. (2011). Speaker Recognition, Encyclopedia of Cryptography and Security, Springer, pp. 1232–1242
[3] Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Reynolds, D.A. (2004). A tutorial on textindependent speaker verification. EURASIP journal on applied signal processing, 2004, 430–451
[4] Bishop, C.M. (2008, June). A new framework for machine learning. In IEEE World Congress on Computational Intelligence (pp. 1–24). Springer Berlin Heidelberg
[5] Bouchard, G. (2007). Bias-variance tradeoff in hybrid generative-discriminative models. In Machine Learning and Applications. ICMLA 2007. Sixth International Conference on (pp. 124–129). IEEE
[6] Burges, C.J.C., Ben, J.I., Denker, J.S., LeCun, Y., Nohl, C.R. (1993). Off line recognition of handwritten postal words using neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7(04), 689–704
[7] Campbell, J.P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462
[8] Campbell, J.P., Reynolds, D A. (1999, March). Corpora for the evaluation of speaker recognition systems. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on (Vol. 2, pp. 829–832). IEEE
[9] Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798
[10] Damper, R.I., Higgins, J.E. (2003). Improving speaker identification in noise by subband processing and decision fusion. Pattern Recognition Letters, 24(13), 2167–2173
[11] Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272
[12] Ganchev, T., Siafarikas, M., Mporas, I., Stoyanova, T. (2014). Wavelet basis selection for enhanced speech parametrization in speaker verification. International Journal of Speech Technology, 17(1), 27–36
[13] Hermansky, H., Morgan, N. (1994). RASTA processing of speech. IEEE transactions on speech and audio processing, 2(4), 578–589
[14] Hsu, C.W., Lin, C.J. (2002). A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, 13(2), 415–425
[15] Kittler, J., Hatef, M., Duin, R.P., Matas, J. (1998). On combining classifiers. IEEE transactions on pattern analysis and machine intelligence, 20(3), 226–239
[16] Kuncheva, L.I., Alpaydin, E. (2007). Combining Pattern Classifiers: Methods and Algorithms, IEEE Transactions on Neural Networks, 18(3), 964–964
[17] Kung, S.Y. (2014). Kernel methods and machine learning. Cambridge University Press. pp. 341–342
[18] Larcher, A., Lee, K.A., Ma, B., Li, H. (2014). Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Communication, 60, 56–77
[19] Mitchell, H. B. (2007). Multi-sensor data fusion: an introduction. Springer Science & Business Media
[20] Monte-Moreno, E., Chetouani, M., Faundez-Zanuy, M., Sole-Casals, J. (2009). Maximum likelihood linear programming data fusion for speaker recognition. Speech Communication, 51(9), 820–830
[21] Najafian, M., Safavi, S., Weber, P., Russell, M. (2016). Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems. ODYSSEY
[22] Nandakumar, K., Jain, A. K. (2008, September). Multibiometric template security using fuzzy vault. In Biometrics: Theory, Applications and Systems, 2008. BTAS 2008. 2nd IEEE International Conference on (pp. 1–6). IEEE
[23] Pal, S.K., Mitra, S. (1996). Noisy fingerprint classification using multilayer perceptron with fuzzy geometrical and textural features. Fuzzy sets and systems, 80(2), 121–132
[24] Ramachandran, R.P., Farrell, K.R., Ramachandran, R., Mammone, R.J. (2002). Speaker recognition–general classifier approaches and data fusion methods. Pattern Recognition, 35(12), 2801–2821
[25] Raudys, Š. (2006). Trainable fusion rules. I. Large sample size case. Neural Networks, 19(10), 1506–1516
[26] Reynolds, D.A., Rose, R. C. (1995). Robust textindependent speaker identification using Gaussian mixture speaker models. IEEE transactions on speech and audio processing, 3(1), 72–83
[27] Reynolds, D.A., Quatieri, T.F., Dunn, R.B. (2000). Speaker verification using adapted Gaussian mixture models. Digital signal processing, 10(1), 19–41
[28] Safavi, S., Gan, H., Mporas, I., Sotudeh, R. Fraud Detection in Voice-based Identity Authentication Applications and Services. In The IEEE International Conference on Data Mining series (ICDM), 2016
[29] Safavi, S., Hanani, A., Russell, M., Jancovic, P., Carey, M.J. (2012). Contrasting the effects of different frequency bands on speaker and accent identification. IEEE Signal Processing Letters, 19(12), 829–832.
[30] Safavi, S., Jancovic, P., Russell, M.J., Carey, M.J. (2013). Identification of gender from children’s speech by computers and humans. In INTERSPEECH (pp. 2440–2444)

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-52117dc4-7912-4ea4-ae80-87ff199cbe0f