Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Conventional speaker recognition systems use the Universal Background Model (UBM) as an imposter for all speakers. In this paper, speaker models are clustered to obtain better imposter model representations for speaker verification purpose. First, a UBM is trained, and speaker models are adapted from the UBM. Then, the k-means algorithm with the Euclidean distance measure is applied to the speaker models. The speakers are divided into two, three, four, and five clusters. The resulting cluster centers are used as background models of their respective speakers. Experiments showed that the proposed method consistently produced lower Equal Error Rates (EER) than the conventional UBM approach for 3, 10, and 30 seconds long test utterances, and also for channel mismatch conditions. The proposed method is also compared with the i-vector approach. The three-cluster model achieved the best performance with a 12.4% relative EER reduction in average, compared to the i-vector method. Statistical significance of the results are also given.
Wydawca
Czasopismo
Rocznik
Tom
Strony
127--135
Opis fizyczny
Bibliogr. 30 poz., rys., tab., wykr.
Twórcy
autor
- Department of Electrical-Electronics Engineering, Adana Science and Technology University, Adana, Turkey
autor
- Department of Computer Engineering, Çukurova University, Adana, Turkey
autor
- Department of Electrical-Electronics Engineering, Çukurova University, Adana, Turkey
Bibliografia
- 1. Apsingekar V. R., De Leon P. L. (2009), Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications, IEEE Trans. Audio. Speech. Lang. Processing, 17, 848–853.
- 2. Auckenthaler R., Mason J. S. (2001), Gaussian selection applied to text-independent speaker verification, Proc. Speaker Odyssey: The Speaker Recognition Workshop, 83–88, Greece.
- 3. Beigi H. S. M., Maes S. H., Chaudhari U. V., Sorensen S. (1999), A hierarchical approach to large-scale speaker recognition, European Conference on Speech Communication and Technology, 2203–2206, Hungary.
- 4. Bimbot F., Bonastre J.-F., Fredouille C., Gravier G., Magrin-Chagnolleau I., Meignier S., Merlin T., Ortega-Garcia J., Petrovska-Delacretaz D., Reynolds D. A. (2004), A Tutorial on Text-Independent Speaker Verification, EURASIP J. Adv. Signal Process., 2004, 430–451.
- 5. Brew A., Cunningham P. (2009), Combining Cohort and UBM Models in Open Set Speaker Identification, Seventh International Workshop on Content-Based Multimedia Indexing, 62–67, Crete.
- 6. Brew A., Cunningham P. (2010), Combining cohort and UBM models in open set speaker detection, Multimed. Tools Appl., 48, 141–159.
- 7. Campbell J. P. (1997), Speaker recognition: a tutorial, Proc. IEEE, 85, 1437–1462.
- 8. Campbell W. M., Sturim D. E., Reynolds D. A., Solomonoff A. (2006), SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation, IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, I-97-100, France.
- 9. De Leon P. L., Apsingekar V. (2007), Reducing Speaker Model Search Space in Speaker Identification, Biometrics Symposium, 1–6, USA.
- 10. Dehak N., Kenny P. J., Dehak R., Dumouchel P., Ouellet P. (2011), Front-End Factor Analysis for Speaker Verification, IEEE Trans. Audio. Speech. Lang. Processing, 19, 788–798.
- 11. Doddington G., Przybocki M., Martin A., Reynolds D. (2000), The NIST speaker recognition evaluation – Overview, methodology, systems, results, perspective, Speech Communication, 31, 225–254.
- 12. Gillick L., Cox S. (1989), Some statistical issues in the comparison of speech recognition algorithms, International Conference on Acoustics, Speech, and Signal Processing, 532–535.
- 13. Hossa R., Makowski R. (2016), An Effective Speaker Clustering Method using UBM and Ultra-Short Training Utterances, Archives of Acoustics, 41, 107–118.
- 14. Kenny P. (2005), Joint factor analysis of speaker and session variability: Theory and algorithms, CRIM, Montr. CRIM-06/08-13, 1–17.
- 15. Kenny P., Boulianne G., Ouellet P., Dumouchel P. (2007), Joint Factor Analysis Versus Eigenchannels in Speaker Recognition, IEEE Trans. Audio, Speech Lang. Process., 15, 1435–1447.
- 16. Kinnunen T., Li H. (2010), An overview of text-independent speaker recognition: From features to supervectors, Speech Communication, 52, 12–40.
- 17. McClanahan R. D., De Leon P. L. (2012), Mixture Component Clustering for Efficient Speaker Verification, Interspeech, 1086-1090, USA.
- 18. McClanahan R. D., De Leon P. L. (2015), Reducing computation in an i-vector speaker recognition system using a tree-structured universal background model, Speech Communication, 66, 36–46.
- 19. McLaren M., Vogt R., Baker B., Sridharan S. (2010), Data-Driven Background Dataset Selection for SVM-Based Speaker Verification, IEEE Trans. Audio. Speech. Lang. Processing, 18, 1496–1506.
- 20. Pallet D., Fisher W., Fiscus J. (1990), Tools for the analysis of benchmark speech recognition, International Conference on Acoustics, Speech, and Signal Processing, 97–100.
- 21. Reynolds D. A. (1995), Speaker Identification and Verification using Gaussian mixture speaker odels, Speech Communication, 17, 91–108.
- 22. Reynolds D. A. (1997), Comparison of Background Normalization Methods for Text-Independent Speaker Verification, European Conference on Speech Communication and Technology, Greece.
- 23. Reynolds D. A., Quatieri T. F., Dunn R. B. (2000), Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing, 10, 19–41.
- 24. Reynolds D. A., Rose R. C. (1995), Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., 3, 72–83.
- 25. Richardson F., Reynolds D., Dehak N. (2015), Deep Neural Network Approaches to Speaker and Language Recognition, IEEE Signal Processing Letters, 22, 1671–1675.
- 26. Sadjadi S. O., Slaney M., Heck L. (2013), MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research, Speech and Language Processing Technical Committee Newsletter, IEEE, 1–4.
- 27. Saeidi R., Kinnunen T., Mohammadi H. R. S., Rodman R., Franti P. (2010), Joint frame and Gaussian selection for text independent speaker verification, IEEE International Conference on Acoustics, Speech and Signal Processing, 4530–4533, USA.
- 28. Xiang B., Berger T. (2003), Efficient text-independent speaker verification with structural gaussian mixture models and neural network, IEEE Trans. Speech Audio Process., 11, 447–456.
- 29. Xiong Z., Zheng T. F., Song Z., Soong F., Wu W. (2006), A tree-based kernel selection approach to efficient Gaussian mixture model–universal background model based speaker identification, Speech Communication, 48, 1273–1282.
- 30. Zhu D., Ma B., Li H. (2011), Speaker Verification With Feature-Space MAPLR Parameters, IEEE Trans. Audio. Speech. Lang. Processing, 19, 505–515.
Uwagi
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-86358b3a-7a0e-4d4b-b2ff-9340bf7ad127