Speaker Model Clustering to Construct Background Models for Speaker Verification

Dişken, G.; Tüfekci, Z.; Çevik, U.

doi:10.1515/aoa-2017-0014

Artykuł - szczegóły

Tytuł artykułu

Speaker Model Clustering to Construct Background Models for Speaker Verification

Autorzy

Dişken G. , Tüfekci Z. , Çevik U.

Treść / Zawartość

Pełne teksty:

Dişken_Speaker Model Clustering_1_2017.pdf

Pobierz

Identyfikatory

DOI

10.1515/aoa-2017-0014

Warianty tytułu

Języki publikacji

Abstrakty

Conventional speaker recognition systems use the Universal Background Model (UBM) as an imposter for all speakers. In this paper, speaker models are clustered to obtain better imposter model representations for speaker verification purpose. First, a UBM is trained, and speaker models are adapted from the UBM. Then, the k-means algorithm with the Euclidean distance measure is applied to the speaker models. The speakers are divided into two, three, four, and five clusters. The resulting cluster centers are used as background models of their respective speakers. Experiments showed that the proposed method consistently produced lower Equal Error Rates (EER) than the conventional UBM approach for 3, 10, and 30 seconds long test utterances, and also for channel mismatch conditions. The proposed method is also compared with the i-vector approach. The three-cluster model achieved the best performance with a 12.4% relative EER reduction in average, compared to the i-vector method. Statistical significance of the results are also given.

Słowa kluczowe

Gaussian mixture models k-means imposter models speaker clustering speaker verification

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Polska Akademia Nauk

Czasopismo

Archives of Acoustics

Rocznik

2017

Tom

Vol. 42, No. 1

Strony

127--135

Opis fizyczny

Bibliogr. 30 poz., rys., tab., wykr.

Twórcy

autor

Dişken G.

gdisken@adanabtu.edu.tr

Department of Electrical-Electronics Engineering, Adana Science and Technology University, Adana, Turkey

autor

Tüfekci Z.

Department of Computer Engineering, Çukurova University, Adana, Turkey

autor

Çevik U.

Department of Electrical-Electronics Engineering, Çukurova University, Adana, Turkey

Bibliografia

1. Apsingekar V. R., De Leon P. L. (2009), Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications, IEEE Trans. Audio. Speech. Lang. Processing, 17, 848–853.
2. Auckenthaler R., Mason J. S. (2001), Gaussian selection applied to text-independent speaker verification, Proc. Speaker Odyssey: The Speaker Recognition Workshop, 83–88, Greece.
3. Beigi H. S. M., Maes S. H., Chaudhari U. V., Sorensen S. (1999), A hierarchical approach to large-scale speaker recognition, European Conference on Speech Communication and Technology, 2203–2206, Hungary.
4. Bimbot F., Bonastre J.-F., Fredouille C., Gravier G., Magrin-Chagnolleau I., Meignier S., Merlin T., Ortega-Garcia J., Petrovska-Delacretaz D., Reynolds D. A. (2004), A Tutorial on Text-Independent Speaker Verification, EURASIP J. Adv. Signal Process., 2004, 430–451.
5. Brew A., Cunningham P. (2009), Combining Cohort and UBM Models in Open Set Speaker Identification, Seventh International Workshop on Content-Based Multimedia Indexing, 62–67, Crete.
6. Brew A., Cunningham P. (2010), Combining cohort and UBM models in open set speaker detection, Multimed. Tools Appl., 48, 141–159.
7. Campbell J. P. (1997), Speaker recognition: a tutorial, Proc. IEEE, 85, 1437–1462.
8. Campbell W. M., Sturim D. E., Reynolds D. A., Solomonoff A. (2006), SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation, IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, I-97-100, France.
9. De Leon P. L., Apsingekar V. (2007), Reducing Speaker Model Search Space in Speaker Identification, Biometrics Symposium, 1–6, USA.
10. Dehak N., Kenny P. J., Dehak R., Dumouchel P., Ouellet P. (2011), Front-End Factor Analysis for Speaker Verification, IEEE Trans. Audio. Speech. Lang. Processing, 19, 788–798.
11. Doddington G., Przybocki M., Martin A., Reynolds D. (2000), The NIST speaker recognition evaluation – Overview, methodology, systems, results, perspective, Speech Communication, 31, 225–254.
12. Gillick L., Cox S. (1989), Some statistical issues in the comparison of speech recognition algorithms, International Conference on Acoustics, Speech, and Signal Processing, 532–535.
13. Hossa R., Makowski R. (2016), An Effective Speaker Clustering Method using UBM and Ultra-Short Training Utterances, Archives of Acoustics, 41, 107–118.
14. Kenny P. (2005), Joint factor analysis of speaker and session variability: Theory and algorithms, CRIM, Montr. CRIM-06/08-13, 1–17.
15. Kenny P., Boulianne G., Ouellet P., Dumouchel P. (2007), Joint Factor Analysis Versus Eigenchannels in Speaker Recognition, IEEE Trans. Audio, Speech Lang. Process., 15, 1435–1447.
16. Kinnunen T., Li H. (2010), An overview of text-independent speaker recognition: From features to supervectors, Speech Communication, 52, 12–40.
17. McClanahan R. D., De Leon P. L. (2012), Mixture Component Clustering for Efficient Speaker Verification, Interspeech, 1086-1090, USA.
18. McClanahan R. D., De Leon P. L. (2015), Reducing computation in an i-vector speaker recognition system using a tree-structured universal background model, Speech Communication, 66, 36–46.
19. McLaren M., Vogt R., Baker B., Sridharan S. (2010), Data-Driven Background Dataset Selection for SVM-Based Speaker Verification, IEEE Trans. Audio. Speech. Lang. Processing, 18, 1496–1506.
20. Pallet D., Fisher W., Fiscus J. (1990), Tools for the analysis of benchmark speech recognition, International Conference on Acoustics, Speech, and Signal Processing, 97–100.
21. Reynolds D. A. (1995), Speaker Identification and Verification using Gaussian mixture speaker odels, Speech Communication, 17, 91–108.
22. Reynolds D. A. (1997), Comparison of Background Normalization Methods for Text-Independent Speaker Verification, European Conference on Speech Communication and Technology, Greece.
23. Reynolds D. A., Quatieri T. F., Dunn R. B. (2000), Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing, 10, 19–41.
24. Reynolds D. A., Rose R. C. (1995), Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., 3, 72–83.
25. Richardson F., Reynolds D., Dehak N. (2015), Deep Neural Network Approaches to Speaker and Language Recognition, IEEE Signal Processing Letters, 22, 1671–1675.
26. Sadjadi S. O., Slaney M., Heck L. (2013), MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research, Speech and Language Processing Technical Committee Newsletter, IEEE, 1–4.
27. Saeidi R., Kinnunen T., Mohammadi H. R. S., Rodman R., Franti P. (2010), Joint frame and Gaussian selection for text independent speaker verification, IEEE International Conference on Acoustics, Speech and Signal Processing, 4530–4533, USA.
28. Xiang B., Berger T. (2003), Efficient text-independent speaker verification with structural gaussian mixture models and neural network, IEEE Trans. Speech Audio Process., 11, 447–456.
29. Xiong Z., Zheng T. F., Song Z., Soong F., Wu W. (2006), A tree-based kernel selection approach to efficient Gaussian mixture model–universal background model based speaker identification, Speech Communication, 48, 1273–1282.
30. Zhu D., Ma B., Li H. (2011), Speaker Verification With Feature-Space MAPLR Parameters, IEEE Trans. Audio. Speech. Lang. Processing, 19, 505–515.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-86358b3a-7a0e-4d4b-b2ff-9340bf7ad127