Text Independent Automatic Speaker Recognition System using fusion of features

Majda-Zdancewicz, E.; Dobrowolski, A. P.

Artykuł - szczegóły

Tytuł artykułu

Text Independent Automatic Speaker Recognition System using fusion of features

Autorzy

Majda-Zdancewicz E. , Dobrowolski A. P.

Wybrane pełne teksty z tego czasopisma

http://pe.org.pl/

Identyfikatory

Warianty tytułu

Automatyczny system rozpoznawania mówcy niezależnie od wypowiadanego tekstu bazujący na fuzji cech

Języki publikacji

Abstrakty

This paper presents a speaker recognition system, which is independent of the linguistic context. The solved task includes: the preprocessing stage, the segmentation of speech signal leading to the extraction of features based on three techniques, selection of the most important features, and the classification stage involving a serial combination of classifiers. Sets of descriptors were obtained using three techniques: cepstral coefficients, mel-cepstral coefficients and original weighted cepstral coefficients. Optimal robust “Voice Print” has been determined using fisher coefficients and PCA analysis. Experiments on the 2002 NIST Speaker Recognition Evaluation corpus show that the proposed system is able to recognise the speaker, regardless on the speech content, even language content with great accuracy.

W pracy przedstawiono system rozpoznawania mówcy niezależny od tekstu wypowiedzi. Rozwiązane problemy obejmują: etap przetwarzania wstępnego, segmentację sygnału mowy prowadzącą do etapu ekstrakcji cech bazującej na trzech technikach analizy sygnału mowy, selekcję najbardziej istotnych cech oraz etap klasyfikacji obejmujący analizę kaskady klasyfikatorów. Zestaw cech uzyskano przy użyciu trzech technik: cepstrum, mel-cepstrum oraz autorskich ważonych cech cesptralnych. Optymalny wektor cech wyekstrahowano przy użyciu współczynników istotności Fishera oraz analizy PCA. Eksperymenty z wykorzystaniem bazy 2002 NIST Speaker Recognition Evaluation pokazują, że przedstawiony system rozpoznaje mówcę niezależnie od ograniczeń lingwistycznych treści, a nawet języka wypowiedzi, z zadowalającą dokładnością.

Słowa kluczowe

automatic speaker recognition features extraction features selection PCA

rozpoznawanie mowy automatyczne ekstrakcja cech selekcja cech PCA

Wydawca

Wydawnictwo SIGMA-NOT

Czasopismo

Przegląd Elektrotechniczny

Rocznik

2015

Tom

R. 91, nr 10

Strony

247--251

Opis fizyczny

Bibliogr. 12 poz., rys., tab., wykr.

Twórcy

autor

Majda-Zdancewicz E.

Ewelina.Majda@wat.edu.pl

Military University of Technology, Faculty of Electronics, Institute of Electronic System, 2 Kaliskiego street, 00-908 Warsaw

autor

Dobrowolski A. P.

Andrzej.Dobrowolski@wat.edu.pl

Military University of Technology, Faculty of Electronics, Institute of Electronic System, 2 Kaliskiego street, 00-908 Warsaw

Bibliografia

[1] Furui S. Recent advantages in speaker recognition, Pattern Recognition Letters, 18 (1997), no. 9, pp. 859-872.
[2] Kinnunen T., Li H., An overview of text-independent speaker recognition: from features to supervectors, Speech Communications 52, 2010, pp. 12-40
[3] Orman D., Arslan L., Frequency analysis of speaker recognition, Proc. Speaker Odyssey: the Speaker Recognition Workshop, Greece, 2001, pp.219-222
[4] Lupu E., Emerich S., Speaker identification approach based on time domain extracted features, 52 nd International Symposium EMLAR 2010, Croata, pp. 355-358
[5] Zheng F., Zhang G., Song Z., Comparison of Different Implementations of MFCC, J. Computer Science &Technology 16(6),pp. 582-589, 2001
[6] Huang X., Acero A., Hon H.W., Spoken language processing: A guide to theory, algorithm, and system development, Prentice Hall PTR, 2001
[7] Hermansky H., Perceptual linear prediction analysis for speech, J. Acoustic Soc. Amer., 87, 1990, pp. 1738-1752
[8] Ming J., Hazen T., Glass J. R., Reynolds D. A., Robust Speaker Recognition In Noisy Conditions, IEEE Transactions on Audio, Speech, and Language Processing, 15, (2007), no. 5, pp. 1711-1723
[9] Majda E., Dobrowolski A. P., Modeling and optimization of the feature generator for speaker recognition systems, Przegląd Elektrotechniczny, 88, (2012), no. 12a, pp. 131-136.
[10] Kopparapu S.K., Laxminarayana M., Choice of Mel Filter Bank in Computing MFCC a resamples Speech, 10 th International Conference on Information Science, Signal processing and their Applications, 2010, pp. 121-124, Malaysia
[11] Kruk M., Osowski S., Koktysz R. , Recognition of Colon Cells Using Ensemble of Classifiers, International Conference on Neural networks, 2007, pp. 345-349, Orlando
[12] Dobrowolski A. P, Majda E. Application of homomorphic methods of speech signal processing in speakers recognition system, Przegląd Elektrotechniczny, 88, (2012), no.6, pp. 12-16

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-b63eeac3-a79e-4916-8ee6-367803133254