Real time recognition of speakers from internet audio stream

Weychan, R.; Marciniak, T.; Stankiewicz, A.; Dabrowski, A.

doi:10.1515/fcds-2015-0014

Artykuł - szczegóły

Tytuł artykułu

Real time recognition of speakers from internet audio stream

Autorzy

Weychan R. , Marciniak T. , Stankiewicz A. , Dabrowski A.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.1515/fcds-2015-0014

Warianty tytułu

Języki publikacji

Abstrakty

In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded) speech signal streams. We show an influence of the audio encoder (e.g., bitrate) on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM) approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator), acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.

Słowa kluczowe

speaker recognition GMM internet radio ISOMAP

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2015

Tom

Vol. 40, No. 3

Strony

223--233

Opis fizyczny

Bibliogr. 21 poz., rys.

Twórcy

autor

Weychan R.

Faculty of Computing, Poznan University of Technology, Poznan, Poland

autor

Marciniak T.

Faculty of Computing, Poznan University of Technology, Poznan, Poland

autor

Stankiewicz A.

Faculty of Computing, Poznan University of Technology, Poznan, Poland

autor

Dabrowski A.

Faculty of Computing, Poznan University of Technology, Poznan, Poland

Bibliografia

[1] S. Araki, T. Hori, M. Fujimoto, S. Watanabe, T. Yoshioka, T. Nakatani, and A. Nakamura. Online meeting recognizer with multichannel speaker diarization. In Signals, Systems and Computers (ASILOMAR), 2010 Conference Record of the Forty Fourth Asilomar Conference on, pages 1697–1701, Nov 2010.
[2] D. Blatt and A. Hero. On tests for global maximum of the log-likelihood function. Information Theory, IEEE Transactions on, 53(7):2510–2525, July 2007.
[3] M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, and M. Dietz. ISO/IEC MPEG-2 Advanced Audio Coding. J. Audio Eng. Soc, 45(10):789–814, 1997.
[4] M. Brookes. VOICEBOX: Speech Processing Toolbox for MATLAB, 2005.
[5] J. Dattorro. Convex optimization and Euclidean distance geometry. Lulu. com, 2008.
[6] J. R. Hershey and R. A. Olsen. Approximating the Kullback Leibler divergence between gaussian mixture models. In ICASSP (4), pages 317–320, 2007.
[7] T. Jiang and J. Han. Map-based audio coding compensation for speaker recognition. Journal of Signal and Information Processing, 2:165, 2011.
[8] R. D. Maesschalck, D. Jouan-Rimbaud, and D. Massart. The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50(1):1 – 18, 2000.
[9] T. Marciniak, R. Weychan, A. Dabrowski, and A. Krzykowska. Speaker recognition based on short Polish sequences. IEEE SPA: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings, pages 95–98, 2010.
[10] T. Marciniak, R. Weychan, A. Dabrowski, and A. Krzykowska. Influence of silence removal on speaker recognition based on short Polish sequences. IEEE SPA: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings, pages 159–163, 2011.
[11] T. Marciniak, R. Weychan, A. Stankiewicz, and A. Dabrowski. Biometric speech signal processing in a system with digital signal processor. Bulletin of the Polish Academy of Sciences. Technical Sciences, Vol. 62, nr 3:589–594, 2014.
[12] S. Molau, M. Pitz, R. Schluter, and H. Ney. Computing Mel-frequency cepstral coefficients on the power spectrum. In Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, volume 1, pages 73–76, 2001.
[13] K. Park, J.-S. Park, and Y.-H. Oh. GMM adaptation based online speaker segmentation for spoken document retrieval. Consumer Electronics, IEEE Transactions on, 56(2):1123–1129, 2010.
[14] Z. Piotrowski, J. Wojtun, and K. Kaminski. Subscriber authentication using GMM and tms320c6713dsp. Przeglad Elektrotechniczny, (12a/2012):127–130, 2012.
[15] A. Plinge and G. A. Fink. Online multi-speaker tracking using multiple microphone arrays informed by auditory scene analysis. In Signal Processing Conference (EUSIPCO), 2013 Proceedings of the 21st European, pages 1–5, Sept 2013.
[16] D. Reynolds. Gaussian mixture models. Encyclopedia of Biometrics, pages 659–663, 2009.
[17] J. B. Tenenbaum, V. D. Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
[18] G. Wen, L. Jiang, and J. Wen. Using locally estimated geodesic distance to optimize neighborhood graph for isometric data embedding. Pattern Recognition, 41(7):2226 – 2236, 2008.
[19] R. Weychan, T. Marciniak, and A. Dabrowski. Analysis of differences between MFCC after multiple GSM transcodings. Przeglad Elektrotechniczny, pages 24–29, 2012.
[20] R. Weychan, T. Marciniak, A. Stankiewicz, and A. Dabrowski. Real time speaker recognition from internet radio. IEEE SPA: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings, pages 128–132, 2014.
[21] R. Weychan, A. Stankiewicz, T. Marciniak, and A. Dabrowski. Improving of speaker identification from mobile telephone calls. In Multimedia Communications, Services and Security, volume 429 of Communications in Computer and Information Science, pages 254–264. 2014.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-5509bc78-0c8b-481a-b154-0bc327efae33