PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Real time recognition of speakers from internet audio stream

Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded) speech signal streams. We show an influence of the audio encoder (e.g., bitrate) on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM) approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator), acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.
Słowa kluczowe
Rocznik
Strony
223--233
Opis fizyczny
Bibliogr. 21 poz., rys.
Twórcy
autor
  • Faculty of Computing, Poznan University of Technology, Poznan, Poland
autor
  • Faculty of Computing, Poznan University of Technology, Poznan, Poland
  • Faculty of Computing, Poznan University of Technology, Poznan, Poland
autor
  • Faculty of Computing, Poznan University of Technology, Poznan, Poland
Bibliografia
  • [1] S. Araki, T. Hori, M. Fujimoto, S. Watanabe, T. Yoshioka, T. Nakatani, and A. Nakamura. Online meeting recognizer with multichannel speaker diarization. In Signals, Systems and Computers (ASILOMAR), 2010 Conference Record of the Forty Fourth Asilomar Conference on, pages 1697–1701, Nov 2010.
  • [2] D. Blatt and A. Hero. On tests for global maximum of the log-likelihood function. Information Theory, IEEE Transactions on, 53(7):2510–2525, July 2007.
  • [3] M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, and M. Dietz. ISO/IEC MPEG-2 Advanced Audio Coding. J. Audio Eng. Soc, 45(10):789–814, 1997.
  • [4] M. Brookes. VOICEBOX: Speech Processing Toolbox for MATLAB, 2005.
  • [5] J. Dattorro. Convex optimization and Euclidean distance geometry. Lulu. com, 2008.
  • [6] J. R. Hershey and R. A. Olsen. Approximating the Kullback Leibler divergence between gaussian mixture models. In ICASSP (4), pages 317–320, 2007.
  • [7] T. Jiang and J. Han. Map-based audio coding compensation for speaker recognition. Journal of Signal and Information Processing, 2:165, 2011.
  • [8] R. D. Maesschalck, D. Jouan-Rimbaud, and D. Massart. The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50(1):1 – 18, 2000.
  • [9] T. Marciniak, R. Weychan, A. Dabrowski, and A. Krzykowska. Speaker recognition based on short Polish sequences. IEEE SPA: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings, pages 95–98, 2010.
  • [10] T. Marciniak, R. Weychan, A. Dabrowski, and A. Krzykowska. Influence of silence removal on speaker recognition based on short Polish sequences. IEEE SPA: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings, pages 159–163, 2011.
  • [11] T. Marciniak, R. Weychan, A. Stankiewicz, and A. Dabrowski. Biometric speech signal processing in a system with digital signal processor. Bulletin of the Polish Academy of Sciences. Technical Sciences, Vol. 62, nr 3:589–594, 2014.
  • [12] S. Molau, M. Pitz, R. Schluter, and H. Ney. Computing Mel-frequency cepstral coefficients on the power spectrum. In Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, volume 1, pages 73–76, 2001.
  • [13] K. Park, J.-S. Park, and Y.-H. Oh. GMM adaptation based online speaker segmentation for spoken document retrieval. Consumer Electronics, IEEE Transactions on, 56(2):1123–1129, 2010.
  • [14] Z. Piotrowski, J. Wojtun, and K. Kaminski. Subscriber authentication using GMM and tms320c6713dsp. Przeglad Elektrotechniczny, (12a/2012):127–130, 2012.
  • [15] A. Plinge and G. A. Fink. Online multi-speaker tracking using multiple microphone arrays informed by auditory scene analysis. In Signal Processing Conference (EUSIPCO), 2013 Proceedings of the 21st European, pages 1–5, Sept 2013.
  • [16] D. Reynolds. Gaussian mixture models. Encyclopedia of Biometrics, pages 659–663, 2009.
  • [17] J. B. Tenenbaum, V. D. Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
  • [18] G. Wen, L. Jiang, and J. Wen. Using locally estimated geodesic distance to optimize neighborhood graph for isometric data embedding. Pattern Recognition, 41(7):2226 – 2236, 2008.
  • [19] R. Weychan, T. Marciniak, and A. Dabrowski. Analysis of differences between MFCC after multiple GSM transcodings. Przeglad Elektrotechniczny, pages 24–29, 2012.
  • [20] R. Weychan, T. Marciniak, A. Stankiewicz, and A. Dabrowski. Real time speaker recognition from internet radio. IEEE SPA: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings, pages 128–132, 2014.
  • [21] R. Weychan, A. Stankiewicz, T. Marciniak, and A. Dabrowski. Improving of speaker identification from mobile telephone calls. In Multimedia Communications, Services and Security, volume 429 of Communications in Computer and Information Science, pages 254–264. 2014.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-5509bc78-0c8b-481a-b154-0bc327efae33
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.