Relaxing the WDO Assumption in Blind Extraction of Speakers from Speech Mixtures

Kasprzak, W.; Ding, N.; Hamada, N.

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Relaxing the WDO Assumption in Blind Extraction of Speakers from Speech Mixtures

Autorzy

Kasprzak W. , Ding N. , Hamada N.

Treść / Zawartość

Pełne teksty:

httpwww_itl_waw_plczasopismajtit2010450.pdf

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The time-frequency masking approach in blind speech extraction consists of two main steps: feature clustering in a space spanned over delay-time and attenuation rate, and spectrogram masking in order to reconstruct the sources. Usually a binary mask is generated under the strong W-disjoint orthogonal (WDO) assumption (disjoint orthogonal representations in the frequency domain). In practice, this assumption is most often violated leading to weak quality of reconstructed sources. In this paper we propose the WDO to be relaxed by allowing some frequency bins to be shared by both sources. As we detect instantaneous fundamental frequencies the mask creation is supported by exploring a harmonic structure of speech. The proposed method is proved to be effective and reliable in experiments with both simulated and real acquired mixtures.

Słowa kluczowe

blind source extraction harmonic frequencies histogram clustering time-frequency masking W-disjoint orthogonal

Wydawca

Instytut Łączności - Państwowy Instytut Badawczy

Czasopismo

Journal of Telecommunications and Information Technology

Rocznik

2010

Tom

nr 4

Strony

50--58

Opis fizyczny

Bibliogr. 13 poz., rys., tab.

Twórcy

autor

Kasprzak W.

autor

Ding N.

autor

Hamada N.

Institute of Control and Computation Engineering, Warsaw University of Technology, Nowowiejska st 15/19, 00-665 Warszawa, Poland, W.Kasprzak@ia.pw.edu.pl

Bibliografia

[1] S. Makino, T.-W. Lee, and H. Sawada, Blind Speech Separation. Berlin: Springer, 1997.
[2] A. Hyv¨arinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York: Wiley, 2001.
[3] A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing. Chichester: Wiley, 2003.
[4] O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking”, IEEE Trans. Sig. Proces., vol. 52, no. 7, pp. 1830–1847, 2004.
[5] S. Rickard, “The DUET blind source separation algorithm”, in [1], pp. 217–237.
[6] M. Aoki, M. Okamoto, S. Aoki, H. Matsui, T. Sakurai, and Y. Kaneda, Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones. Acoust. Sci. & Tech, vol. 22, pp. 149–157, no. 2, 2001.
[7] F. Abrard and Y. Deville, “A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources”, Sig. Proces., vol. 85, pp. 1389-1403, 2005.
[8] S. Arberet, R. Gribonval, and F. Bimbot, “A robust method to count, locate and separate audio sources in a multichannel underdetermined mixture”, IEEE Trans. Sig. Proces., vol. 58, no. 1, pp. 121–133, 2010.
[9] Z. He, A. Cichocki, Y. Li, S. Xie, and S. Sanei, “K-hyperline clustering learning for sparse component analysis”, Sig. Proces., vol. 89, pp. 1011–1022, 2009.
[10] S. Makino, H. Sawada, R.Mukai, and S. Araki, “Blind source separation of convolutive mixture of speech in frequency domain”, IEICE Trans. Fundament., vol. 88, no. 7, pp. 1830–1847, 2004.
[11] A. Araki, H. Sawada, R. Mukai, and S. Makino, “Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors”, Sig. Proces., vol. 87, pp. 1833–1847, 2007.
[12] H. Ouchi and N. Hamada, “Separation of speech mixture by time-frequency masking utilizing sound harmonics, J. Sig. Proces., vol. 13, no. 4, pp. 331–334, 2009.
[13] T. Kobayasi, S. Itahashi, S. Hayamizu, and T. Takezawa, “ASJ continuous speech corpus for research, J. Acoust. Soc. Japan, vol. 48, no. 12, pp. 888–893, 1992 (in Japanese).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BAT8-0020-0017