Speech Segmentation Algorithm Based on an Analysis of the Normalized Power Spectral Density

Pekar, D.; Tsikhanenka, S.

Artykuł - szczegóły

Tytuł artykułu

Speech Segmentation Algorithm Based on an Analysis of the Normalized Power Spectral Density

Autorzy

Pekar D. , Tsikhanenka S.

Treść / Zawartość

Pełne teksty:

httpwww_itl_waw_plczasopismajtit2010444.pdf

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

This article demonstrates a new approach to speaker independent phoneme detection. The core of the algorithm is to measure the distance between normalized power spectral densities in adjacent, short-time segments and verify it based on velocity of changes of values of short-time signal energy analysis. The results of experiment analysis indicate that proposed algorithm allows revealing a phoneme structure of pronounced speech with high probability. The advantages of this algorithm are absence of any prior information on a signal or model of phonemes and speakers that allows the algorithm to be speaker independent and have a low computation complexity.

Słowa kluczowe

phoneme segmentation power spectral density short-term signal energy speaker independent voice systems

Wydawca

Instytut Łączności - Państwowy Instytut Badawczy

Czasopismo

Journal of Telecommunications and Information Technology

Rocznik

2010

Tom

nr 4

Strony

44--49

Opis fizyczny

Bibliogr. 9 poz., rys.

Twórcy

autor

Pekar D.

autor

Tsikhanenka S.

Belarusian State University, Nezavisimosti av. 4, Minsk, Belarus, 220030, pekar.dima@gmail.com

Bibliografia

[1] A. Saheli and A. Abolfazl, “Speech recognition from PSD using neural network”, in Proc. Int. MultiConf. Engin. Comp. Scient. IMECS 2009, Hong Kong, 2009, vol. 1, pp. 174–176.
[2] B. Gajic and K. Paliwal, “Robust parameters for speech recognition based on subband spectral centroid histograms”, in Proc. 7th Eur. Conf. Speech Commun. Technol. EUROSPEECH 2001, Aalborg, Denmark, 2001.
[3] C. Espy-Wilson and S. Manocha, “A new set of features for text-independent speaker identification”, in Proc. Int. Conf. Spoken Lang. Proces. INTERSPEECH 2006, Pittsburgh, USA, 2006.
[4] P. Labutin and S Koval, “Speaker identification based on the statistical analysis of f0”, in Proc. 16th Annual Conference IAFPA 2007, Plymouth, UK, 2007.
[5] T. Becker and M. Jessen, “Forensic speaker verification using formant features and gaussian mixture models”, in Proc. Int. Conf. Spoken Lang. Proces. INTERSPEECH 2008, Brisbane, Australia, 2008.
[6] E. H. Kim and K. H. Hyun, “Robust emotion recognition feature, frequency range of meaningful signal”, in Proc. IEEE Int. Worksh. Robot Human Interact. Commun., Nashville, USA, 2005.
[7] M. A. Al-Alaoui and L. Al-Kanj, “Speech recognition using artificial neural networks and hidden Markov models”, IEEE Multidiscipl. Engin. Educ. Mag., vol. 3, no. 3, 2008.
[8] N. Bhatnagar, “A modified spectral subtraction method combined with perceptual weighting for speech enhancement”, M.Sc. thesis, The University of Texas, Dallas, August 2002.
[9] M. S. Medvedev, “Ispolzovanije vejvlet-preobrazowanija dla postrojenia modelej fonem russkowo jazyka”, Wiestnik Sibirskogo Federalnego Universiteta, no. 9, p. 198, 2006 (in Russian).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BAT8-0020-0016