This paper presents experimental results on whispered speech recognition based on Teager Energy Operator for linear and mel cepstral coefficients including the Cepstral Mean Subtraction normalization technique. The feature vectors taken into consideration are Linear Frequency Cepstral Coefficients, Teager Energy based Linear Frequency Cepstral Coefficients, Mel Frequency Cepstral Coefficients and Teager Energy based Mel Frequency Cepstral Coefficients. A speaker dependent scenario is used. For the recognition process, Dynamic Time Warping and Hidden Markov Models methods are applied. Results show a respectable improvement in whispered speech recognition as achieved by using the Teager Energy Operator with Cepstral Mean Subtraction.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The article presents research on the automatic whispery speech recognition. The main task was to find dependences between a number of triphone classes (number of leaves in decision tree) and the total number of Gaussian distributions and therefore, to determine optimal values, for which the quality of speech recognition is best. Moreover, it was found, how these dependences differ between normal and whispery speech, what was not done earlier, and this is the innovative part of this work. Based on the performed experiments and obtained results one can say that the number of triphone classes (number of leaves) for whispered speech should be significantly lower than for normal speech.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.