Formant frequencies are important cues for characterizing whispered speech. However, it is difficult to exactly estimate its formant by the conventional linear prediction coding algorithm. The main reason is that the formant bandwidth of a whisper is wider than that of voiced speech. This brings up the pole interaction problem that then leads to the result that one or more real roots are regarded as spurious and deleted from the original LP polynomial. To reduce the degradation of pole interactions, an improved root-finding formant estimation algorithm has been proposed. In this algorithm, the whisper formant bandwidth is modified to make the spectral energy of the remained formant polynomial equal to that of the original LP polynomial. Experimental results with six Chinese whispered monophthong phonemes show that the formant frequencies obtained by the proposed algorithm produce a more reliable formant spectrum than the one that does not consider the pole interaction effect.
A speaker recognition system based on joint factor analysis (JFA) is proposed to improve whisper- ing speakers’ recognition rate under channel mismatch. The system estimated separately the eigenvoice and the eigenchannel before calculating the corresponding speaker and the channel factors. Finally, a channel-free speaker model was built to describe accurately a speaker using model compensation. The test results from the whispered speech databases obtained under eight different channels showed that the correct recognition rate of a recognition system based on JFA was higher than that of the Gaussian Mixture Model–Universal Background Model. In particular, the recognition rate in cellphone channel tests increased significantly.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.