Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models

Hirose, K.; Sun, Q.; Minematsu, N.

Artykuł - szczegóły

Tytuł artykułu

Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models

Autorzy

Hirose K. , Sun Q. , Minematsu N.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

A method for generating sentence F0 contours of Standard Chinese speech is developed. It is based on superposing tone components on phrase components in logarithmic frequency. While tone components are language specific, phrase components are assumed to be more language universal. Taking this situation into account, the method treats two kinds of components differently. The tone components are generated by concatenating F0 patterns of tone nuclei, which are predicted by a corpus-based scheme, while the phrase components are generated by rules. Experiments on F0 contour generation were conducted using 100 news utterances by a female speaker. First experiments were conducted on the generation of tone components, with phrase components of the original utterances being used unchanged. The results showed that the method could generate F0 contours close to those of target speech. Speech synthesis was conducted by substituting original F0 contours to generated ones by TD-PSOLA. A high score 4.5 in 5-point scale was obtained on average as the result of listening experiments on the quality of synthetic speech. Second experiments were on the generated phrase components, with the tone components extracted from the original utterances. Although the synthetic speech with generated F0 contours sounded mostly natural, there were occasional "degraded sounds", because of mismatch between the phrase and the tone components. To cope with the mismatch, a two-step method was developed, where information of the phrase contours was used for the prediction of tone components. Validity on the method was shown through perceptual experiments on synthesized speech.

Słowa kluczowe

speech synthesis F0 contour generation Standard Chinese superpositional model tone nucleus model

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Komitet Akustyki PAN
Polskie Towarzystwo Akustyczne

Czasopismo

Archives of Acoustics

Rocznik

2007

Tom

Vol. 32, No. 1

Strony

41--50

Opis fizyczny

Bibliogr. 11 poz., rys., tab.

Twórcy

autor

Hirose K.

autor

Sun Q.

autor

Minematsu N.

University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan, hirose@gavo.t.u-tokyo.ac.jp

Bibliografia

[1] CHEN S., HWANG S., WANG Y., An RNN-base prosodic information synthesizer for Mandarin Text-to-speech, IEEE Trans. on Speech and Audio Processing, 6, 3, 226.239 (1998).
[2] TAO J., CAI L., Clustering and feature learning based F0 prediction for Chinese speech synthesis, Proc. Int. Conference on Speech and Language Processing, pp. 2097. 200, Denver 2002.
[3] NI J., HIROSE K., Synthesis of fundamental frequency contours of standard Chinese sentences from tone sandhi and focus conditions, Proc. Int. Conference on Speech and Language Processing, Beijing, pp. 195.198, 2000.
[4] FUJISAKI H., HIROSE K., Analysis of voice fundamental frequency contours for declarative sentences of Japanese, J. Acoust. Soc. Japan (E), 5, 4, 233.242 (1984).
[5] FUJISAKI H., HIROSE K., HALLE P., LEI H., Analysis and modelling of tonal features in polysyllabic words and sentences of the standard Chinese, Proc. Int. Conference on Speech and Language Processing, Kobe, pp. 841.844, 1990-10.
[6] HIROSE K., SATO, K., ASANO, Y., AND MINEMATSU, N., Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: application to emotional speech synthesis, Speech Communication, 46, 3.4, 385.404 (2005).
[7] GU W., HIROSE K., FUJISAKI H., Automatic extraction of tone command parameters for the model of F0 contour generation for Standard Chinese, IEICE Trans. Information and Systems, E87-D, 5, 1079.1085 (2004).
[8] ZHANG J., HIROSE K., Tone nucleus modeling for Chinese lexical tone recognition, Speech Communication, 42, 3.4, 447-466 (2004).
[9] HIROSE K., FUJISAKI H., A system for the synthesis of high-quality speech from texts on general weather conditions, IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, E76-A, 11, 1971.1980 (1993).
[10] NARUSAWA S., MINEMATSU N., HIROSE K., FUJISAKI H., A method for automatic extraction of model parameters from fundamental frequency contours of speech, Proc. IEEE Int. Conference on Acoustics, Speech and Signal Processing, pp.509.512, Orlando 2002.
[11] SUN Q., HIROSE K., GU W., MINEMATSU N., Rule-based generation of phrase components in two-step synthesis of fundamental frequency contours of Mandarin, Proc. International Conf. on Speech Prosody, pp.561.564, Dresden 2006.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BAT8-0003-0059