PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Prosody annotation for unit selection TTS synthesis

Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This paper concerns prosody annotation and intonation modeling, especially for the application in a corpus based speech synthesis. In order to establish the rules of the automatic intonation modeling, a four hour fully annotated speech database has been acoustically and perceptually analyzed. The speech material included different text types, dialogs and prosodically rich phrases. As the result of these analyses, a basic prosodic annotation including 6 pitch accent types and 5 types of prosodic phrases have been distinguished. Moreover, the analyses made it possible to define rules for a semi-automatic stylization and parametrization of intonation contours for the application in text-to-speech and speech recognition systems. The assumptions behind the stylization method and results of the quantitative and qualitative evaluation of the stylization accuracy based on the speech consisting of ca. 1000 phrases coming from a literary text read by female and male speakers are discussed. Finally, a classification of pitch accents and boundary tones based on the parameterization is presented.
Rocznik
Strony
25--40
Opis fizyczny
Bibliogr. 27 poz., rys., tab.
Twórcy
autor
autor
  • Adam Mickiewicz University, Institute of Linguistics, Międzychodzka 5, 60-371 Poznań, Poland, lin@amu.edu.pl
Bibliografia
  • [1] ADELL J., BONAFONTE A., Towards phone segmentation for concatenation speech synthesis, 5th Speech Synthesis Workshop, Pittsburgh 2004.
  • [2] BECKMAN M. E., AYERS ELAM G., Guidelines for ToBI labelling, available at: http://www.ling.ohio-state.edu/_tobi/ame_tobi/labelling_guide_v3.pdf
  • [3] BREUER S., STOBER K., WAGNER P., ABRESCH J., Dokumentation zum Bonn Open Synthesis System BOSS II, Unveroffentliches Dokument, IKP, Bonn 2000.
  • [4] BREUER S., FRANCUZIK K., DEMENKO G., Analysis of Polish Segmental duration with CART, Proceedings of Speech Prosody 2006, pp. 137.140, Dresden 2006.
  • [5] CLEMENTS G. N., KEYSER S. J., A three-tiered theory of the syllable, Technical Report, Massachusetts Institute of Technology, 1981.
  • [6] DEMENKO G., Analysis of Polish suprasegmentals for needs of speech technology, edited by UAM, Poznań 1999.
  • [7] DEMENKO G., WAGNER A., The stylization of intonation contours, Proceedings of Speech Prosody 2006, Dresden, pp. 141.144, 2006.
  • [8] DEMENKO G., WYPYCH M., BARANOWSKA E., Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis, Speech and Language Technology, ed. PTFON, 7, 79.97, Pozna´n 2003.
  • [9] FUJISAKI H., Dynamic characteristics of voice fundamental frequency in speech and singing, [in:] The Production of Speech, P.F. MacNeilage [Ed.], pp. 39-47, Springer-Verlag, 1983.
  • [10] T'HART J., COLLIER R., COHEN A., A perceptual study of intonation, Cambridge University Press, Cambridge 1990.
  • [11] HESS W., Pitch determination of speech signals, Springer Verlag, New York 1983.
  • [12] HIRST D., VÉRONIS J., IDE N., Analysis of fundamental frequency patterns for multi-lingual synthesis using INTSINT, Proceedings of 2nd ESCA/IEEE Workshop on Speech Synthesis, pp. 77.80, New Paltz, September, 1994.
  • [13] JASSEM W., Fundamentals of the Polish phonetics [in Polish], PWN, Warszawa 1973.
  • [14] MATOUSEK J., TIHELKA D., PSUTKA J., Automatic segmentation for Czech concatenative speech synthesis using statistical approach with boundary-specific correction, Proceedings of Eurospeech, 2003.
  • [15] MERTENS P., The prosogram: Semi-automatic transcription of prosody based on a tonal perception model, B. Bel and I. Marlien [Eds.], Proceedings of Speech Prosody 2004, pp. 549.552, Nara, Japan 2004.
  • [16] MILONE D. H., RUBIO A. J., Prosodic and accentual information for automatic speech recognition, Proceedings of IEEE, 11, 4, 321.333 (2003).
  • [17] MIXDORFF H., A novel approach to the fully automatic extraction of Fujisaki model parameters, Proceedings of ICASSP 2000, 3, 1281.1284, Istanbul 2000.
  • [18] MÖHLER G., Describing intonation with a parametric model, Proceedings of ICSLP98, pp. 2581. 2584, Sydney 1998.
  • [19] NARAYANAN S., ALWAN A., Text to speech synthesis, new paradigms and advances, IMSC Press Multimedia Series, New Jersey 2004.
  • [20] OSTENDORF M., DOGALAKIS V. V., KIMBALL O. A., From HMM's to segment models: A unified view of stochastic modeling for speech recognition, IEEE Trans. on Speech and Audio Proc., 4, 5, 360.378 (1996).
  • [21] PIERREHUMBERT J., The phonology and phonetics of English intonation, PhD dissertation, MIT, 1980.
  • [22] STEFFEN.BATOGOWAM., Accentual structure of Polish [in Polish], Wydawnictwo Naukowe PWN, Warszawa 2000.
  • [23] SJOLANDER K., BESKOW J., WaveSurfer . An open source speech tool, Proceedings ICSLP'00, Beijing, 4, 464.467, 2000.
  • [24] SZYMANSKI M., GROCHOLEWSKI S., Semi-automatic segmentation of speech: manual segmentation strategy; problem space analysis, [in:] Advances in Soft Computing, Computer Recognition Systems: Proceedings of 4th Int. Conference on Computer Recognition, M. Kurzyński [Ed.], pp. 747.755, Springer Verlag, 2005.
  • [25] SZYMANSKI M., GROCHOLEWSKI S., Transcription-based automatic segmentation of speech, Archives of Control Sciences, 15, 465.472 (2005).
  • [26] TAYLOR P., Analysis and synthesis of intonation using the tilt model, J. Acoust. Soc. Am., 107, 3, 1697.1714 (2000).
  • [27] WELLS J. C., SAMPA computer readable phonetic alphabet, [in:] Handbook of Standards and Resources for Spoken Language Systems (Part IV, Section B), D. Gibbon, R. Moore and R. Winski [Eds.], Mouton de Gruyter, Berlin and New York 1998; Available at: www.phon.ucl.ac.uk/home/sampa/polish.htm
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BAT8-0003-0058
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.