Implementation of Polish speech synthesis for the BOSS system

Demenko, G.; Mobius, B.; Klessa, K.

Artykuł - szczegóły

Tytuł artykułu

Implementation of Polish speech synthesis for the BOSS system

Autorzy

Demenko G. , Mobius B. , Klessa K.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The subsequent sections focus on the implementation of Polish TTS modules in the BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.

Słowa kluczowe

speech synthesis text-to-speech (TTS) unit selection duration prediction

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2010

Tom

Vol. 58, nr 3

Strony

371--376

Opis fizyczny

Bibliogr. 26 poz., rys.

Twórcy

autor

Demenko G.

autor

Mobius B.

autor

Klessa K.

Institute of Linguistics, Department of Phonetics, Adam Mickiewicz University, 4 Niepodległości Ave., 61-874 Poznań, Poland

Bibliografia

[1] A.J. Hunt and A.W. Black, “Unit selection in a concatenative speech synthesis system using a large speech database”, Proc. IEEE Int. Conf. on Acoustics and Speech Signal Processing 1, 373–376 (1996).
[2] A.W Black and P. Taylor, “Automatically clustering similar units for unit selection in speech synthesis”, Proc. European Conf. on Speech Communication and Technology 2, 601–604 (1997).
[3] A.P. Breen and P. Jackson, “Non-uniform unit selection and the similarity metric within BT’s Laureate TTS system”, Proc. Third Int. Workshop on Speech Synthesis 1, 373–376 (1998).
[4] M. Beutnagel, M. Mohri, and M. Riley, “Rapid unit selection from a large speech corpus for concatenative speech synthesis”, Proc. Eur. Con. on Speech Communication and Technology 2, 607–610 (1999).
[5] A. Conkie, “Robust unit selection system for speech synthesis”, Collected Papers of the 137th Meeting of the Acoustical Society of America and the 2nd Convention of the European Acoustics Association: Forum Acusticum Berlin 1, 1PSCB\10 (1999).
[6] M. Balestri, A. Pacchiotti, S. Quazza, P.L. Salza, and S. Sandri, “Choose the best to modify the least: a new generation concatenative synthesis system”, Proc. Eur. Conf. Speech Communication and Technology 5, 2291–2294 (1999)
[7] N. Iwahashi, and Y. Sagisaka, “Speech segment network approach for an optimal synthesis unit set”, Computer Speech and Language 9, 335–352 (1995).
[8] B. M¨obius, ”Rare events and closed domains: two delicate concepts in speech synthesis”, Int. J. Speech Technology 6 (1), 57–71 (2003).
[9] J.P.H. Santen and A.L. Buchsbaum, “Methods for optimal text selection”, Proc. Eur. Conf. on Speech Communication and Technology 2, 553–556 (1997).
[10] ECESS: European Center of Excellence on Speech Synthesis, http://www.ecess.eu (2008).
[11] SYNSIG: Speech Synthesis Special Interest Group of ISCA, http://www.synsig.org/index.php/Blizzard Challengeil (2008).
[12] BOSS: The Bonn Open Synthesis System, http://www.i .unibonn.de/search?SearchableText=boss (2008).
[13] S. Breuer, “Multifunktionale und multilinguale Unit-Selection-Sprachsynthese – Designprinzipien fur Architektur und Sprachbausteine” , Phd Thesis, Universitat Bonn, Bonn, 2008.
[14] E. Klabbers and K. Stober, R. Veldhuis, P. Wagner, and S. Breuer, “Speech synthesis development made easy”, The Bonn Open Synthesis System 1, 521–524 (2001).
[15] K. St ¨ober, T. Portele, P. Wagner, and W. Hess, “Synthesis by word concatenation”, Proc. Eur. Conf. on Speech Communication and Technology 2, 619–622 (1999).
[16] SAMPA for Polish Homepage, http://www.phon.ucl.ac.uk/home/sampa/polish.htm (2008).
[17] W. Jassem, “Illustrations of the IPA”, Polish J. Int. Phonetic Association 33 (1) 103–107 (2003).
[18] G. Demenko, M. Wypych, and E. Baranowska, “Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis”, Speech and Language Technology 7, 79–97 (2003).
[19] M. Szymański, and S. Grocholewski, “Semi-automatic segmentation of speech: manual segmentation strategy. Problem space analysis”, Advances in Soft Computing, Computer Recognition Systems 1, 747–755 (2005).
[20] K. Sj ¨olander and J. Beskow, Wavesurfer, http://www.speech.kth.se/wavesurfer/ (2008).
[21] K. Klessa, , M. Szymański, S. Breuer, and G. Demenko, “Optimization of Polish segmental duration prediction with CART”, 6th ISCA Workshop on Speech Synthesis (SSW-6) Proc. 1, CD-ROM (2007).
[22] G. Demenko, J. Bachan, B. M¨obius, K. Klessa, M. Szymański, and S. Grocholewski, “Development and evaluation of Polish speech corpus for unit selection speech synthesis systems”, Proc.: Interspeech 2008 1, CD-ROM (2008).
[23] S. Breuer, K. Francuzik, G. Demenko, and M. Szymański, “Analysis of Polish segmental duration with CART”, Proc. Speech Prosody Conf. 1, CD-ROM (2006).
[24] S. Breuer and J. Abresch, Unit selection speech synthesis for a directory enquiries service”, Proc. ICPhS Barcelona 2003 1, CD-ROM (2003).
[25] D. Gibbon and J. Bachan, “An automatic close copy speech synthesis tool for large-scale speech corpus evaluation”, Proc. Sixth International Language Resources and Evaluation (LREC’08) 1, CD-ROM (2008).
[26] ELDA: Evaluations and Language resources Distribution Agency, http://www.elda.org/ (2008).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPG8-0039-0003