Transcription-based automatic segmentation of speech

Szymański, M.; Grocholewski, S.

Artykuł - szczegóły

Tytuł artykułu

Transcription-based automatic segmentation of speech

Autorzy

Szymański M. , Grocholewski S.

Wybrane pełne teksty z tego czasopisma

https://journals.pan.pl/acs/

Identyfikatory

Warianty tytułu

Konferencja

Human Language Technologies as a challenge for Computer Science and Linguistics (2; 21-23.04.2005; Poznań, Poland)

Języki publikacji

Abstrakty

The important element of today's speech systems is the set of recorded wavefiles annotated by a sequence of phonemes and boundary time-points. The manual segmentation of speech is a very laborious task, hence the need for automatic segmenation algorithms. However, the manual segmentation still outperforms the automatic one and at the same time the quality of resulting synthetic voice highly depends on the accuracy of the phonetic segmentation. This paper describes our methodology and implementation of automatic speech segmentation, emphasizing its new elements.

Słowa kluczowe

speech segmentation speech synthesis unit selection

Wydawca

Polish Academy of Sciences, Committee of Automatic Control and Robotics

Czasopismo

Archives of Control Sciences

Rocznik

2005

Tom

Vol. 15, no. 3

Strony

461--468

Opis fizyczny

Bibliogr. 13 poz., rys.

Twórcy

autor

Szymański M.

marcin.szymanski@cs.put.poznan.pl

Poznań University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznań, Poland

autor

Grocholewski S.

stefan.grocholewski@cs.put.poznan.pl

Poznań University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznań, Poland

Bibliografia

[1] J. Adell and A. Bonafonte: Towards phone segmentation for concatenation speech synthesis. In 5th Speech Synthesis Workshop, Pittsburgh. (2004).
[2] N. Campbell and A. Black: Prosody and the selection of source units for concatenative synthesis. In J. van Santen. R. Sproat, J. Olive, and J. Hirschberg (Eds.). Progress in Speech Synthesis, New York. Springer Verlag, (1997), 279-292.
[3] S. Grocholewski: Corpora - speech database for polish diphones. In Eurospeech'97, (1997).
[4] E. Klabbers, K. Stoeber. R. Veldhuis, P. Wagner and S. Breuer: Speech synthesis development made easy: The bonn open synthesis system. Im Eurospeech, (2001).
[5] K. Kvale: Segmentation and labelling of speech. PhD thesis. Institute for Teleteknikk, Trondheim, 1993.
[6] J. Matousek. D. Tihelka and J. Psutka: Automatic segmentation for Czech concatenative speech synthesis using statistical approach with boundary-specific correction. In Eurospeech, (2003).
[7] M. Ostendorf, V. V. Digalakis and O. A. Kimball: From hnim's to segment models: A unified view of stochastic modeling for speech recognition. IEEE Trans, on Speech and Audio Proc, 4(5), (1996), 360-378.
[8] M. Szymanski and S. Grocholewski: Automatyczna segmentacja nagrań na podstawie transkrypcji (automatic speech segmentation based on phonetic transcription). Technical Report RB-023/03, Poznań University of Technology. Institute of Computing Science, 2003.
[9] M. Szymański and S. Grocholewski: Dynamic programming method for fine-tuning the boundary points in automatic segmentation of speech. In Proc. SASR. Krakow, (2005).
[10] M. Szymański and S. Grocholewski: Implementacja algorytmu segmentacji nagrań mowy ze statystycznymi modelami długości fonemów; dobór i testowanie parametrów modeli (implementation of speech segmentation algorithm with statistical duration models: tuning the model parameters). Technical Report RB-004/05, Poznań University of Technology. Institute of Computing Science. 2005.
[11] M. Szymański and S. Grocholewski: Semi-automatic segmentation of speech: manual segmentation strategy; problem space analysis. In Proc. CORES'05. Wroclaw, (2005).
[12] P. A. Taylor and S. D. Isard: Automatic phone segmentation. In Proc. Eurospeech, Genova, Italy, (1991).
[13] S. Young, J. Odell, D. Ollason, V. Valtchev and P. Woodland: The HTK Book (for HTK Version 2.1), Cambridge University, 1997.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSW3-0021-0019