Polish phoneme statistics obtained on large set of written texts

Ziółko, B.; Gałka, J.; Ziółko, M.

Artykuł - szczegóły

Tytuł artykułu

Polish phoneme statistics obtained on large set of written texts

Autorzy

Ziółko B. , Gałka J. , Ziółko M.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Statystyki polskich fonemów uzyskane z dużych zbiorów tekstów

Języki publikacji

Abstrakty

The phonetical statistics were collected from several Polish corpora. The paper is a summary of the data which are phoneme n-grams and some phenomena in the statistics. Triphone statistics apply context-dependent speech units which have an important role in speech recognition systems and were never calculated for a large set of Polish written texts. The standard phonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.

W niniejszej pracy zaprezentowano opis statystyk głosek języka polskiego zebranych z dużej liczby tekstów. Triady głosek pełnią istotną rolę w rozpoznawaniu mowy. Omówiono obserwacje dotyczące zebranych statystyk i przedstawiono listy najpopularniejszych elementów.

Słowa kluczowe

NLP triphone statistics speech processing Polish

przetwarzanie języka naturalnego statystyki głosek przetwarzanie mowy

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2009

Tom

Vol. 10

Strony

97--106

Opis fizyczny

Bibliogr. 14 poz., rys., tab.

Twórcy

autor

Ziółko B.

bziolko@agh.edu.pl

Department of Electronics, AGH University of Science and Technology Krakow, Poland

autor

Gałka J.

jgalka@agh.edu.pl

Department of Electronics, AGH University of Science and Technology Krakow, Poland

autor

Ziółko M.

ziolko@agh.edu.pl

Department of Electronics, AGH University of Science and Technology Krakow, Poland

Bibliografia

[1] Agirre E., Ansa O., Mart´ınez D., Hovy E.: Enriching wordnet concepts with topic signatures, Procceedings of the SIGLEX Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, 2001
[2] Bellegarda J. R.: Large vocabulary speech recognition with multispan statistical language models, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, pp. 76–84, 2000
[3] Denes P. B.: Statistics of spoken English, The Journal of the Acoustical Society of America, vol. 34, pp. 1978–1979, 1962
[4] Yannakoudakis E. J., Hutton P. J.: An assessment of n-phoneme statistics in phoneme guessing algorithms which aim to incorporate phonotactic constraints, Speech Communication, vol. 11, pp. 581–602, 1992
[5] Basztura C.: Rozmawiac z komputerem, (Eng. To speak with computers). Format, 1992
[6] Young S., Evermann G., Gales M., Hain T., Kershaw D., Moore G., Odell J., Ollason D., Povey D., Valtchev V., Woodland P.: HTK Book. UK: Cambridge University Engineering Department, 2005
[7] Ziółko B., Gałka J., Manandhar S., Wilson R., Ziółko M.: Triphone statistics for polish language, Proceedings of 3rd Language and Technology Conference, 2007
[8] Demenko G., Wypych M., Baranowska E.: Implementation of grapheme-tophoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis, Speech and Language Technology, PTFon, Poznan, vol. 7, no. 17, 2003
[9] Young S.: Large vocabulary continuous speech recognition: a review, IEEE Signal Processing Magazine, vol. 13(5), pp. 45–57, 1996
[10] Rabiner L., Juang B. H.: Fundamentals of speech recognition. New Jersey: PTR Prentice-Hall, Inc., 1993
[11] Ostaszewska D., Tambor J.: Fonetyka i fonologia współczesnego języka Polskiego (eng. Phonetics and phonology of modern Polish language). PWN, 2000
[12] Steffen-Batóg M., Nowakowski P.: An algorithm for phonetic transcription of ortographic texts in Polish, Studia Phonetica Posnaniensia, vol. 3, 1993
[13] Daelemans W., Bosch, van den, A.: Language-independent data-oriented grapheme-to-phoneme conversion, Progress in Speech Synthesis, New York: Springer-Verlag, 1997
[14] Jassem K.: A phonemic transcription and syllable division rule engine, Onomastica-Copernicus Research Colloquium, Edinburgh, 1996 106

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-AGH1-0023-0091