Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
During public presentations or interviews, speakers commonly and unconsciously abuse interjections or filled pauses that interfere with speech fluency and negatively affect listeners impression and speech perception. Types of disfluencies and methods of detection are reviewed. Authors carried out a survey which results indicated the most adverse elements for audience. The article presents an approach to automatic detection of the most common type of disfluencies - filled pauses. A base of patterns of filled pauses (prolongated I, prolongated e, mm, Im, xmm, using SAMPA notation) was collected from 72 minutes of recordings of public presentations and interviews of six speakers (3 male, 3 female). Statistical analysis of length and frequency of occurrence of such interjections in recordings are presented. Then, each pattern from training set was described with mean values of first and second formants (F1 and F2). Detection was performed on test set of recordings by recognizing the phonemes using the two formants with efficiency of recognition about 68%. The results of research on disfluencies in speech detection may be applied in a system that analyzes speech and provides feedback of imperfections that occurred during speech in order to help in oratorical skills training. A conceptual prototype of such an application is proposed. Moreover, a base of patterns of most common disfluencies can be used in speech recognition systems to avoid interjections during speech-to-text transcription.
Wydawca
Czasopismo
Rocznik
Tom
Strony
3--10
Opis fizyczny
Bibliogr. 13 poz., rys., tab., wykr.
Twórcy
autor
- Department of Automatic Control and Biomedical Engineering
autor
- Department of Electronics AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków
Bibliografia
- [1] Roberts P. M., Meltzer A., Wilding J., Disfluencies in non-stuttering adults across sample lengths and topics. Journal of Communication Disorders, 42 (2009) 414–427.
- [2] Stouten F., Duchateau J., Martens J. P., Wambacq P., Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation. Speech Comunication, 48 (2006) 1590–1606.
- [3] Myers F., Bakker K., St. Luis K. O., Raphael L. J., Disfluencies in cluttered speech. Journal of Fluency Disorders, 37 (2012) 9–19.
- [4] O’Shaughnessy D., Gabrea M., Automatic identification of filled pauses in spontaneous speech. In Proc.: Canadian Conference on Electrical and Computer Engineering Conference, 2 (2000) 620-624.
- [5] Audhkhasi K., Kandhway K., Formant-based technique for automatic filled-pause detection in spontaneous spoken English. In Proc.: International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009.
- [6] Stouten F., Martens J. P., A feature-based filled pases detection system for Dutch. In Materials of IEEE Workshop on Automatic Speech Recognition and Understanding, 2003.
- [7] Ziolko B., Miga B., Jadczyk T.: Semisupervised production of speech corpora using existing recordings, International 24. Seminar on Speech Production (ISSP’11) , Montreal, 2011
- [8] Wang X., Pols L.C.W, Ten Bosch L.F.M., Analysis Of Context-Dependent Segmental Duration For Automatic Speech Recognition, Proceedings ICSLP ‘94
- [9] Ziółko B., Ziółko M., Time durations of phonemes in Polish language for speech and speaker recognition, Human language technology : challenges for computer science and linguistics : 4th Language and Technology Conference, LTC 2009 : Poznan, Poland, November 6–8, 2009.
- [10] Boersma P., Praat, A system for doing phonetics by computer. Glot International 5:9/10, 341-345.
- [11] Tadeusiewicz R., Sygnał mowy, WKiŁ, Warszawa, 1987.
- [12] Ziółko M., Ziółko B., Przetwarzanie mowy, Wydawnictwa AGH, Kraków 2011.
- [13] Ciota Z., Metody przetwarzania sygnałów akustycznych w komputerowej analizie mowy, Wyd. EXIT Warszawa 2010.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-b01a1512-6efe-43cb-8ec5-92068830f852