PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Acoustic Features of Filled Pauses in Polish Task-Oriented Dialogues

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Filled pauses (FPs) have proved to be more than valuable cues to speech production processes and important units in discourse analysis. Some aspects of their form and occurrence patterns have been shown to be speaker- and language-specific. In the present study, basic acoustic properties of FPs in Polish task-oriented dialogues are explored. A set of FPs was extracted from a corpus of twenty task- oriented dialogues on the basis of available annotations. After initial scrutiny and selection, a subset of the signals underwent a series of pitch, formant frequency and voice quality analyses. A significant amount of variation found in the realisations of FPs justifies their potential application in speaker recognition systems. Regular monosegmental FPs were confirmed to show relatively stable basic acoustic parameters, which allows for their easy identification and measurements but it may result in less significant differences among the speakers.
Rocznik
Strony
63--73
Opis fizyczny
Bibliogr. 75 poz., tab., wykr.
Twórcy
  • Institute of Linguistics, Adam Mickiewicz University al. Niepodległosci 4, 61-874 Poznan, Poland
Bibliografia
  • 1. Abelin Å., Allwood J. (2000), Cross-linguistic Interpretation of Emotional Prosody, ITR Workshop on Speech and Emotion, Newcastle, Northern Ireland, UK.
  • 2. Abelin Å. (2008), Anger or Fear? – Cross cultural multimodal interpretations of emotional expressions, [in:] Emotions in the human voice, Izdebski K. [Ed.], pp. 65-73, Vol. 1, Plural Publ. Co., San Diego.
  • 3. Abercrombie D. (1967), Elements of general phonetics, Edinburgh University Press, Edinburgh.
  • 4. Albalá M.J., Battaner E., Gil J., Llisterri J., Machuca M., Marrero V., Rios A. (2009), Vowel formant structure and speaker identification. A perceptual study, CIP 2009- 3a Conferencia Iberica de Per- cepcao, Guimaraes, Portugal, 8-10 Julho 2009.
  • 5. Audhkhasi k., Kandhway k., Deshmukh O., Verma A. (2009), Formant-based technique for automatic filled-pause detection in spontaneous spoken English, Proc. ICASSP, pp. 4857-4860, Taipei, Taiwan.
  • 6. Beigi H. (2011a), Speaker Recognition [in:] Biometrics, Yang J. [Ed.], InTech. ISBN: 978-953-307-618-8, Available from: http://www.intechopen.com/books/biometrics/speaker-recognition.
  • 7. Beigi H. (2011b), Fundamentals of Speaker Recognition, Springer, New York.
  • 8. Benuš Š. (2009), Variability and stability in collaborative dialogues: Turn-taking and filled pauses, Proceedings of Interspeech 2009, pp. 709-799, Brighton.
  • 9. Boersma P., Weenink D. (2013), Praat: doing phonetics by computer [Computer program], Version 5.3 (retrieved http://www.praat.org/).
  • 10. Boomer D. s. (1965), Hesitation and grammatical encoding, Language and Speech, 8, 148-158.
  • 11. Bortfeld H., Leon S., Bloom J., Schober M., Brennan S. (2001), Disfluency Rates in Conversation: Effects of Age, Relationship, Topic, Role, and Gender, Language and Speech, 44, 2, 123-147.
  • 12. Ten Bosch L. (2003), Emotions, speech and the ASR framework, Speech Communication, 40, 1-2, 213-225.
  • 13. Burkhardt F., Audibert N., Malatesta L., Türk O., Arslan L., Aubergé V. (2006), Emotional Prosody - Does Culture Makes A Difference?, Proceedings of Speech Prosody 2006 Conference, Dresden, Germany.
  • 14. Campbell N. (2002), Towards a grammar of spoken language: Incorporating paralinguistic information, Proceedings ICSLP 2002, Denver, Colorado.
  • 15. Campbell N. (2004), Listening between the lines. A study of paralinguistic information carried by tone of voice, pp. 13-16, International Symposium on Total Aspects of Languages TAL2004, Beijing, China.
  • 16. Campbell N. (2007), Whom we laugh affects how we laugh, Proc. Workshop on “The Phonetics of Laughter”, pp. 61-65, Saarbrücken, Germany.
  • 17. Candea M., Vasilescu I., Adda-Decker M. (2005), Inter- and intra-language acoustic analysis of autonomous fillers, Proceedings of DISS05, Aix-en- Provence, France.
  • 18. Christenfeld N., Schachter S., Bilous F. (1991), Filled pauses and gestures: It’s not coincidence, Journal of Psycholinguistic Research, 20, 1-10.
  • 19. Crystal D. (1963), A perspective for paralanguage, Le Maitre Phonetique, 120, 25-29.
  • 20. Crystal D. (1966), The linguistic status of prosodic and paralinguistic features, Proceedings of the University of Newcastle-upon Tyne Philosophical Society, 1, 8, 93-108.
  • 21. Crystal D. (1974), Paralinguistics, [in:] Current trends in linguistics, T. A. Sebeok [Ed.], 12, pp. 265295, Mouton, The Hague.
  • 22. Crystal D. (1975), Paralinguistics, [in:] The body as a medium of expression, Benthall J., Polhemus T. (Eds.], pp. 162-174, Institute of Conteporary Arts, London.
  • 23. Demenko G. (2000), Analysis for suprasegmental features for speaker verification, 8-th Australian International Conference on Speech Science and Technology, pp. 294-299, Canberra.
  • 24. Doddington G. (2001), Speaker recognition based on idiolectal differences between speakers, Proceedings of the Eurospeech, 4, 2521-2524.
  • 25. Duez D. (2001), Acoustic-phonetic Characteristics of Filled Pauses in Spontaneous French, ITRW on Disfluency in Spontaneous Speech, pp. 41-44, Edinburgh.
  • 26. Esposito A., McCullough K.E., Quek F. (2001), Disfluencies in gesture: Gestural correlates to filled and, unfilled speech pauses, IEEE Workshop on Cues in Communication, Kauai, Hawaii.
  • 27. Farrús M., Hernando J. (2009), Using jitter and shimmer in speaker verification, IET Signal Processing, 3, 4, 247-257.
  • 28. Farrús M., Hernando J., Ejarque P. (2007), Jitter and shimmer measurements for speaker recognition, pp. 778-781, Proceedings of Interspeech 2007 Conference, Antwerp, Belgium.
  • 29. Francuzik K., Karpiński M., Kleśta J. (2002), A Preliminary Study of the Intonational Phrase, Nuclear Melody and Pauses in Polish Semi-Spontaneous Narration, Proceedings of Speech Prosody 2002 Conference, Aix-en-Provence, France.
  • 30. Fromkin V. A. (1971), The nonanomalous nature of anomalous utterances, Language, 47, 27-52.
  • 31. Fromkin V. A. [Ed.] (1973), Speech errors as linguistic evidence, Mouton Publishers, The Hague.
  • 32. Garg G., Ward N. (2006), Detecting Filled Pauses in Tutorial Dialogs, Technical Report UTEP-CS-06-32.
  • 33. Gobl C., Ni Chasaide A. (2003), The role of voice quality in communicating emotion, mood and attitude, Speech Communication, 40, 1-2, 189-212.
  • 34. Goldman-Eisler F. (1968), Psycholinguistics. Experiments in spontaneous speech, The Academic Press, London and New York.
  • 35. Gonzalez-Rodriguez J. (2008), Forensic Automatic Speaker Recognition: Fiction or Science ?, Proceedings of Interspeech 2008, Brisbane, Australia.
  • 36. Gravano A., Levitan R., Willson L., Benuš Š., Hirschberg J., Nenkova A. (2011), Acoustic and Prosodic Correlates of Social Behavior, Proceedings of Interspeech 2011, Florence, Italy.
  • 37. Jarmołowicz-Nowikow E., Karpiński M. (2012), The form and function of pointing gestures in task- oriented dialogues, Conference of the International Society for Gesture Studies (Book of Abstracts), Lund, Sweden.
  • 38. Jassem W. (1973), Principles of Acoustic Phonetics, [in Polish: Podstawy fonetyki akustycznej], PWN, Warszawa.
  • 39. Johnstone T., Scherer K. R. (2000), Vocal communication of emotion, [in:] The Handbook of Emotions, 2nd Ed., Lewis M., Haviland j. [Eds.], pp. 226-235, Guilford, New York.
  • 40. Karpiński M. (2006), Structure and intonation of Polish task-oriented dialogue [in Polish], Wydawnictwo Naukowe UAM, Poznań.
  • 41. Karpiński M. (2007), Selected quasi-lexical and non- lexical units in Polish map task dialogues, Archives of Acoustics, 32, 1, 51-65.
  • 42. Karpiński M., Jarmołowicz-Nowikow E. (2010), Prosodic and Gestural Features of Phrase-internal Dis- fluencies in Polish Spontaneous Utterances, Proceedings of Speech Prosody 2010 Conference, Chicago.
  • 43. Keller E. (2005), The analysis of voice quality in speech processing, [in:] Lecture Notes in Computer Science, Chollet G., Esposito A., Faundez-Zanuy M. [Eds.], 3445, pp. 54-73, Springer-Verlag.
  • 44. Ladd D., Silverman K.A., Tolkmitt F., Bergmann G., Scherer K. R. (1985), Evidence for the independent function of intonation contour type, voice quality, and f0 range in signalling speaker affect, Journal of the Acoustical Society of America, 78, 2, 435-444.
  • 45. Laver J. (1980), The Phonetic Description of Voice Quality, Cambridge University Press, Cambridge.
  • 46. Local J., Kelly J. (1980), Projection and ‘silences’: Notes on phonetic and conversational structure, Human Studies, 9, 185-204.
  • 47. Ludlow C. L., Coulter D. C., Bassich C. J. (1982), Relationships between vocal jitter, age, sex, and smoking, Journal of Acoustic Society of America, 71, 55-56.
  • 48. Maclay H., Osgood C. E. (1959), Hesitation Phenomena in Spontaneous English Speech, Word, 1, 1943.
  • 49. Mary L., Yegnanarayana B. (2008), Extraction and representation of prosodic features for language and speaker recognition, Speech Communication, 50, 782796.
  • 50. Nishinuma Y., Hayashi A. (2004), Silent Pauses in Simulated Request-Refusal Type Dialogues. A Phonetic Analysis of German, Korean, and Japanese, Symposium of Nordic Association for Japanese and Korean Studies (NAJAKS), Göteborg, Sweden.
  • 51. Ogden R. (2001), Turn transition, creak and, glottal stop in Finnish talk-in-interaction, Journal of the International Phonetic Association, 31, 139-152.
  • 52. Pakosz M. (1982), Intonation and attitude, Lingua, 56, 153-178.
  • 53. Patel S., Scherer K.R., Björkner E., Sundberg J. (2011), Mapping emotions into acoustic space: The role of voice production, Biological Psychology, 87, 93-98.
  • 54. Rosner B.S., Pickering J.B. (1994), Vowel perception and, production, Oxford University Press, Oxford.
  • 55. Rusz J., Cmejla R., Ruzickova H., Ruzicka E. (2011), Quantitative acoustic measurements for characterisation of speech and voice disorders in early untreated Parkinson’s disease, Journal of Acoustic Society of America, 129, 1, 350-367.
  • 56. Saks M. j., Koehler J. (2005), The coming paradigm shift in forensic identification science, Science, 309, 5736, 892-895.
  • 57. Saville-Troike M. (1985), The place of silence in an integrated theory of communication, [in:] Perspectives on silence, Tannen D., Saville-Troike M. [Eds.], Norwood, NJ, Ablex.
  • 58. Scherer K.R., Banse R., Wallbott H. (2001) Emotion inferences from vocal expression correlate across languages and cultures, Journal of Cross- Cultural Psychology, 32, 76-92.
  • 59. Schötz S. (2006), Prosodic Cues in Human and Machine Estimation of Female and Male Speaker Age, [in:] Nordic Prosody: Proceedings of the IX-th Conference, Bruce G., Horne M. [Eds.], pp. 215-223, Peter Lang Publishing, Lund.
  • 60. Shriberg E. (1994), Preliminaries to a Theory of Speech Disfluencies, PhD Thesis, Dep. of Psychology, University of California, Berkeley.
  • 61. Shriberg E. (1999), Phonetic consequences of speech disfluency, Proceedings of International Congress of Phonetic Sciences, pp. 619-622, San Francisco.
  • 62. Shriberg E. (2001), To “Errrr” is Human: Ecology and Acoustics of Speech Disfluencies, Journal of the International Phonetic Association, 31, 1, 153-169.
  • 63. Shriberg E. (2007), High-level Features in Speaker Recognition, [in:] Speaker Classification I, Mueller C. [Ed.], pp. 241-259, Springer-Verlag, Berlin - Heidelberg.
  • 64. Stepanova S. (2007), Some features of filled hesitation pauses in spontaneous Russian, Proceedings of ICPhS XVI, pp. 1325-1328, Saarbruecken.
  • 65. Swerts M. (1998), Filled pauses as markers of discourse structure, Journal of Pragmatics, 30, 485-496.
  • 66. Trager G. L. (1960), Taos III, paralanguage, Anthropological Linguistics, 2, 2, 24-30.
  • 67. Trager G.L. (1961), The typology of paralanguage, Anthropological Linguistics, 3,1, 17-21.
  • 68. Trager G.L. (1964), Paralanguage: A first approximation, [in:] Language in culture and, society, Dell Hymes [Ed.], pp. 274-288, Harper and Row, New York.
  • 69. Trask R. L. (1996), A Dictionary of Phonetics and Phonology, Routledge, London.
  • 70. Vasilescu I., Adda-Decker M. (2007), A cross language study of acoustic and prosodic characteristics of vocalic hesitations, [in:] Fundamentals of Verbal and Nonverbal Communication and the Biometric Issue, Esposito A., Bratanic M., Keller E., Mari- NARO M. [Eds.], pp. 140-148, IOS Press.
  • 71. Vasilescu I., Candea M., Adda-Decker M. (2004), Hesitations autonomies dans 8 langues: une etude acoustique et perceptive, Workshop MIDL04, Paris, France.
  • 72. Vasilescu I., Candea M., Adda-Decker M. (2005), Perceptual salience of language-specific acoustic differences in autonomous fillers across eight languages, Pro ceedings of InterSpeech 2005, pp. 1773-1776, Lisbon, Portugal.
  • 73. Wallbott H. G., Scherer R. S. (1979), Normal speech - normal people. Speculations on paralinguis- tic features, arousal, and social competence attribution, Proceedings of the Social Psychology and Language Conference, Bristol.
  • 74. Ward N. (2004), Pragmatic Functions of Prosodic Features in Non-Lexical Utterances, Proceedings of Speech Prosody 2004 Conference, pp. 325-328, Nara, Japan.
  • 75. Wu Ch.-H., Yan G.-L. (2004), Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition, Journal of VLSI Signal Processing, 36, 91-104.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-ff5b5081-5da2-451e-bd5b-e0417bd26f0d
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.