PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Exploiting Prosody for Automatic Syntactic Phrase Boundary Detection in Speech

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The relation between syntax and prosody is evident, even if the prosodic structure cannot be directly mapped to the syntactic one and vice versa. Syntax-to-prosody mapping is widely used in text-to-speech applications, but prosody-to-syntax mapping is mostly missing from automatic speech recognition/understanding systems. This paper presents an experiment towards filling this gap and evaluating whether a HMM-based automatic prosodic segmentation tool can be used to support the reconstruction of the syntactic structure directly from speech. Results show that up to 85% of syntactic clause boundaries and up to about 70% of embedded syntactic phrase boundaries could be identified based on the detection of phonological phrases. Recall rates do not depend further on syntactic layering, in other words, whether the phrase is multiply embedded or not. Clause boundaries can be well assigned to intonational phrase level in read speech and can be well separated from lower level syntactic phrases based on the type of the aligned phonological phrase(s). These findings can be exploited in speech understanding systems, allowing for the recovery of the skeleton of the syntactic structure, based purely on the speech signal.
Słowa kluczowe
Rocznik
Strony
143--172
Opis fizyczny
Bibliogr. 32 poz., rys., tab., wykr.
Twórcy
autor
  • Department of Telecommunication and Media Informatics, Budapest University for Technology and Economics, Budapest, Hungary
autor
  • Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest, Hungary
Bibliografia
  • [1] A. Babarczy, G. Bálint, G. Hamp, A. Rung (2005), Hunpars: a rule-based sentence parser for Hungarian, Proc. of the 6th International Symposium on Computational Intelligence, Budapest, Hungary.
  • [2] A. Batliner, B. Möbius, G. Möhler, A. Schweitzer and E. Nöth (2006), Prosodic models, automatic speech understanding, and speech synthesis: towards the common ground, Proc. Eurospeech 2001, Vol. 4., Aalborg, Denmark, pp. 2285-2288.
  • [3] S. Becker, M. Schröder, W. J Barry (2006), Rule-based Prosody Prediction for German Text-to-Speech Synthesis, Speech prosody, Dresden, Germany, p. 31.
  • [4] J. Butzberger, H. Murveit, E. Shriberg and P. Price (1992), Spontaneous speech effects in large vocabulary speech recognition applications, Proceedings of the 1992 DARPA Speech and Natural Language Workshop, pp. 339-343.
  • [5] E. Chang, J.-L. Zhou, S. Di, C. Huang and K.-F. Lee (2000), Large vocabulary Mandarin speech recognition with different approaches in modeling tones, International Conference on Spoken Language Processing.
  • [6] A. Christophe, S. Peperkamp, C. Pallier, E. Block, and J. Mehler (2004), Phonological phrase boundaries constrain lexical access: I. Adult data. Journal of Memory and Language, Vol. 51, pp. 523-547.
  • [7] F. Gallwitz, H. Niemann, E. Nöth, W. Warnke (2002), Integrated recognition of words and prosodic phrase boundaries. Speech Communication, Vol. 36. pp. 81-95.
  • [8] G. Gazdar, E. H. Klein, G. K. Pullum and I. A. Sag (1985), Generalized Phrase Structure Grammar, Oxford: Blackwell, and Cambridge, MA: Harvard University Press.
  • [9] K. Hirose, N. Minematsu, Y. Hashimoto and K. Iwano (2001), Continuous Speech Recognition of Japanese Using Prosodic Word Boundaries Detected by Mora Transition Modeling of Fundamental Frequency Contours, Proceedings of ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, Red Bank, NJ, USA, pp.61-66.
  • [10] J. Hirschberg (1993), Pitch Accent in Contex: Predicting Intonation and Prominence from Text, Arftificial Intelligence, Vol. 63, No. 1-2.
  • [11] J. Ito and A. Mester (2008), Rhythmic and interface categories in prosody Ms., UC Santa Cruz. Presented at PRIG (Prosody Interest Group), UCSC.
  • [12] K. Iwano (1999), Prosodic Word Boundary Detection Using Mora Transition Modeling of Fundamental Frequency Contours – Speaker Independent Experiments. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 99), Budapest, Hungary, vol.1, pp. 231-234.
  • [13] E. M. Kaisse (1985), Connected Speech: The Interaction of Syntax and Phonology, Academic Press, San Diego.
  • [14] K. É Kiss (2002), The syntax of Hungarian. Cambridge University Press, UK.
  • [15] I. Koutny, G. Olaszy, P. Olaszi (2000), Prosody prediction from text in Hungarian and its realisation in TTS conversion, International Journal of Speech Technology, Vol. 3-4, pp. 187-200.
  • [16] X. Li, Y. Yang, Y. Lu (2010), How and when prosodic boundaries influence syntactic parsing under different discourse contexts: An ERP study Biological Psychology, Volume 83, Issue 3, March 2010, pp. 250-259.
  • [17] E. Nöth, A. Batliner, A. Kiessling, R. Kompe, and H. Niemann (2000), Verbmobil: the use of prosody in the linguistic components of a speech understanding system, IEEE Trans, ASSP, Vol. 8, pp. 519-532.
  • [18] C. Pollard, I. A. Sag (1994), Head-Driven Phrase Structure Grammar. University of Chicago Press.
  • [19] P. J. Price, M. Ostendorf, S. Shattuck-Hufnagel, C. Fong (1991), The use of prosody for syntactic disambiguation, Journal of the Acoustical Society of America Vol. 90, No. 6, pp. 2956-2970.
  • [20] P. Roach et al. (1996), BABEL: An Eastern European multi-language database, Proc. of the 4th International Conference on Speech and Language Processing, Philadelphia, USA, Vol 3. pp. 1892-1893.
  • [21] E. Selkirk (2001), The Syntax-Phonology Interface, in N. J. Smelser and P. B. Baltes (Eds), International Encyclopaedia of the Social and Behavioural Sciences, Oxford: Pergamon, pp. 15407-15412.
  • [22] E. Shriberg, A. Stolcke, Direct modeling of prosody: An overview of applications in automatic speech processing, Proc. ISCA International Conference on Speech Prosody, 2004.
  • [23] E. Shriberg, A. Stolcke, D. Hakkani-Tür, G. Tür (2000), Prosody-Based Automatic Segmentation of Speech into Sentences and Topics, Speech Communication 32 (1-2), 127-154.
  • [24] K. Silverman (1993), On costumizing prosody in speech synthesis: names and addresses as a case in point, Proc. ARPA Workshop on Human Language Technology, pp. 317-322.
  • [25] K. N. Strelnikov, V. A. Vorobyev, T. V. Chernigovskaya, S. V. Medvedev (2006), Prosodic clues to syntactic processing – a PET and ERP study, NeuroImage Volume 29, Issue 4, pp. 1127-1134.
  • [26] M. Szarvas, T. Fegyó, P. Mihajlik, P. Tatai (2000), Automatic Recognition of Hungarian: Theory and Practice. International Journal of Speech Technology 3: (3-4) pp. 237-251.
  • [27] Gy. Szaszák, K. Nagy and A. Beke (2011), Analysing the correspondence between automatic prosodic segmentation and syntactic structure, Proc of Interspeech 2011, Florence, Italy, pp. 1057-1061.
  • [28] V. Trón, L. Németh, P. Halácsy, A. Kornai, Gy. Gyepesi, D. Varga (2005), Hunmorph: Open source word analysis. Proceedings of the ACL 2005 Workshop on Software, Ann Arbor, MI, pp. 77-85.
  • [29] N. M. Veilleux, M. Ostendorf (1993), Prosody/parse scoring and its application in ATIS. Proc. ARPA Human Language Technology Workshop ’93. pp 335-40.
  • [30] K. Vicsi and Gy. Szaszák (2005), Automatic Segmentation of Continuous Speech on Word Level Based on Supra-segmental Features, International Journal of Speech Technology, Vol. 8 No. 4, pp. 363-370.
  • [31] K. Vicsi, Gy. Szaszák (2005), Using prosody to improve automatic speech recognition, Speech Communication Vol. 52, No. 5, pp. 413-426.
  • [32] M. Wagner (2005), Prosody and recursion, Ph.D. dissertation, MIT.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-6c3cc61f-ca6b-4f21-8e9a-9a50446331e6
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.