On different approaches to syntactic analysis into bi-lexical dependencies : An empirical comparison of direct, PCFG-based, and HPSG-based parsers

Ivanova, A.; Oepen, S.; Dridan, R.; Flickinger, D.; Øvrelid, L.; Lapponi, E.

doi:10.15398/jlm.v4i1.101

Artykuł - szczegóły

Tytuł artykułu

On different approaches to syntactic analysis into bi-lexical dependencies : An empirical comparison of direct, PCFG-based, and HPSG-based parsers

Autorzy

Ivanova A. , Oepen S. , Dridan R. , Flickinger D. , Øvrelid L. , Lapponi E.

Treść / Zawartość

Pełne teksty:

Ivanova_On different approaches to syntactic_1_2016.pdf

Pobierz

Identyfikatory

DOI

10.15398/jlm.v4i1.101

Warianty tytułu

Języki publikacji

Abstrakty

We compare three different approaches to parsing into syntactic, bilexical dependencies for English: a ‘direct’ data-driven dependenci parser, a statistical phrase structure parser, and a hybrid, ‘deep’ grammar-driven parser. The analyses from the latter two are postconverted to bi-lexical dependencies. Through this ‘reduction’ of All three approaches to syntactic dependency parsers, we determine empirically what performance can be obtained for a common set of dependenci types for English; in- and out-of-domain experimentation ranges over diverse text types. In doing so, we observe what trade-offs apply along three dimensions: accuracy, efficiency, and resilience to domain variation. Our results suggest that the hand-built grammar in one of our parsers helps in both accuracy and cross-domain parsing performance. When evaluated extrinsically in two downstream tasks – negation resolution and semantic dependency parsing – these accuracy gains do sometimes but not always translate into improved end-to-end performance.

Słowa kluczowe

syntactic dependency parsing domain variation

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Journal of Language Modelling

Rocznik

2016

Tom

Vol. 4, No. 1

Strony

113--144

Opis fizyczny

Bibliogr. 49 poz., rys., tab., wykr.

Twórcy

autor

Ivanova A.

University of Oslo, Department of Informatics

autor

Oepen S.

University of Oslo, Department of Informatics

autor

Dridan R.

University of Oslo, Department of Informatics

autor

Flickinger D.

Stanford University, Center for the Study of Language and Information

autor

Øvrelid L.

University of Oslo, Department of Informatics

autor

Lapponi E.

University of Oslo, Department of Informatics

Bibliografia

[1] Peter Adolphs, Stephan Oepen, Ulrich Callmeier, Berthold Crysmann, Dan Flickinger, and Bernd Kiefer (2008), Some Fine Points of Hybrid Natural Language Parsing, in Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco.
[2] Ezra Black, Steve Abney, Dan Flickinger, Claudia Gdaniec, Ralph Grishman, Phil Harrison, Don Hindle, Robert Ingria, Fred Jelinek, Judith Klavans, Mark Liberman, Mitch Marcus, S. Roukos, Beatrice Santorini, and Tomek Strzalkowski (1991), A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars, in Proceedings of the Workshop on Speech and Natural Language, pp. 306-311, Pacific Grove, USA.
[3] Bernd Bohnet and Joakim Nivre (2012), A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Conference on Natural Language Learning, pp. 1455-1465, Jeju Island, Korea.
[4] Johan Bos, Edward Briscoe, Aoife Cahill, John Carroll, Stephen Clark, Ann Copestake, Dan Flickinger, Josef van Genabith, Julia Hockenmaier, Aravind Joshi, Ronald Kaplan, Tracy Holloway King, Sandra Kuebler, Dekang Lin, Jan Tore Lønning, Christopher Manning, Yusuke Miyao, Joakim Nivre, Stephan Oepen, Kenji Sagae, Nianwen Xue, and Yi Zhang, editors (2008), Workshop on Cross-Framework and Cross-Domain Parser Evaluation, Manchester, UK.
[5] Ted Briscoe and John Carroll (2006), Evaluating the Accuracy of an Unlexicalised Statistical Parser on the PARC DepBank, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th Meeting of the Association for Computational Linguistics, pp. 41-48, Sydney, Australia.
[6] Ulrich Callmeier (2002), Preprocessing and Encoding Techniques in PET, in Stephan Oepen, Daniel Flickinger, J. Tsujii, and Hans Uszkoreit, editors, Collaborative Language Engineering. A Case Study in Efficient Grammar-Based Processing, pp. 127-140, CSLI Publications, Stanford, CA.
[7] David Carter (1997), The TreeBanker. A Tool for Supervised Training of Parsed Corpora, in Proceedings of the Workshop on Computational Environments for Grammar Development and Linguistic Engineering, pp. 9-15, Madrid, Spain.
[8] Daniel Cer, Marie-Catherine de Marneffe, Dan Jurafsky, and Chris Manning (2010), Parsing to Stanford Dependencies. Trade-Offs between Speed and Accuracy, in Proceedings of the 7th International Conference on Language Resources and Evaluation, pp. 1628-1632, Valletta, Malta.
[9] Stephen Clark and James R. Curran (2007), Formalism-Independent Parser Evaluation with CCG and DepBank, in Proceedings of the 45th Meeting of the Association for Computational Linguistics, pp. 248-255, Prague, Czech Republic.
[10] Marie-Catherine de Marneffe and Christopher D. Manning (2008), The Stanford Typed Dependencies Representation, in Proceedings of the COLING Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1-8, Manchester, UK.
[11] Rebecca Dridan (2013), Ubertagging. Joint Segmentation and Supertagging for English, in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1-10, Seattle, WA, USA.
[12] Jacob Elming, Anders Johannsen, Sigrid Klerke, Emanuele Lapponi, Hector Martinez, and Anders Søgaard (2013), Down-Stream Effects of Tree-to-Dependency Conversions, in Proceedings of Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 617-626, Atlanta, GA, USA.
[13] Dan Flickinger (2000), On Building a More Efficient Grammar by Exploiting Types, Natural Language Engineering, 6 (1): 15-28.
[14] Dan Flickinger, Yi Zhang, and Valia Kordoni (2012), DeepBank. A Dynamically Annotated Treebank of the Wall Street Journal, in Proceedings of the 11th International Workshop on Treebanks and Linguistic Theories, pp. 85-96, Edições Colibri, Lisbon, Portugal.
[15] Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux, Joakim Nivre, Deirdre Hogan, and Josef van Genabith (2011), From News to Comment. Resources and Benchmarks for Parsing the Language of Web 2.0, in Proceedings of the 2011 International Joint Conference on Natural Language Processing, pp. 893-901, Chiang Mai, Thailand.
[16] Timothy A. D. Fowler and Gerald Penn (2010), Accurate Context-Free Parsing with Combinatory Categorial Grammar, in Proceedings of the 48th Meeting of the Association for Computational Linguistics, pp. 335-344, Uppsala, Sweden.
[17] W. Nelson Francis and Henry Kučera (1982), Frequency Analysis of English Usage. Lexicon and Grammar, Houghton Mifflin, New York, USA.
[18] Daniel Gildea (2001), Corpus Variation and Parser Performance, in Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 167-202, Pittsburgh, USA.
[19] Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Ondřej Bojar, Silvie Cinková, Eva Fučíková, Marie Mikulová, Petr Pajas, Jan Popelka, Jiří Semecký, Jana Šindlerová, Jan Štěpánek, Josef Toman, Zdeňka Urešová, and Zdeněk Žabokrtský (2012), Announcing Prague Czech-English Dependency Treebank 2.0, in Proceedings of the 8th International Conference on Language Resources and Evaluation, pp. 3153-3160, Istanbul, Turkey.
[20] Julia Hockenmaier and Mark Steedman (2007), CCGbank. A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank, Computational Linguistics, 33: 355-396.
[21] Angelina Ivanova, Stephan Oepen, and Lilja Øvrelid (2013), Survey on Parsing Three Dependency Representations for English, in Proceedings of the 51th Meeting of the Association for Computational Linguistics, pp. 31-37, Sofia, Bulgaria.
[22] Angelina Ivanova, Stephan Oepen, Lilja Øvrelid, and Dan Flickinger (2012), Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies, in Proceedings of the Sixth Linguistic Annotation Workshop, pp. 2-11, Jeju, Republic of Korea.
[23] Tracy Holloway King, Richard Crouch, Stefan Riezler, Mary Dalrymple, and Ronald M. Kaplan (2003), The PARC 700 Dependency Bank, in Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora, pp. 1-8, Budapest, Hungary.
[24] Emanuele Lapponi, Jonathon Read, and Lilja Øvrelid (2012a), Representing and Resolving Negation for Sentiment Analysis, in Proceedings of the 2012 ICDM Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction, Brussels, Belgium.
[25] Emanuele Lapponi, Erik Velldal, Lilja Øvrelid, and Jonathon Read (2012b), UiO2: Sequence-Labeling Negation Using Dependency Features, in Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, pp. 319-327, Montréal, Canada.
[26] Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz (1993), Building a Large Annotated Corpora of English. The Penn Treebank, Computational Linguistics, 19: 313-330.
[27] T. André F. Martins and C. Mariana S. Almeida (2014), Priberam: A Turbo Semantic Parser with Second Order Features, in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 471-476, Association for Computational Linguistics, Dublin, Ireland.
[28] Ryan T. McDonald and Joakim Nivre (2007), Characterizing the Errors of Data-Driven Dependency Parsing Models, in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Conference on Natural Language Learning, pp. 122–131, Prague, Czech Republic.
[29] Makoto Miwa, Sampo Pyysalo, Tadayoshi Hara, and Jun’ichi Tsujii (2010), Evaluating Dependency Representations for Event Extraction, in Proceedings of the 23rd International Conference on Computational Linguistics, pp. 779-787.
[30] Yusuke Miyao, Rune Sætre, Kenji Sagae, Takuya Matsuzaki, and Jun’ichi Tsujii (2008), Task-Oriented Evaluation of Syntactic Parsers and Their Representations, in Proceedings of the 46th Meeting of the Association for Computational Linguistics, pp. 46-54, Columbus, OH, USA.
[31] Yusuke Miyao, Kenji Sagae, and Jun’ichi Tsujii (2007), Towards Framework-Independent Evaluation of Deep Linguistic Parsers, in Proceedings of the 2007 Workshop on Grammar Engineering across Frameworks, pp. 238-258, Palo Alto, California.
[32] Yusuke Miyao and Jun’ichi Tsujii (2008), Feature Forest Models for Probabilistic HPSG Parsing, Computational Linguistics, 34 (1): 35-80.
[33] Diego Mollá and Ben Hutchinson (2003), Intrinsic Versus Extrinsic Evaluations of Parsing Systems, in Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: Are Evaluation Methods, Metrics and Resources Reusable?, pp. 43-50, Budapest, Hungary.
[34] Roser Morante and Eduardo Blanco (2012), *SEM 2012 Shared Task. Resolving the Scope and Focus of Negation, in Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, pp. 265-274, Montréal, Canada.
[35] Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret (2007), The CoNLL 2007 Shared Task on Dependency Parsing, in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Conference on Natural Language Learning, pp. 915-932, Prague, Czech Republic.
[36] Stephan Oepen and John Carroll (2000), Ambiguity Packing in Constraint-Based Parsing. Practical Results, in Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 162-169, Seattle, WA, USA.
[37] Stephan Oepen, Daniel Flickinger, Kristina Toutanova, and Christopher D. Manning (2004), LinGO Redwoods. A Rich and Dynamic Treebank for HPSG, Research on Language and Computation, 2 (4): 575-596.
[38] Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Daniel Zeman, Silvie Cinkova, Dan Flickinger, Jan Hajic, and Zdenka Uresova (2015), SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing, in Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 915-926, Denver, CO, USA.
[39] Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Daniel Zeman, Dan Flickinger, Jan Hajič, Angelina Ivanova, and Yi Zhang (2014), SemEval 2014 Task 8. Broad-Coverage Semantic Dependency Parsing, in Proceedings of the 8th International Workshop on Semantic Evaluation, Dublin, Ireland.
[40] Stephan Oepen and Jan Tore Lønning (2006), Discriminant-Based MRS Banking, in Proceedings of the 5th International Conference on Language Resources and Evaluation, pp. 1250-1255, Genoa, Italy.
[41] Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein (2006), Learning Accurate, Compact, and Interpretable Tree Annotation, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th Meeting of the Association for Computational Linguistics, pp. 433-440, Sydney, Australia.
[42] Barbara Plank and Gertjan van Noord (2010), Grammar-Driven versus Data-Driven. Which Parsing System is more Affected by Domain Shifts?, in Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground, pp. 25-33, Association for Computational Linguistics, Uppsala, Sweden.
[43] Carl Pollard and Ivan A. Sag (1994), Head-Driven Phrase Structure Grammar, Studies in Contemporary Linguistics, The University of Chicago Press, Chicago, USA.
[44] Roy Schwartz, Omri Abend, and Ari Rappoport (2012), Learnability-Based Syntactic Annotation Design, in Proceedings of the 24th International Conference on Computational Linguistics, Mumbai, India.
[45] Wolfgang Wahlster, editor (2000), Verbmobil. Foundations of Speech-to-Speech Translation, Springer, Berlin, Germany.
[46] Gisle Ytrestøl, Stephan Oepen, and Dan Flickinger (2009), Extracting and Annotating Wikipedia Sub-Domains, in Proceedings of the 7th International Workshop on Treebanks and Linguistic Theories, pp. 185-197, Groningen, The Netherlands.
[47] Yi Zhang and Hans-Ulrich Krieger (2011), Large-Scale Corpus-Driven PCFG Approximation of an HPSG, in Proceedings of the 12th International Conference on Parsing Technologies, pp. 198-208, Dublin, Ireland.
[48] Yi Zhang, Stephan Oepen, and John Carroll (2007), Efficiency in Unification-Based N-Best Parsing, in Proceedings of the 10th International Conference on Parsing Technologies, pp. 48-59, Prague, Czech Republic.
[49] Yi Zhang and Rui Wang (2009), Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar, in Proceedings of the 47th Meeting of the Association for Computational Linguistics, pp. 378-386, Suntec, Singapore.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c5eb960d-0dfd-4cfc-be31-a515dac0a560