A syntactic component for Vietnamese language processing

Le-Hong, P.; Roussanaly, A.; Nguyen, T. M. H.

doi:10.15398/jlm.v3i1.89

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

A syntactic component for Vietnamese language processing

Autorzy

Le-Hong P. , Roussanaly A. , Nguyen T. M. H.

Treść / Zawartość

Pełne teksty:

Le-Hong_A syntactic component for Vietnamese language_1_2015.pdf

Pobierz

Identyfikatory

DOI

10.15398/jlm.v3i1.89

Warianty tytułu

Języki publikacji

Abstrakty

This paper presents the development of a grammar and a syntactic parser for the Vietnamese language. We first discuss the construction of a lexicalized tree-adjoining grammar using an automatic extraction approach. We then present the construction and evaluation of a deep syntactic parser based on the extracted grammar. This is a complete system that produces syntactic structures for Vietnamese sentences. A dependency annotation scheme for Vietnamese and an algorithm for extracting dependency structures from derivation trees are also proposed. This is the first Vietnamese parsing system capable of producing both constituency and dependency analyses. It offers encouraging performance: accuracy of 69.33% and 73.21% for constituency and dependency analysis, respectively.

Słowa kluczowe

language parsing segmentation syntactic component tagging tree-adjoining grammar Vietnamese

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Journal of Language Modelling

Rocznik

2015

Tom

Vol. 3, No. 1

Strony

145--184

Opis fizyczny

Bibliogr. 53 poz., rys., tab., wykr.

Twórcy

autor

Le-Hong P.

VNU University of Science, Hanoi, Vietnam

autor

Roussanaly A.

LORIA, Université de Lorraine, Nancy, France

autor

Nguyen T. M. H.

VNU University of Science, Hanoi, Vietnam

Bibliografia

[1] Anne Abeillé, Lionel Clément, and François Toussenel (2003), Building a treebank for French, in Anne Abeillé, editor, Treebanks: Building and Using Parsed Corpora, volume 20 of Text, Speech and Language Technology, pp. 165-187, Springer Netherlands.
[2] Mark Alves (1999), What’s so Chinese about Vietnamese?, in Proceedings of the Ninth Annual Meeting of the Southeast Asian Linguistics Society, pp. 221-224, University of California, Berkeley, USA.
[3] Jens Bäcker and Karin Harbusch (2002), Hidden Markov model-based supertagging in a user-initiative dialogue system, in Proceedings of TAG + 6, pp. 269-278, Universita di Venezia, Italy.
[4] Marie Candito, Benoît Crabbé, and Djamé Seddah (2009a), On statistical parsing of French with supervised and semi-supervised strategies, in Proceedings of EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference, pp. 49-57, Athens, Greece.
[5] Marie Candito, Benoît Crabbé, and Pascal Denis (2010), Statistical French dependency parsing: treebank conversion and first results, in Proceedings of LREC 2010, pp. 19-21, Valletta, Malta.
[6] Marie Candito, Benoît Crabbé, Pascal Denis, and François Guérin (2009b), Analyse syntaxique du français : des constituants aux dépendances (Syntactic Parsing of French: from constituents to dependencies), in Actes de Traitement Automatique des Langues, pp. 40-49, Senlis, France.
[7] John Caroll, Ted Briscoe, and Antonio Sanfilippo (1998), Parser evaluation: a survey and a new proposal, in Proceedings of LREC 1998, Granada, Spain.
[8] Xavier Carreras, Michael Collins, and Terry Koo (2008), TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing, in Proceedings of CoNLL 2008, pp. 9-16, Manchester, UK.
[9] John Chen, Srinivas Bangalore, and K. Vijay-Shanker (2006), Automated extraction of tree-adjoining grammars from treebanks, Natural Language Engineering, 12 (3): 251-299.
[10] John Chen and K. Vijay-Shanker (2000), Automated extraction of TAGs from the Penn treebank, in Proceedings of the Sixth International Workshop on Parsing Technologies.
[11] David Chiang (2000), Statistical parsing with an automatically extracted tree adjoining grammar, in Proceedings of ACL, pp. 456-463, Morristown, New Jersey, USA.
[12] Michael Collins (1997), Three generative, lexicalised models for statistical parsing, in Proceedings of ACL, pp. 16-23, Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA.
[13] Michael Collins (2003), Head-driven statistical models for natural language parsing, Computational Linguistics, 29 (4): 589-637.
[14] Benoît Crabbé, Denys Duchier, Claire Gardent, Josheph Le Roux, and Yannick Parmentier (2013), XMG: eXtensible MetaGrammar, Computational Linguistics, 39 (3): 591-629.
[15] Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning (2006), Generating typed dependency parses from phrase structure parses, in Proceedings of LREC 2006, pp. 449-454, Genoa, Italy.
[16] Yuan Ding and Martha Palmer (2004), Synchronous dependency insertion grammars: a grammar formalism for syntax-based statistical machine translation, in Workshop on Recent Advances in Dependency Grammars, pp. 90-97, Geneva, Switzerland.
[17] Robert Frank (2002), Phrase structure composition and syntactic dependencies, MIT Press, Boston, USA.
[18] Nizar Habash and Owen Rambow (2004), Extracting a tree-adjoining grammar from the Penn Arabic treebank, in Actes de Traitement Automatique des Langues, pp. 50-55, Fez, Morocco.
[19] Cao Xuân Hạo (2000), Vietnamese – Some Questions on Phonetics, Syntax and Semantics (in Vietnamese), NXB GD, Hanoi, Vietnam.
[20] Phê Hoàng (2002), Vietnamese Dictionary, NXB DN, Danang, Vietnam.
[21] Đạt Hữu, Trí Dõi Trần, and Thanh Lan Đào (1998), Basis of Vietnamese (in Vietnamese), NXB GD, Hanoi, Vietnam.
[22] Ane-Dybro Johansen (2004), Extraction des grammaires LTAG à partir d’un corpus étiquetté syntaxiquement, Master’s thesis, Université Paris 7, Paris, France.
[23] Richard Johansson and Pierre Nugues (2008), Dependency-based syntactic-semantic analysis with PropBank and NomBank, in CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pp. 183-187, Manchester, UK.
[24] Aravind K. Joshi and Yves Schabes (1997), Tree Adjoining Grammars, in Grzegorz Rozenberg and Arto Salomaa, editors, Handbooks of Formal Languages and Automata, pp. 69-123, Springer-Verlag, New York, USA.
[25] Miriam Kaeshammer (2012), A German treebank and lexicon for tree-adjoining grammars, Master’s thesis, Universitat des Saarlandes, Saarlandes, Germany.
[26] Laura Kallmeyer and Marco Kuhlmann (2012), A formal model for plausible dependencies in lexicalized tree adjoining grammar, in Proceedings of TAG + 11, pp. 108-116, Paris, France.
[27] Tracy Holloway King, Richard Crouch, Stefan Riezler, Mary Dalrymple, and Ronald M. Kaplan (2003), The PARC 700 dependency bank, in Proceedings of 4th International Workshop on Linguistically Interpreted Corpora, pp. 1-8, Budapest, Hungary.
[28] Terry Koo and Michael Collins (2010), Efficient third-order dependency parsers, in Proceedings of ACL, pp. 1-11, Uppsala, Sweden.
[29] Sandra Kübler, Ryan McDonald, and Joakim Nivre (2009), Dependency parsing, Morgan & Claypool Publishers.
[30] Phuong Le-Hong, Thi Minh Huyen Nguyen, Azim Roussanaly, and Tuong Vinh Ho (2008), A hybrid approach to word segmentation of Vietnamese texts, in Proceedings of LATA, LNCS 5196, pp. 240-249, Springer.
[31] Phuong Le-Hong, Azim Roussanaly, Thi Minh Huyen Nguyen, and Mathias Rossignol (2010), An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts, in Actes de Traitement Automatique des Langues, pp. 50-61, Montreal, Canada.
[32] Charles N. Li and Sandra A. Thompson (1976), Subject and topic: a new typology of language, in Subject and topic, pp. 457-489, London/New York: Academic Press.
[33] Dekang Lin and Patrick Pantel (2001), Discovery of inference rules for question answering, Natural Language Engineering, 7 (4): 343-360.
[34] David M. Magerman (1995), Statistical decision-tree models for parsing, in Proceedings of ACL, pp. 276-283, Stroudsburg, Pennsylvania, USA.
[35] Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Nuria Bertomeu Castelló, and Jungmee Lee (2013), Universal dependency annotation for multilingual parsing, in Proceedings of ACL, pp. 92-97, Sofia, Bulgaria.
[36] Ryan McDonald and Fernando Pereira (2006), Online learning of approximate dependency parsing algorithms, in Proceedings of EACL, pp. 81-88, Trento, Italy.
[37] Alexis Nasr (2004), Analyse syntaxique probabiliste pour grammaires de dépendances extraites automatiquement, Habilitation à diriger des recherches, Université Paris 7, Paris, France.
[38] Günter Neumann (2003), A uniform method for automatically extracting stochastic lexicalized tree grammar from treebank and HPSG, in Anne Abeillé, editor, Treebanks: Building and Using Parsed Corpora, volume 20 of Text, Speech and Language Technology, pp. 351-365, Springer Netherlands.
[39] Phuong Thai Nguyen, Luong Vu Xuan, Thi Minh Huyen Nguyen, Van Hiep Nguyen, and Phuong Le-Hong (2009), Building a large syntactically-annotated corpus of Vietnamese, in Proceedings of the 3rd Linguistic Annotation Workshop, ACL-IJCNLP, pp. 182-185, Suntec City, Singapore.
[40] Thi Luong Nguyen, My Linh Ha, Viet Hung Nguyen, Thi Minh Huyen Nguyen, and Phuong Le-Hong (2013), Building a treebank for Vietnamese dependency parsing, in The 10th IEEE RIVF, pp. 147-151, IEEE, Hanoi, Vietnam.
[41] Thi Minh Huyen Nguyen, Laurent Romary, Mathias Rossignol, and Xuan Luong Vu (2006), A lexicon for Vietnamese language processing, Language Resources and Evaluation, 40 (3-4).
[42] Jaokim Nivre and Ryan McDonald (2008), Integrating graph-Based and transition-Based dependency parsers, in Proceedings of ACL-08, pp. 950-958, ACL, Columbus, Ohio, USA.
[43] Joakim Nivre (2003), An efficient algorithm for projective dependency parsing, in Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 03), pp. 149-160, Nancy, France.
[44] Jungyeul Park (2006), Extraction of tree adjoining grammars from a treebank for Korean, in Proceedings of COLING-ACL Student Research Workshop, pp. 73-78, Morristown, New Jersey, USA.
[45] Patrick Paroubek, L. G. Pouillot, I. Robba, and Anne Vilnat (2005), EASY: Campagne d’évaluation des analyseurs syntaxiques (EASY Evaluation compagne of syntactic parsers), in Actes de Traitement Automatique des Langues, pp. 3-12, Dourdan, France.
[46] Lewis M. Paul, Gary F. Simons, and Charles D. Fennig (eds.) (2014), Ethnologue: Languages of the World, Seventeenth edition, SIL International, Dallas, Texas, USA.
[47] Owen Rambow and Aravind Joshi (1994), A formal look at dependency grammars and phrase-structure grammars, with special consideration of word-order phenomena, in Current Issues in Meaning-Text Theory, pp. 1-20, Pinter, London, UK.
[48] Azim Roussanaly, Benoît Crabbé, and Jérôme Perrin (2005), Premier bilan de la participation du LORIA à la campagne d’évaluation EASY, in Actes de Traitement Automatique des Langues, pp. 49-52, Dourdan, France.
[49] Yves Schabes (1990), Mathematical and computational aspects of lexicalized grammars, Ph.D. thesis, University of Pennsylvania, Pennsylvania, USA.
[50] Rion Snow, Dan Jurafsky, and Andrew Y. Ng (2005), Learning syntactic patterns for automatic hypernym discovery, in Advances in Neural Information Processing Systems, pp. 1297-1304, Vancouver, Canada.
[51] Vietnam Committee on Social Sciences, editor (1983), Vietnamese Grammar (in Vietnamese), NXB KHXH, Hanoi, Vietnam.
[52] Fei Xia (2001), Automatic grammar generation from two different perspectives, Ph.D. thesis, University of Pennsylvania, Pennsylvania, USA.
[53] Fei Xia, Martha Palmer, and Aravind Joshi (2000), A uniform method of grammar extraction and its applications, in Proceedings of the joint SIGDAT conference on empirical methods in NLP and very large corpora, pp. 53-62, Morristown, New Jersey, USA.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-66ba5acd-1251-4a95-a892-9e8a8eab2ad3