Lexicon management and standard formats

Laporte, E.

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Lexicon management and standard formats

Autorzy

Laporte E.

Wybrane pełne teksty z tego czasopisma

https://journals.pan.pl/acs/

Identyfikatory

Warianty tytułu

Konferencja

Human Language Technologies as a challenge for Computer Science and Linguistics (2; 21-23.04.2005; Poznań, Poland)

Języki publikacji

Abstrakty

International standards for lexicon formats are in preparation. To a certain extent, the proposed formats converge with prior results of standardization projects. However, their adequacy for (i) lexicon management and (ii) lexicon-driven applications have been little debated in the past, nor are they as a part of the present standardization effort. We examine these issues. IGM has developed XML formats compatible with the emerging international standards, and we report experimental results on large-coverage lexicons.

Słowa kluczowe

language resource lexicon management standardization inflection morphology

Wydawca

Polish Academy of Sciences, Committee of Automatic Control and Robotics

Czasopismo

Archives of Control Sciences

Rocznik

2005

Tom

Vol. 15, no. 3

Strony

337--348

Opis fizyczny

Bibliogr. 41 poz.

Twórcy

autor

Laporte E.

Institut Gaspard-Monge (IGM), University of Marne-la-Vallée, France, eric.laporte@univ-mlv.fr

Bibliografia

[1] A. W. Appel and G. J Jacobson: The world's fastest Scrabble program. Comm. ACM, 31(5). 1988, 572-578 & 585.
[2] S. Bird and E. Loper: NLTK: the Natural Language Toolkit. In Proc. of ACL, (2004).
[3] O. Blanc: Rapport d'avancement Outilex. IGM, 2003.
[4] O. Blanc and A. Dister: Automates lexicaux avec structure de trails. In RECITAL 2004. (2004), 23-32.
[5] CH. Boitet, M. Mangeot and G. Serasset: The PAPILLON project: cooperatively building a multilingual lexical data-base to derive open source dictionaries & lexicons. In COUNG Workshop on NLP and XML. Taipei, Taiwan, (2002), 93-96.
[6] T. Briscoe: Lexical issues in Natural Language Processing. In E. Klein and F. Veltman, (Eds). Natural Language and Speech. Springer. (1991).
[7] B. Courtois: Un systeme de dictionnaires electroniques pour les mots simples du franc,ais. Langue Francaise. 87 Paris. Larousse. (1990).
[8J. H. Cunningham: GATE, a general architecture for text engineering. Computers and the Humanities, 36 (2002), 223-254.
[9] M. Domenig: Word Manager: A System for the Definition. Access and Maintenance of Lexical Databases. In Proc. of COUNG. Budapest, 1 (1988).
[10] W. N. Francis and H. Kucera: Manual of Information to Accompany a Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Corrected and Revised edition). Department of Linguistics, Brown University, Providence, Rhode Island, 1979.
[11] G. Francopoulo: Proposition de norme des lexiques pour le traitement automatique du langage. AFNOR, 21 p. (2003).
[12] M. George: Terminology and other language resources. Lexical Resource Markup Framework. ISO. 16 p. (2003).
[13] D. Gibbon and Th. Trippel: A multi-view hyper-lexicon resource for speech and language system development. In Proc. of LREC. Athens, (2000), 1713-1718.
[14] M. Groos: Lexicon-Grammar. The Representation of Compound Words. In Proc. of COUNG. Bonn. (1986), 1-6.
[15] C. Grover and A. Lascarides: XML-Based Data Preparation for Robust Deep Parsing. In Proc. Joint EACL-ACL Meeting, Toulouse. (2001).
[16] L. Hayashi and J. Hatton: Combining UML, XML and relational database technologies. The best of all worlds for robust linguistic databases. In Proc. IRCS Workshop on Linguistic Databases, (2001).
[17] H.-G. Huh and E. Laporte: A resource-based Korean morphological annotation system. In Proc. Int. Joint Conf. on Natural Language Processing, Jeju. Korea, (2005).
[18] N. Ide and L. Romary: Standards for language resources. In Proc. LREC. Las Palmas, (2002), 839-844.
[19] N. Ide and J. Veronis: Text Encoding Initiative: Background and Context. Dordrecht: Kluwer, 1995.
[20] D. Jurafsky and J. Martin: Speech and language processing. Prentice Hall, 2000.
[21] E. Laporte: Symbolic natural language processing. In Applied Combinatorics on Words. Lothaire, Cambridge Univ. Press, (2005), 153-195.
[22] K. Lee. H. Bunt, S. Bauman, L. Burnar,. L. Clement, E. de la Clergerie, Th. Declerck, L. Romary, A. Roussanaly and C. Roux: Towards an international standard on feature structure representation. In Proc. of LREC, (2004), 373-376.
[23] W. Lezius: Morphy. German Morphology. Part-of-Speech Tagging and Applications. In Proc. EURALEX. Stuttgart, (2000), 619-623.
[24] Ch. Lieske. S. McCormick and G. Thurmair: The Open Lexicon Interchange Format (OLIF) Comes of Age. Machine Translation Summit VIII. (2001).
[25] E. Loper and S. Bird: NLTK: the Natural Language Toolkit. In Proc. ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, Philadelphia. (2002).
[26] C. Lucchesi and T. Kowaltowski: Applications of finite automata representing large vocabularies. Software - Practice and Experience, 23(1). Wiley & Sons. (1993), 15-30.
[27] M. Marcus, B. Santorini and M. A. Marcinkiewicz: Building a large annotated corpus of English: The Penn Treebank, Computational Linguistics, 19(2), (1993), 313-330.
[28] S. Nirenburg: The Subworld Concept Lexicon and the Lexicon Management I System. Computational Linguistics, 13(3-4), (1987).
[29] B. Normier and M. Nossin: GENELEX Project: Eureka for Linguistic Engineering. Proc Int. Workshop on Electronic Dictionaries, OISA. Kanagawa. Japan. (1990), 63-70.
[30] K. Oelazer and Sh. Inkelas: A Finite Stale Pronunciation Lexicon for Turkish. In Proc. EACL Workshop on Finite State Methods in NLP. Budapest. (2003).
[31] S. Paumier: Unitex. Manuel d'uiilisation. Research report. (2002).
[32] H. Poirier: The XELDA framework. (1999). http:// www.dcs.shcf.ac.uk/ hamish/dalr/baslow/xelda.pdf
[33] M. F. Porter: An algorithm for suffix stripping. Program. 14(3). (1980). 130-137.
[34] U. Quasthoff: Tools for Automatic Lexicon Maintenance; Acquisition. Error Correction, and the Generation of Missing Values. In Proc. LREC. (1998). 853-856.
[35] D. Revuz: Minimization of acyclic deterministic automata in linear time. Theoretical Computer Science. 92( 1), (1992), 181-189.
[36] L. Romary: Towards an Abstract Representation of Terminological Data Collections. The TMF model. TAMA. Antwerp. 2001.
[37] M. Silberztein: A new approach to lagging: the use of a large-coverage electronic dictionary. Applied Computer Translation, 1(4), (1991).
[38] M. Silberztein: INTEX: a corpus processing system. In Proc. COLING. Kyoto, (1994).
[39] M. Silberztein: Inlex: an FST toolbox. Tlieoretical Computer Science. 231(1), (2000), 33-46.
[40] C. Vertan and W. von Hahn: Towards a Generic Architecture for Lexicon Management. In Proc. LREC, (2002). 45-48.
[41] P. Wittenburg, W. Peters and S. Drude: Analysis of Lexical Structures from Field Linguistics and Language Engineering. In Proc. LREC, (2002).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSW3-0021-0007