Narzędzia help

Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
first previous next last
cannonical link button

http://yadda.icm.edu.pl:80/baztech/element/bwmeta1.element.baztech-article-BATC-0008-0008

Czasopismo

Control and Cybernetics

Tytuł artykułu

Language resources for named entity annotation in the National Corpus of Polish

Autorzy Savary, A.  Piskorski, J. 
Treść / Zawartość
Warianty tytułu
Języki publikacji EN
Abstrakty
EN We present the named entity annotation subtask of a project aiming at creating the National Corpus of Polish. We summarize the annotation requirements defined for this corpus, and we discuss how existing lexical resources and grammars for named entity recognition for Polish have been adapted to meet those requirements. We show detailed results of the corpus annotation using the information extraction platform SProUT. We also analyze the errors committed by our knowledge-based method and suggest its further improvements.
Słowa kluczowe
EN natural language processing   proper names   named entities   corpus annotation   Polish National Corpus   SProUT  
Wydawca Systems Research Institute, Polish Academy of Sciences
Czasopismo Control and Cybernetics
Rocznik 2011
Tom Vol. 40, no 2
Strony 361--391
Opis fizyczny Bibliogr. 55 poz.
Twórcy
autor Savary, A.
autor Piskorski, J.
  • Université François Rabelais Tours, Laboratoire d'Informatique Blois, France
Bibliografia
Abramowicz,W., Filipowska,A., Piskorski, J., Wecel,K. andWieloch, K. (2006) Linguistic Suite for Polish Cadastral System. In: Proc. of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy. ELRA, 2518-2523.
Acedański, Sz. (2010) A Morphosyntactic Brill Tagger for Inflectional Languages. In: Advances in Natural Language Processing. Proc. of the 7th International Conference on Advances in Natural Language Processing. LNAI 6233, Springer-Verlag, 3-14.
Appelt,D.E., Hobbs, J.R., Bear, J., Israel,D., Kameyama,M., Kehler, A., Martin, D., Myers, K. and Tyson, M. (1995) SRI International FASTUS system: MUC-6 test results and analysis. In: Proc. of the Sixth Message Understanding Conference (MUC-6). NIST, Morgan-Kaufmann Publishers, 237-248.
Bański, P. and Przepiórkowski, A. (2009) Stand-off TEI Annotation: the Case of the National Corpus of Polish. In: Proc. of the Third Linguistic Annotation Workshop (ACL-IJCNLP 2009), Suntec, Singapore. Association for Computational Linguistics, 64-67.
Becker,M., Drożdżyński,W., Krieger, H.U., Piskorski, J., Schäfer, U. and Xu, F. (2002) SProUT - Shallow Processing with Typed Feature Structures and Unification. In: Proceedings of the International Conference on NLP (ICON 2002), Mumbay, India.
Budiscak, J., Piskorski, J. and Ristov, S. (2009) Compressing Gazetteers Revisited. In: Pre-proceedings of the FSMNLP’09, Pretoria, South Africa. University of Pretoria.
Burnard, L. and Bauman, S., eds. (2008) TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium, Oxford.
Chinchor, N. (1997) MUC-7 Named Entity Task Definition. In: Proc. of the Message Understanding Conference (MUC-7). Linguistic Data Consortium.
Daciuk, J. and Piskorski, J. (2006) Gazetteer Compression Technique Based on Substructure Recognition. Advances in Soft Computing, 35, 87-96.
Drożdżyński, W., Krieger, H.U., Piskorski, J., Schäfer, U., and Xu, F. (2004) Shallow Processing with Unification and Typed Feature Structures - Foundations and Applications. Künstliche Intelligenz, 1, 17-23.
Farmakiotou, D., Karkaletsis, V., Koutsias, J., Sigletos, G., Spyropoulos, C.D. and Stamatopoulos, P. (2000) Rule-Based Named Entity Recognition For Greek Financial Texts. In: Proc. of the Workshop on Computational Lexicography and Multimedia Dictionaries (COMLEX 2000, Patras, Greece. 75-78.
Finkel, J.R. and Manning, Ch.D. (2009a) Joint Parsing and Named Entity Recognition. In: Proc. of NAACL-2009, Boulder, Colorado, USA. Association for Computational Linguistics, 326-334.
Finkel, J.R. and Manning, Ch.D. (2009b) Nested Named Entity Recognition. In: Proc. EMNLP-2009, Singapore. Association for Computational Linguistics, 141-150.
Freitas, C., Mota, C., Santos, D., Oliveira, H.G. and Carvalho, P. (2010) Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese. In: Proc. of the Seventh conference on International Language Resources and Evaluation (LREC’10). ELRA, 3630-3637.
Friburger, N. and Maurel, D. (2004) Finite-state transducer cascade to extract named entities in texts. Theoretical Computer Science, 313, 94-104.
Gaizauskas, R., Wakao, T., Humphreys, K., Cunningham, H. and Wilks, Y. (1995) University of Sheffield: Description of the LaSIE System as Used for MUC-6. In: Proc. of the Sixth Message Understanding Conference (MUC-6). NIST, Morgan-Kaufmann Publishers, 207-220.
Galicia-Haro, S.N. and Gelbukh, A. (2007) Complex named entities in Spanish texts. In: S. Sekine and E. Ranchhod, eds., Named Entities. Recognition, classification and use. John Benjamins, 71-96.
Galliano, S., Gravier, G. and Chaubard, L. (2009) The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts. In: Proc. of Interspeech 2009. International Speech Communication Association (ISCA), 2583-2586.
Głowińska, K. and Przepiórkowski, A. (2010) The Design of Syntactic Annotation Levels in the National Corpus of Polish. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010. ELRA, 1816-1821.
Graliński, F., Jassem, K. and Marcińczuk, M. (2009a) An Environment for Named Entity Recognition and Translation. In: Proc. of the 13th Annual Conference of the EAMT. European Association for Machine Translation, 88-96.
Graliński,F., Jassem,K., Marcińczuk,M. and Wawrzyniak,P. (2009b) Named Entity Recognition in Machine Anonymization. In: Recent Advances in Intelligent Information Systems. Exit, Warsaw, 247-260.
Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M. and Tyson, M. (1997) FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. In: E. Roche and Y. Schabes, eds., Finite-state language processing. MIT Press, 383-406.
Kravalová, J. and Žabokrtský,Z. (2009) Czech Named Entity Corpus and SVM-based Recognizer. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Suntec, Singapore. Association for Computational Linguistics, 194-201.
Kubiak-Sokół,A. and Łaziński, M., eds. (2007) Słownik nazw miejscowości i mieszkańców (in Polish). Wydawnictwo Naukowe PWN, Warszawa.
Lubaszewski, W. (2007) Information extraction tools for Polish text. In: Proc. LTC’07, Poznań. Wydawnictwo Poznanskie, Poznań, 567.
Lubaszewski, W. (2009) Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu (in Polish). AGH Uczelniane Wydawnictwa Naukowo-Dydaktyczne, Kraków.
Marcińczuk,M. and Piasecki,M. (2007) Pattern Extraction for Event Recognition in the Reports of Polish Stockholders. In: Proc. IMCSIT-AAIA’07, Wisła, Poland. Polskie Towarzystwo Informatyczne, 275-284.
Marcińczuk, M. and Piasecki, M. (2010) Named Entity Recognition in the Domain of Polish Stock Exchange Reports. In: Proc. Intelligent Information Systems 2010, Siedlce, Poland. Publishing House of University of Podlasie, 127-140.
Marciniak, M., Rabiega-Wiśniewska, J., Savary, A., Woliński, M. and Heliasz, C. (2009) Constructing an Electronic Dictionary of Polish Urban Proper Names. In: Recent Advances in Intelligent Information Systems. Proceedings of the Balto-Slavonic Natural Language Processing Workshop, Kraków. Exit, Warszawa, 743-749.
Mikheev,A., Moens,M. and Grover,C. (1999) Named Entity Recognition without Gazetteers. In: Proc. of the Ninth Conference of the European Chapter of the Association for Computational Linguistics (EACL’99). Association for Computational Linguistics, Stroudsburg, USA, 1-8.
Mykowiecka, A., Marasek, K., Marciniak, M. and Rabiega-Wiśniewska, J. (2009) Annotated Corpus of Polish Spoken Dialogues. In: Z. Vetulani and H. Uszkoreit, eds., Human Language Technology. Challenges of the Information Society, Third Language and Technology Conference, LTC 2007, Poznan, Poland. LNCS 5603, 50-62.
Nadeau, D. and Sekine, S. (2007) A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1), 3-26.
Nouvel, D., Antoine, J.Y., Friburger, N. and Maurel, D. (2010) An Analysis of the Performances of the CasEN Named Entities Recognition System in the Ester2 Evaluation Campaign. In: N. Calzolari et al., eds., Proc. of the Seventh conference on International Language Resources and Evaluation (LREC’10). ELRA, 523-529.
Osenova, P. and Kolkovska, S. (2002) Combining the named-entity recognition task and NP chunking strategy for robust pre-processing. In: Proceedings of the Workshop on Treebanks and Linguistic Theories, (TLT02), Sozopol, Bulgaria. Bulgarian Academy of Sciences, 167-182.
Piasecki,M. and Wardyński,A. (2006) Multiclassifier Approach to Tagging of Polish. In: Proc. of the International Multiconference on Computer Science and Information Technology (IIS’06). Polskie Towarzystwo Informatyczne, 169-178.
Piskorski, J. (2005) Named-Entity Recognition for Polish with SProUT. Proceedings of IMTCI 2004, Warsaw, Poland. LNCS 3490, Springer, 122-133.
Piskorski, J. (2008) ExPRESS : extraction pattern recognition engine and specification suite. In: Proc. of the International Workshop a Finite-State Methods and Natural Language Processing 2007 (FSMNLP’2007). Potsdam, Germany, Universitaet Potsdam, 166-183.
Piskorski, J., Homola, P., Marciniak, M., Mykowiecka, A., Przepiórkowski, A. and Woliński, M. (2004) Information Extraction for Polish Using the SProUT Platform. In: S. Kłopotek, T.Wierzchoń and K. Trojanowski, eds., Intelligent Information Processing and Web Mining. Springer-Verlag, Berlin, 227-236.
Piskorski, J., Sydow, M. and Kupść, A. (2007) Lemmatization of Polish Person Names. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies (ACL ’07). Association for Computational Linguistics, Stroudsburg, USA, 27-34.
Piskorski, J., Wieloch,K. and Sydow,M. (2009) On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages. Information Retrieval, 12(3), 275-299.
Przepiórkowski,A., Górski,R.L., Łaziński,M. and Pęzik, P. (2009) Recent Developments in the National Corpus of Polish. In: J. Levická and R. Garabík, eds., Proc. of Slovko’09, Smolenice, Slovakia. Tribun, Brno, 302-309.
Przepiórkowski, A. and Woliński, M. (2003) A Flexemic Tagset for Polish. In: Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages (MorphSlav ’03). Association for Computational Linguistics, 33-40.
Rymut, K. (2002) Dictionary of Surnames in Current Use in Poland at the Beginning of the 21st Century. Polish Academy of Sciences, Polish Language Institute and Polish Genealogical Society of America, Kraków-Chicago.
Rymut, K., ed. (2008) Nazwy wodne Polski. Research project nr 1H01D01029 (electronic database), Polska Akademia Nauk, Instytut Języka Polskiego, Kraków.
Rzetelska-Feleszko, E., ed. (2005) Polskie nazwy własne (in Polish). Instytut Języka Polskiego Polskiej Akademii Nauk, Kraków.
Saloni, Z., Gruszczyński, W., Woliński, M. and Wołosz, R. (2007) Słownik gramatyczny języka polskiego (in Polish). Wiedza Powszechna, Warszawa.
Savary, A., Krstev, C. and Vitas, D. (2007) Inflectional Non Compositionality and Variation of Compounds in French, Polish and Serbian, and Their Automatic Processing. BULAG, 32, 73-93.
Savary, A., Rabiega-Wiśniewska, J. and Woliński, M. (2009) Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex. LNAI 5070, Springer, 111-142.
Savary,A., Waszczuk, J. and Przepiórkowski,A. (2010) Towards the Annotation of Named Entities in the Polish National Corpus. In: N. Calzolari et al., eds., Proc. of the Seventh conference on International Language Resources and Evaluation (LREC’10). ELRA, 3622-3629.
Schäfer,U. (2006) OntoNERdIE-Mapping and Linking Ontologies to Named Entity Recognition and Information Extraction Resources. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006). ELRA, 1756-1761.
Sekine, S., Sudo, K. and Nobata, Ch. (2002) Extended Named Entity Hierarchy. In: Proc. of the 3rd International Conference on Language Resources and Evaluation, Canary Island, Spain. ELRA, 3622-3629.
Wacholder, N., Ravin, Y. and Choi, M. (1997) Disambiguation of proper names in text. In: Proc. of the Fifth Conference on Applied Natural Language Processing (ANLC ’97). Association for Computational Linguistics, Stroudsburg, USA, 202-208.
Waszczuk, J., Głowińska,K., Savary,A. and Przepiórkowski,A. (2010) Tools and Methodologies for Annotating Syntax and Named Entities in the National Corpus of Polish. In: Proc. of IMCSIT-CLA’10 Workshop, Wisła, Poland. Polskie Towarzystwo Informatyczne, 531-539.
Wolinski, F., Vichot, F. and Dillet, B. (1995) Automatic Processing of Proper Names in Texts. In: EACL ’95: Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics. Morgan Kaufmann Publishers Inc., 23-30.
Woliński, M. (2006) Morfeusz - a Practical Tool for the Morphological Analysis of Polish. In: Proc. of IIS:IIPWM’06. Springer, 503-512.
Kolekcja BazTech
Identyfikator YADDA bwmeta1.element.baztech-article-BATC-0008-0008
Identyfikatory