Czasopismo
Tytuł artykułu
Autorzy
Warianty tytułu
Języki publikacji
Abstrakty
In the paper some issues connected with indexing documents in the Polish language are discussed. Algorithms for stemming and part of speech tagging, important in the process of text analysis and indexing are shortly described. Next their suitability to the Polish language, which has a very extensive inflection, is discussed. The usefulness for stemming and part of speech tagging of large dictionaries with inflected forms, like WordNet and open-source dictionary of Polish language is also described. Two dictionary structures enabling effective word searching are presented. In the final part, some tests of implemented two dictionary structures are described. Tests were made on the six actual and three crafted artificial texts. At the end conclusions of performed tests are formulated. (original abstract)
Słowa kluczowe
Czasopismo
Rocznik
Tom
Numer
Strony
284-293
Opis fizyczny
Twórcy
autor
- Warsaw University of Life Sciences - SGGW, Poland
autor
- Warsaw University of Life Sciences - SGGW, Poland
Bibliografia
- [1] Manning C. D., Raghavan P., Schütze H., (2008) Introduction to Information Retrieval, Cambridge University Press.
- [2] Lovins, J., (1968) Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11(1-2), pp. 11-31.
- [3] Paice, C., Husk, G., (1990) Another Stemmer, ACM SIGIR Forum 24(3): 566.
- [4] Porter, M., (1980) An algorithm for suffix stripping. Program 14(3), pp. 130-137.
- [5] http://www.tartarus.org/˜martin/PorterStemmer/
- [6] Dolamic L., Savoy J. (2008) Stemming Approaches for East European Languages, In Advances in Multilingual and Multimodal Information Retrieval, Vol. 5152, pp. 37-44.
- [7] Weiss D. (2005) A Survey of Freely Available Polish Stemmers and Evaluation of Their Applicability in Information Retrieval. 2nd Language and Technology Conference, Poznań, Poland, pp. 216-221.
- [8] Voutilainen A. (2003). Part-of-speech tagging. In R. Mitkov, editor, The Oxford handbook of computational linguistics, pp. 219-232. Oxford University Press, New York, USA.
- [9] Manning C. D., (2011) Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? In: Computational Linguistics and Intelligent Text Processing, 12th International Conference, CICLing 2011, Proceedings, Part I.
- [10] http://faculty.washington.edu/dillon/GramResources/GramResources.html
- [11] Galus S. (2005) Dictionary-Based Part-of-Speech Tagging of Polish. In: Kłopotek M.A., Wierzchoń S.T., Trojanowski K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol. 31. Springer, Berlin, Heidelberg.
- [12] Fellbaum C., Miller G. (1998) WordNet. An Electronic Lexical Database. MIT Press.
- [13] Finlayson, M.A. (2014) Java Libraries for Accessing the Princeton Wordnet: Comparison and Evaluation. In: Proceedings of the 7th International Global WordNet Conference (GWC 2014) pp. 78-85. Tartu, Estonia.
- [14] Polish language dictionary, http://www.sjp.pl
- [15] Wrzeciono P., Karwowski W. (2013) Automatic Indexing and Creating Semantic Networks for Agricultural Science Papers in the Polish Language, Computer Software and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th Annual, Kyoto.
- [16] Karwowski W., Wrzeciono P., (2014) Automatic indexer for Polish agricultural texts. Information Systems in Management 2014, Vol. 3, nr 4, pp. 229-238.
- [17] Karwowski W., Wrzeciono P., (2017) Methods of automatic topic mining in publications in agriculture domain. Information Systems in Management, 2017 Vol. 6 (3) pp 192-202.
- [18] The AGROVOC thesaurus, http://aims.fao.org/
- [19] plWordNet, http://plwordnet.pwr.wroc.pl/wordnet/
- [20] Drozdek A., Simon D.L., (1995) Data structures in C, Pws Pub Co.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.ekon-element-000171480849