PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Towards the data structure for effective word search

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In the paper problem of searching basic forms for words in the Polish language is discussed. Polish language has a very extensive inflection and effective method for finding base form is important in many NLP tasks for example text indexing. The method for searching, based on open-source dictionary of Polish language, is presented. In this method it is important to design a structure for storing all words from dictionary, in such a way that it allows to quickly find basic words forms. Two dictionary structures: ternary search tree and associative table are presented and discussed. Tests are performed on the six actual and three crafted artificial texts and results are compared with other possible dictionary structures. At the end conclusions about structures effectiveness are formulated.
Rocznik
Strony
227--236
Opis fizyczny
Bibliogr. 11 poz., tab.
Twórcy
autor
  • Department of Informatics, Warsaw University of Life Sciences (SGGW)
Bibliografia
  • [1] Bentley J., Sedgewick R., (1998) Ternary Search Trees. Dr. Dobbs Journal April, 1998
  • [2] Cormen, T. H., Leiserson, C. E.; Rivest, R. L.; Stein, C., (2001), Chapter 11 Hash Tables, Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill
  • [3] Karwowski W., Wrzeciono P., (2014) Automatic indexer for Polish agricultural texts. Information Systems in Management 2014, Vol. 3, nr 4, pp. 229-238
  • [4] Karwowski W., Wrzeciono P., (2017) Methods of automatic topic mining in publications in agriculture domain. Information Systems in Management 2016, Vol. 6 (3) pp 192-202
  • [5] Karwowski W., Wrzeciono P., (2017) The dictionary structure for effective word search. Information Systems in Management 2017, Vol. 6, (4), s. 284-293
  • [6] Mehlhorn, K., Sanders, P. (2008), Chapter 4 Hash Tables and Associative Arrays, Algorithms and Data Structures: The Basic Toolbox, Springer
  • [7] Morphosyntactic dictionary for the Polish language https://github.com/morfologik/
  • [8] Polish language dictionary, http://www.sjp.pl
  • [9] Stempel - Algorithmic Stemmer for Polish Language http://getopt.org/stempel/
  • [10] Weiss D. (2005) A Survey of Freely Available Polish Stemmers and Evaluation of Their Applicability in Information Retrieval. 2nd Language and Technology Conference, Poznań, Poland, pp. 216-221
  • [11] Wrzeciono P., Karwowski W. (2013) Automatic Indexing and Creating Semantic Networks for Agricultural Science Papers in the Polish Language, Computer Software and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th Annual, Kyoto
Uwagi
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-38e1a66d-42a7-4e9a-8229-4ac846f683fa
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.