Exploit relations between the word letters and their placement in the word for Arabic root extraction

Hawas, F. A.

Artykuł - szczegóły

Tytuł artykułu

Exploit relations between the word letters and their placement in the word for Arabic root extraction

Autorzy

Hawas F. A.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

This paper presents a new root-extraction approach for Arabic words. The approach tries to assign for Arabic words a unique root without relying on a database of word roots, a list of word patterns or a list of all the prefixes and the suffixes of the Arabic words. Unlike most of Arabic rule-based stemmers, it tries to predict the root-letters positions one by one based on some rules and relations among the word letters and their placement in the word. This paper focuses on two parts of the approach. The first one introduces some rules to distinguish between the Arabic definite article and the permanent component that may found in any Arabic word. The second one classifies Arabic letters in to groups according to their positions in the word. The proposed approach is a system composed of several modules used to extract the word root. The approach has been evaluated using the Holy Quran words. The evaluation results show a promising root extraction algorithm.

Słowa kluczowe

rule-based stemmer word root suffixes prefixes words patterns

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2013

Tom

Vol. 14 (2)

Strony

327--341

Opis fizyczny

Bibliogr. 12 poz., rys., tab.

Twórcy

autor

Hawas F. A.

fatmih@yu.edu.jo

Faculty of Information Technology and Computer Sciences, Yarmouk University, Irbid, Jordan

Bibliografia

[1] AI-Sughaiyer I.A., Al-Kharashi I.A.: Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society for Information Science and Technology, 55(3):189–213, 2004
[2] Duwairi R.: Machine learning for Arabic Text Categorization. Journal of the American Society for Information Science and Technology (JASIST), 57(8):1005–1010, 2005
[3] Jurafsky D., Martin. J.H.: Speech and Language Processing: An Introduction to Speech Recognition. Natural Language Processing, and Computational Linguistics and Speech Recognition, Prentice-Hall, 2007
[4] Khoja S., Garside R.: Stemming Arabic text . Technical report, Computing Department, Lancaster University, 1999
[5] Krovetz R.: Viewing morphology as an inference process. In Conference on Research and Development in Information Retrieval, pp. 191–202. In Proc. of the Sixteenth Annual International ACM SIGIR, 1993.
[6] Momani M., Faraj J.: A novel algorithm to extract tri-literal arabic roots. In International Conference on Computer Systems and Applications (AICCSA), pp. 309–315. In IEEE/ACS, May 2007
[7] Paice. C.D.: Another stemmer. SIGIR Forum, 24(3):56–61, 1990
[8] Porter M.F.: An algorithm for suffix stripping. Program, 14(3):130–137, 1980
[9] Savoy J.: Stemming of French words based on grammatical categories. Journal of the American Society for Information Science, 44(1):1–9, 1993
[10] Savoy J.: A stemming procedure and stop word list for general French corpora. Journal of the American Society for Information Science, 50(10):944–952, 1999
[11] Shalabi R.A.: Pattern-based stemmer for finding Arabic roots. Information Technology Journal, 4(1):38–43, 2005.
[12] Wikipedia: Arabic language. http://en.wikipedia.org/wiki/Arabic_language, 2013. Online; accessed 18- January-2013

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-50ffcad3-39e6-443f-b3d6-1474660738a7