This paper analyzes the incidence that dimensionality reduction techniques have in the process of text categorization of documents written in Basque. Classification techniques such as Naive Bayes, Winnow, SVMs and k-NN have been selected. The Singular Value Decomposition dimensionality reduction technique together with lemmatization and noun selection have been used in our experiments. The results obtained show that the approach combines SVD and k-NN for a lemmatized corpus gives the best accuracy rates of all with a remarkable difference.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.