PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Analyzing the effect of dimensionality reduction in document categorization for Basque

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
Human Language Technologies as a challenge for Computer Science and Linguistics (2; 21-23.04.2005; Poznań, Poland)
Języki publikacji
EN
Abstrakty
EN
This paper analyzes the incidence that dimensionality reduction techniques have in the process of text categorization of documents written in Basque. Classification techniques such as Naive Bayes, Winnow, SVMs and k-NN have been selected. The Singular Value Decomposition dimensionality reduction technique together with lemmatization and noun selection have been used in our experiments. The results obtained show that the approach combines SVD and k-NN for a lemmatized corpus gives the best accuracy rates of all with a remarkable difference.
Rocznik
Strony
703--710
Opis fizyczny
Bibliogr. 19 poz., tab.
Twórcy
autor
  • University of the Basque Country, UPV-EHU, Computer Science Faculty, 649 postakutxa, 20.080 Donostia, Gipuzkoa, Euskal-Heria, Spain
autor
  • University of the Basque Country, UPV-EHU, Computer Science Faculty, 649 postakutxa, 20.080 Donostia, Gipuzkoa, Euskal-Heria, Spain
autor
  • University of the Basque Country, UPV-EHU, Computer Science Faculty, 649 postakutxa, 20.080 Donostia, Gipuzkoa, Euskal-Heria, Spain
autor
  • University of the Basque Country, UPV-EHU, Computer Science Faculty, 649 postakutxa, 20.080 Donostia, Gipuzkoa, Euskal-Heria, Spain
  • University of the Basque Country, UPV-EHU, Computer Science Faculty, 649 postakutxa, 20.080 Donostia, Gipuzkoa, Euskal-Heria, Spain, ccpzejaa@si.ehu.es
Bibliografia
  • [1] I. Alegria, X. Artola, K. Sarasola and M. Urkia: Automatic Morphological Analysis of Basque. Literary & Linguistic Computing. 11 (1996).
  • [2] M. W. Berry and M. Browne: Understanding Search Engines: Mathematical Modeling and Text Retrieval. Society for Industrial and Applied Mathematics. ISBN: 0-89871-437-0. Philadelphia, 1999.
  • [3] M. W. Berry, S. T. Dumais and G. W. O'Brien: Using Linear Algebra For Intelligent Information Retrieval. SIAM Review, 37(4), (1995), 573-595.
  • [4] A. J. Carlson, C. M. Cumby, J. L. Rosen and D. Roth, Snow. UIUC Tech Report UIUC-DCS-R-99-210. 1999, University of Illinois.
  • [5] I. Dagan, Y. Karov, and D. Roth. Mistake-Driven Learning In Text Categorization. In Proceedings of The 2nd Conference On Empirical Methods In Natural Language Processing, Pages 55-63, 1997.
  • [6] B. V. Dasarathy, Nearest neighbor (nn) norms: Nn pattern recognition classification techniques. IEEE Computer Society Press, 1991.
  • [7] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science. 41:391-407, 1990.
  • [8) R. Dolin, J. Pierre, M. Butler and R. Avedon. Practical evaluation of ir within automated classification systems. Proceedings of the International Conference on Information and Knowledge Management CIKM. pages 322-329. November 1999.
  • [9] S. Dumais, Latent semantic analysis. ARIST (Annual Review- of Information Science Technology). 38:189-230. 2004.
  • [10] S. T. Dumais: Using Isi for information tillering: Tree-3 experiments. In D. Harman. editor. Third Text REtrieval Conference (TRECJl. pages 219-230, 1995.
  • [11] N. Ezeiza, I. Aduriz, I. Alegria, J. M. Arriola. and R. Urizar Combining stochastic and rule-based methods for disambiguation in agglutinative languages. COUNC-ACLVH. 1998.
  • [12] I. Inza, P. Larninaga, R. Etxeberria and B. Sierra: Feature subset selection by bayesian network-based optimization. Artificial Intelligence. 123:157-184. 2000.
  • [13] T. Joachims: Transductive inference for lex! classification using support vector machines. Proceedings of ICML.-99, 16th International Conference on Machine Learning, pages 200-209, 1999.
  • [14] M. Minsky: Steps toward artificial intelligence. In Proceedings of the Institute of Radio Engineers, volume 49, pages 8-30.
  • [15] P. Nakov, E. Valchanova and G. Angelova: Towards deeper understanding of the Isa performance. In Proc. of the Int. Conference RANLP-03 'Recent Advances in Natural language Processing", pages 311-318. Bulgaria, 2003.
  • [16] F. Sebastiani: Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1-47, March 2002.
  • [17] I. H. Witten and E. Frank: Data mining, practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publishers. 1999.
  • [18] D. Wolpert: Stacked generalization. Neural Networks. 5:241-259. 1992.
  • [19] Y. Yang and J. O. Pedersen: A comparative study on feature selection in text categorization. In Morgan Kaufmann, editor. Proceedings of the Fourteenth International Conference on Machine Learning. ICML’7, pages 412—420, 1997.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BSW3-0021-0026
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.