Rare class text categorization with SVM ensemble

Silva, C.; Ribeiro, B.

Artykuł - szczegóły

Tytuł artykułu

Rare class text categorization with SVM ensemble

Autorzy

Silva C. , Ribeiro B.

Wybrane pełne teksty z tego czasopisma

http://pe.org.pl/

Identyfikatory

Warianty tytułu

Kategoryzacja tekstu klasy rzadkiej w oparciu o zespoły SVM

Konferencja

PELINCEC Workshop "Bridges Through Time: Intelligent Control, Signal Processing and Real-Time Process Control"

Języki publikacji

Abstrakty

Text Classification is the assignment of a class from a predetermined set to a new document. In real world applications the number of positive examples for most classes is limited, while the overall number of examples is huge. In this setting classifiers' performance can experience a not so graceful degradation, especially where false negatives are concerned. To handle this problem, we propose a committee of several SVM, where the learning strategy uses the separating margin as differentiating factor on positive classifications. While enabling robustness, the method improves performance by correcting errors of one classifier using the accurate output of others. We demonstrate the practicality and effectiveness of the method by simulation results on Reuters-21578 data set.

Kategoryzacja tekstu to przypisanie nowego tekstu do odpowiedniej kategorii ze zdefiniowanego wcześniej zbioru. W praktycznych zastosowaniach liczba wzorców dla większości klas jest ograniczona, podczas gdy liczba wszystkich danych wejściowych jest ogromna. Przy takich właściwościach problemu, zbudowanie klasyfikatora dobrze spełniającego swoje zadanie nie jest trywialne. Aby rozwiązać ten problem, zaproponowano zespól kilku struktur SVM, w których uczenie opiera się na maksymalizacji marginesu separacji pomiędzy dwiema różnymi klasami. Metoda wprowadza odporność poprzez korekcję wyjścia jednego z klasyfikatorów dzięki wykorzystaniu informacji z wyjść pozostałych. Skuteczność metody zilustrowano na przykładzie symulacji dla zbioru danych Reuters-21578.

Słowa kluczowe

text classification SVM ensemble

kategoryzacja tekstu zespoły SVM

Wydawca

Wydawnictwo SIGMA-NOT

Czasopismo

Przegląd Elektrotechniczny

Rocznik

2006

Tom

R. 82, nr 1

Strony

28--31

Opis fizyczny

Bibliogr. 10 poz., rys., tab.

Twórcy

autor

Silva C.

autor

Ribeiro B.

Institute Politecnico de Leiria, Universidade de Coimbra, catarina@dei.uc.pt

Bibliografia

[1] Yan-Shi Dong, Ke-Song Han, "Boosting SVM Classifiers by Ensemble", WWW 2005, 1072-1073, 2005.
[2] Y. Yang, X. Liu, "A Re-Examination of Text Categorization Methods", in Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, ACM Press, pp.42-49, 1999.
[3] Thorsten Joachims,"Learning to Classify Text Using Support Vector Machines", The Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, 2002.
[4] Fabrizio Sebastian!, "Machine Learning In Automated Text Categorization", ACM Computing Surveys, Vol.34, No.1, March 2002, pp. 1-47.
[5] Catarina Silva, Bernardete Ribeiro, "Labeled and Unlabeled Data in Text Categorization", IEEE International Joint Conference on Neural Networks, 1661-1666, 2004.
[6] Jian Zhang, Yiming Yang, "Robusteness of Regularized Linear Classification Methods in Text Categorization", SIGIR'03, pp.190-197.
[7] R. Yan, A. Hauptmann, R. Jin, Y. Liu, "On Predicting Rare Class with SVM Ensemble in Scene Classification", IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP'03), 2003.
[8] Yan-Shi Dong, Ke-Song Han, "A Comparison of Several Ensemble Methods for Text Categorization", IEEE International Conference on Services Computing, 2004.
[9] Vladimir Vapnik, (1995), The Nature of Statistical Learning Theory, Springer, 1995.
[10] B. Scholkopf, C. Burges, A. Smola, Advances in Kernel Methods - Introduction to Support Vector Learning, MIT Press, pp. 1-15, 1999.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BAR0-0014-0025