Data preprocessing in the classification of the imbalanced data

Borowska, K.; Topczewska, M.

Artykuł - szczegóły

Tytuł artykułu

Data preprocessing in the classification of the imbalanced data

Autorzy

Borowska K. , Topczewska M.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Przetwarzanie wstępne w problemie klasyfikacji danych niezrównoważonych

Języki publikacji

Abstrakty

The article concerns the problem of imbalanced data classification. Two algorithms improving the standard SMOTE method have been created and tested. To measure the distance between objects the Euclidean or the HVDM metric was applied, depending on the number of nominal attributes in a dataset.

Artykuł dotyczy problemu klasyfikacji w przypadku, gdy mamy do czynienia z klasami niezrównoważonymi. W tym celu stworzone zostały dwa algorytmy poprawiające wyniki uzyskiwane za pomocą standardowego algorytmu SMOTE. Do pomiaru odległości między obiektami zastosowano metrykę euklidesową lub metrykę HVDM, w zależności od liczby cech nominalnych w zbiorze.

Słowa kluczowe

class imbalance oversampling classification

klasy niezrównoważone klasyfikacja nowy obiekt tworzenie

Wydawca

Oficyna Wydawnicza Politechniki Białostockiej

Czasopismo

Advances in Computer Science Research

Rocznik

2014

Tom

Nr 11

Strony

31--46

Opis fizyczny

Bibliogr. 12 poz., rys.

Twórcy

autor

Borowska K.

Student of Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland

autor

Topczewska M.

Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland

Bibliografia

[1] G. M. Weiss, Mining with Rarity: A Unifying Framework, SIGKDD Explor. Newsl., Springer Berlin Heidelberg, 6(1), 7–19, 2004.
[2] S. Barua, Md. M. Islam, K. Murase, A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning, Neural Information Processing, Springer Berlin Heidelberg, 7063, 735–744, 2011.
[3] V. Garcia, R. A. Mollineda, J. S. Sanchez, On the k–NN performance in a challenging scenario of imbalance and overlapping, Pattern Analysis and Applications, Springer-Verlag, 11, 269–280, 2008.
[4] H. He, E. A. Garcia, Learning from Imbalanced Data, IEEE Trans. on Knowl. and Data Eng. on 21(9), 1263–1284, 2009.
[5] J. Taeho, N. Japkowicz, Class Imbalances Versus Small Disjuncts, SIGKDD Explor. Newsl. on 6(1), 40–49, 2004.
[6] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: Synthetic Minority Over-sampling Technique, J. Artif. Int. Res. on 16(1), 321–357, 2002. [7] S. Hu, Y. Liang, L. Ma, Y. He, MSMOTE: Improving Classification Performance When Training Data is Imbalanced, Computer Science and Engineering, 2, 13–17, 2009.
[8] N. V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: Synthetic Minority Over-sampling Technique, J. Artif. Int. Res. on 16(1), 321–357, 2002. [9] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and HybridBased Approaches, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 42(4), 463–484, 2012.
[10] G. E. A. P. A. Batista and D. F. Silva, How k-Nearest Neighbor Parameters Affect its Performance, Argentine Symposium on Artificial Intelligence, 1–12, 2009.
[11] K. Napierała, J. Stefanowski, S. Wilk, Learning from Imbalanced Data in Presence of Noisy and Borderline Examples, Proceedings of the 7th International Conference on Rough Sets and Current Trends in Computing, Springer-Verlag, Warsaw, 2010.
[12] Y. Sun, M. S. Kamela, A. K. C. Wongb, Y. Wangc, Cost-sensitive boosting for classification of imbalanced data, Pattern Recognition, 40(12), 3358—3378, 2007.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-11da4b3d-a470-4e32-abf4-ae69d1b8cdc3