Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Using information on class interrelations to improve classification of multiclass imbalanced data: A new resampling algorithm

Treść / Zawartość
Warianty tytułu
Języki publikacji
The relations between multiple imbalanced classes can be handled with a specialized approach which evaluates types of examples’ difficulty based on an analysis of the class distribution in the examples’ neighborhood, additionally exploiting information about the similarity of neighboring classes. In this paper, we demonstrate that such an approach can be implemented as a data preprocessing technique and that it can improve the performance of various classifiers on multiclass imbalanced datasets. It has led us to the introduction of a new resampling algorithm, called Similarity Oversampling and Undersampling Preprocessing (SOUP), which resamples examples according to their difficulty. Its experimental evaluation on real and artificial datasets has shown that it is competitive with the most popular decomposition ensembles and better than specialized preprocessing techniques for multi-imbalanced problems.
Opis fizyczny
Bibliogr. 27 poz., rys., tab.
  • Institute of Computing Sciences, Poznan University of Technology, ul. Piotrowo 2, 60-965 Poznań, Poland,
  • [1] Abdi, L. and Hashemi, S. (2016). To combat multi-class imbalanced problems by means of over-sampling techniques, IEEE Transactions on Knowledge and Data Engineering 28(1): 238–251.
  • [2] Agrawal, A., Herna, L.V. and Paquet, E. (2015). SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling, International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), Lisbon, Portugal, Vol. 01, pp. 226–234.
  • [3] Błaszczyński, J. and Stefanowski, J. (2015). Neighbourhood sampling in bagging for imbalanced data, Neurocomputing 150(Part B): 184–203.
  • [4] Fernandez, A., Lopez, V., Galar, M., Jesus, M. and Herrera, F. (2013). Analysing the classification of imbalanced data sets with multiple classes, binarization techniques and ad-hoc approaches, Knowledge-Based Systems 42: 97–110.
  • [5] Fernández, A., Garca, S., Galar, M., Prati, R., Krawczyk, B. and Herrera, H. (2018). Learning from Imbalanced Data Sets, Springer, Cham.
  • [6] Fernandez-Navarro, F., Hervás-Martínez, C. and Gutiérrez, P. A. (2011). A dynamic over-sampling procedure based on sensitivity for multi-class problems, Pattern Recognition 44(8): 1821–1833.
  • [7] Galar, M., Fernndez, A., Barrenechea, E., Bustince, H. and Herrera, F.A. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes, Pattern Recognition 44(8): 1761–1776.
  • [8] Garcia, V., Sanchez, J.S. and Mollineda, R.A. (2007). An empirical study of the behaviour of classifiers on imbalanced and overlapped data sets, in L. Rueda et al. (Eds), Progress in Pattern Recognition, Image Analysis and Applications, Lecture Notes on Computer Science, Vol. 4756, Springer, Berlin, pp. 397–406.
  • [9] He, H. and Ma, Y. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications, Wiley, New York, NY.
  • [10] Jo, T. and Japkowicz, N. (2004). Class imbalances versus small disjuncts, ACM SIGKDD Explorations Newsletter 6(1): 40–49.
  • [11] Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions, Progress Artificial Intelligence 5(4): 221–232.
  • [12] Lango, M. (2019). Tackling the problem of class imbalance in multi-class sentiment classification: An experimental study, Foundations of Computing and Decision Sciences 44(2): 151–178.
  • [13] Lango, M., Napierala, K. and Stefanowski, J. (2017). Evaluating difficulty of multi-class imbalanced data, 23rd International Symposium ISMIS, Warsaw, Poland, pp. 312–322.
  • [14] Lango, M. and Stefanowski, J. (2018). Multi-class and feature selection extensions of roughly balanced bagging for imbalanced data, Journal of Intelligent Information Systems 50(1): 97–127.
  • [15] Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class distribution, Technical Report A-2001-2, University of Tampere, Tampere.
  • [16] Lopez, V., Fernandez, A., Garcia, S., Palade, V. and Herrera, F. (2014). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences 257: 113–141.
  • [17] Napierala, K. and Stefanowski, J. (2012). The influence of minority class distribution on learning from imbalance data, Proceedings of the 7th Conference HAIS 2012, Salamanca, Spain, pp. 139–150.
  • [18] Napierala, K. and Stefanowski, J. (2016). Types of minority class examples and their influence on learning classifiers from imbalanced data, Journal of Intelligent Information Systems 46(3): 563–597.
  • [19] Napierala, K., Stefanowski, J. and Wilk, S. (2010). Learning from imbalanced data in presence of noisy and borderline examples, in M. Szczuka et al. (Eds), Proceedings of the 7th International Conference RSCTC 2010, Lecture Notes on Artificial Intelligence, Vol. 6086, Springer, Berlin, pp. 158–167.
  • [20] Prati, R., Batista, G. and Monard, M. (2004). Class imbalance versus class overlapping: An analysis of a learning system behavior, in R. Monroy et al. (Eds), Advances in Artificial Intelligence, MICAI 2004, Lecture Notes in Computer Science, Vol. 2972, Springer, Berlin/Heidelberg, pp. 312–321.
  • [21] Seaz, J., Krawczyk, B. and Wozniak, M. (2016). Analyzing the oversampling of different classes and types in multi-class imbalanced data, Pattern Recognition 57: 164–178.
  • [22] Stefanowski, J. (2013). Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data, in S. Ramanna et al. (Eds), Emerging Paradigms in Machine Learning, Smart Innovation, Systems and Technologies, Vol. 13, Springer, Berlin/Heidelberg, pp. 277–306.
  • [23] Stefanowski, J. (2016). Dealing with data difficulty factors while learning from imbalanced data, in J. Mielniczuk (Eds), Challenges in Computational Statistics and Data Mining, Studies in Computational Intelligence, Vol. 605, Springer, Cham, pp. 333–363.
  • [24] Stefanowski, J., Krawiec, K. and Wrembel, R. (2017). Exploring complex and big data, International Journal of Applied Mathematics and Computer Science 27(4): 669–679, DOI: 10.1515/amcs-2017-0046.
  • [25] Wang, S. and Yao, X. (2012). Mutliclass imbalance problems: Analysis and and potential solutions, IEEE Transactions Systems, Man and Cybernetics, B 42(4): 1119–1130.
  • [26] Wojciechowski, S., Wilk, S. and Stefanowski, J. (2017). An algorithm for selective preprocessing of multi-class imbalanced data, International Conference on Computer Recognition Systems, CORES 2017, Polanica Zdrój, Poland, pp. 238–247.
  • [27] Zhou, Z.H. and Liu, X.Y. (2010). On multi-class cost sensitive learning, Computational Intelligence 26(3): 232–257.
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Identyfikator YADDA
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.