Post-processing of BRACID Rules Induced from Imbalanced Data

Napierala, K.; Stefanowski, J.

doi:10.3233/FI-2016-1422

Artykuł - szczegóły

Tytuł artykułu

Post-processing of BRACID Rules Induced from Imbalanced Data

Autorzy

Napierala K. , Stefanowski J.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2016-1422

Warianty tytułu

Konferencja

Rough Set Theory Workshop (RST’2015); (6; 29-06-2015; University of Warsaw )

Języki publikacji

Abstrakty

Rule-based classifiers constructed from imbalanced data fail to correctly classify instances from the minority class. Solutions to this problem should deal with data and algorithmic difficulty factors. The new algorithm BRACID addresses these factors more comprehensively than other proposals. The experimental evaluation of classification abilities of BRACID shows that it significantly outperforms other rule approaches specialized for imbalanced data. However, it may generate too high a number of rules, which hinder the human interpretation of the discovered rules. Thus, the method for post-processing of BRACID rules is presented. It aims at selecting rules characterized by high supports, in particular for the minority class, and covering diversified subsets of examples. Experimental studies confirm its usefulness.

Słowa kluczowe

rule induction class imbalance interpretability of rules filtering of rules

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2016

Tom

Vol. 148, nr 1/2

Strony

51--64

Opis fizyczny

Bibliogr. 33 poz., tab.

Twórcy

autor

Napierala K.

krystyna.napierala@datax.pl

DATAX sp. z o.o., 53-609 Wroclaw, Poland

autor

Stefanowski J.

Jerzy.Stefanowski@cs.put.poznan.pl

Institute of Computing Sciences, Poznan University of Technology, 60-965 Poznan, Poland

Bibliografia

[1] An A. Learning classification rules from data. Computers and Mathematics with Applications. 2003; 45:737–748. doi:10.1016/S0898-1221(03)00034-8.
[2] Bayardo R, Agrawal R. Mining the most interesting rules. Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. 1999, pp. 145–154. Available from: http://doi.acm.org/10.1145/312129.312219.
[3] Brzezinska-Szczech I, Greco S, Slowinski R. Mining Pareto-optimal rules with respect to support and antisupport. Engineering Applications of Artificial Intelligence. 2007;20(5):587–600. doi:10.1016/j.engappai.2006.11.015.
[4] Chawla N, Bowyer K, Hall L, Kegelmeyer W. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 2002;16(1):341–378. Available from: http://dl.acm.org/citation.cfm?id=1622407.1622416.
[5] Domingos P. The RISE System: Conquering without Separating. Sixth IEEE International Conference on Tools with Artificial Intelligence. Proceedings Sixth International Conference of ICTAI. 1994; pp. 704–707. doi:10.1109/TAI.1994.346421.
[6] Furnkranz J, Gamberger D, Lavrac N. Foundations of Rule Learning. Cognitive Technologies, Springer Verlag, 2012. ISBN: 978-3-540-75196-0, 978-3-540-75197-7.
[7] Gamberger D, Lavrac N. Expert-guided subgroup discovery: methodology and application. Journal of Artificial Intelligence Research. 2002;17(1):501–527.
[8] Garcia V, Sanchez JS, Mollineda RA. An empirical study of the behaviour of classifiers on imbalanced and overlapped data sets. Proceedings 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications (CIARP) 2007, LNCS 4756, Springer-Verlag, 2007, pp. 397-406. doi:10.1007/978-3-540-76725-1_42.
[9] Greco S, Slowinski R, Stefanowski J. Evaluating importance of conditions in the set of decision rules. Proceedings 11th International Conference, RSFDGrC 2007, Toronto, Canada, May 14-16, 2007. LNAI 4482, Springer Verlag, 2007, pp. 314–321. doi:10.1007/978-3-540-72530-5_37.
[10] Grzymala-Busse JW. Rule Induction from Rough Approximations, in: Handbook of Computational Intelligence (Kacprzyk J., PedryczW. (eds)), Springer Berlin Heidelberg, 2015, pp. 371–385. doi:10.1007/978-3-662-43505-2_23.
[11] Grzymala-Busse JW, Goodwin LK, Grzymala-Busse WJ, Zheng X. An approach to imbalanced data sets based on changing rule strength. Proc. Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, 2000, pp. 69–74.
[12] Grzymala-Busse JW, Stefanowski J, Wilk Sz. A comparison of two approaches to data mining from imbalanced data. Journal of Intelligent Manufacturing, 2005;16(6):565–573. doi:10.1007/s10845-005-4362-2.
[13] He H, Yungian Ma (eds): Imbalanced Learning. Foundations, Algorithms and Applications, Wiley-IEEE Press, 2013. ISBN: 1118074629, 9781118074626.
[14] Hoens TR, Qia Q, Chawla NV, Zhou Z-H. Building Decision Trees for the Multi-class Imbalance Problem. Proceedings 16th Pacific-Asia Conference, PAKDD 2012, Part I, 2012, pp. 122–134. doi:10.1007/978-3-642-30217-6_11.
[15] Japkowicz N. Class imbalance: Are we focusing on the right issue?. Proceedings II of Workshop on Learning from Imbalanced Data Sets, ICML Conference, 2003, pp. 17–23.
[16] Klosgen W. Explora: A mutlipattern and multi strategy discovery assistant. Advances in knowledge discovery and data mining (Fayyad U. et al. (Eds)), 1996;23:249–271. Available from: http://dl.acm.org/citation.cfm?id=257938.257965.
[17] Liu J, Hu Q, Yu D. A weighted rough set based method developed for class imbalance. Information Sciences, 2008;178(4):1235–1256. doi:10.1016/j.ins.2007.10.002.
[18] Lopez V, Fernandez A, Garcia S, Palade V, Herrera F. An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics. Information Sciences, 2013;250:113–141. doi:10.1016/j.ins.2013.07.007.
[19] Napierala K, Stefanowski J. Addressing imbalanced data with argument based rule learning. Expert Systems With Applications, 2015;42(24):9468–9481. doi:10.1016/j.eswa.2015.07.076.
[20] Napierala K, Stefanowski J. The influence of minority class distribution on learning from imbalance data, Proceedings 7th International Conference, HAIS 2012, Salamanca. Part II, LNAI 7209, Springer 2012, pp. 139–150. doi:10.1007/978-3-642-28931-6_14.
[21] Napierala K, Stefanowski J. BRACID: a comprehensive approach to learning rules from imbalanced data. Journal of Intelligent Information Systems, 2012;39(2):335–373. doi:10.1007/s10844-011-0193-0
[22] Napierala K, Stefanowski J. Types of Minority Class Examples and Their Influence on Learning Classifiers from Imbalanced Data. Journal of Intelligent Information Systems, 2016;46(3):563–597. doi: 10.1007/s10844-015-0368-1.
[23] Own HS, Abd N, Abraham A. A new weighted rough set framework for imbalance class distribution. International Conference of Soft Computing and Pattern Recognition, SoCPaR. Proceedings International Conference, 7-10 December 2010, pp. 29–34. doi: 10.1109/SOCPAR.2010.5685849.
[24] Sikora M, Wrobel L. Data-driven adaptive selection of rule quality measures for improving rule induction and filtration algorithms. International Journal of General Systems, 2013;42(6):594–613. doi:10.1080/03081079.2013.798901.
[25] Slowinski K, Stefanowski J, Siwinski D. Application of rule induction and rough sets to verification of magnetic resonance diagnosis. Fundamenta Informaticae, 2002;53(3/4):345-363.
[26] Stefanowski J. On combined classifiers, rule induction and rough sets. Transactions on Rough Sets, 2007; 6:329–350. Available from: http://dl.acm.org/citation.cfm?id=1768306.1768325.
[27] Stefanowski J, Wilk S. Rough sets for handling imbalanced data: combining filtering and rule-based classifiers. Fundamenta Informaticae, 2006;72(1-3):379-391. Available from: http://dl.acm.org/citation.cfm?id=2369376.2369404.
[28] Stefanowski J, Wilk S. Extending rule-based classifiers to improve recognition of imbalanced classes, in Advances in Data Management, Studies in Computational Intelligence (Ras Z, Dardzinska A. (eds)), Springer 2009;223:131–154. doi:10.1007/978-3-642-02190-9_7.
[29] Wang S, Yao X. Multiclass Imbalance Problems: Analysis and Potential Solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012;42(4):1119–1130. doi:10.1109/TSMCB. 2012.2187280.
[30] Weiss GM. Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 2004;6(1):7–19. doi: 10.1145/1007730.1007734.
[31] Weiss GM, Provost F. Learning when training data are costly: the efect of class distribution on tree induction. Journal of Artificial Intelligence Research, 2003;19(1):315–354. Available from: http://dl.acm.org/citation.cfm?id=1622434.1622445.
[32] Wilson DR, Martinez TR. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 1997;6(1):1–34.
[33] Yao YY, Zhong N. An analysis of quantitative measures associated with rules. Methodologies for Knowledge Discovery and Data Mining. Proceedings Third Pacific-Asia Conference, PAKDD-99 Beijing, China. LNAI 1574, Springer, 1999, pp. 1574:479–488. doi:10.1007/3-540-48912-6_64.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-01f6030a-10c2-4b7a-9231-f90f7760d655