PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Rule Quality Measures Settings in Classification, Regression and Survival Rule Induction : an Empirical Approach

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The paper presents the results of research related to the efficiency of the so-called rule quality measures which are used to evaluate the quality of rules at each stage of the rule induction. The stages of rule growing and pruning were considered along with the issue of conflict resolution which may occur during the classification. The work is the continuation of research on the efficiency of quality measures employed in sequential covering rule induction algorithm. In this paper we analyse only these quality measures (8 measures) which had been recognized as effective based on previous conducted research. The study was conducted on approximately 70 benchmark datasets related to classification, regression and survival analysis problems. In the comparisons we analyzed prognostic abilities of the induced rules as well as the complexity of the resulting rule-based data models.
Wydawca
Rocznik
Strony
419--449
Opis fizyczny
Bibliogr. 74 poz., rys., tab.
Twórcy
autor
  • Institute of Informatics, ul. Akademicka 16, 44-100 Gliwice, Poland
  • Institute of Innovative Technologies, EMAG Leopolda 31, 40-189 Katowice, Poland
autor
  • Institute of Informatics, ul. Akademicka 16, 44-100 Gliwice, Poland
  • Institute of Innovative Technologies, EMAG Leopolda 31, 40-189 Katowice, Poland
autor
  • Institute of Informatics, ul. Akademicka 16, 44-100 Gliwice, Poland
Bibliografia
  • [1] Błaszczyński J, Słowiński R, Szeląg M. Sequential covering rule induction algorithm for variable consistency rough set approaches. Information Sciences. 2011;181(5):987–1002. Available from: {http://dx.doi.org/10.1016/j.ins.2010.10.030}. doi:10.1016/j.ins.2010.10.030.
  • [2] Fürnkranz J. Separate-and-conquer rule learning. Artificial Intelligence Review. 1999;13(1):3–54. Available from: {http://dx.doi.org/10.1023/A:1006524209794}. doi:10.1023/A:1006524209794.
  • [3] Grzymała-Busse J, Ziarko W. Data mining based on rough sets. In: Data Mining Opportunities and Challenges. Idea Group Publishing; 2003. p. 142–173.
  • [4] Kaufman K, Michalski R. Learning in Inconsistent World, Rule Selection in STAR/AQ18. Machine Learning and Inference Laboratory; 1999. Report P99-2 1999.
  • [5] Stanczyk U. Selection of decision rules based on attribute ranking. Journal of Intelligent and Fuzzy Systems. 2015;29(2):899–915. Available from: http://dx.doi.org/10.3233/IFS-151620. doi:10.3233/IFS-151620.
  • [6] Czogała E, Łęski J. Fuzzy and Neuro-Fuzzy Intelligent Systems. vol. 47 of Studies in Fuzziness and Soft Computing. Physica-Verlag; 2000. Available from: http://dx.doi.org/10.1007/978-3-7908-1853-6. doi:10.1007/978-3-7908-1853-6.
  • [7] Boser B, Guyon I, Vapnik V. A training algorithm for optimal margin classifiers. In: Proc. of the 5th Annual ACM Workshop on Computational Learning Theory; 1992. p. 144–152. Available from: {http://dx.doi.org/10.1145/130385.130401}. doi:10.1145/130385.130401.
  • [8] Dembczynski K, Kotłowski W, Słowiński R. ENDER: a statistical framework for boosting decision rules. Data Mining and Knowledge Discovery. 2010;21(1):52–90. Available from: {http://dx.doi.org/10.1007/s10618-010-0177-7}. doi:10.1007/s10618-010-0177-7.
  • [9] Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules in Large Databases. In: Proc. of the 20th VLDB Conference. Santiago, Chile; 2004. p. 487–499.
  • [10] Kavsek B, Lavrac N. APRIORI-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence. 2006;20(7):543–583. Available from: {http://dx.doi.org/10.1007/978-3-540-45231-7_22}. doi:10.1007/978-3-540-45231-7_22.
  • [11] Stefanowski J, Vanderpooten D. Induction of Decision Rules in Classification and Discovery-Oriented Perspectives. International Journal of Intelligent Systems. 2001;16(1):13–27.
  • [12] Lavrac N, Kavsek B, Flach P. Subgroup discovery with CN2-SD. Journal of Machine Learning Research. 2004;5:153–188.
  • [13] Geng L, Hamilton H. Interestingness measures for data mining: A survey. ACM Computing Surveys. 2006;38(6):art. no. 9. Available from: {http://dx.doi.org/10.1145/1132960.1132963}. doi:10.1145/1132960.1132963.
  • [14] McGarry K. A survey of interestingness measures for knowledge discovery. The Knowledge Engineering Review. 2005;20(1):39–61. Available from: {http://dx.doi.org/10.1017/S0269888905000408}. doi:10.1017/S0269888905000408.
  • [15] Sahar S. Interestingness measures - On determining what is interesting. In: Data Mining and Knowledge Discovery Handbook. Springer-Verlag; 2010. p. 603–612. Available from: {http://dx.doi.org/10.1007/978-0-387-09823-4_30}. doi:10.1007/978-0-387-09823-4_30.
  • [16] An A, Cercone N. Rule quality measures for rule induction systems: description and evaluation. Computational Intelligence. 2001;17(3):409–424. Available from: http://dx.doi.org/10.1111/0824-7935.00154. doi:10.1111/0824-7935.00154.
  • [17] Bruha I, Tkadlec J. Rule quality for multiple-rules classifier: Empirical expertise and theoretical methodology. Intelligent Data Analysis. 2003;7(2):99–124.
  • [18] Janssen F, Fürnkranz J. On the quest for optimal rule learning heuristics. Machine Learning. 2010;78:343–379. Available from: {http://dx.doi.org/10.1007/s10994-009-5162-2}. doi:10.1007/s10994-009-5162-2.
  • [19] Sikora M, Wróbel Ł. Data-driven Adaptive Selection of Rule Quality Measures for Improving Rule Induction and Filtration Algorithm. International Journal of General Systems. 2013;42(6):594–613. Available from: {http://dx.doi.org/10.1080/03081079.2013.798901}. doi:10.1080/03081079.2013.798901.
  • [20] Ślęzak D, Ziarko W, et al. The investigation of the Bayesian rough set model. International Journal of Approximate Reasoning. 2005;40(1):81–91.
  • [21] Michalski RS. Discovering Classification Rules Using Variable-valued Logic System VL. In: Proceedings of the 3rd International Joint Conference on Artificial Intelligence. IJCAI’73. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1973. p. 162–172. Available from: http://dl.acm.org/citation.cfm?id=1624775.1624795.
  • [22] Clark P, Niblett T. The CN2 Induction Algorithm. Machine Learning. 1989;3(4):261–283. Available from: http://dx.doi.org/10.1023/A:1022641700528. doi:10.1023/A:1022641700528.
  • [23] Sikora M. Induction and pruning of classification rules for prediction of microseismic hazards in coal mines. Expert Systems with Applications. 2011;38(6):6748–6758. Available from: http://www.sciencedirect.com/science/article/pii/S0957417410012972. doi:http://dx.doi.org/10.1016/j.eswa.2010.11.059.
  • [24] Tsumoto S. Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model. Information Sciences. 2004;162(2):65–80. Medical Expert Systems. Available from: http://www.sciencedirect.com/science/article/pii/S0020025504000647. doi:http://dx.doi.org/10.1016/j.ins.2004.03.002.
  • [25] Napierala K, Stefanowski J. BRACID: a comprehensive approach to learning rules from imbalanced data. Journal of Intelligent Information Systems. 2012;39(2):335–373. Available from: http://dx.doi.org/10.1007/s10844-011-0193-0. doi:10.1007/s10844-011-0193-0.
  • [26] Hühn J, Hüllermeier E. FURIA: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery. 2009;19(3):293–319. Available from: http://dx.doi.org/10.1007/s10618-009-0131-8. doi:10.1007/s10618-009-0131-8.
  • [27] Lavrac N, Flach P, Zupan B. Rule Evaluation Measures: A Unifying View. Lecture Notes in Artificial Intelligence. 1999;1634:174–185. Available from: {http://dx.doi.org/10.1007/3-540-48751-4_17}. doi:10.1007/3-540-48751-4_17.
  • [28] Bench-Capon T, Dunne P, Možina M, Žabkar J, Bratko I. Argumentation in Artificial Intelligence Argument based machine learning. Artificial Intelligence. 2007;171(10):922–937. Available from: http://www.sciencedirect.com/science/article/pii/S0004370207000690. doi:http://dx.doi.org/10.1016/j.artint.2007.04.007.
  • [29] Sikora M. Redefinition of Decision Rules Based on the Importance of Elementary Conditions Evaluation. Fundamenta Informaticae. 2013 Apr;123(2):171–197. Available from: http://dx.doi.org/10.3233/FI-2013-806. doi:10.3233/FI-2013-806.
  • [30] Riza LS, Janusz A, Bergmeir C, Cornelis C, Herrera F, Ślęzak D, et al. Implementing algorithms of rough set theory and fuzzy rough set theory in the R package – rough sets. Information Sciences. 2014;287:68–89.
  • [31] Amin T, Chikalov I, Moshkov M, Zielosko B. Relationships Between Length and Coverage of Decision Rules. Fundam Inform. 2014;129(1-2):1–13. Available from: http://dx.doi.org/10.3233/FI-2014-956. doi:10.3233/FI-2014-956.
  • [32] Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks; 1984.
  • [33] Quinlan JR. Learning with continuous classes. In: Proceedings of the Australian Joint Conference on Artificial Intelligence. Singapore: World Scientific; 1992. p. 343–348.
  • [34] Ženko B, Džeroski S, Struyf J. In: Bonchi F, Boulicaut JF, editors. Learning Predictive Clustering Rules. Berlin, Heidelberg: Springer Berlin Heidelberg; 2006. p. 234–250. Available from: http://dx.doi.org/10.1007/11733492\_14. doi:10.1007/11733492_14.
  • [35] Janssen F, Fürnkranz J. Heuristic Rule-based Regression via Dynamic Reduction to Classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Two. IJCAI’11; 2011. p. 1330–1335.
  • [36] Friedman JH, Popescu BE. Predictive learning via rule ensembles. Ann Appl Stat. 2008 09;2(3):916–954. Available from: http://dx.doi.org/10.1214/07-AOAS148. doi:10.1214/07-AOAS148.
  • [37] Dembczyński K, Kotłowski W, Słowiński R. In: Rutkowski L, Tadeusiewicz R, Zadeh LA, Zurada JM, editors. Solving Regression by Learning an Ensemble of Decision Rules. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. p. 533–544. Available from: http://dx.doi.org/10.1007/978-3-540-69731-2\_52. doi:10.1007/978-3-540-69731-2_52.
  • [38] Pattaraintakorn P, Cercone N. A foundation of rough sets theoretical and computational hybrid intelligent system for survival analysis. Computers & Mathematics with Applications. 2008;56(7):1699–1708. Available from: http://www.sciencedirect.com/science/article/pii/S0898122108002472. doi:http://dx.doi.org/10.1016/j.camwa.2008.04.030.
  • [39] Bazan J, Osmólski A, Skowron A, Ślęzak D, Szczuka M, Wróblewski J. In: Alpigini JJ, Peters JF, Skowron A, Zhong N, editors. Rough Set Approach to the Survival Analysis. Berlin, Heidelberg: Springer Berlin Heidelberg; 2002. p. 522–529. Available from: http://dx.doi.org/10.1007/3-540-45813-1\_69.doi:10.1007/3-540-45813-1_69.
  • [40] Sikora M, Wróbel Ł, Mielcarek M, Kalwak K. Application of rule induction to discover survival factors of patients after bone marrow transplantation. Journal of Medical Informatics and Technologies. 2013;22:35–53.
  • [41] Kronek LP, Reddy A. Logical analysis of survival data: prognostic survival models by detecting high-degree interactions in right-censored data. Bioinformatics. 2008;24(16):i248–i253. doi:10.1093/bioinformatics/btn265.
  • [42] Chikalov I, Lozin V, Lozina I, Moshkov M, Nguyen HS, Skowron A, et al. In: Logical Analysis of Data: Theory, Methodology and Applications. Berlin, Heidelberg: Springer Berlin Heidelberg; 2013. p. 147–192. Available from: http://dx.doi.org/10.1007/978-3-642-28667-4\_3. doi:10.1007/978-3-642-28667-4_3.
  • [43] Crama Y, Hammer PL, Ibaraki T. Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research. 1988;16(1):299–325. Available from: http://dx.doi.org/10.1007/BF02283750. doi:10.1007/BF02283750.
  • [44] Liu X, Minin V, Huang Y, Seligson DB, Horvath S. Statistical methods for analysing tissue microarray data. Journal of Biopharmaceutical Statistics. 2004;14(3):671.
  • [45] Michael LeBlanc JC. Relative Risk Trees for Censored Survival Data. Biometrics. 1992;48(2):411–425. Available from: http://www.jstor.org/stable/2532300.
  • [46] Wróbel Ł. Tree-based induction of decision list from survival data. Journal of Medical Informatics and Technologies. 2012;20:73–78.
  • [47] Wróbel Ł, Sikora M. Censoring weighted separate-and-conquer rule induction from survival data. Methods of Information in Medicine. 2014;53(2):137.
  • [48] Bruha I. Quality of decision rules: Definitions and classification schemes for multiple rules. In: Machine Learning and Statistics: The Interface. John Wiley; 1997. p. 107–131.
  • [49] Fürnkranz J, Flach P. ROC ‘n’ Rule Learning - Towards a Better Understanding of Covering Algorithms. Machine Learning. 2005;39–77. Available from: {http://dx.doi.org/10.1007/s10994-005-5011-x}. doi:10.1007/s10994-005-5011-x.
  • [50] Yao Y, Zhong N. An analysis of quantitative measures associated with rules. Lecture Notes in Computer Science. 1999;1574:479–488. Available from: {http://dx.doi.org/10.1007/3-540-48912-6_64}. doi:10.1007/3-540-48912-6_64.
  • [51] Greco S, Pawlak Z, Słowiński R. Can Bayesian confirmation measures be useful for rough set decision rules? Engineering Applications of Artificial Intelligence. 2004;17(4):345–361. Available from: {http://dx.doi.org/10.1016/j.engappai.2004.04.008}. doi:10.1016/j.engappai.2004.04.008.
  • [52] Tan P, Kumar V, Srivastava J. Selecting the right interestingness measure for association patterns. In: Proc. of the 8th International Conference on Knowledge Discovery and Data Mining; 2002. p. 32–41. Available from: {http://dx.doi.org/10.1145/775047.775053}. doi:10.1145/775047.775053.
  • [53] Sikora M. Decision rule-based data models using TRS and NetTRS - methods and algorithms. Transaction on Rough Sets - Lecture Notes on Computer Science. 2010;5946:130–160. Available from: {http://dx.doi.org/10.1007/978-3-642-11479-3_8}. doi:10.1007/978-3-642-11479-3_8.
  • [54] Sikora M, Wróbel Ł. Data-driven Adaptive Selection of Rule Quality Measures for Improving the Rule Induction Algorithm. Lecture Notes in Computer Science. 2011;6743:279–287. Available from: {http://dx.doi.org/10.1007/978-3-642-21881-1_44}. doi:10.1007/978-3-642-21881-1_44.
  • [55] Sikora M, Skowron A, Wróbel Ł. In: Ramsay A, Agre G, editors. Rule Quality Measure-Based Induction of Unordered Sets of Regression Rules. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012. p. 162–171. Available from: http://dx.doi.org/10.1007/978-3-642-33185-5\_18. doi:10.1007/978-3-642-33185-5_18.
  • [56] Kaplan EL, Meier P. Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association. 1958;53(282):457–481.
  • [57] Wohlrab L, Fürnkranz J. A review and comparison of strategies for handling missing values in separateand-conquer rule learning. Journal of Intelligent Information Systems. 2011;36(1):73–98. Available from: {http://dx.doi.org/10.1007/s10844-010-0121-8}. doi:10.1007/s10844-010-0121-8.
  • [58] Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006;27:861–874. Available from: {http://dx.doi.org/10.1016/j.patrec.2005.10.010}. doi:10.1016/j.patrec.2005.10.010.
  • [59] Webb G, Brain D. Generality Is Predictive of Prediction Accuracy. Lecture Notes in Computer Science. 2006;3755:1–13. Available from: {http://dx.doi.org/10.1007/11677437_1}. doi:10.1007/11677437_1.
  • [60] Xiong H, Shekhar S, Tan PN, Kumar V. Exploiting a Support-based Upper Bound of Pearson’s Correlation Coefficient for Efficiently Identifying Strongly Correlated Pairs. In: In Proceedings of the 10th ACM SIGKDD; 2004. p. 334–343. Available from: {http://dx.doi.org/10.1145/1014052.1014090}. doi:10.1145/1014052.1014090.
  • [61] Christensen D. Measuring confirmation. Journal of Philosophy. 1999;96(9):437–461. Available from: {http://dx.doi.org/10.2307/2564707}. doi:10.2307/2564707.
  • [62] Joyce J. The Foundations of Causal Decision Theory. Cambridge University Press; 1999.
  • [63] Brzezińska I, Słowiński R, Greco S. Mining Pareto-optimal Rules with Respect to Support and Confirmation or Support and Anti-support. Engineering Applications of Artificial Intelligence. 2007; 20(5):587–600. Available from: http://dx.doi.org/10.1016/j.engappai.2006.11.015. doi:10.1016/j.engappai.2006.11.015.
  • [64] Uci machine learning repository;. Available from: http://archive.ics.uci.edu/ml.
  • [65] Hosmer DW, Lemeshow S, May S. Applied Survival Analysis: Regression Modeling of Time to Event Data (Wiley Series in Probability and Statistics). 2nd ed. Wiley-Interscience; 2008. Available from: http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20\&path=ASIN/0471754994.
  • [66] Team RC. A Language and Environment for Statistical Computing.
  • [67] Lange N, Ryan L, Billard L, Brillinger D, Conquest L, Greenhouse J, et al. Case Studies in Biometry. No. 1 in A Wiley-interscience publication. Wiley; 1994. Available from: https://books.google.pl/books?id=PZcpAQAAMAAJ.
  • [68] Wojnarski M, Janusz A, Nguyen HS, Bazan J, Luo CJ. RSCTC2010 Discovery Challenge: Mining DNA Microarrays Data for Medical Diagnosis and Treatment. Lecture Notes in Artificial Intelligence. 2010;6086:4–19. Available from: {http://dx.doi.org/10.1007/978-3-642-13529-3_3}. doi:10.1007/978-3-642-13529-3_3.
  • [69] Nemeneyi P. Distribution-free Multiple comparisons [PhD thesis].
  • [70] Wilcoxon F. Individual Comparisons by Ranking Methods. Biometrics Bulletin. 1945;1(6):80–83. Available from: http://www.jstor.org/stable/3001968.
  • [71] Wróbel Ł, Sikora M, Skowron A. In: Sombattheera C, Loi NK, Wankar R, Quan T, editors. Algorithms for Filtration of Unordered Sets of Regression Rules. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012. p. 284–295. Available from: http://dx.doi.org/10.1007/978-3-642-35455-7\_26. doi:10.1007/978-3-642-35455-7_26.
  • [72] Witten I, Frank E. Data mining: practical machine learning tools and techniques. Morgan Kaufmann; 2005.
  • [73] Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine. 1999;18(17).
  • [74] Malara W, Sikora M, Wróbel L. An R package for induction and evaluation of classification rules. Studia Informatica. 2013;34(2B):339–352.
Uwagi
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-5599b1fc-5b30-4b24-8e84-30bc169bb6a7
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.