Apriori-Based Rule Generation in Incomplete Information Databases and Non-Deterministic Information Systems

Sakai, H.; Wu, M.; Nakata, M.

doi:10.3233/FI-2014-995

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!
Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Apriori-Based Rule Generation in Incomplete Information Databases and Non-Deterministic Information Systems

Autorzy

Sakai H. , Wu M. , Nakata M.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2014-995

Warianty tytułu

Języki publikacji

Abstrakty

This paper discusses issues related to incomplete information databases and considers a logical framework for rule generation. In our approach, a rule is an implication satisfying specified constraints. The term incomplete information databases covers many types of inexact data, such as non-deterministic information, data with missing values, incomplete information or interval valued data. In the paper, we start by defining certain and possible rules based on non-deterministic information. We use their mathematical properties to solve computational problems related to rule generation. Then, we reconsider the NIS-Apriori algorithm which generates a given implication if and only if it is either a certain rule or a possible rule satisfying the constraints. In this sense, NIS-Apriori is logically sound and complete. In this paper, we pay a special attention to soundness and completeness of the considered algorithmic framework, which is not necessarily obvious when switching from exact to inexact data sets. Moreover, we analyze different types of non-deterministic information corresponding to different types of the underlying attributes, i.e., value sets for qualitative attributes and intervals for quantitative attributes, and we discuss various approaches to construction of descriptors related to particular attributes within the rules' premises. An improved implementation of NIS-Apriori and some demonstrations of an experimental application of our approach to data sets taken from the UCI machine learning repository are also presented. Last but not least, we show simplified proofs of some of our theoretical results.

Słowa kluczowe

association rules decision rules incomplete information non-deterministic information set and interval valued data sets rough set approximations apriori algorithm apriori algorithm extensions apriori algorithm implementation implementation soundness completeness

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2014

Tom

Vol. 130, nr 3

Strony

343--376

Opis fizyczny

Bibliogr. 56 poz., rys., tab.

Twórcy

autor

Sakai H.

sakai@mns.kyutech.ac.jp

Department of Basic Sciences, Faculty of Engineering, Kyushu Institute of Technology, Tobata, Kitakyushu, 804-8550, Japan

autor

Wu M.

wumogaku@yahoo.co.jp

Department of Integrated System Engineering, Kyushu Institute of Technology, Tobata, Kitakyushu, 804-8550, Japan

autor

Nakata M.

nakatam@ieee.org

Faculty of Management and Information Science, Josai International University, Gumyo, Togane, Chiba 283, Japan

Bibliografia

[1] Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases, Proc. VLDB’94 (J.B. Bocca, M. Jarke, C. Zaniolo, Eds.), Morgan Kaufmann, 1994, 487–499.
[2] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules, in: Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996, 307–328.
[3] Blackburn, P., de Rijke, M., Venema, Y.: Modal logic, Cambridge University Press, 2001.
[4] Ceglar, A., Roddick, J.F.: Association mining, ACM Computing Survey, 38(2), 2006.
[5] Chikalov, I., Moshkov, M. and Zelentsova, M.: On optimization of decision trees, Transactions on Rough Sets, 4, 2005, 18–36.
[6] Chmielewski, M.R., Grzymała-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15(4), 1996, 319–331.
[7] Codd, E.F.: A relational model of data for large shared data banks, Communication of the ACM, 13(6), 1970, 377–387.
[8] Frank, A., Asuncion, A.: UCI machine learning repository, Irvine, CA: University of California, School of Information and Computer Science, 2010. http://mlearn.ics.uci.edu/MLRepository.html
[9] Greco, S., Matarazzo, B., Słowiński, R.: Granular computing and data mining for ordered data: The dominance-based rough set approach, in: Encyclopedia of Complexity and Systems Science (R.A. Meyers, Ed.), Springer, 2009, 4283–4305.
[10] Grzymała-Busse, J.W., Werbrouck, P.: On the best search method in the LEM1 and LEM2 algorithms, in: Incomplete Information: Rough Set Analysis, Studies in Fuzziness and Soft Computing (E. Orłowska, Ed.), 13, Springer, 1998, 75–91.
[11] Grzymała-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction, Transactions on Rough Sets, 1, 2004, 78–95.
[12] Grzymała-Busse, J., Rząsa, W.: A local version of the MLEM2 algorithm for rule induction, Fundamenta Informaticae, 100, 2010, 99–116.
[13] Huynh, V.N., Nakamori, Y., Ono, H., Lawry, J., Kreinovich, V., Nguyen, H.T. (Eds.): Interval / Probabilistic Uncertainty and Non-Classical Logics, Advances in Soft Computing 46, Springer, 2008.
[14] Kaytoue, M., Assaghir, Z., Napoli, A., Kuznetsov, S. O.: Embedding tolerance relations in formal concept analysis: An application in information fusion, in: Information and Knowledge Management, CIKM 2010 (J. Huang, N. Koudas, G.J.F. Jones, X. Wu, K. Collins-Thompson, A. An, Eds.), ACM, 2010, 1689–1692.
[15] Kryszkiewicz, M.: Rough set approach to incomplete information systems, Information Sciences, 112(1-4), 1998, 39–49.
[16] Kryszkiewicz, M.: Rules in incomplete information systems, Information Sciences, 113(3-4), 1999, 271–292.
[17] Leung, Y., Fischer, M.M., Wu, W.-Z., Mi, J.S.: A rough set approach for the discovery of classification rules in interval-valued information systems, International Journal of Approximate Reasoning, 47(2), 2008, 233–246.
[18] Liang, J.Y., Wang, J., Qian, Y.H.: A new measure of uncertainty based on knowledge granulation for rough sets, Information Sciences, 179, 2009, 458–470.
[19] Lipski, W.: On semantic issues connected with incomplete information databases, ACM Transactions on Database Systems, 4(3), 1979, 262–296.
[20] Lipski, W.: On databases with incomplete information, Journal of the ACM, 28(1), 1981, 41–70.
[21] Lloyd, J.W.: Foundations of Logic Programming, Springer Verlag, 1984.
[22] Murai, T., Resconi, G., Nakata, M., Sato, Y.: Operations of zooming in and out on possible worlds for semantic fields, in: KES 2002, Frontiers in Artificial Intelligence and Applications (L.J. E. Damiani R.J. Howlett, N. Ichalkaranje, Eds.), 82, IOS Press, 2002, 1083–1087.
[23] Nakata, M., Sakai, H.: Lower and upper approximations in data tables containing possibilistic information, Transactions on Rough Sets, 7, 2007, 170–189.
[24] Nakata, M., Sakai, H.: Applying rough sets to information tables containing possibilistic values, Transactions on Computational Science, 2, 2008, 180–204.
[25] Nguyen, H.S.: Approximate boolean reasoning: Foundations and applications in data mining, Transactions on Rough Sets, 5, 2006, 334–506.
[26] Orłowska, E., Pawlak, Z.: Representation of nondeterministic information, Theoretical Computer Science, 29(1-2), 1984, 27–39.
[27] Orłowska, E.: Introduction: What you always wanted to know about rough sets, in: Incomplete Information: Rough Set Analysis, Studies in Fuzziness and Soft Computing (E. Orłowska, Ed.), 13, Springer, 1998, 1–20.
[28] Pawlak, Z.: Information systems theoretical foundations, Information Systems, 6(3), 1981, 205–218.
[29] Pawlak, Z.: Systemy informacyjne: Podstawy teoretyczne (In Polish),WNT Press, 1983.
[30] Pawlak, Z.: Rough sets: Theoretical aspects of reasoning about data, Kluwer Academic Publishers, 1991.
[31] Pawlak, Z.: Some issues on rough sets, Transactions on Rough Sets, 1, 2004, 1–58.
[32] Pedrycz,W., Skowron, A., Kreinovich, V. (Eds.): Handbook of Granular Computing, Wiley, 2008.
[33] Polkowski, L., Skowron, A. (Eds.): Rough sets in knowledge discovery 1: Methodology and applications, Studies in Fuzziness and Soft Computing, 18, Springer, 1998.
[34] Qian, Y.H., Liang, J.Y., Yao, Y.Y., Dang, C.Y.: MGRS: A multi-granulation rough set, Information Sciences, 180, 2010, 949–970.
[35] Quinlan, J.R.: Improved use of continuous attributes in C4.5, Journal of Artificial Intelligence Research, 4, 1996, 77–90.
[36] RNIA software logs: http://www.mns.kyutech.ac.jp/∼sakai/RNIA
[37] Sakai, H.: On a framework for logic programming with incomplete information, Fundamenta Informaticae, 19(3/4), 1993, 223–234.
[38] Sakai, H.: Effective procedures for handling possible equivalence relations in non-deterministic information systems, Fundamenta Informaticae, 48(4), 2001, 343–362.
[39] Sakai, H., Okuma, A.: Basic algorithms and tools for rough non-deterministic information analysis, Transactions on Rough Sets, 1, 2004, 209–231.
[40] Sakai, H.: Possible equivalence relations and their application to hypothesis generation in non-deterministic information systems, Transactions on Rough Sets, 2, 2004, 82–106.
[41] Sakai, H., Ishibashi, R., Koba, K., Nakata,M.: Rules and apriori algorithm in non-deterministic information systems, Transactions on Rough Sets, 9, 2008, 328–350.
[42] Sakai, H., Nakata, M., ´Ślęzak, D.: Rule generation in Lipski’s incomplete information databases, Proc. Rough Sets and Current Trends in Computing (M.S. Szczuka, M. Kryszkiewicz, S. Ramanna, R. Jensen, Q. Hu, Eds.), Springer, LNAI, Vol.6086, 2010, 376–385.
[43] Sakai, H., Nakata,M., Ślezak, D.: A prototype system for rule generation in Lipski’s incomplete information databases, Proc. Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (S.O. Kuznetsov, D. Ślęzak, D.H. Hepting, B. Mirkin, Eds.), Springer, LNAI, Vol.6743, 2011, 175–182.
[44] Sakai H., Okuma H., Wu M., Nakata M.: A descriptor-based division chart table in rough non-deterministic information analysis, in: Smart Innovation, Systems and Technologies (J. Watada, et al., Eds.), Springer, 15, Springer, 2012, 25–34.
[45] Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems, in: Intelligent Decision Support - Handbook of Advances and Applications of the Rough Set Theory (R. Słowiński, Ed.), Kluwer Academic Publishers, 1992, 331–362.
[46] Ślęzak, D., Janusz, A., ´Świeboda,W., Nguyen, H.S., Bazan, J., Skowron, A.: Semantic analytics of pubMed content, Information Quality in e-Health (A. Holzinger, K.-M. Simonic, Eds.), Springer, LNCS, Vol.7058, 2011, 63–74.
[47] Ślęzak, D., Sakai, H.: Automatic extraction of decision rules from non-deterministic data systems: Theoretical foundations and SQL-based implementation, in: Database Theory and Application, Communications in Computer and Information Science (D. Ślęzak, T.H. Kim, Y. Zhang, J. Ma, K.I. Chung, Eds.), Vol. 64, Springer, 2009, 151–162.
[48] Ślęzak, D., Synak, P., Borkowski, J., Wróblewski, J., Toppin, G.: A rough-columnar RDBMS engine – A case study of correlated subqueries, IEEE Data Engineering Bulletin, 2012.
[49] Stefanowski, J., Tsoukiàs, A.: On the extension of rough sets under incomplete information, Proc. New Directions in Rough Sets, Data Mining, and Granular-Soft Computing (N. Zhong, A. Skowron, S. Ohsuga, Eds.), Springer, LNAI, Vol.1711, 1999, 73–81.
[50] Stefanowski, J., Tsoukiàs, A.: Incomplete information tables and rough classification, Computational Intelligence, 17(3), 2001, 545–566.
[51] Szczuka, M., Ślęzak, D.: Representation and evaluation of granular systems, in: Smart Innovation, Systems and Technologies (J. Watada, etal., Eds.), Springer, 15, Springer, 2012, 277–286.
[52] Tsumoto, S.: Automated extraction of hierarchical decision rules from clinical databases using rough set model, Expert Systems with Applications, 24, 2003, 189–197.
[53] Wu, W.-Z., Zhang, W.-X. and Li, H.-Z.: Knowledge acquisition in incomplete fuzzy information systems via the rough set approach, Expert Systems, 20, 2003, 280–286.
[54] Yang, X., Yu, D., Yang, J., Wei, L.: Dominance-based rough set approach to incomplete interval-valued information system, Data & Knowledge Engineering, 68(11), 2009, 1331–1347.
[55] Yao, Y.Y.: Three-way decisions with probabilistic rough sets, Information Sciences, 180, 2010, 314–353.
[56] Zadeh, L.A.: Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems, 90(2), 1997, 111–127.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-34814a2f-971b-4e4b-b2d8-ac05b18228ab