SQL-based approach to distributed and incremental association rule mining 1

Kona, H.; Chakravarthy, S.; Arora, A.

Artykuł - szczegóły

Tytuł artykułu

SQL-based approach to distributed and incremental association rule mining 1

Autorzy

Kona H. , Chakravarthy S. , Arora A.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Warianty tytułu

Konferencja

ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD'2005) / sympozjum [1st; September 15-16, 2005; Tallinn, Estonia]

Języki publikacji

Abstrakty

Database mining is the process of extracting interesting and previously unknown patterns and correlations from data stored in Data Base Management Systems (DBMSs). Association rule mining is the process of discovering items, which tend to occur together in transactions. If the data to be mined were stored as relations in multiple databases, instead of moving data from one database to another, a partitioned or distributed approach would be appropriate. Also, incremental addition of data to the dala set should not necessitate re-computation of rules for the entire data set. This paper focuses on partitioned and incremental approaches to association rule mining for data stored in Relational DBMSs. This paper proposes a partitioning approach that is very effective for distributed databases as compared to the main memory partitioned approach. Our approach uses SQL-based K-way join algorithm and its optimizations. A second alternative that trades accuracy for performance is also presented. Our results indicate that, beyond a certain size of data sets, the accuracy is preserved with this approach and results in better performance. The incremental association rule-mining algorithm reduces the task of re-computing the rules each time new data is added to the database. This paper implements the incremental algorithm using the negative border concept with a number of optimizations. Extensive experiments are performed and results are presented for both partitioned and incremental approaches using IBM DB2/UDB and Oracle 8i.

Słowa kluczowe

database mining association rules distributed mining incremental mining

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2006

Tom

Vol. 31, No. 1

Strony

5--26

Opis fizyczny

Bibliogr. 19 poz.

Twórcy

autor

Kona H.

autor

Chakravarthy S.

autor

Arora A.

CSE Department, The University of Texas at Arlingtona, kona@cse.uta.edu

Bibliografia

1. Zhang, S., X. Wu, and C. Zhang, Multi-Database Mining. IEEE Computational Intelligence Bulletin, Vol. 2, No. 1, June 2003: p. 5-13.
2. Wu, X. and S. Zhang. Synthesizing High-Frequency Rules from Different Data Sources. In KDE 2003, p, 353-367.
3. Han, J. and M. Kamber, Data Mining : Concepts and Techniques. 2001: Morgan Kaufmann Publishers.
4. Thomas, S., et al. An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases. In Knowledge Discovery and Data Mining. 1997. p. 263-266.
5. Thomas, S., Architectures and optimizations for integrating Data Mining algorithms with Database Systems, in CSE. 1998, University of Florida: Gainesville.
6. Thomas, S. and S. Chakravarthy. Incremental Mining of Constrained Associations. In Proc. of the 7th Intl. Conf. of High Performance Computing (HiPC). 2000, p. 547-558.
7. Thuraisingham, B., A Primer for Understanding and Applying Data Mining. IEEE, 2000. Vol. 2, No.l: p. 28-31.
8. Agrawal, R., T. Imielinski, and A. Swarm. Mining Association Rules between sets of items in large databases. In ACM SIGMOD Intl. Conference on the Management of Data. 1993. Washington, D.C., p. 207-216.
9. Agrawal, R. and R. Srikant. Fast Algorithms for mining association rules. In 20th Intl. Conference on Very Large Databases (VLDB). 1994, p. 487-499.
10. Savasere, A., E. Omiecinsky, and S. Navathe. An efficient algorithm for mining ssociation rules in large databases. In 21st Intl. Conf. on Very Large Databases (VLDB). 1995. Zurich, Switzerland, p. 432-444.
11. Chen, Y., An Efficient Parallel Algorithm for Mining Association Rules in Large Databases. 1998, Georgia Institute of Technology: Atlanta.
12. Sarawagi, S., S. Thomas, and R. Agrawal. Integrating Association Rule Mining with Relational Database System: Alternatives and Implications. In ACM SIGMOD Intl. Conference on Management of Data. 1998. Seattle, Washington, p. 343-354.
13. Dudgikar, M., A Layered Optimizer or Mining Association Rules over RDBMS, in CSE Department. 2000, University of Florida: Gainesville.
14. Mishra, P. and S. Chakravarthy. Performance Evaluation and Analysis of SOL-92 Approaches for Association Rule Mining. InBNCODProc. 2003,.p. 95-114.
15. Mishra, P. and S. Chakravarthy, Performance Evaluation of SQL-OR Variants for Association Rule Mining, DaWaK 2003, Prague, Czech Republic, p. 288-298.
16. Cheung, D., J.Han, V.Ng, and C.Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. In Proc. of the 121 IEEE Intl.Conference on Data Engineering, 1996, New Orleans, Louisiana, p. 106-114.
17. Toivonen, H. Sampling Large Databases for Association Rules. In Proc. of Intl. Conf. on Very Large Data Bases. 1996: Morgan Kaufman.
18. Kona, H. and S. Chakravarthy, Partitioned Approach to Association Rule Mining over Multiple Databases, Dawak 2004, Zaragoza, Spain, p 320-330.
19. Kona H., "Association Rule Mining Over Multiple Databases: Partitioned and Incremental Approaches", Fall'2003. http://www.cse.uta.edu/Research/Publications/Downloads/CSE-2003-40.pdf

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPP1-0059-0065