Integration of candidate hash trees in concurrent processing of frequent itemset queries using Apriori

Grudziński, P.; Wojciechowski, M.

Artykuł - szczegóły

Tytuł artykułu

Integration of candidate hash trees in concurrent processing of frequent itemset queries using Apriori

Autorzy

Grudziński P. , Wojciechowski M.

Treść / Zawartość

Pełne teksty:

http://matwbn.icm.edu.pl/ksiazki/cc/cc38/cc3814.pdf [zdalny]

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. In this paper we address the problem of processing batches of frequent itemset queries using the Apriori algorithm. The best solution of this problem proposed so far is Common Counting, which consists in concurrent execution of the queries using Apriori with the integration of scans of the parts of the database shared among the queries. In this paper we propose a new method - Common Candidate Tree, offering a more tight integration of the concurrently processed queries by sharing memory data structures, i.e., candidate hash trees. The experiments show that Common Candidate Tree outperforms Common Counting in terms of execution time. Moreover, thanks to smaller memory consumption, Common Candidate Tree can be applied to larger batches of queries.

Słowa kluczowe

data mining frequent itemset mining data mining queries

Wydawca

Systems Research Institute, Polish Academy of Sciences

Czasopismo

Control and Cybernetics

Rocznik

2009

Tom

Vol. 38, no 1

Strony

47--65

Opis fizyczny

Bibliogr. 24 poz., wykr.

Twórcy

autor

Grudziński P.

autor

Wojciechowski M.

Adam Mickiewicz University, Faculty of Mathematics and Computer Science, Umultowska 87, 61-614 Poznań, Poland

Bibliografia

AGRAWAL, R., IMIELINSKI, T., and SWAMI, A. (1993) Mining Association Rules Between Sets of Items in Large Databases. In: P. Buneman and S. Jajodia, eds., Proceedings of the 1993 ACM SIGMOD Int’l Conf. on Management of Data. ACM Press, New York, 207-216.
AGRAWAL, R., MEHTA, M., SHAFER, J., SRIKANT, R., ARMING, A. and BOLLINGER, T. (1996) The Quest Data Mining System. In: E. Simoudis, J. Han and U.M. Fayyad, eds., Proc. of the 2nd Int’l Conf. on Knowledge Discovery in Databases and Data Mining. AAAI Press, Portland, Oregon, 244-249.
AGRAWAL, R. and SRIKANT, R. (1994) Fast Algorithms for Mining Association Rules. In: J.B. Bocca, M. Jarke and C. Zaniolo, eds., Proc. of the 20th Int’l Conf. on Very Large Data Bases. Morgan Kaufmann, 487-499.
ALSABBAGH, J.R. and RAGHAVAN, V.V. (1994) Analysis of common subexpression exploitation models in multiple-query processing. Proceedings of the 10th International Conference on Data Engineering. IEEE Computer Society, 488-497.
BARALIS, E. and PSAILA, G. (1999) Incremental Refinement of Mining Queries. In: M.K. Mohania and A.M. Tjoa, eds., Data Warehousing and Knowledge Discovery. Proceedings of the 1st DaWaK Conference. LNCS 1676, Springer, 173-182.
BLOCKEEL, H., DEHASPE, L., DEMOEN, B., JANSSENS, G., RAMON, J. and VANDECASTEELE, H. (2002) Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs. Journal of Artificial Intelligence Research 16, 135-166.
BOIŃSKI, P., WOJCIECHOWSKI, M. and ZAKRZEWICZ, M. (2006) A Greedy Approach to Concurrent Processing of Frequent Itemset Queries. In: A.M. Tjoa and J. Trujillo, eds., Data Warehousing and Knowledge Discovery. Proc. of the 8th DaWaK Conference. LNCS 4081, Springer, 292-301.
CHEUNG, D.W., HAN, J., NG, V. and WONG, C.Y. (1996) Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. In: S.Y.W. Su, ed., Proceedings of the 12th ICDE Conference. IEEE Computer Society, 106-114.
HAN, J., PEI, J. and YIN, Y. (2000) Mining frequent patterns without candidate generation. In: W. Chen, J.F. Naughton and P.A. Bernstein, eds., Proceedings of the 2000 ACM SIGMOD Int’l Conference on Management of Data. ACM Press, 1-12.
IMIELINSKI, T. and MANNILA, H. (1996) A Database Perspective on Knowledge Discovery. Communications of the ACM 39 (11), 58-64.
JARKE, M. (1985) Common subexpression isolation in multiple query optimization. In: W. Kim, D.S. Reiner and D.S. Batory, eds., Query Processing in Database Systems. Springer, 191-205.
JEUDY, B. and BOULICAUT, J-F. (2002) Using condensed representations for interactive association rule mining. In: T. Elomaa, H. Mannila and H. Toivonen, Principles of Data Mining and Knowledge Discovery. 6th European Conference. LNCS 2431, Springer, 225-236.
JIN, R., SlNHA, K. and AGRAWAL, G. (2005) Simultaneous Optimization of Complex Mining Tasks with a Knowledgeable Cache. In: R. Grossman, R.J. Bayardo and K.P. Bennett, eds., Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 600-605.
MEO, R. (2003) Optimization of a Language for Data Mining. Proceedings of the ACM Symposium on Applied Computing. ACM Press, 437-444.
MORZY. T., WOJCIECHOWSKI, M. and ZAKRZEWICZ, M. (2000) Materialized Data Mining Views. In: D.A. Zighed, H.J. Komorowski and J.M. Zytkow, eds., Principles of Data Mining and Knowledge Discovery. 4th European Conference. LNCS 1910, Springer, 65-74.
NAG, B., DESHPANDE, P.M. and DEWITT, D.J. (1999) Using a Knowledge Cache for Interactive Discovery of Association Rules. Proceedings of the 5th A CM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 244-253.
PEI, J. and HAN, J. (2000) Can We Push More Constraints into Frequent Pattern Mining? Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 350-354.
ROY, P., SESHADRI, S., SUNDARSHAN, S. and BHOBE, S. (2000) Efficient and Extensible Algorithms for Multi Query Optimization. In: W. Chen, J.F. Naughton, P.A. Bernstein, eds., Proceedings of the 2000 ACM SIG-MOD Int’I Conference on Management of Data. ACM, 249-260.
SELLIS, T. (1988) Multiple-query optimization. ACM Transactions on Database Systems 13, 1, 23-52.
WOJCIECHOWSKI, M., GAŁĘCKI, K. and GAWRONEK, K. (2005) Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm. In: T. Morzy, M. Morzy and M. Wojciechowski, eds., Proceedings of the 1st ADBIS Workshop on Data Mining and Knowledge Discovery. Publishing House of Poznan University of Technology, 35-46.
WOJCIECHOWSKI, M., GAŁĘCKI, K. and GAWRONEK, K. (2007) Three Strategies for Concurrent Processing of Frequent Itemset Queries Using FP-growth. In: S. Dzeroski, J. Struyf, eds., Knowledge Discovery in Inductive Databases, 5th International Workshop, KDID 2006. LNCS 4747, Springer, 240-258.
WOJCIECHOWSKI, M. and ZAKRZEWICZ, M. (2003) Evaluation of Common Counting Method for Concurrent Data Mining Queries. In: L.A. Kalinichenko et al., eds., Advances in Databases and Information Systems. 7th East European Conference, ADBIS 2003. LNCS 2798, Springer, 76-87.
WOJCIECHOWSKI, M. and ZAKRZEWICZ, M. (2005) On Multiple Query Optimization in Data Mining. In: T.B. Ho, D.W. Cheung and H. Liu, eds., Advances in Knowledge Discovery and Data Mining. 9th Pacific-Asia Conference, PAKDD 2005. LNCS 3518, Springer, 696-701.
ZHENG, Z., KOHAVI, R. and MASON, L. (2001) Real world performance of association rule algorithms. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 401-406.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BAT5-0036-0025