Mop: An Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing

Deng, Z-H.; Gao, N.; Xu, X-R.

Artykuł - szczegóły

Tytuł artykułu

Mop: An Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing

Autorzy

Deng Z-H. , Gao N. , Xu X-R.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Mining frequent patterns in database has emerged as an important task in knowledge discovery and data mining. In this paper, we present an efficient algorithm called Mop for fast frequent pattern discovery. Mop utilizes a new kind of data structure called OP tree (ordered pattern tree) and some particular properties of frequent patterns to facilitate the process of mining frequent patterns. An OP tree is a special frequent pattern tree, where the children of any node are sorted according to the supports of corresponding items. Efficiency of Mop is achieved with three techniques: (1) it adopts OP tree to store a large database to avoid repetitive database scans, (2) it finds all frequent 2-patterns in the construction of OP tree to avoid the costly generation of a large number of candidate 2-patterns, (3) the supports of candidate k-patterns (k>2) can be obtained by traversing a few of specific subtrees of the OP tree, which greatly reduces the search space and avoid multi-scans of a database. We experimentally compare our algorithm with the Apriori algorithm and the FP-growth algorithm on one real database and one synthetical database. The experimental results show that Mop is about an order of magnitude faster than the Apriori algorithm. Mop also outperforms the FP-growth algorithm, especially when support threshold is very low and databases are quite large.

Słowa kluczowe

data mining frequent patterns ordered pattern trees

Wydawca

Polskie Towarzystwo Matematyczne

Czasopismo

Fundamenta Informaticae

Rocznik

2011

Tom

Vol. 111, nr 4

Strony

373--390

Opis fizyczny

Bibliogr. 28 poz., tab., wykr.

Twórcy

autor

Deng Z-H.

autor

Gao N.

autor

Xu X-R.

Room 2318, Science Building 2, Peking University, Beijing, China, zhdeng@cis.pku.edu.cn

Bibliografia

[1] Agrawal R, Imielinski T, and Swami A. Mining Association Rules between Set of Items in Large Databases. In proceedings of 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD'93), pp. 207-216.
[2] Brin S, Motwani R, and Silverstein C. Beyond market basket: Generalizing association rules to correlations. In proceedings of 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD'97), pp. 265-276.
[3] Agrawal R and Srikant R. Mining sequential patterns. In Proceedings of 1995 International Conference on Data Engineering (ICDE'95), pp. 3-14.
[4] Han J, Dong G, and Yin Y. Efficient mining of partial periodic patterns in time series database. In Proceedings of 1999 International Conference on Data Engineering (ICDE'99), pp. 106-115.
[5] Liu B, Hsu W, and Ma Y. Integrating classification and association rule mining. In Pro-ceedings of 1998 International Conference on Knowledge Discovery and Data Mining (KDD'98), pp. 80-86.
[6] Agrawal R and Srikant R. Fast algorithm for mining Association rules. In Proceedings of 1994 International Conference on Very Large Data Bases (VLDB'94), pp. 487-499.
[7] Han J and Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001.
[8] Savasere A, Omiecinski E, and Navathe S. An efficient algorithm for mining association rules in large databases. In Proceedings of 1995 International Conference on Very Large Data Bases (VLDB'95), pp. 432-443.
[9] Park J S, Chen M S, and Yu P S. Using a hash-based method with transaction trimming for mining association rules. IEEE Transactions on Knowledge and Data Engineering, 1997, 9(5): 813-824.
[10] Lent B, Swami A, and Widom J. Clustering association rules. In Proceedings of 1997 International Conference on Data Engineering (ICDE'97), pp. 220-231.
[11] Sarawagi S, Thomas S, and Agrawal R. Integrating association rule mining with relational database systems: Alternatives and implications. In Proceedings of 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD'98), pp. 343-354.
[12] Ng R, Lakshmanan L V S, Han J, and Pang A. Exploratory mining and pruning optimizations of constrained associations rules. In Proceedings of 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD'98), pp. 13-24.
[13] Grahne G., Lakshmanan L, and Wang X. Efficient mining of constrained correlated sets. In Proceeding of 2000 International Conference on Data Engineering (ICDE'00), pp. 512-521.
[14] Shenoy P, Haritsa J R, Sundarshan S, Bhalotia G., Bawa M, and Shah D. Turbo-Charging Vertical Mining of Large Databases. In Proceedings of 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD'00), pp. 22-33.
[15] Zaki M J. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 2000,12(3): 372-390.
[16] Zaki M J and Gouda K. Fast vertical mining using diffsets. In Proceedings of 2003 International Conference on Knowledge Discovery and Data Mining (SIGKDD'03), pp. 326-335.
[17] Han J, Pei J, and Yin Y.Mining frequent patterns without candidate generation. In proceedings of 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD'00), pp. 1-12.
[18] Liu G, Lu H, Xu Y, Yu J X. Ascending Frequency Ordered Prefix-tree: EfficientMining of Frequent Patterns. In Proceedings of 2003 International Conference on Database Systems for Advanced Applications (DASFAA'03), pp: 65-72
[19] Han J, Pei J, Yin Y, andMao R. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery, 2004, 8: 53-87.
[20] Woon Y K, Ng W K, and Lim E P. A Support-Ordered Trie for Fast Frequent Itemset Discovery. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(7): 875-879.
[21] http://www.almaden. ibm.com/cs/quest/syndata.html.
[22] http://fimi.cs.helsinki.fi.
[23] Brijs T, Swinnen G, Vanhoof K, and Wets G. The use of association rules for product assortment decisions: a case study. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (1999), pp. 254-260.
[24] Ferenc Bodon, A fast APRIORI implementation, IEEE ICDMWorkshop on Frequent ItemsetMining Implementations (FIMI'03), 2003.
[25] Zaki M J and Hsiao C J. Efficient Algorithm for Mining Closed Itemsets and Their Lattice Structure. IEEE Transactions on Knowledge and Data Engineering, 2005,17(4): 462-478.
[26] Zaki M J and Hsiao C J. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of SDM'02, pp. 12-28.
[27] Wang J Y, Han J, and Pei J. CLOSET+: Searching for the Best Strategies for Mining Fre-quent Closed Itemsets. In Proceedings of 2003 International Conference on Knowledge Discovery and Data Mining (SIGKDD'03), pp. 236-245.
[28] Wang J Y, Han J, Lu Y, and Tzvetkov P. TFP: An Efficient Algorithm for Mining Top-k Frequent Closed Itemsets. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(5): 652-664

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS8-0021-0012