Tytuł artykułu
Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Hashing schemes are a common technique to improve the performance in mining not only association rules but also sequential patterns or traversal patters. However, the collision problem in hash schemes may result in severe performance degradation. In this paper, we propose perfect hashing schemes for mining traversal patterns to avoid collisions in the hash table. The main idea is to transform each large itemsets into one large 2-itemset by employing a delicate encoding scheme. Then perfect hash schemes designed only for itemsets of length two, rather than varied lengths, are applied. The experimental results show that our method is more than twice as faster than FS algorithm. The results also show our method is scalable to database sizes. One variant of our perfect hash scheme, called partial hash, is proposed to cope with the enormous memory space required by typical perfect hash functions. We also give a comparison of the performances of different perfect hash variants and investigate their properties.
Słowa kluczowe
Wydawca
Czasopismo
Rocznik
Tom
Strony
185--202
Opis fizyczny
tab., wykr., bibliogr. 21 poz.
Twórcy
autor
autor
autor
- Department of Information Engineering and Computer Science Feng Chia University, 100 Wenhwa Rd., Seatwen, Taichung 40724, Taiwan, R.O.C.
Bibliografia
- [1] R. Agrawal and R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases, Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, pp. 487-499, 1994.
- [2] R. Agrawal and R. Srikant: Mining Sequential Patterns, Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3-14, 1995.
- [3] C. C. Chang: A Scheme for Constructing Ordered Minimal Perfect Hashing Functions, Information Sciences, vol. 39, pp. 187-195, 1986.
- [4] C. C. Chang and R. C. T. Lee: A Letter Oriented Minimal Perfect Hashing Scheme, The Computer Journal, vol. 29, no. 3, pp. 277-281, 1986.
- [5] M. S. Chen, J. S. Park, and P. S. Yu: Efficient Data Mining for Path Traversal Patterns, IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, pp. 209-221, 1998.
- [6] M. Dietzfelbinger,A. Karlin, K.Mehlhorn, and F.M. a. d. Heide: Dynamic Perfect Hashing: Upper and Lower Bounds, SIAM Journal on Computing, vol. 23, no. 4, pp. 738-761, 1994.
- [7] M. Fredman, J. Komlos, and E. Szemeredi: Storing a Sparse Table with O(1)Worst Case Access Time, Journal of the ACM, vol. 31, no. 3, pp. 538-544, 1984.
- [8] V. Ganti, J. Gehrke, and R. Ramakrishnan: DEMON: Mining and Monitoring Evolving Data, IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 1, pp. 50-63, 2001.
- [9] M. Garofalakis, R. Rastogi, and K.Shim: Mining Sequential Patterns with Regular Expression Constraints, IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 530-552, 2002.
- [10] J. Han, J. Pei, and Y. Yin: Mining Frequent Patterns without Candidate Generation, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, pp. 16-18, 2000.
- [11] J. Han, J. Pei, Y. Yin, and R. Mao: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach, Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004.
- [12] G. J. Hwang, W. F. Tsai, and J. C. R. Tseng: A Minimal Perfect Hashing Approach for Mining Association Rules from Very Large Databases, The 6th IASTED International Conference on Internet and Multimedia Systems and Applications, Kaua'i, Hawaii, USA, pp. 80-85, 2002.
- [13] D. R. S. M. Atici and R. Wei: A New Practical Algorithm for the Construction of a Perfect Hash Function, Journal of Combinatorial Mathematics and Combinatorial Computing, vol. 35, pp. 127-145, 2000.
- [14] H. Mannila and H. Toivonen: Discovering Generalized Episodes Using Minimal Occurrences, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 146-151, 1996.
- [15] H. Mannila, H. Toivonen, and A. I. Verkamo: Discovering Frequent Episodes in Sequences, Proceedings of the First International Conference on Knowledge Discovery and DataMining,Montreal, Canada, pp. 210-215, 1995.
- [16] H. Mannila, H. Toivonen, and A. I. Verkamo: Discovery of Frequent Episodes in Event Sequences, Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, 1997.
- [17] J. S. Park, M. S. Chen, and P. S. Yu: Using a Hash-Based Method with Transaction Trimming for Mining Association Rules, IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 5, pp. 813-825, 1997.
- [18] Y. Xiao and M. H. Dunham: EfficientMining of Traversal Patterns, Data & Knowledge Engineering, vol. 39, no. 2, pp. 191-214, 2001.
- [19] A. H. Youssefi, D. J. Duke, and M. J. Zaki: Visual Web Mining, Proceedings of the 13th international conference on World Wide Web, New York, NY, pp. 394-395, 2004.
- [20] M. J. Zaki: SPADE: An Efficient Algorithm forMining Frequent Sequences,Machine Learning Journal, vol. 42, pp. 31-60, 2001.
- [21] Z. Zheng, R. Kohavi, and L. Mason: Real World Performance of Association Rule Algorithms, Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discovery and DataMining, New York, pp. 401-406, 2001.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS2-0009-0054