Perfect Hashing Schemes for Mining Traversal Patterns

Chang, C-C.; Lin, C-Y.; Chou, H.

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Perfect Hashing Schemes for Mining Traversal Patterns

Autorzy

Chang C-C. , Lin C-Y. , Chou H.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Hashing schemes are a common technique to improve the performance in mining not only association rules but also sequential patterns or traversal patters. However, the collision problem in hash schemes may result in severe performance degradation. In this paper, we propose perfect hashing schemes for mining traversal patterns to avoid collisions in the hash table. The main idea is to transform each large itemsets into one large 2-itemset by employing a delicate encoding scheme. Then perfect hash schemes designed only for itemsets of length two, rather than varied lengths, are applied. The experimental results show that our method is more than twice as faster than FS algorithm. The results also show our method is scalable to database sizes. One variant of our perfect hash scheme, called partial hash, is proposed to cope with the enormous memory space required by typical perfect hash functions. We also give a comparison of the performances of different perfect hash variants and investigate their properties.

Słowa kluczowe

data mining traversal patterns perfect hashing performance analysis

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2006

Tom

Vol. 70, nr 3

Strony

185--202

Opis fizyczny

tab., wykr., bibliogr. 21 poz.

Twórcy

autor

Chang C-C.

autor

Lin C-Y.

autor

Chou H.

Department of Information Engineering and Computer Science Feng Chia University, 100 Wenhwa Rd., Seatwen, Taichung 40724, Taiwan, R.O.C.

Bibliografia

[1] R. Agrawal and R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases, Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, pp. 487-499, 1994.
[2] R. Agrawal and R. Srikant: Mining Sequential Patterns, Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3-14, 1995.
[3] C. C. Chang: A Scheme for Constructing Ordered Minimal Perfect Hashing Functions, Information Sciences, vol. 39, pp. 187-195, 1986.
[4] C. C. Chang and R. C. T. Lee: A Letter Oriented Minimal Perfect Hashing Scheme, The Computer Journal, vol. 29, no. 3, pp. 277-281, 1986.
[5] M. S. Chen, J. S. Park, and P. S. Yu: Efficient Data Mining for Path Traversal Patterns, IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, pp. 209-221, 1998.
[6] M. Dietzfelbinger,A. Karlin, K.Mehlhorn, and F.M. a. d. Heide: Dynamic Perfect Hashing: Upper and Lower Bounds, SIAM Journal on Computing, vol. 23, no. 4, pp. 738-761, 1994.
[7] M. Fredman, J. Komlos, and E. Szemeredi: Storing a Sparse Table with O(1)Worst Case Access Time, Journal of the ACM, vol. 31, no. 3, pp. 538-544, 1984.
[8] V. Ganti, J. Gehrke, and R. Ramakrishnan: DEMON: Mining and Monitoring Evolving Data, IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 1, pp. 50-63, 2001.
[9] M. Garofalakis, R. Rastogi, and K.Shim: Mining Sequential Patterns with Regular Expression Constraints, IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 530-552, 2002.
[10] J. Han, J. Pei, and Y. Yin: Mining Frequent Patterns without Candidate Generation, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, pp. 16-18, 2000.
[11] J. Han, J. Pei, Y. Yin, and R. Mao: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach, Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004.
[12] G. J. Hwang, W. F. Tsai, and J. C. R. Tseng: A Minimal Perfect Hashing Approach for Mining Association Rules from Very Large Databases, The 6th IASTED International Conference on Internet and Multimedia Systems and Applications, Kaua'i, Hawaii, USA, pp. 80-85, 2002.
[13] D. R. S. M. Atici and R. Wei: A New Practical Algorithm for the Construction of a Perfect Hash Function, Journal of Combinatorial Mathematics and Combinatorial Computing, vol. 35, pp. 127-145, 2000.
[14] H. Mannila and H. Toivonen: Discovering Generalized Episodes Using Minimal Occurrences, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 146-151, 1996.
[15] H. Mannila, H. Toivonen, and A. I. Verkamo: Discovering Frequent Episodes in Sequences, Proceedings of the First International Conference on Knowledge Discovery and DataMining,Montreal, Canada, pp. 210-215, 1995.
[16] H. Mannila, H. Toivonen, and A. I. Verkamo: Discovery of Frequent Episodes in Event Sequences, Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, 1997.
[17] J. S. Park, M. S. Chen, and P. S. Yu: Using a Hash-Based Method with Transaction Trimming for Mining Association Rules, IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 5, pp. 813-825, 1997.
[18] Y. Xiao and M. H. Dunham: EfficientMining of Traversal Patterns, Data & Knowledge Engineering, vol. 39, no. 2, pp. 191-214, 2001.
[19] A. H. Youssefi, D. J. Duke, and M. J. Zaki: Visual Web Mining, Proceedings of the 13th international conference on World Wide Web, New York, NY, pp. 394-395, 2004.
[20] M. J. Zaki: SPADE: An Efficient Algorithm forMining Frequent Sequences,Machine Learning Journal, vol. 42, pp. 31-60, 2001.
[21] Z. Zheng, R. Kohavi, and L. Mason: Real World Performance of Association Rule Algorithms, Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discovery and DataMining, New York, pp. 401-406, 2001.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS2-0009-0054