Efficiently Mining Sequential Generator Patterns Using Prefix Trees

Pham, T.-T.

doi:10.3233/FI-2015-1217

Artykuł - szczegóły

Tytuł artykułu

Efficiently Mining Sequential Generator Patterns Using Prefix Trees

Autorzy

Pham T.-T.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2015-1217

Warianty tytułu

Języki publikacji

Abstrakty

Mining long frequent sequences that contain a combinatorial number of frequent subsequences or using very low support thresholds to mine sequential patterns is usually both time- and memory-consuming. The mining of closed sequential patterns, sequential generator patterns, and maximum sequences has been proposed to overcome this problem. Sequential generator patterns, when used together with closed sequential patterns, can provide additional information that closed sequential patterns alone cannot provide. Mining sequential generator patterns is thus an important task in data mining as well. This paper proposes an algorithm called MSGP-PreTree for mining all sequential generator patterns based on the prefix-tree structure. The algorithm uses the characteristics of sequential generator patterns and sequence extensions to efficiently perform a depth-first search on a prefix tree. It also uses a vertical approach to list and count the supports of sequences based on the prime block encoding approach for representing candidate sequences and determining the frequencies of candidates. Besides, the search space of the MSGP-PreTree algorithm is much smaller than those of other algorithms because two pruning strategies are applied. Experimental results conducted on synthetic and real databases show that the proposed algorithm is effective than a previous one.

Słowa kluczowe

sequential pattern sequential generator pattern sequence database prefix tree

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2015

Tom

Vol. 138, nr 3

Strony

373--386

Opis fizyczny

Bibliogr. 34 poz., tab., wykr.

Twórcy

autor

Pham T.-T.

phamthithiet@hui.edu.vn

Faculty of Information Technology Industrial University of Ho Chi Minh City Ho Chi Minh City, Viet Nam

Bibliografia

[1] Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H.,, Arikawa S.: Efficient Substructure Discovery from Large Semi-Structured Data. In Proc. Of the second SIAM Intl Conf. Data Mining, pp. 158-174, 2002.
[2] Agrawal, R., Srikant, R.: Mining sequential patterns. In Proc. of 11th International Conference on Data Engineering, Taipei, Taiwan, pp. 3-14, 1995.
[3] Ayres J., Gehrke J., Yiu T., Flannick J.: Sequential pattern mining using a bitmap representation. In Proc. of ACM SIGKDD International Conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada, pp. 429-435, 2002.
[4] Berry M.J., Linoff G.: Data mining techniques for marketing, sales and customer support. John Wiley & Sons,1997.
[5] Brin S., Motwani R., Ullman J., Tsur S.: Dynamic itemset counting and implication rules for market basket data. In Proc. of the 1997 ACM SIGMOD International Conference on the Management of Data, pp. 255-264, 1997.
[6] Chang Y.I., Wu C.C, Chen J.R, Jeng Y.H.: Mining sequential motifs from protein databases based on a bit pattern approach. International Journal of Innovative Computing, Information and Control, 8(1B), pp.647 - 657, 2012.
[7] Chang L., Wang T., Yang D., Luan H., Tang S.: Efficient algorithms for incremental maintenance of closed sequential patterns in large databases. Data and Knowledge Engineering, 68(1), pp. 68-106, 2009.
[8] Chen, Y., Guo, J., Wang, Y., Xiong, Y., Zhu, Y.: Incremental mining of sequential patterns using prefix tree. Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science Volume 4426, pp. 433-440, 2007.
[9] Gao C.C., Wang J.Y., He Y.K., Zhou L.Z.: Efficient mining of frequent sequence generators. In Proc. of the 17th international conference on World Wide Web, Beijing, China, pp. 1051-1052, 2008.
[10] Garofalakis ,M. N., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. In Proc. of the 25th International Conference on Very Large Data Bases (VLDB ’99), pp. 223 - 234, 1999.
[11] Gouda K., Hassaan M., Zaki M.J.: PRISM: A primal-encoding approach for frequent sequence mining. Journal of Computer and System Sciences, 76(1), pp. 88-102, 2010.
[12] Hamedanian, M., Nadimi, M., Naderi, M.: An Efficient Prefix Tree for Incremental Frequent Pattern. International Journal of Information and Communication Technology Research, 3(2), pp. 49-55, 2013.
[13] Huan, J.,Wang,W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In Proc of the 3rd IEEE Intl. Conf. on Data Mining ICDM, Piscataway, NJ, USA, IEEE Press, pp. 549-552, 2003.
[14] Huang G., Yang F., Hu C., Ren J.: Fast discovery of frequent closed sequential patterns based on positional data. In Proc. of the 9th International Conference on Machine Learning and Cybernetics, Qingdao, pp. 444 - 449, 2010.
[15] Jesus A.-F., Nicolo F.-P., Andrea B., Francisco H.: Analysis of the effectiveness of the genetic algorithms based on extraction of association rules. Fundamenta Informaticae, 98(1), pp. 1-14, 2010.
[16] Li J., Li H., Wong L., Pei J., G. Dong: Minimum description length (MDL) principle: Generators are preferable to closed patterns. In Proc. of the 21th National Conference on Artificial Intelligence (AAAI’06), Boston, Massachusetts, USA, pp. 409 - 414, 2006.
[17] Lin, M.-Y., Lee, S.-Y.: Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning. Journal Of Information Science And Engineering, 21, pp. 109-128, 2005.
[18] Lo D., Khoo S.-C., Li J.: Mining and ranking generators of sequential patterns. SIAM Conference on Data Mining (SDM 2008), Atlanta, Georgia, USA, pp. 553-564, 2008.
[19] Luo C., Chung S.M.: A scalable algorithm for mining maximal frequent sequences using a sample. Knowledge and Information Systems, 15(2), pp. 149-179, 2008.
[20] Masseglia, F., Cathala, F., Poncelet, P.: The PSP Approach for Mining Sequential Patterns. In Proc. of the Second European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD ’98), pp. 176-184, 1998.
[21] Nguyen L.T.T, Vo B., Hong T.P., Thanh H.C.: Classification based on association rules: A lattice-based. Expert Systems with Applications, 39(13), pp. 11357-11366 (2012).
[22] Pandya, M., Trikha, P.: an efficient prefix tree structure to extract frequent pattern. International Journal of Advances in Engineering & Technology, 6(3), pp. 1220-1227, 2013.
[23] Pei J., et al: Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Transaction on Knowledge and Data Engineering, 16(10), pp. 1424-1440, 2004.
[24] Srikant R., Agrawal R.: Mining sequential patterns: generalizations and performance improvements. In Proc. of 5th International Conference on Extending Database Technology, pp.3-17, (1996).
[25] Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Efficient single-pass frequent pattern mining using a prefix-tree. Information Sciences, 179(5), pp. 559-583, 2008.
[26] Vo B., Le B.: Interestingness measures for association rules: Combination between lattice and hash tables. Expert Systems with Applications, 38(9), pp. 1630-11 640 (2011).
[27] Wang Y.-T., Lee A.J.T.: Mining Web navigation patterns with a path traversal graph. Expert Systems with Applications, 38(6), pp. 7112-7122 (2011).
[28] Wang J., Han J.: BIDE: Efficient mining of frequent closed sequences. In Proc. of the 20th International Conference on Data Engineering (ICDE’95), IEEE Computer Society Press, pp. 79-91, 2004.
[29] Wang J., Han J., Li C.: Frequent closed sequence mining without candidate maintenance. IEEE Transaction on Knowledge and Data Engineering, 19(8), pp. 1024-1056, 2007.
[30] Yan, X., Han, J.. gSpan: Graph-based substructure pattern mining. In the ICDM ’02 Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 721-724, 2002.
[31] Yan X., Han J., Afshar R.: CloSpan: Mining closed sequential patterns in large datasets. In Proc. of the 3th SIAM International Conference on Data Mining, San Francisco, CA, USA: SIAM Press, pp. 166-177, 2003.
[32] Yia S.W., Zhao T.H., Zhang Y.Y., Ma S.L., Che Z.B.: An effective algorithm for mining sequential generators. In Proc. of 2011 International Conference on Advanced in Control Engineering and Information Science (CEIS 2011), 15, pp. 3653 - 3657, 2011.
[33] Zaki M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42(1-2), pp. 31-60, 2001.
[34] Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest. Proc of the Int. Conf. Knowledge Discovery and Data Mining (SIGKDD02), pp. 71-80, 2002.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-e7e660c2-8e57-494f-82e4-d2a9a29e3ab3