Accuracy of generalized context patterns in the context based sequential patterns mining
Treść / Zawartość
A context pattern is a frequent subsequence mined from the context database containing set of sequences. This kind of sequential patterns and all elements inside them are described by additional sets of context attributes e.g. continuous ones. The contexts describe circumstances of transactions and sources of sequential data. These patterns can be mined by an algorithm for the context based sequential pattern mining. However, this can create large sets of patterns because all contexts related to patterns are taken from the database. The goal of the generalization method is to reduce the context pattern set by introducing a more compact and descriptive kind of patterns. This is achieved by finding clusters of similar context patterns in the mined set and transforming them to a smaller set of generalized context patterns. This process has to retain as much as possible information from the mined context patterns. This paper introduces a definition of the generalized context pattern and the related algorithm. Results from the generalization may differ as depending on the algorithm design and settings. Hence, generalized patterns may reflect frequent information from the context database differently. Thus, an accuracy measure is also proposed to evaluate the generalized patterns. This measure is used in the experiments presented. The generalized context patterns are compared to patterns mined by the basic sequential patterns mining with prediscretization of context values.
Bibliogr. 15 poz., il.
- Agrawal, R. and Srikant, R. (1995) Mining sequential patterns. Proceedings of the 11th International Conference on Data Engineering. IEEE Computer Society, 3-14.
- Guha, S., Rastogi, R. and Shim, K. (2000) ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Systems, 25, 345-366.
- Han, J., Pei, J., Mortazavi-Asl,B., Chen,Q., Dayal,U. and Hsu,M.-C. (2001) PrefixSpan: Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth. Proceedings of the 17th International Conference on Data Engineering, IEEE Computer Society, 215-224.
- Morzy,T.,Wojciechowski,M. and Zakrzewicz,M. (1999) Pattern-Oriented Hierarchical Clustering. Proceedings of the third East-European Symposium on Advances in Databases and Information Systems - ADBIS’99, Slovenia, LNCS 1691, 179-190.
- Ng, R.T. and Han, J. (2002) CLARANS: A Method for Clustering Objects for Spatial Data Mining. IEEE Transactions on Knowledge and Data Engineering, 14, 1003-1016.
- Pilevar, A.H. and Sukumar, M. (2005) GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases. Pattern Recognition Letters, 26, 999-1010.
- Pinto, H., Han, J., Pei, J., Wang, K., Chen, Q. and Dayal, U. (2001) Multi-dimensional sequential pattern mining. Proceedings of the 10th International Conference on Information and Knowledge Management, ACM, 81-88.
- Plantevit, M., Choong, Y., Laurent, A., Laurent, D. and Teisseire, M. (2005)M2SP:Mining Sequential Patterns Among Several Dimensions. LNAI 3721, Springer, 205-216.
- Plantevit,M., Laurent,A. and Teisseire,M. (2008) Up and Down: Mining Multidimensional Sequential Patterns Using Hierarchies. Data Warehousing and Knowledge Discovery. LNCS 5182, Springer, 156-165.
- Srikant, R. and Agrawal, R. (1996) Mining Sequential Patterns: Generalizations and Performance Improvements. Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, LNCS 1057, Springer-Verlag, 3-17.
- Stefanowski, J. and Ziembiński, R. (2005) Mining Context Based Sequential Patterns. Proceedings of the 3rd International Atlantic Web Intelligence Conference: Advances in Web Intelligence, LNCS 3528, Springer, 401-407.
- Stefanowski, J. and Ziembiński, R. (2009) An Experimental Evaluation of Two Approaches to Mining Context Based Sequential Patterns. Control and Cybernetics, 31 (1), 27-45.
- Yang, Y., Guan, X. and You, J. (2002) CLOPE: a fast and effective clustering algorithm for transactional data. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 682-687.
- Yang, J. and Wang, W. (2003) CLUSEQ: Efficient and Effective Sequence Clustering. Proceedings of the 19th International Conference on Data Engineering, IEEE Press, 101-112.
- Ziembiński, R. (2007) Algorithms for Context Based Sequential Pattern Mining. Fundamenta Informaticae, 76 (4), 495-510.