Pruning discovered sequential patterns using minimum improvement threshold

Prinke, S.; Wojciechowski, M.; Zakrzewicz, M.

Artykuł - szczegóły

Tytuł artykułu

Pruning discovered sequential patterns using minimum improvement threshold

Autorzy

Prinke S. , Wojciechowski M. , Zakrzewicz M.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Warianty tytułu

Konferencja

ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD'2005) / sympozjum [1st; September 15-16, 2005; Tallinn, Estonia]

Języki publikacji

Abstrakty

Discovery of sequential patterns is an important data mining problem with numerous applications. Sequential patterns are subsequences frequently occurring in a database of sequences of sets of items. In a basic scenario, the goal of sequential pattern mining is discovery of all patterns whose frequency exceeds a user-specified frequency threshold. The problem with such an approach is a huge number of sequential patterns which are likely to be returned for reasonable frequency thresholds. One possible solution to this problem is excluding the patterns which do not provide significantly more information than some other patterns in the result set. Two approaches falling into that category have been studied in the context of sequential patterns: discovery of maximal patterns and closed patterns. Unfortunately, the set of maximal patterns may not contain many important patterns with high frequency, and discovery of closed patterns may not reduce the number of resulting patterns for sparse datasets. Therefore, in this paper we propose and experimentally evaluate the minimum improvement criterion to be used in the post-processing phase to reduce the number of sequential patterns returned to the user. Our method is an adaptation of one of the methods previously proposed for association rules.

Słowa kluczowe

data mining sequential patterns interestingness measures

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2006

Tom

Vol. 31, No. 1

Strony

43--57

Opis fizyczny

Bibliogr. 15 poz.

Twórcy

autor

Prinke S.

autor

Wojciechowski M.

autor

Zakrzewicz M.

Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 2, 60-965 Poznan, Poland

Bibliografia

[1] Agrawal R., Imielinski T,, Swami A, Mining Association Rules Between Sets of Items in Large Databases, in: P. Buneman, S. Jajodia (eds.), Proceedings of the 1993 ACM SIGMOD Conference on Management of Data, ACM Press., Washington, D.C., USA, 1993,207-216.
[2] Agrawal R., Mehta M., Shafer J., Srikant R., Arning A., Bollinger T., The Quest Data Mining System, in: E. Simoudis, J. Hań, U.M. Fayyad (eds.), Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, AAAI Press, Portland, USA, 1996, 244-249.
[3] Agrawal R., Srikant R., Mining Sequential Patterns, in: P.S. Yu, A.L.P. Chen (eds.), Proceedings of the llth International Conference on Data Engineering, IEEE Computer Society, Taipei, Taiwan, 1995, 3-14.
[4] Bayardo R.J., Agrawal R., Gunolupos D., Constraint-based rule mining in large, dense databases, in: Proceedings of the 15th International Conference on Data Engineering, IEEE Computer Society, Sydney, Austrialia, 1999, 188-197.
[5] Garofalakis M., Rastogi R., Shim K., SPIRIT: Sequential Pattern Mining with Regular Expression Constraints, in: M.P. Atkinson, M.E. Orlowska, P. Valduriez, S.B. Zdonik, M.L. Brodie (eds.), Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann, Edinburgh, Scotland, UK, 1999, 223-234.
[6] Hettich S., Bay S. D., The UCI KDD Archive [http://kdd.ics.uci.edu], Irvine, CA: University of California, Department of Information and Computer Science, 1999.
[7] Pasquier N., Bastide Y., Taouil R., Lakhal L., Discovering frequent closed itemsets for association rules, in: C. Been, P. Bruneman (eds.), Proceedings of the 7th International Conference On Database Theory? Springer, Jerusalem, Israel, 1999, 398-416.
[8] Pei J., Hań J., Wang W., Mining sequential patterns with constraints in large databases, in: Proceedings of the llth International Conference on Information and Knowledge Management, ACM Press, McLean, Virginia, USA, 2002, 18-25.
[9] Pei J., Dong G., Zou W., Hań J., Mining Condensed Frequent-Pattern Bases, in: Proceedings of the IEEE 2002 International Conference on Data Mining, IEEE Computer Society, Maebashi City, Japan, 2002, 378-385.
[10] Pei J., Han J., Mortazavi-Asl B., Pinto H., Chen Q., Dayal U., Hsu M-C., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, in: Proceedings of the 17th International Conference on Data Engineering, IEEE Computer Society, Heidelberg, Germany, 2001, 215-224.
[11] Pudi V., Haritsa J.R., Generalized Closed Itemsets for Association Rule Mining, in: U. Dayal, K. Ramamritham, T.M. Vijayaraman (eds.), Proceedings of the 19th International Conference on Data Engineering, IEEE Computer Society, Bangalore, India, 2003, 714-716.
[12] Srikant R., Agrawal R., Mining Sequential Patterns: Generalizations and Performance Improvements, in: P.M.G. Apers, M. Bouzeghoub, G. Gardarin (eds.), Proceedings of the 5th International Conference on Extending Database Technology, Springer, Avignon, France, 1996, 3-17.
[13] Toivonen H., Klernettinen M., Ronkainen P,, Hatonen K., Mannila H., Pruning and grouping discovered association rules, in: MLnet Workshop on Statistics, Machine Learning, and Discovery in Databases, MLnet, Herakllon, Greece, 1995, 47-52.
[14] Wojciechowski M., Interactive Constraint-Based Sequential Pattern Mining, in: A. Caplinskas, J. Eder (eds.), Proceedings of the 5th East European Conference on Advances in Databases and Information Systems, Springer, Vilnius, Lithuania, 2001, 169-181.
[15] Yan X., Hań J., Afshar R., CloSpan: Mining closed sequential patterns in large datasets, in: D. Barbara, C. Kamath (eds.), Proceedings of SIAM International Conference on Data Mining, SIAM, San Francisco, USA, 2003, 166-177.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPP1-0059-0067