An empirical study of context based sequential pattern mining algorithms efficiency

Ziembiński, R.

Artykuł - szczegóły

Tytuł artykułu

An empirical study of context based sequential pattern mining algorithms efficiency

Autorzy

Ziembiński R.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Methods of patterns detection in the sets of data are useful and demanded tools in a knowledge discovery process. The problem of searching patterns in set of sequences is named Sequential Patterns Mining. It can be defined as a way of finding frequent subsequences in the sequences database. The patterns selection procedure may be simply understood. Every subsequence must be enclosed in the required number of sequences from the database at least to become a pattern. The number of a pattern enclosing sequences is called a pattern support. The process of finding patterns may look trivial but its efficient solution is not. The efficiency plays a crucial role if the required support is lowered. The number of mined patterns may grow exponentially. Moreover, the situation may change if the problem of Sequential Patterns Mining will be extended further. In the classic definition the sequence is a list of ordered elements containing only non-empty sets of items. The Context Based Sequential Patterns Mining adds uniform and multi-attribute contexts (vectors) to the elements of the sequence and the sequence itself. Introducing contexts significantly enlarges the problem search space. However, it brings some additional occasions to constrain the mining process, too. This enhancement requires new algorithms. Traditional ones are not able to cope with non-nominal data directly. Algorithms derived straightly from traditional algorithms were verified to be inefficient. This study evaluates efficiency of novel ContextMapping and ContextMappingHeuristic algorithms. These innovative algoritnms are designed to solve the problem of Context Based Sequential Pattern Mining. This study answers in what scope the algorithms parameterization impacts on mining costs and accuracy. It also refers the modified problem to the traditional one pointing at the common and uncommon properties and drawing perspective for further research.

Słowa kluczowe

knowledge discovery sequence database context based sequential pattern mining context attributes

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2007

Tom

Vol. 32, No. 1

Strony

63--84

Opis fizyczny

Bibliogr. 14 poz.

Twórcy

autor

Ziembiński R.

Poznań University of Technology, Laboratory of Intelligent Decision Support Systems, ul. Piotrowo 2, 60-965 Poznań, radoslaw.ziembinski@cs.put.poznan.pl

Bibliografia

[1] Agrawal R., Srikant R.: „Fast algorithm for mining association rules" In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 487-499, Santiago, Chile. Sept. 1994.
[2] Agrawal R., Srikant R.: „Mining sequential patterns" In Proc. 1995 Int. Conf. Data Engineering (1CDE'95), pages 3-14, Taipei, Taiwan, Mar. 1995.
[3] Han J., Pei J., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.C.: „Freespan: Frequent pattern-projected sequential pattern mining" In. Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD'OO), pages 355-359, Boston, MA, Aug. 2000.
[4] Han J., Pei J., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.C.: „Prefixspan: Minning sequential patterns efficiently by prefix-projected pattern growth" In. Proc. 2001 Int. Conf. Data Engineering (ICDE'Ol), pages 215-224, Heidelberg, Germany, Apr. 2001.
[5] Yang Z., Wang Y., Kitsuregawa M.; „LAPIN: Effective Sequential Pattern Mining Algorithms by Last Position Induction" Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba, Meguro-Ku, Tokyo 153-8305, Japan, 2005.
[6] Stefanowski J., Ziembiński R.: „Mining Context Based Sequential Patterns" in Szczepaniak P., Kacprzyk J., Niewiadornski A. „Advances in Web Intelligence", Proceedings of the Third International Atlantic Web Intelligence Conference, Lodz, June 2005, Lecture Notes in Artificial Intelligence: vol. 3528, Springer-Verlag, 2005, sir. 401-407.
[7] Weiss G. M.: „ Timeweaver: a Genetic Algorithm for Identifying Predictive Patterns in Sequences of Events" Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann, Orlando, Florida, USA, 1999.
[8] Pitkow J. E., Pirolli P.: „Mining Longest Repeating Subsequences to Predict World Wide Web Surfing" USENIX Symposium on Internet Technologies and Systems, 1999.
[9] Mobasher B., Dai H., Luo T., Nakagawa M. : „ Using sequential and non-sequential patterns in predictive web usage mining tasks" In Proceedings of the IEEE International Conference on Data Mining (ICDM'02), Maebashi City, Japan, 2002.
[10] Ali K., Manganaris S., Srikant R.: „Partial Classification Using Association Rules" Knowledge Discovery and Data Mining, 115-118, 1997.
[11] Lesh N., Zaki M. J., Ogihara M.: „Mining Features for Sequence Classification (1999)" Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, San Diego, 1999.
[12] Pinto H., Han J., Pei J., Wang K., Chen Q., Dayal U.: „Multidimensional Sequential Pattern Mining" Intelligent Database Systems Research Lab. School of Computing Science Simon Fraser University. Burnaby, B.C., Canada V5A 1S6, 2001
[13] Srikant R., Agrawal R.: „Mining sequential patterns: Generalizations and performance improvements" In Proc. 5th Int. Conf. Extending Database Technology, EDBT, P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, Eds. Vol. 1057. Springer-Verlag, 3-17, 1996

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPP1-0073-0021