PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

An empirical study of context based sequential pattern mining algorithms efficiency

Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Methods of patterns detection in the sets of data are useful and demanded tools in a knowledge discovery process. The problem of searching patterns in set of sequences is named Sequential Patterns Mining. It can be defined as a way of finding frequent subsequences in the sequences database. The patterns selection procedure may be simply understood. Every subsequence must be enclosed in the required number of sequences from the database at least to become a pattern. The number of a pattern enclosing sequences is called a pattern support. The process of finding patterns may look trivial but its efficient solution is not. The efficiency plays a crucial role if the required support is lowered. The number of mined patterns may grow exponentially. Moreover, the situation may change if the problem of Sequential Patterns Mining will be extended further. In the classic definition the sequence is a list of ordered elements containing only non-empty sets of items. The Context Based Sequential Patterns Mining adds uniform and multi-attribute contexts (vectors) to the elements of the sequence and the sequence itself. Introducing contexts significantly enlarges the problem search space. However, it brings some additional occasions to constrain the mining process, too. This enhancement requires new algorithms. Traditional ones are not able to cope with non-nominal data directly. Algorithms derived straightly from traditional algorithms were verified to be inefficient. This study evaluates efficiency of novel ContextMapping and ContextMappingHeuristic algorithms. These innovative algoritnms are designed to solve the problem of Context Based Sequential Pattern Mining. This study answers in what scope the algorithms parameterization impacts on mining costs and accuracy. It also refers the modified problem to the traditional one pointing at the common and uncommon properties and drawing perspective for further research.
Rocznik
Strony
63--84
Opis fizyczny
Bibliogr. 14 poz.
Twórcy
Bibliografia
  • [1] Agrawal R., Srikant R.: „Fast algorithm for mining association rules" In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 487-499, Santiago, Chile. Sept. 1994.
  • [2] Agrawal R., Srikant R.: „Mining sequential patterns" In Proc. 1995 Int. Conf. Data Engineering (1CDE'95), pages 3-14, Taipei, Taiwan, Mar. 1995.
  • [3] Han J., Pei J., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.C.: „Freespan: Frequent pattern-projected sequential pattern mining" In. Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD'OO), pages 355-359, Boston, MA, Aug. 2000.
  • [4] Han J., Pei J., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.C.: „Prefixspan: Minning sequential patterns efficiently by prefix-projected pattern growth" In. Proc. 2001 Int. Conf. Data Engineering (ICDE'Ol), pages 215-224, Heidelberg, Germany, Apr. 2001.
  • [5] Yang Z., Wang Y., Kitsuregawa M.; „LAPIN: Effective Sequential Pattern Mining Algorithms by Last Position Induction" Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba, Meguro-Ku, Tokyo 153-8305, Japan, 2005.
  • [6] Stefanowski J., Ziembiński R.: „Mining Context Based Sequential Patterns" in Szczepaniak P., Kacprzyk J., Niewiadornski A. „Advances in Web Intelligence", Proceedings of the Third International Atlantic Web Intelligence Conference, Lodz, June 2005, Lecture Notes in Artificial Intelligence: vol. 3528, Springer-Verlag, 2005, sir. 401-407.
  • [7] Weiss G. M.: „ Timeweaver: a Genetic Algorithm for Identifying Predictive Patterns in Sequences of Events" Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann, Orlando, Florida, USA, 1999.
  • [8] Pitkow J. E., Pirolli P.: „Mining Longest Repeating Subsequences to Predict World Wide Web Surfing" USENIX Symposium on Internet Technologies and Systems, 1999.
  • [9] Mobasher B., Dai H., Luo T., Nakagawa M. : „ Using sequential and non-sequential patterns in predictive web usage mining tasks" In Proceedings of the IEEE International Conference on Data Mining (ICDM'02), Maebashi City, Japan, 2002.
  • [10] Ali K., Manganaris S., Srikant R.: „Partial Classification Using Association Rules" Knowledge Discovery and Data Mining, 115-118, 1997.
  • [11] Lesh N., Zaki M. J., Ogihara M.: „Mining Features for Sequence Classification (1999)" Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, San Diego, 1999.
  • [12] Pinto H., Han J., Pei J., Wang K., Chen Q., Dayal U.: „Multidimensional Sequential Pattern Mining" Intelligent Database Systems Research Lab. School of Computing Science Simon Fraser University. Burnaby, B.C., Canada V5A 1S6, 2001
  • [13] Srikant R., Agrawal R.: „Mining sequential patterns: Generalizations and performance improvements" In Proc. 5th Int. Conf. Extending Database Technology, EDBT, P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, Eds. Vol. 1057. Springer-Verlag, 3-17, 1996
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BPP1-0073-0021
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.