A context pattern is a frequent subsequence mined from the context database containing set of sequences. This kind of sequential patterns and all elements inside them are described by additional sets of context attributes e.g. continuous ones. The contexts describe circumstances of transactions and sources of sequential data. These patterns can be mined by an algorithm for the context based sequential pattern mining. However, this can create large sets of patterns because all contexts related to patterns are taken from the database. The goal of the generalization method is to reduce the context pattern set by introducing a more compact and descriptive kind of patterns. This is achieved by finding clusters of similar context patterns in the mined set and transforming them to a smaller set of generalized context patterns. This process has to retain as much as possible information from the mined context patterns. This paper introduces a definition of the generalized context pattern and the related algorithm. Results from the generalization may differ as depending on the algorithm design and settings. Hence, generalized patterns may reflect frequent information from the context database differently. Thus, an accuracy measure is also proposed to evaluate the generalized patterns. This measure is used in the experiments presented. The generalized context patterns are compared to patterns mined by the basic sequential patterns mining with prediscretization of context values.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Methods of patterns detection in the sets of data are useful and demanded tools in a knowledge discovery process. The problem of searching patterns in set of sequences is named Sequential Patterns Mining. It can be defined as a way of finding frequent subsequences in the sequences database. The patterns selection procedure may be simply understood. Every subsequence must be enclosed in the required number of sequences from the database at least to become a pattern. The number of a pattern enclosing sequences is called a pattern support. The process of finding patterns may look trivial but its efficient solution is not. The efficiency plays a crucial role if the required support is lowered. The number of mined patterns may grow exponentially. Moreover, the situation may change if the problem of Sequential Patterns Mining will be extended further. In the classic definition the sequence is a list of ordered elements containing only non-empty sets of items. The Context Based Sequential Patterns Mining adds uniform and multi-attribute contexts (vectors) to the elements of the sequence and the sequence itself. Introducing contexts significantly enlarges the problem search space. However, it brings some additional occasions to constrain the mining process, too. This enhancement requires new algorithms. Traditional ones are not able to cope with non-nominal data directly. Algorithms derived straightly from traditional algorithms were verified to be inefficient. This study evaluates efficiency of novel ContextMapping and ContextMappingHeuristic algorithms. These innovative algoritnms are designed to solve the problem of Context Based Sequential Pattern Mining. This study answers in what scope the algorithms parameterization impacts on mining costs and accuracy. It also refers the modified problem to the traditional one pointing at the common and uncommon properties and drawing perspective for further research.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.