Wyniki wyszukiwania - BazTech

1

Patterns of multirelational data transformation in data mining process

Mazurek M.

Studia Informatica

|

2012

|

Vol. 33, nr 2A

581-591

EN

Multirelational data mining requires complex preprocessing of data. Identification of transformation patterns and implementation of reusable components lead to more robust data-mining flow construction process. In this paper concept of implementation of selected transformation patterns is presented. Rapid Miner environment is used to build transformations, which can be later used in predicting customer behavior.

PL

Eksploracja danych wielorelacyjnych w dostępnych środowiskach eksploracji danych wymaga złożonego wstępnego przetwarzania danych. Identyfikacja wzorców przetwarzania oraz ich implementacja w postaci komponentów wielokrotnego użytku prowadzi do zwiększenia efektywności konstrukcji przepływów danych. W artykule przedstawiono koncepcję implementacji w środowisku Rapid Miner wybranych transformacji, które znajdują zastosowanie w prognozowaniu zachowań klientów.

2

Similarity-Based Classification in Relational Databases

Hońko P.

Fundamenta Informaticae

|

2010

|

Vol. 101, nr 3

187-213

EN

In this paper, we introduce a method for measuring similarity of objects of a relational database (relational objects, in short). We also propose and investigate an algorithm SC for classification of relational objects. The task of classification is carried out based on similarity of the objects to predefined classes. An object to be classified is assigned to the class to which it is most similar. A similarity of an object to a class is understood as its similarity to a class representative. Severalmethods for computing the class representative are proposed. We test the algorithm on real and artificial databases. We compare results obtained by the algorithm with those obtained by other algorithms known from the literature. We also present our approach in the context of granular computing.

3

Inferring graph grammars by detecting overlap in frequent subgraphs

Kukluk J. P., Holder L. B., Cook D. J.

International Journal of Applied Mathematics and Computer Science

|

2008

|

Vol. 18, no 2

241-250

EN

In this paper we study the inference of node and edge replacement graph grammars. We search for frequent subgraphs and then check for an overlap among the instances of the subgraphs in the input graph. If the subgraphs overlap by one node, we propose a node replacement graph grammar production. If the subgraphs overlap by two nodes or two nodes and an edge, we propose an edge replacement graph grammar production. We can also infer a hierarchy of productions by compressing portions of a graph described by a production and then inferring new productions on the compressed graph. We validate the approach in experiments where we generate graphs from known grammars and measure how well the approach infers the original grammar from the generated graph. We show graph grammars found in biological molecules, biological networks, and analyze learning curves of the algorithm.

4

Learning from Skewed Class Multi-relational Databases

Guo H., Viktor H.L.

Fundamenta Informaticae

|

2008

|

Vol. 89, nr 1

69-94

EN

Relational databases, with vast amounts of data–from financial transactions, marketing surveys, medical records, to health informatics observations– and complex schemas, are ubiquitous in our society. Multirelational classification algorithms have been proposed to learn from such relational repositories, where multiple interconnected tables (relations) are involved. These methods search for relevant features both from a target relation (in which each tuple is associated with a class label) and relations related to the target, in order to better classify target relation tuples. However, in many practical database applications, such as credit card fraud detection and disease diagnosis, the target tuples are highly imbalanced. That is, the number of examples of one class (majority class) in the target relation is much higher than the others (minority classes). Many existing methods thus tend to produce poor predictive performance over the underrepresented class in the data. This paper presents a strategy to deal with such imbalanced multirelational data. The method learns from multiple views (feature sets) of relational data in order to construct view learners with different awareness of the imbalanced problem. These different observations possessed by multiple view learners are then combined, in order to yield a model which has better knowledge on both the majority and minority classes in a relational database. Experiments performed on six benchmarking data sets show that the proposed method achieves promising results when compared with other popular relational data mining algorithms, in terms of the ROC curve and AUC value obtained. In particular, an important result indicates that the method is superior when the class imbalanced is very high.

5

Relational Data and Rough Sets

Stepaniuk J.

Fundamenta Informaticae

|

2007

|

Vol. 79, nr 3-4

525-539

EN

In this paper, we show that approximation spaces are basic structures for knowledge discovery from multi-relational data. The utility of approximation spaces as fundamental objects constructed for concept approximation is emphasized. Examples of basic concepts are given throughout this paper to illustrate how approximation spaces can be beneficially used in many settings.