Wyniki wyszukiwania - Biblioteka Nauki

1

Knowledge Mining from Data: Methodological Problems and Directions for Development

100%

Kulikowski J.

|

2011

|

tom Vol. 15, No 2

227-233

EN

The development of knowledge engineering and, within its framework, of data mining or knowledge mining from data should result in the characteristics or descriptions of objects, events, processes and/or rules governing them, which should satisfy certain quality criteria: credibility, accuracy, verifiability, topicality, mutual logical consistency, usefulness, etc. Choosing suitable mathematical models of knowledge mining from data ensures satisfying only some of the above criteria. This paper presents, also in the context of the aims of The Committee on Data for Science and Technology (CODATA), more general aspects of knowledge mining and popularization, which require applying the rules that enable or facilitate controlling the quality of data.

2

Global Action Rules in Distributed Knowledge Systems

100%

Raś Z. W. , Gupta S.

|

2002

|

tom Vol. 51, nr 1,2

175-184

EN

In papers [4,5], query answering system based on distributed knowledge mining was introduced and investigated. In paper by Ras and Wieczorkowska [3], the notion of an action rule was introduced and for its application domain e-business was taken. In this paper, we generalize the notion of action rules in a similar way to handling global queries in [4,5]. Mainly, when values of attributes for a given customer, used in action rules, can not be easily changed by business user, definitions of these attributes are extracted from other sites of a distributed knowledge system. To be more precise, attributes at every site of a distributed knowledge system are divided into two sets: stable and flexible. Values of flexible attributes, for a given consumer, sometime can be changed and this change can be influenced and controlled by a business user. However, some of these changes (for instance to the attribute ``profit'') can not be done directly to a chosen attribute. In this case, definitions of such an attribute in terms of other attributes have to be learned. These new definitions are used to construct action rules showing what changes in values of flexible attributes, for a given consumer, are needed in order to re-classify this consumer the way business user wants. But, business user may be either unable or unwilling to proceed with actions leading to such changes. In all such cases we may search for definitions of these flexible attributes looking at either local or remote sites for help.

3

Bridging the Gap between Non-Symbolic and Symbolic Processing - How Could Human Being Acquire Language? -

100%

Ohsuga S.

|

2007

|

tom Vol. 75, nr 1-4

385-406

EN

Information becomes more and more sophisticated with its ever-increasing use. Information sophistication relates closely to human intelligence. In order to ensure the common form of information, a symbolic language has been developed. It gradually progressed so that the representation of information of higher-level sophistication becomes possible. However, there is still a lot of information that cannot be captured by a language and has to be represented at very low level. A great effort is necessary for representing such information in a symbolic language because there is a large gap between non-symbolic and symbolic representations. This paper discusses two problems concerning bridging this gap, one from symbolic processing side and the other from non-symbolic processing side. The former is the language aspect of activity called discovery. The latter concerns an evolutional process of language creation. Both are very important topics for explaining the process of sophisticating information.

4

An empirical study of context based sequential pattern mining algorithms efficiency

100%

Ziembiński R.

|

2007

|

tom Vol. 32, No. 1

63-84

EN

Methods of patterns detection in the sets of data are useful and demanded tools in a knowledge discovery process. The problem of searching patterns in set of sequences is named Sequential Patterns Mining. It can be defined as a way of finding frequent subsequences in the sequences database. The patterns selection procedure may be simply understood. Every subsequence must be enclosed in the required number of sequences from the database at least to become a pattern. The number of a pattern enclosing sequences is called a pattern support. The process of finding patterns may look trivial but its efficient solution is not. The efficiency plays a crucial role if the required support is lowered. The number of mined patterns may grow exponentially. Moreover, the situation may change if the problem of Sequential Patterns Mining will be extended further. In the classic definition the sequence is a list of ordered elements containing only non-empty sets of items. The Context Based Sequential Patterns Mining adds uniform and multi-attribute contexts (vectors) to the elements of the sequence and the sequence itself. Introducing contexts significantly enlarges the problem search space. However, it brings some additional occasions to constrain the mining process, too. This enhancement requires new algorithms. Traditional ones are not able to cope with non-nominal data directly. Algorithms derived straightly from traditional algorithms were verified to be inefficient. This study evaluates efficiency of novel ContextMapping and ContextMappingHeuristic algorithms. These innovative algoritnms are designed to solve the problem of Context Based Sequential Pattern Mining. This study answers in what scope the algorithms parameterization impacts on mining costs and accuracy. It also refers the modified problem to the traditional one pointing at the common and uncommon properties and drawing perspective for further research.

5

Toward decisional DNA: developing holistic set of experience knowledge structure

100%

Sanin C. , Szczerbicki E.

|

2008

|

tom No. 9

109-122

EN

Set of Experience Knowledge Structure (SOEKS) is a structure able to collect and manage explicit knowledge of formal decision events on different forms. It was built as part of a platform for transforming information into knowledge named Knowledge Supply Chain System (KSCS). In brief, the KSCS takes information from different technologies that make formal decision events, integrates them and transforms them into knowledge represented by Sets of Experience. SOEKS is a structure that can be source and target of multiple technologies. Moreover, it comprises variables, functions, constraints and rules associated in a DNA shape allowing the construction of Decisional DNA. However, when having various dissimilar Sets of Experience as output of the same formal decision event, a renegotiation and unification of the decision has to be performed. The purpose of this paper is to show the process of renegotiating various dissimilar Sets of Experience collected from the same formal decision event.

6

An experimental evaluation of two approaches to mining context based sequential patterns

100%

Stefanowski J. , Ziembiński R.

|

2009

|

tom Vol. 38, no 1

27-45

EN

The paper discusses the results of experiments with a new context extension of a sequential pattern mining problem. In this extension, two kinds of context attributes are introduced for describing the source of a sequence and for each element inside this sequence. Such context based sequential patterns may be discovered by a new algorithm, called Context Mapping Improved, specific for handling attributes with similarity functions. For numerical attributes an alternative approach could include their pre-discretization, transforming discrete values into artificial items and, then, using an adaptation of an algorithm for mining sequential patterns from nominal items. The aim of this paper is to experimentally compare these two approaches to mine artificially generated sequence databases with numerical context attributes where several reference patterns are hidden. The results of experiments show that the Context Mapping Improved algorithm has led to better re-discovery of reference patterns. Moreover, a new measure for comparing two sets of context based patterns is introduced.

7

Musical Sound Classification based on Wavelet Analysis

100%

Wieczorkowska A.

|

2001

|

tom Vol. 47, nr 1,2

175-188

EN

Contents-based searching through audio data is basically restricted to metadata, which are attached manually to the file. Otherwise, users have to look for the specific musical information alone. Nevertheles, when classifiers based on descriptors extracted from sounds analytically are used, automatic classification can be in some cases possible. For instance, wavelet analysis can be used as a basis for automatic classification of audio data. In this paper, classification of musical instrument sounds based on wavelet parameterization is described. Decision trees and rough set based algorithms are used as classification tools. The parameterization is very simple, but the efficiency of classification proves that automatic classification of these sounds is possible.

8

Rough modeling - a bottom-up approach to model construction

100%

Loken T. , Komorowski J.

International Journal of Applied Mathematics and Computer Science

|

2001

|

tom 11

|

nr 3

675-690

EN

Traditional data mining methods based on rough set theory focus on extracting models which are good at classifying unseen obj-ects. If one wants to uncover new knowledge from the data, the model must have a high descriptive quality-it must describe the data set in a clear and concise manner, without sacrificing classification performance. Rough modeling, introduced by Kowalczyk (1998), is an approach which aims at providing models with good predictive emphand descriptive qualities, in addition to being computationally simple enough to handle large data sets. As rough models are flexible in nature and simple to generate, it is possible to generate a large number of models and search through them for the best model. Initial experiments confirm that the drop in performance of rough models compared to models induced using traditional rough set methods is slight at worst, and the gain in descriptive quality is very large.

9

Autonomous diagnostics: a data mining approach

88%

Kusiak A.

|

2000

|

tom Vol. 76, nr 35

100-108

EN

Data mining offers tools for data analysis, knowledge discovery, and autonomous decision-making. In the paper, a data mining approach is used to extract meaningful features (attributes) from a data set and make accurate predictions for a semiconductor process application. An important property of the approach discussed in the paper is that a decision is made only when it is accurately predicted, otherwise no autonomous decision is recommended. The high accuracy of predictions made by the proposed approach is based on a weak assumption that objects with equivalent values of a subset of attributes produce equivalent outcomes.

10

Mining the Largest Dense Vertexlet in a Weighted Scale-free Graph

88%

Bandyopadhyay S. , Bhattacharyya M.

|

2009

|

tom Vol. 96, nr 1/2

1-25

EN

An important problem of knowledge discovery that has recently evolved in various reallife networks is identifying the largest set of vertices that are functionally associated. The topology of many real-life networks shows scale-freeness, where the vertices of the underlying graph follow a power-law degree distribution. Moreover, the graphs corresponding to most of the real-life networks are weighted in nature. In this article, the problem of finding the largest group or association of vertices that are dense (denoted as dense vertexlet) in a weighted scale-free graph is addressed. Density quantifies the degree of similarity within a group of vertices in a graph. The density of a vertexlet is defined in a novel way that ensures significant participation of all the vertices within the vertexlet. It is established that the problem is NP-complete in nature. An upper bound on the order of the largest dense vertexlet of a weighted graph, with respect to certain density threshold value, is also derived. Finally, an O(n2 log n) (n denotes the number of vertices in the graph) heuristic graph mining algorithm that produces an approximate solution for the problem is presented.

11

Light Region-based Techniques for Process Discovery

88%

Solé M. , Carmona J.

|

2011

|

tom Vol. 113, nr 3/4

343-376

EN

A central problem in the area of Process Mining is to obtain a formal model that represents selected behavior of a system. The theory of regions has been applied to address this problem, enabling the derivation of a Petri net whose language includes a set of traces. However, when dealing with real-life systems, the available tool support for performing such a task is unsatisfactory, due to the complex algorithms that are required. In this paper, the theory of regions is revisited to devise a novel technique that explores the space of regions by combining the elements of a region basis. Due to its light space requirements, the approach can represent an important step for bridging the gap between the theory of regions and its industrial application. Experimental results show that there is improvement in orders of magnitude in comparison with state-of-the-art tools for the same task.

12

The Outline of an Ontology for the Rough Set Theory and its Applications

88%

Grochowalski P. , Pancerz K.

|

2009

|

tom Vol. 93, nr 1-3

143-154

EN

The paper gives the outline of an ontology for the rough set theory and its applications. This ontology will be applied in intelligent searching the Rough Set Database System. A specialized editor from the Protege system is used to define the ontology.

13

Toward Intelligent Searching the Rough Set Database System (RSDS): an Ontological Approach

88%

Suraj Z. , Grochowalski P.

|

2010

|

tom Vol. 101, nr 1/2

115-123

EN

Themain goal of this paper is to give the outline of some approach to intelligent searching the Rough Set Database System (RSDS). RSDS is a bibliographical system containing bibliographical descriptions of publications connected with methodology of rough sets and its applications. The presented approach bases on created ontologies which are models for the considered domain (rough set theory, its applications and related fields) and for information about publications coming from, for example, abstracts.

14

A Comparision of Different Decision Algorithms Used in Volumetric Storm Cells Classification

75%

Suraj Z. , Peters J. F. , Rząsa W.

|

2002

|

tom Vol. 51, nr 1,2

201-214

EN

Decision algorithms useful in classifying meteorological volumetric radar data are the subject of described in the paper experiments. Such data come from the Radar Decision Support System (RDSS) database of Environment Canada and concern summer storms created in this country. Some research groups used the data completed by RDSS for verifying the utility of chosen methods in volumetric storm cells classification. The paper consists of a review of experiments that were made on the data from RDSS database of Environment Canada and presents the quality of particular classifiers. The classification accuracy coefficient is used to express the quality. For five research groups that led their experiments in a similar way it was possible to compare received outputs. Experiments showed that the Support Vector Machine (SVM) method and rough set algorithms which use object oriented reducts for rule generation to classify volumetric storm data perform better than other classifiers.

15

Acquisition of technology knowledge from online information sources

75%

Kluska-Nawarecka S. , Wilk-Kołodziejczyk D. , Dziaduś-Rudnicka J. , Smolarek-Grzyb A.

|

2011

|

tom Vol. 11, iss. 3

107-112

EN

The article discusses problems related with the search of information from open sources, particularly on the Internet. Specific area of concern is searching for technical knowledge in the area of metalcasting. The results of ongoing experiments were given, to serve as a basis in identification of the opportunities to improve the process of searching through determination of own research plans.

16

Algorithms for Context Based Sequential Pattern Mining

75%

Ziembiński R.

|

2007

|

tom Vol. 76, nr 4

495-510

EN

This paper describes practical aspects of a novel approach to the sequential pattern mining named Context Based Sequential Pattern Mining (CBSPM). It introduces a novel ContextMapping algorithm used for the context pattern mining and an illustrative example showing some advantages of the proposed method. The approach presented here takes into consideration some shortcomings of the classic problem of the sequential pattern mining. The significant advantage of the classic sequential patterns mining is simplicity. It introduces simple element construction, built upon set of atomic items. The comparison of sequence's elements utilizes simple inclusion of sets. But many practical problems like web event mining, monitoring, tracking and rules generation often require mining more complex data. The CBSPM takes into account non nominal attributes and similarity of sequence's elements. An approach described here extends traditional problem adding a vector of context attributes of any kind to sequences and sequence’s elements. Context vectors contain details about sequence's and element's origin. The mining process results in context patterns containing additional, valuable context information useful in interpretation of patterns origin.

17

Application of machine learning for prediction a methane concentration in a coal-mine

75%

Sikora M. , Sikora B.

|

2006

|

tom Vol. 51, no 4

475-492

EN

Applications of machine learning methods for creation rule-based data model used in prediction of a methane concentration in the excavation are described in the paper. Data coming from a methane concentration monitoring system and methodology of their transformation into a form acceptable by analytic algorithms that have been used are presented in the second chapter. Next chapter describes the rules induction algorithm used for prediction. Results of the analysis that has been performed on data coming from a coal-mine are presented at the end of the paper.

PL

W artykule przedstawiono pomysł zastosowania inteligentnych technik komputerowych do eksploracyjnej analizy danych pochodzących z systemu monitorowania zagrożeń związanych z wydzielaniem metanu w kopalniach węgla kamiennego. Celem stawianym zastosowanym metodom analitycznym jest predykcja stężenia metanu mierzonego przez wybrany metanomierz z wyprzedzeniem dziesięciominutowym i godzinnym. Spośród różnych metodologii generowania systemów umożliwiających predykcję (m.in. systemy rozmyte, sztuczne sieci neuronowe, metody statystyczne) w artykule wybrano algorytm indukcji reguł o konkluzjach w postaci funkcji liniowych. Przedstawiony algorytm charakteryzuje się jednym z najszybszych czasów analizy oraz dobrymi wynikami predykcji uzyskiwanymi na ogólnodostępnych danych benchmarkowych. Istotną cechą zastosowanego algorytmu jest również to, że wyniki analizy, a więc syntetyczny opis analizowanego zbioru danych jest stosunkowo łatwy do interpretacji przez użytkownika. Z punktu widzenia dziedziny znanej jako odkrywanie wiedzy, w bazach danych jest to bardzo istotna cecha. Dane poddane analizie pochodziły z wyrobiska znajdującego się na kopalni niezagrożonej tąpnięciami. Na rysunku pierwszym przedstawiono schemat rejonu, w którym znajduje się rozważane przez nas wyrobisko, widoczne jest tam również rozmieszczenie czujników. Graficzna analiza szeregów czasowych odzwierciedlających wskazania metanomierzy i anemometrów (rys. 2) wykazała, że największą dynamikę stężenia metanu obserwuje się na wylocie ze ściany. W badaniach podjęto zatem próbę predykcji wskazań metanomierza M32. Dane pomiarowe zbierane były z dziesięciosekundowym interwałem czasowym. Do celów badań dane poddano agregacji tworząc dwa zbiory danych, w których kolejne rekordy zawierały: maksymalne wartości mierzonych wartości w okresach jednominutowych (był to zbiór danych dla predykcji dziesięciominutowej), maksymalne wartości mierzonych wartości w okresach dziesięciominutowych (był to zbiór danych dla predykcji godzinowej). W celu umożliwienia zastosowania metod analitycznych wykorzystujących paradygmat maszynowego uczenia, dostępny zbiór danych należało poddać modyfikacjom. Dane pobrane z systemu monitorowania reprezentowane są przez zbiór rekordów, pomiędzy którymi istnieje związek temporalny, tymczasem zastosowany w artykule algorytm analizuje tabele, w których każdy wiersz jest niezależny. Zatem, informacja o stanie danego procesu w danej chwili czasu (w tym o dynamice zmian parametrów opisujących ten proces) musi być zawarta w jednym wierszu. W rozdziale drugi przedstawiono sposób, w jaki możliwe jest przejście z reprezentacji danych uzyskanych wprost z systemu monitorowania (Tab. l) do reprezentacji akceptowanej przez wykorzystany algorytm analityczny (Tab. 2). W rozdziale drugim wyspecyfikowano także zbiór zmiennych niezależnych: AN31 - wska7JInia anemometru AN31; AN32 - wskazania anemometru AN32; MM32 - wskazania metanomierza MM32; Wydobycie, DAN31 - suma wskazań AN31 za ostatnie dziesięć minut; DAN32 - suma wskazań AN32 za ostatnie dziesięć minut; DMM32 - suma wskazań MM32 za ostatnie dziesięć minut. Zmiennej zależnej nadano nazwę MM32_Pred. W rozdziale trzecim dokładnie opisano zastosowany algorytm analityczny, który umożliwia generowanie reguł o liniowych konkluzjach (I). Algorytm buduje regułę w taki sposób, aby część warunkowa reguły opisywała jak największa liczbę obiektów ze zbioru treningowego (2) przy jednoczesnym ograniczeniu wariancji zmiennej zależnej. Wielowymiarowy model liniowy pozwalający dla danej reguły wyznaczyć wartość zmiennej zależnej znajduje się w jej konkluzji. Algorytm jest heurystyczny i jako kryterium optymalności w czasie budowy reguły wykorzystuje wyrażenie (3). W rozdziale trzecim omówiono także metody optymalizacji (w tym upraszczania) otrzymanego regułowego modelu danych. Rozdział czwarty zawiera wyniki przeprowadzonych analiz. Analizę prowadzono na wydzielonych zbiorach danych, efektywność wyznaczonych modeli sprawdzono na niezależnych zbiorach testowych. Obiektywną miarą efektywności był błąd RMS (4) popełniany przez wyznaczone modele, jako miarę subiektywną przyjęto skomplikowanie (możliwość interpretacji przez użytkownika) wyznaczonego modelu. Metodę proponowaną w artykule porównano z metodami statystycznymi (regresja wielowymiarowa, ARIMA) oraz z metodą stochastyczną (sieci neuronowe). Wyniki eksperymentów dla predykcji dziesięciominutowej podano w tabeli trzeciej, wyniki dla predykcji godzinowej podano w tabeli czwartej. Wizualnie, rzeczywisty szereg czasowy stężenia metanu rejestrowanego przez metanomierz M32 oraz szereg przewidywany przez model pokazano na rysunkach trzecim i czwartym. Przeprowadzone badania wykazały, że zastosowana metoda pozwoliła na uzyskanie najmniejszego błędu predykcji przy jednoczesnym zachowaniu przejrzystości wyznaczonego modelu. Metoda charakteryzowała się również najkrótszym czasem analizy.

18

Rough Modeling---a Bottom-up Approach to Model Construction

75%

Loken T. , Komorowski J.

|

2001

|

tom Vol. 11, no 3

675-690

EN

Traditional data mining methods based on rough set theory focus on extracting models which are good at classifying unseen objects. If one wants to uncover new knowledge from the data, the model must have a high descriptive quality---it must describe the data set in a clear and concise manner, without sacrificing classification performance. Rough modeling, introduced by Kowalczyk (1998), is an approach which aims at providing models with good predictive and descriptive qualities, in addition to being computationally simple enough to handle large data sets. As rough models are flexible in nature and simple to generate, it is possible to generate a large number of models and search through them for the best model. Initial experiments confirm that the drop in performance of rough models compared to models induced using traditional rough set methods is slight at worst, and the gain in descriptive quality is very large.

19

Eksploracja danych w serwisach internetowych

75%

Fabisiak L.

|

2009

|

tom nr 3 (20)

33-42

PL

Eksploracja danych dostarcza bardzo cennej wiedzy o funkcjonowaniu serwisu. Pozwala uzyskać wiedzę o tym kto, kiedy, dlaczego i jak używa serwisu. Dzięki postępowi w komputerach i technologii rejestrowania danych, ogromne zbiory były i są gromadzone. Sztuka eksploracji danych polega na wydobyciu cennych informacji z otaczającej masy nic nie wnoszących liczb po to, żeby właściciele tych danych mogli na nich się wzbogacić. Organizację posiadają cenną, dialektyczną wiedzę o atrakcyjności swojej oferty, wiedzę o tym w jaki sposób kształtować ofertę, aby odpowiadała ona potrzebom klienta itp. Dysponując danymi uzyskanymi w procesie eksploracji można dostosować zawartość serwisów do potrzeb danego użytkownika, poprawić strukturę całego serwisu a także wprowadzić nowe elementy do serwisów internetowych. Eksploracja danych pozwala na wydzielenie grup atrakcyjnych klientów o których serwisy powinny szczególnie dbać. Eksploracja danych pozwala na pełne wykorzystanie posiadanych informacji o klientach i transakcjach, a co za tym idzie odkrycie wiedzy, która może zaważyć o losach i pozycji firmy [14]. Podsumowując, eksploracja może przynieść korzyści organizacji, gdyż dostarcza danych użytecznych w procesach podejmowania decyzji biznesowych i decyzji dotyczących funkcjonowania i rozwoju serwisu. Widoczne są też korzyści dla klienta, gdyż serwis lepiej odpowiada na jego potrzeby, a on sam częściej i chętniej korzysta z serwisu oraz jest zainteresowany jego nowymi funkcjami. Bowiem skuteczne zastosowanie eksploracji danych to takie, które przynosi ogólnie rozumiane zyski dzięki wdrożeniu jej wyników.

EN

In this work the exploration of data for analysis and evaluation of Web sites. The publication brought closer review of the methods of data mining, data mining union websites. Synthesis has been currently used in the methods and techniques.

20

Knowledge discovery in data using formal concept analysis and random projections

75%

Aswani Kumar C.

International Journal of Applied Mathematics and Computer Science

|

2011

|

tom 21

|

nr 4

745-756

EN

In this paper our objective is to propose a random projections based formal concept analysis for knowledge discovery in data. We demonstrate the implementation of the proposed method on two real world healthcare datasets. Formal Concept Analysis (FCA) is a mathematical framework that offers a conceptual knowledge representation through hierarchical conceptual structures called concept lattices. However, during the design of a concept lattice, complexity plays a major role.