Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 18

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  rule induction
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
PL
W artykule dyskutowane są możliwości zastosowania metod syntezy logicznej w zadaniach eksploracji danych. W szczególności omawiana jest metoda redukcji atrybutów oraz metoda indukcji reguł decyzyjnych. Pokazano, że metody syntezy logicznej skutecznie usprawniają te procedury i z powodzeniem mogą być zastosowane do rozwiązywania ogólniejszych zadań eksploracji danych. W uzasadnieniu celowości takiego postępowania omówiono diagnozowanie pacjentów z możliwością eliminowania kłopotliwych badań.
EN
The article discusses the possibilities of application of logic synthesis methods in data mining tasks. In particular, the method of reducing attributes and the method of inducing decision rules is considered. It is shown that by applying specialized logic synthesis methods, these issues can be effectively improved and successfully used for solving data mining tasks. In justification of the advisability of such proceedings, the patient's diagnosis with the possibility of eliminating troublesome tests is discussed.
EN
In this paper a reasoning algorithm for a creative decision support system is proposed. It allows to integrate inference and machine learning algorithms. Execution of learning algorithm is automatic because it is formalized as aplying a complex inference rule, which generates intrinsically new knowledge using the facts stored already in the knowledge base as training data. This new knowledge may be used in the same inference chain to derive a decision. Such a solution makes the reasoning process more creative and allows to continue resoning in cases when the knowledge base does not have appropriate knowledge explicit encoded. In the paper appropriate knowledge representation and infeence model are proposed. Experimental verification is performed on a decision support system in a casting domain.
EN
The paper presents the results of research related to the efficiency of the so-called rule quality measures which are used to evaluate the quality of rules at each stage of the rule induction. The stages of rule growing and pruning were considered along with the issue of conflict resolution which may occur during the classification. The work is the continuation of research on the efficiency of quality measures employed in sequential covering rule induction algorithm. In this paper we analyse only these quality measures (8 measures) which had been recognized as effective based on previous conducted research. The study was conducted on approximately 70 benchmark datasets related to classification, regression and survival analysis problems. In the comparisons we analyzed prognostic abilities of the induced rules as well as the complexity of the resulting rule-based data models.
4
Content available remote Post-processing of BRACID Rules Induced from Imbalanced Data
EN
Rule-based classifiers constructed from imbalanced data fail to correctly classify instances from the minority class. Solutions to this problem should deal with data and algorithmic difficulty factors. The new algorithm BRACID addresses these factors more comprehensively than other proposals. The experimental evaluation of classification abilities of BRACID shows that it significantly outperforms other rule approaches specialized for imbalanced data. However, it may generate too high a number of rules, which hinder the human interpretation of the discovered rules. Thus, the method for post-processing of BRACID rules is presented. It aims at selecting rules characterized by high supports, in particular for the minority class, and covering diversified subsets of examples. Experimental studies confirm its usefulness.
PL
Podsumowano badania nad algorytmem uzupełnienia funkcji boolowskich. Wskazano rozwiązania, zaproponowane w czasie badan, w tym analizę istniejącej implementacji algorytmu oraz wykorzystane i proponowane sposoby przyśpieszenia obliczeń. Badania zostały zakończone wydajną implementacją współbieżną algorytmu. Wykonano zestawienie czasów obliczeń istniejących systemów analizy i eksploracji danych z autorską implementacją. Wynikiem prac jest łatwo rozszerzalne eksperymentalne oprogramowanie - najszybsze z testowanych rozwiązań i systemów eksploracji danych.
EN
The paper provides three-year research summary related to the Boolean function complement algorithm. It contains computational issues of the algorithm, the analysis of existing implementations, and solutions that have been considered during the research including both proposed and implemented enhancements to reduce the calculation time. As a result of the research new algorithm has been developed that has finally been implemented as efficient concurrent procedure. At last, the computation time comparison of existing data mining systems to the authors' implementation has been shown. These studies have resulted in experimental software that is the most effective of the tested solutions and data mining systems.
PL
Dyskutowane są możliwości zastosowania metod syntezy logicznej w zadaniach eksploracji danych. W szczególności omawiane jest zastosowanie metody uzupełnienia funkcji boolowskiej do najważniejszych procedur eksploracji danych, takich jak ich dyskretyzacja, indukcja reguł oraz redukcja atrybutów. Pokazano, że metody syntezy logicznej skutecznie usprawniają te procedury i z powodzeniem mogą być zastosowane do rozwiązywania zadań eksploracji danych w medycynie i telekomunikacji.
EN
The article discusses the possibilities of application of logic synthesis methods in data mining tasks. The main idea is to use the complement of Boolean function method from logic synthesis in the most important data mining procedures such as data discretization, induction of rules and reduction of attributes. It is shown that by applying specialized logic synthesis methods, these three issues can be effectively improved and successfully used for solving data mining tasks in medicine and telecommunications.
PL
Dyskutowana jest nowa metoda indukcji reguł decyzyjnych. W przeciwieństwie do klasycznej metody sekwencyjnego pokrywania stosuje się w niej dwustopniowy proces selekcji reguł, w którym pojedyncze obiekty są uogólniane w celu uzyskania zbioru reguł minimalnych. Następnie rodzina wszystkich minimalnych reguł jest selekcjonowana wydajnymi algorytmami heurystycznymi. Przedstawione wyniki eksperymentów wskazują, że metoda znacząco usprawnia proces indukcji reguł decyzyjnych.
EN
A new method of solving the rule induction problem is discussed. The method is different to the classical approach using the so called sequential covering strategy. The main idea is to use the two stage selection process where single objects are considered in order to find whole sets of minimal rules. Next the family of minimal rules is selected using efficient highly-heuristic algorithms. The presented results of experiments with typical databases indicate that the proposed approach significantly improves the efficiency of the rule induction process.
PL
Pierwsza część monografii poświęcona jest pokryciowym algorytmom indukcji reguł i obiektywnym miarom oceny jakości reguł decyzyjnych. Przedstawiono dwa algorytmy, które indukcję reguł prowadzą w kierunku maksymalizacji wartości miar przeznaczonych do oceny reguł decyzyjnych. Dokonano analizy własności miar definiowanych na podstawie tablicy kontyngencji i na tej podstawie określono minimalne zbiory własności, pożądane dla miar nadzorujących proces indukcji oraz oceniających zdolności opisowe reguł decyzyjnych. Przeprowadzono analizę równoważności i podobieństwa miar. Równoważność analizowano zarówno ze względu na uporządkowanie reguł, jak i na sposób rozstrzygania konfliktów klasyfikacji. W części eksperymentalnej zweryfikowano efektywności miar, zidentyfikowano zbiory miar najbardziej efektywnych oraz zaproponowano adaptacyjną metodę doboru miary w algorytmie indukcji reguł. Przeprowadzono także analizę własności teoretycznych najefektywniejszych miar. W pierwszej części pracy omówiono również miary niedefiniowane bezpośrednio na podstawie tablicy kontyngencji. Przedyskutowano możliwość złożonej oceny reguł, a także przedstawiono propozycję wielokryterialnej oceny reguł na podstawie tzw. funkcji użyteczności. Druga część monografii koncentruje się na wybranych metodach przycinania reguł. W części tej zaprezentowano dwa algorytmy agregacji reguł, algorytm redefinicji reguł na podstawie informacji o ważności tworzących je warunków elementarnych oraz cztery algorytmy filtracji reguł. Dzięki agregacji i redefinicji w przesłankach reguł mogą pojawić się złożone warunki elementarne, co w szczególnych przypadkach lepiej odzwierciedla zależności, jakimi charakteryzują się dane. Efektywność wszystkich algorytmów proponowanych w częściach pierwszej i drugiej zweryfikowano eksperymentalnie. Ostatnia część publikacji przedstawia przykłady nowych zastosowań algorytmów indukcji reguł decyzyjnych. Zaprezentowano trzy nowe obszary zastosowań: prognozowanie zagrożeń sejsmicznych, analizę danych okołoprzeszczepowych oraz funkcjonalny opis genów. W zastosowaniach tych wykorzystano rezultaty badań przedstawionych w częściach pierwszej i drugiej. W ostatniej części monografii przedstawiono także dwie, ukierunkowane dziedzinowo, modyfikacje algorytmów indukcji reguł. Pierwsza z nich umożliwia indukcję reguł sterowaną hipotezami definiowanymi przez użytkownika. Druga dostosowuje algorytm indukcji do hierarchicznej struktury analizowanych danych. Rezultatem badań nad funkcjonalnym opisem genów jest także metoda redukcji atrybutów, biorąca pod uwagę semantykę ich wartości.
EN
The first part of the book is devoted to seąuential covering rule induction algorithms and objective rule evaluation measures. Two algorithms that maximize values of rule evaluation measures are presented. The properties of measures defined on the basis of the contingency table were analyzed and minimal sets of the properties desired for measures controlling the process of rule induction and evaluating descriptive quality of decision rules were specified. The analysis of equivalence and similarity of measures was carried out. The equivalence was analyzed both due to the rule ordering and the classification conflicts resolving. In the experimental part the efficiency of measures was verified, sets of the most efficient measures were identified. The adaptive method of measure selection in sequential covering rule induction algorithm was proposed. Moreover, theoretical properties of most effective measures were analyzed. In the first part of the paper measures that are not defined directly from the contingency table were also discussed. Furthermore, the possibility of complex evaluation of rules was discussed and the proposal of multi-criteria rule assessment on the basis of so-called utility function was presented. The second part of the book focuses on algorithms of rule pruning. This part presents two algorithms of rule aggregation, the algorithm of rule redefinition based on information about the importance of the rule elementary conditions, and four algorithms of rule filtration. Through aggregation and redefinition complex elementary conditions may appear in rule premises which, in specific cases, better reflect dependencies in data. The effectiveness of all the algorithms proposed in the first and second parts was verified experimentally. The last part of the work shows examples of new applications of the decision rule induction algorithms. The following three new areas of application are presented: forecasting of seismic hazards, analysis of bone marrow transplantation data and functional description of genes. The results of study presented in the first two parts of the book were used there. Two domain-oriented modifications of rule induction algorithms are proposed. The first one allows for the rule induction controlled by hypothesis defined by the user. The second adjusts the induction algorithm to the hierarchical structure of the analyzed data. The method of attribute reduction that takes into consideration the semantics of the attributes values is also the result of research on the functional description of genes.
EN
The paper presents an algorithm for induction of decision list from survival data. The algorithm uses a survival tree as the inner learner which is repeatedly executed in order to select the best rule at each iteration. The effectiveness of the algorithm was empirical tested for two implementations of survival trees on 15 benchmark datasets. The results show that proposed algorithm for survival decision list construction is able to induce more compact models than corresponding survival tree without the loss of the accuracy of predictions.
10
Content available remote Factors of software quality - analysis of extended ISBSG dataset
EN
In this paper, we analyze the extended ISBSG dataset, which contains data on a wide range of software projects developed in various companies worldwide. The main aim of this paper is to identify important factors that influence software quality and to investigate the nature of these relationships. This analysis involves using various statistical techniques, both analytical and graphical. We provide a rating for each variable to express the strength of its relationship with software quality. Unlike earlier analyses, we focus on the business perspective and its relationships on software quality. Obtained results may be used do support decision making in software projects, specifically by demonstrating the impact of selected software development practices.
EN
The paper presents the attempt of applying rough sets theory and ROSES software to build the automated currency pair trading system. ROSES system's rule induction m.odule was used to extract rules from the feature space, built upon moving averages differences and other technical analysis indicators. The rules obtained were evaluated through the simulation of a trading system.
EN
Given a medical data set containing genetic description of sodium-sensitive and nonsensitive patients, we examine it using several techniques: induction of decision rules, naive Bayes classifier, voting perceptron classifier, decision trees, SVM classifier. We specifically focus on induction of decision rules and so called Pareto-optimal rules, which are of large interpretative value for physicians. We find statistically relevant combinations of attributes, which affect the sodium sensitivity.
EN
One of the most important problems with rule induction methods is that it is very difficult for domain experts to check millions of rules generated from large datasets, although the discovery from these rules requires deep interpretation from domain knowledge. Although several solutions have been proposed in the studies on data mining and knowledge discovery, these studies are not focused on similarities between rules obtained. When one rule r1 has reasonable features and the other rule r2 with high similarity to r1 includes unexpected factors, the relations between these rules will become a trigger to the discovery of knowledge. In this paper, we propose a visualization approach to show the similarity relations between rules based on multidimensional scaling, which assign a two-dimensional cartesian coordinate to each data point from the information about similarities between this data and others data. We evaluated this method on two medical data sets, whose experimental results show that knowledge useful for domain experts can be found.
EN
The paper addresses problems of improving performance of rule-based classifiers constructed from imbalanced data sets, i.e., data sets where the minority class of primary importance is under-represented in comparison to majority classes. We introduced two techniques to detect and process inconsistent examples from the majority classes in the boundary between the minority and majority classes. Both these techniques differ in the way of processing inconsistent boundary examples from the majority classes. The first approach removes them, while the other relabels them as belonging to the minority class. The experiments showed that the best results were obtained for the filtering technique, where inconsistent majority class examples were reassigned to the minority class, combined with a classifier composed of decision rules generated by the MODLEM algorithm.
15
Content available remote Hyperplane Aggregation of Dominance Decision Rules
EN
In this paper we consider multiple criteria decision aid systems based on decision rules generated from examples. A common problem in such systems is the over-abundance of decision rules, as in many situations the rule generation algorithms produce very large sets of rules. This prolific representation of knowledge provides a great deal of detailed information about the described objects, but is appropriately difficult to interpret and use. One way of solving this problem is to aggregate the created rules into more general ones, e.g. by forming rules of enriched syntax. The paper presents a generalization of elementary rule conditions into linear combinations. This corresponds to partitioning the preference-ordered condition space of criteria with non-orthogonal hyperplanes. The objective of this paper is to introduce the generalized rules into the multiple criteria classification problems and to demonstrate that these problems can be successfully solved using the introduced rules. The usefulness of the introduced solution is finally demonstrated in computational experiments with real-life data sets.
16
Content available remote Extraction of Structure of Medical Diagnosis from Clinical Data
EN
One of the most important problems with rule induction methods is that they cannot extract rules, which plausibly represent expert decision processes. In this paper, the characteristics of experts' rules are closely examined and a new approach to extract plausible rules is introduced, which consists of the following three procedures. First, the characterization of decision attributes (given classes) is extracted from databases and the concept hierarchy for given classes is calculated. Second, based on the hierarchy, rules for each hierarchical level are induced from data. Then, for each given class, rules for all the hierarchical levels are integrated into one rule. The proposed method was evaluated on a medical database, the experimental results of which show that induced rules correctly represent experts' decision processes.
EN
The article describes a method combining two widely-used empirical approaches to learning from examples: rule induction and instance-based learning. In our algorithm (RIONA) decision is predicted not on the basis of the whole support set of all rules matching a test case, but the support set restricted to a neighbourhood of a test case. The size of the optimal neighbourhood is automatically induced during the learning phase. The empirical study shows the interesting fact that it is enough to consider a small neighbourhood to achieve classification accuracy comparable to an algorithm considering the whole learning set. The combination of k-NN and a rule-based algorithm results in a significant acceleration of the algorithm using all minimal rules. Moreover, the presented classifier has high accuracy for both kinds of domains: more suitable for k-NN classifiers and more suitable for rule based classifiers.
EN
Inconsistent information is one of main difficulties in the explanation and recommendation tasks of decision analysis. We distinguish two kinds of such information inconsistencies : the first is related to indiscernibility of objects described by attributes defined in nominal or ordinal scales, and the other follows from violation of the dominance principle among attributes defined on preference ordered ordinal or cardinal scales, i.e. among criteria. In this paper we discuss how these two kinds of inconsistencies are handled by a new approach based on the rough sets theory. Combination of this theory with inductive learning techniques leads to generation of decision rules from rough approximations of decision classes. Particular attention is paid to numerical attribute scales and preference-ordered scales of criteria, and their influence on the syntax of induced decision rules.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.