Wyniki wyszukiwania - BazTech

1

Filtering Decision Rules Driven by Sequential Forward and Backward Selection of Attributes: An Illustrative Example in Stylometric Domain

Zielosko Beata, Stańczyk Urszula, Jabloński Kamil

Annals of Computer Science and Information Systems

|

2023

|

Vol. 35

833--842

EN

The paper presents investigations concerning the decision rule filtering process controlled by the estimated relevance of available attributes. In the conducted study, two search directions were used, sequential forward selection and sequential backward elimination, applied after the knowledge discovery step to the rule sets inferred from a dataset. The steps of sequential search, along with two different strategies of rule selection, were governed by three rankings obtained for variables, all related to characteristics of data and rules that can be induced, as follows, (i) a ranking based on the weighting factor referring to the occurrence of attributes in generated decision reducts, (ii) the OneR ranking exploiting short rule properties, and (iii) the proposed ranking defined through the operation of greedy algorithm for rule induction. The three rankings were confronted and compared from the perspective of their usefulness for the selection of rules performed in the two directions. The resulting sets of rules were analysed with respect to the properties of the constituent decision rules and from the point of performance for all constructed rule-based classifiers. Substantial experiments were carried out in the stylometric domain, treating the task of authorship attribution as classification. The results obtained indicate that for all three rankings and search paths it was possible to obtain a noticeable reduction of attributes while at least maintaining the power of inducers, at the same time improving characteristics of rule sets.

2

Data irregularities in discretisation of test sets used for evaluation of classification systems: A case study on authorship attribution

Stańczyk Urszula, Zielosko Beata

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2021

|

Vol. 69, nr 4

art. no. e137629

EN

When patterns to be recognised are described by features of continuous type, discretisation becomes either an optional or necessary step in the initial data pre-processing stage. Characteristics of data, distribution of data points in the input space, can significantly influence the process of transformation from real-valued into nominal attributes, and the resulting performance of classification systems employing them. If data include several separate sets, their discretisation becomes more complex, as varying numbers of intervals and different ranges can be constructed for the same variables. The paper presents research on irregularities in data distribution, observed in the context of discretisation processes. Selected discretisation methods were used and their effect on the performance of decision algorithms, induced in classical rough set approach, was investigated. The studied input space was defined by measurable style-markers, which, exploited as characteristic features, facilitate treating a task of stylometric authorship attribution as classification.

3

Comparison of Heuristics for Optimization of Association Rules

Alsolami Fawaz, Amin Talha, Moshkov Mikhail, Zielosko Beata, Żabiński Krzysztof

Fundamenta Informaticae

|

2019

|

Vol. 166, nr 1

1--14

EN

In this paper, seven greedy heuristics for construction of association rules are compared from the point of view of the length and coverage of constructed rules. The obtained rules are compared also with optimal ones constructed by dynamic programming algorithms. The average relative difference between length of rules constructed by the best heuristic and minimum length of rules is at most 4%. The same situation is with coverage.