In a previous paper three types of missing attribute values: lost values, attribute-concept values and "do not care" conditions were compared using six data sets. Since previous experimental results were affected by large variances due to conducting experiments on different versions of a given data set, we conducted new experiments, using the same pattern of missing attribute values for all three types of missing attribute values and for both certain and possible rules. Additionally, in our new experiments, the process of incremental replacing specified values by missing attribute values was terminated when entire rows of the data sets were full of missing attribute values. Finally, we created new, incomplete data sets by replacing the specified values starting from 5% of all attribute values, instead of 10% as in the previous experiments, with an increment of 5% instead of the previous increment of 10%. As a result, it is becoming more clear that the best approach to missing attribute values is based on lost values, with small difference between certain and possible rules, and that the worst approach is based on "do not care" conditions, certain rules. With our improved experimental setup it is also more clear that for a given data set the type of the missing attribute values should be selected individually.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The indiscernibility relation is a fundamental concept of the rough set theory. The original definition of the indiscernibility relation does not capture the situation where some of the attribute values are missing. This paper tries to enhance former works by proposing an individual treatment of missing values at the attribute or value level. The main assumption of the theses presented in this paper considers that not all missing values are semantically equal. We propose two different approaches to create an individual indiscernibility relation for a particular information system. The first relation assumes variable, but fixed semantics of missing attribute values in different columns. The second relation assumes different semantics of missing attribute values, although this variability is limited with expressive power of formulas utilizing descriptors. We provide also a comparison of flexible indiscernibility relations and missing value imputation methods. Finally we present a simple algorithm for inducing sub-optimal relations from data.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In this paper we present a method of data decomposition to avoid the necessity of reasoning on data with missing attribute values. This method can be applied to any algorithm of classifier induction. The original incomplete data is decomposed into data subsets without missing values. Next, methods for classifier induction are applied to these sets. Finally, a conflict resolving method is used to obtain final classification from partial classifiers. We provide an empirical evaluation of the decomposition method accuracy and model size with use of various decomposition criteria on data with natural missing values. We present also experiments on data with synthetic missing values to examine the properties of proposed method with variable ratio of incompleteness.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.