In this paper we present results of experiments on 166 incomplete data sets using three probabilistic approximations: lower, middle, and upper. Two interpretations of missing attribute values were used: lost and “do not care” conditions. Our main objective was to select the best combination of an approximation and a missing attribute interpretation. We conclude that the best approach depends on the data set. The additional objective of our research was to study the average number of distinct probabilities associated with characteristic sets for all concepts of the data set. This number is much larger for data sets with “do not care” conditions than with data sets with lost values. Therefore, for data sets with “do not care” conditions the number of probabilistic approximations is also larger.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The main objective of our research was to test whether the probabilistic approximations should be used in rule induction from incomplete data. For our research we designed experiments using six standard data sets. Four of the data sets were incomplete to begin with and two of the data sets had missing attribute values that were randomly inserted. In the six data sets, we used two interpretations of missing attribute values: lost values and “do not care” conditions. In addition we used three definitions of approximations: singleton, subset and concept. Among 36 combinations of a data set, type of missing attribute values and type of approximation, for five combinations the error rate (the result of ten-fold cross validation) was smaller than for ordinary (lower and upper) approximations; for other four combinations, the error rate was larger than for ordinary approximations. For the remaining 27 combinations, the difference between these error rates was not statistically significant.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.