We present a set of guidelines for improving quality and efficiency in initial steps of the KDD process by utilizing various kinds of domain knowledge. We discuss how such knowledge may be used to the advantage of system developer and what kinds of improvements can be achieved. We focus on systems that incorporate creation and processing of compound data objects within the RDBMS framework. These basic considerations are illustrated with several examples of implemented database solutions.
Today many different software tools for decision support exist; the same is true for data mining which can be seen as a particularly challenging sub-area of decision support. Choosing the most suitable tool for a particular industrial data mining application is becoming difficult, especially for industrial decision makers whose expertise is in a different field. This paper provides a conceptual analysis of crucial features of current data mining software tools, by establishing an abstract view on typical processes in data mining. Thus a common terminology is given which simplifies the comparison of tools. Based on this analysis, objective decisions for the application of decision supporting software tools in industrial practice can be made.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The knowledge discovery from real-life databases is a multi-phase process consisting of numerous steps, including attribute selection, discretization of real-valued attributes, and rule induction. In the paper, we discuss a rule discovery process that is based on rough set theory. The core of the process is a soft hybrid induction system called the Generalized Distribution Table and Rough Set System (GDT-RS) for discovering classification rules from databases with uncertain and incomplete data. The system is based on a combination of Generalization Distribution Table (GDT) and the Rough Set methodologies. In the preprocessing, two modules, i.e. Rough Sets with Heuristics (RSH) and Rough Sets with Boolean Reasoning (RSBR), are used for attribute selection and discretization of real-valued attributes, respectively. We use a slope-collapse database as an example showing how rules can be discovered from a large, real-life database.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.