PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Constructing a Decision Tree for Graph-Structured Data and its Applications

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
A machine learning technique called Graph-Based Induction (GBI) efficiently extracts typical patterns from graph-structured data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search. Meanwhile, a decision tree is an effective means of data classification from which rules that are easy to understand can be obtained. However, a decision tree could not be constructed for the data which is not explicitly expressed with attribute-value pairs. This paper proposes a method called Decision Tree Graph-Based Induction (DT-GBI), which constructs a classifier (decision tree) for graph-structured data while simultaneously constructing attributes for classification using GBI. Substructures (patterns) are extracted at each node of a decision tree by stepwise pair expansion in GBI to be used as attributes for testing. Since attributes (features) are constructed while a classifier is being constructed, DT-GBI can be conceived as a method for feature construction. The predictive accuracy of a decision tree is affected by which attributes (patterns) are used and how they are constructed. A beam search is employed to extract good enough discriminative patterns within the greedy search framework. Pessimistic pruning is incorporated to avoid overfitting to the training data. Experiments using a DNA dataset were conducted to see the effect of the beam width and the number of chunking at each node of a decision tree. The results indicate that DT-GBI that uses very little prior domain knowledge can construct a decision tree that is comparable to other classifiers constructed using the domain knowledge. DT-GBI was also applied to analyze a real-world hepatitis dataset as a part of evidence-based medicine. Four classification tasks of the hepatitis data were conducted using only the time-series data of blood inspection and urinalysis. The preliminary results of experiments, both constructed decision trees and their predictive accuracies as well as extracted patterns, are reported in this paper. Some of the patterns match domain experts' experience and the overall results are encouraging.
Wydawca
Rocznik
Strony
131--160
Opis fizyczny
Bibliogr. 30 poz., tab., wykr.
Twórcy
autor
  • Institute of Scientific and Industrial Research Osaka University, Japan
autor
  • Institute of Scientific and Industrial Research Osaka University, Japan
autor
  • Institute of Scientific and Industrial Research Osaka University, Japan
autor
  • Institute of Scientific and Industrial Research Osaka University, Japan
autor
  • Division for Medical Informatics Chiba University Hospital, Japan
  • Division for Medical Informatics Chiba University Hospital, Japan
Bibliografia
  • [1] Blake, C. L., Keogh, E., Merz, C.: UCI Repository of Machine Leaning Database, 1998, Http://www.ics.uci.edu/􀀀mlearn/MLRepository.html.
  • [2] Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision tree, Artificial Intelligence, 101, 1998, 285–297.
  • [3] Blockeel, H., Sebag, M.: Scalability and efficiency in multi-relational data mining, ACM SIGKDD Explorations Newsletter, 5(1), 2003, 17–30.
  • [4] Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J.: Classification and Regression Trees, Wadsworth & Brooks/Cole Advanced Books & Software, 1984.
  • [5] Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J.: The CN2 Induction Algorithm, Machine Learning, 3, 1989, 261–283.
  • [6] Cook, D. J., Holder, L. B.: Graph-Based Data Mining, IEEE Intelligent Systems, 15(2), 2000, 32–41.
  • [7] Džeroski, S.: Relational Data Mining Applications: An Overview, Relational Data Mining, 2001, 63–102.
  • [8] Fortin, S.: The graph isomorphism problem, 1996.
  • [9] Hirano, S., Tsumoto, S.: Mining Similar Temporal Patterns in Long Time-series Data and Its Application to Medicine, Proceedings of 2002 IEEE International Conference on Data Mining (ICDM 2002), December 2002.
  • [10] Ho, T. B., Nguyen, T. D., Kawasaki, S., Le, S., Nguyen, D. D., Yokoi, H., Takabayashi, K.: Mining Hepatitis Data with Temporal Abstraction, Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003.
  • [11] Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data, Proc. of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, 2000.
  • [12] Inokuchi, A., Washio, T., Motoda, H.: Complete Mining of Frequent Patterns from Graphs: Mining Graph Data, Machine Learning, 50(3), 2003, 321–354.
  • [13] Kramer, S., Windmer, G.: Inducing Classification and Regression Trees in First Order Logic, Relational Data Mining, 2001, 140–159.
  • [14] Matsuda, T., Horiuchi, T., Motoda, H., Washio, T.: Extension of Graph-Based Induction for General Graph Structured Data, Knowledge Discovery and Data Mining: Current Issues and New Applications (Springer Verlag LNAI 1805), 2000.
  • [15] Matsuda, T., Yoshida, T., Motoda, H., Washio, T.: Knowledge Discovery from Structured Data by Beamwise Graph-Based Induction, Proc. of the 7th Pacific Rim International Confernce on Artificial Intelligence (Springer Verlag LNAI2417), 2002.
  • [16] Matsuda, T., Yoshida, T., Motoda, H., Washio, T.: Mining Patterns from Structured Data by Beam-wise Graph-Based Induction, Proc. of The Fifth International Conference on Discovery Science, 2002.
  • [17] Michalski, R. S.: Learning Flexible Concepts: Fundamental Ideas and a Method Based on Two-Tiered Representaion, In Machine Learning, An Artificial Intelligence Approiach, 3, 1990, 63–102.
  • [18] Muggleton, S.: Inductive Logic Programming: Issues, results and challenge of Learning Language in Logic, Artificial Intelligence, 114, 1999, 283–296.
  • [19] Muggleton, S., de Raedt, L.: Inductive Logic Programming: Theory and Methods, Journal of Logic Programming, 19(20), 1994, 629–679.
  • [20] Nédellec, C., Adé, H., Bergadano, F., Tausend, B.: Declarative bias in ILP, Advances in Inductive Logic Programming, 1996, 82–103.
  • [21] Ohsaki, M., Sato, Y., Kitaguchi, S., Yamaguchi, T.: A Rule Discovery Support System, Project “Realization of Active Mining in the Era of Information Flood” Report, March 2003.
  • [22] Page, D., Srinivasan, A.: ILP: A Short Look Back and a Longer Forward, Journal of Machine Learning Research, 4, 2003, 415–430.
  • [23] Quinlan, J. R.: Induction of decision trees, Machine Learning, 1, 1986, 81–106.
  • [24] Quinlan, J. R.: C4.5:Programs For Machine Learning, Morgan Kaufmann Publishers, 1993.
  • [25] Read, R. C., Corneil, D. G.: The graph isomorphism disease, Journal of Graph Theory, 1, 1977, 339–363.
  • [26] Towell, G. G., Shavlik, J. W.: Extracting Refined Rules from Knowledge-Based Neural Networks, Machine Learning, 13, 1993, 71–101.
  • [27] Warodom, G., Matsuda, T., Yoshida, T., Motoda, H., Washio, T.: Classifier Construction by Graph-Based Induction for Graph-Structured Data, Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer Verlag LNAI2637), 2003.
  • [28] Warodom, G., Matsuda, T., Yoshida, T., Motoda, H., Washio, T.: Performance Evaluation of Decision Tree Graph-Based Induction, Proc. of the International Conference on Discovery Science (Springer Verlag LNAI2843), 2003.
  • [29] Yamada, Y., Suzuki, E., Yokoi, H., Takabayashi, K.: Decision-tree Induction from Time-series Data Based on a Standard-example Split Test, Proc. of the 12th International Conference on Machine Learning, August 2003.
  • [30] Yoshida, K., Motoda, H.: CLIP : Concept Learning from Inference Pattern, Journal of Artificial Intelligence, 75(1), 1995, 63–92.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS2-0007-0024
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.