Constructing a Decision Tree for Graph-Structured Data and its Applications

Authors:
Warodom Geamsakul;Tetsuya Yoshida;Kouzou Ohara;Hiroshi Motoda;Hideto Yokoi;Katsuhiko Takabayashi
Affiliations:
Institute of Scientific and Industrial Research, Osaka Univ., Japan. warodom@ar.sanken.osaka-u.ac.jp;Institute of Scientific and Industrial Research, Osaka University, Japan. yoshida@ar.sanken.osaka-u.ac.jp;Institute of Scientific and Industrial Research, Osaka University, Japan. ohara@ar.sanken.osaka-u.ac.jp;Institute of Scientific and Industrial Research, Osaka University, Japan. motoda@ar.sanken.osaka-u.ac.jp (Corresp. Inst. of Sci. and Ind. Res., Osaka Univ., 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, ...;Division for Medical Informatics, Chiba University Hospital, Japan. yokoi@telemed.ho.chiba-u.ac.jp/takaba@ho.chiba-u.ac.jp;Division for Medical Informatics, Chiba University Hospital, Japan. yokoi@telemed.ho.chiba-u.ac.jp/takaba@ho.chiba-u.ac.jp
Venue:
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Year:
2004

Citing 18
Cited 8

Learning flexible concepts: fundamental ideas and a method based on two-tiered representation

Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Extracting Refined Rules from Knowledge-Based Neural Networks

Machine Learning
CLIP: concept learning from inference patterns

Artificial Intelligence - Special issue: AI research in Japan
Top-down induction of first-order logical decision trees

Artificial Intelligence
Inductive logic programming: issues, results and the challenge of learning language in logic

Artificial Intelligence - Special issue on applications of artificial intelligence
Inducing classification and regression trees in first order logic

Relational Data Mining
Relational data mining applications: an overview

Relational Data Mining
Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Graph-Based Data Mining

IEEE Intelligent Systems
The CN2 Induction Algorithm

Machine Learning
Induction of Decision Trees

Machine Learning
Knowledge Discovery from Structured Data by Beam-Wise Graph-Based Induction

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Mining Patterns from Structured Data by Beam-Wise Graph-Based Induction

DS '02 Proceedings of the 5th International Conference on Discovery Science
Mining Similar Temporal Patterns in Long Time-Series Data and Its Application to Medicine

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Ilp: a short look back and a longer look forward

The Journal of Machine Learning Research
Mining hepatitis data with temporal abstraction

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalability and efficiency in multi-relational data mining

ACM SIGKDD Explorations Newsletter

An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Pruning Strategies Based on the Upper Bound of Information Gain for Discriminative Subgraph Mining

Knowledge Acquisition: Approaches, Algorithms and Applications
Pattern discovery from graph-structured data: a data mining perspective

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Conducting term alignment of a dataset without data provider identification

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Constructing decision trees for graph-structured data by chunkingless graph-based induction

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Extracting discriminative patterns from graph structured data using constrained search

PKAW'06 Proceedings of the 9th Pacific Rim Knowledge Acquisition international conference on Advances in Knowledge Acquisition and Management
What can we do with graph-structured data? – a data mining perspective

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
An ontology-driven decision support system for high-performance and cost-optimized design of complex railway portal frames

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A machine learning technique called Graph-Based Induction (GBI) efficiently extracts typical patterns from graph-structured data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search. Meanwhile, a decision tree is an effective means of data classification from which rules that are easy to understand can be obtained. However, a decision tree could not be constructed for the data which is not explicitly expressedwith attribute-value pairs. This paper proposes a method called Decision Tree Graph-Based Induction (DT-GBI), which constructs a classifier (decision tree) for graph-structured data while simultaneously constructing attributes for classification using GBI. Substructures (patterns) are extracted at each node of a decision tree by stepwise pair expansion in GBI to be used as attributes for testing. Since attributes (features) are constructed while a classifier is being constructed, DT-GBI can be conceived as a method for feature construction. The predictive accuracy of a decision tree is affected by which attributes (patterns) are used and how they are constructed. A beam search is employed to extract good enough discriminative patterns within the greedy search framework. Pessimistic pruning is incorporated to avoid overfitting to the training data. Experiments using a DNA dataset were conducted to see the effect of the beam width and the number of chunking at each node of a decision tree. The results indicate that DT-GBI that uses very little prior domain knowledge can construct a decision tree that is comparable to other classifiers constructed using the domain knowledge. DT-GBI was also applied to analyze a real-world hepatitis dataset as a part of evidence-based medicine. Four classification tasks of the hepatitis data were conducted using only the time-series data of blood inspection and urinalysis. The preliminary results of experiments, both constructed decision trees and their predictive accuracies as well as extracted patterns, are reported in this paper. Some of the patterns match domain experts experience and the overall results are encouraging.