Classifier construction by graph-based induction for graph-structured data

  • Authors:
  • Warodom Geamsakul;Takashi Matsuda;Tetsuya Yoshida;Hiroshi Motoda;Takashi Washio

  • Affiliations:
  • Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka, Japan;Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka, Japan;Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka, Japan;Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka, Japan;Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka, Japan

  • Venue:
  • PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A machine learning technique called Graph-Based Induction (GBI) efficiently extracts typical patterns from graph-structured data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search. Meanwhile, a decision tree is an effective means of data classification from which rules that are easy to understand can be obtained. However, a decision tree could not be produced for the data which is not explicitly expressed with attribute-value pairs. In this paper, we proposes a method of constructing a classifier (decision tree) for graph-structured data by GBI. In our approach attributes, namely substructures useful for classification task, are constructed by GBI on the fly while constructing a decision tree. We call this technique Decision Tree - Graph-Based Induction (DT-GBI). DT-GBI was tested against a DNA dataset from UCI repository. Since DNA data is a sequence of symbols, representing each sequence by attribute-value pairs by simply assigning these symbols to the values of ordered attributes does not make sense. The sequences were transformed into graph-structured data and the attributes (substructures) were extracted by GBI to construct a decision tree. Effect of adjusting the number of times to run GBI at each node of a decision tree is evaluated with respect to the predictive accuracy. The results indicate the effectiveness of DT-GBI for constructing a classifier for graph-structured data.