Optimized Substructure Discovery for Semi-structured Data

  • Authors:
  • Kenji Abe;Shinji Kawasoe;Tatsuya Asai;Hiroki Arimura;Setsuo Arikawa

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we consider the problem of discovering interesting substructures from a large collection of semi-structured data in the framework of optimized pattern discovery. We model semi-structured data and patterns with labeled ordered trees, and present an efficient algorithm that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data. We give theoretical analyses of the computational complexity of the algorithm for patterns with bounded and unbounded size. Experiments show that the algorithm performs well and discovered interesting patterns on real datasets.