Online Algorithms for Mining Semi-structured Data Stream

  • Authors:
  • Tatsuya Asai;Hiroki Arimura;Kenji Abe;Shinji Kawasoe;Setsuo Arikawa

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study an online data mining problemfrom streams of semi-structured data such as XML data.Modeling semi-structured data and patterns as labeled orderedtrees, we present an online algorithm StreamT thatreceives fragments of an unseen possibly infinite semi-structureddata in the document order through a datastream, and can return the current set of frequent patternsimmediately on request at any time. A crucial part of our algorithmis the incremental maintenance of the occurrencesof possibly frequent patterns using a tree sweeping technique.We give modifications of the algorithm to other on-linemining model. We present theoretical and empiricalanalyses to evaluate the performance of the algorithm.