Sequential pattern mining for structure-based XML document classification

  • Authors:
  • Calin Garboni;Florent Masseglia;Brigitte Trousse

  • Affiliations:
  • West University of Timisoara, Romania;INRIA Sophia Antipolis, AxIS Research Team 2004, Sophia Antiplis, France;INRIA Sophia Antipolis, AxIS Research Team 2004, Sophia Antiplis, France

  • Venue:
  • INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article presents an original supervised classification technique for XML documents which is based on structure only. Each XML document is viewed as an ordered labeled tree, represented by his tags only. Our method has three steps. After a cleaning step, we characterize each predefined cluster in terms of frequent structural subsequences. Then we classify the XML documents based on the mined patterns of each cluster.