Clust-XPaths: clustering of XML paths

  • Authors:
  • Amina Madani;Omar Boussaid;Djamel Eddine Zegour

  • Affiliations:
  • Algiers University, Law Faculty, Algiers, Algeria;Lumière Lyon2 University, Lyon, France;National High School of Computer Science, Algiers, Algeria

  • Venue:
  • MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of this paper is to bring a new approach of XML documents clustering. We use a flexible representation of documents by considering both the structure and the content. The approach consists of representing XML documents by a set of their paths. We exploit the semantic similarity between terms (tags and text) that composes XML paths, by unifying them using a thesaurus created in advance. Clustering is then used to organize documents into clusters based on their paths similarity. Experiments were conducted on a large set of documents which were made available as part of INEX 20071 (INitiative for the Evaluation of XML Retrieval).