XML documents clustering using a tensor space model

  • Authors:
  • Sangeetha Kutty;Richi Nayak;Yuefeng Li

  • Affiliations:
  • Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia;Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia;Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.