Extended VSM for XML document classification using frequent subtrees

  • Authors:
  • Jianwu Yang;Songlin Wang

  • Affiliations:
  • Institute of Computer Sci. & Tech., Peking University, Beijing, China;Institute of Computer Sci. & Tech., Peking University, Beijing, China

  • Venue:
  • INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Structured link vector model (SLVM) is a representation proposed for modeling XML documents, which was extended from the conventional vector space model (VSM) by incorporating document structures. In this paper, we describe the classification approach for XML documents based on SLVM in the Document Mining Challenge of INEX 2009, where the closed frequent subtrees as structural units are used for content extraction from the XML document and the Chi-square test is used for feature selection.