UJM at INEX 2009 XML mining track

  • Authors:
  • Christine Largeron;Christophe Moulin;Mathias Géry

  • Affiliations:
  • Université de Lyon, Saint-Étienne, France and CNRS, UMR, Laboratoire Hubert Curien, Université de Saint-Étienne Jean Monnet, France;Université de Lyon, Saint-Étienne, France and CNRS, UMR, Laboratoire Hubert Curien, Université de Saint-Étienne Jean Monnet, France;Université de Lyon, Saint-Étienne, France and CNRS, UMR, Laboratoire Hubert Curien, Université de Saint-Étienne Jean Monnet, France

  • Venue:
  • INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports our experiments carried out for the INEX XML Mining track 2009, consisting in developing categorization methods for multi-labeled XML documents. We represent XML documents as vectors of indexed terms. The purpose of our experiments is twofold: firstly we aim to compare strategies that reduce the index size using an improved feature selection criteria CCD. Secondly, we compare a thresholding strategy (MCut) we proposed with common RCut, PCut strategies. The index size was reduced in such a way that the results were less good than expected. However, we obtained good improvements with the MCut thresholding strategy.