The nature of statistical learning theory
The nature of statistical learning theory
An algorithm for suffix stripping
Readings in information retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Overview of the INEX 2008 XML Mining Track
Advances in Focused Retrieval
Selection strategies for multi-label text categorization
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Overview of the INEX 2009 XML mining track: clustering and classification of XML documents
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Hi-index | 0.00 |
This paper reports our experiments carried out for the INEX XML Mining track 2009, consisting in developing categorization methods for multi-labeled XML documents. We represent XML documents as vectors of indexed terms. The purpose of our experiments is twofold: firstly we aim to compare strategies that reduce the index size using an improved feature selection criteria CCD. Secondly, we compare a thresholding strategy (MCut) we proposed with common RCut, PCut strategies. The index size was reduced in such a way that the results were less good than expected. However, we obtained good improvements with the MCut thresholding strategy.