Overview of the INEX 2008 XML Mining Track

Authors:
Ludovic Denoyer;Patrick Gallinari
Affiliations:
LIP6 - University of Paris 6, ;LIP6 - University of Paris 6,
Venue:
Advances in Focused Retrieval
Year:
2009

Citing 0
Cited 6

A cluster-based approach to XML similarity joins

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Overview of the INEX 2009 XML mining track: clustering and classification of XML documents

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Link-based text classification using Bayesian networks

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Utilising semantic tags in XML clustering

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
UJM at INEX 2009 XML mining track

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Overview of the INEX 2010 XML mining track: clustering and classification of XML documents

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe here the XML Mining Track at INEX 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML documents and also the link information between documents.