Clust-XPaths: clustering of XML paths

Authors:
Amina Madani;Omar Boussaid;Djamel Eddine Zegour
Affiliations:
Algiers University, Law Faculty, Algiers, Algeria;Lumière Lyon2 University, Lyon, France;National High School of Computer Science, Algiers, Algeria
Venue:
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Year:
2011

Citing 15
Cited 0

Data mining and the Web: past, present and future

Proceedings of the 2nd international workshop on Web information and data management
A semi-structured document model for text mining

Journal of Computer Science and Technology
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
XML schema clustering with semantic and hierarchical similarity measures

Knowledge-Based Systems
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
XML documents clustering by structures

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Clustering XML documents using self-organizing maps for structures

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
SSC: statistical subspace clustering

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Clustering XML documents using structural summaries

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The aim of this paper is to bring a new approach of XML documents clustering. We use a flexible representation of documents by considering both the structure and the content. The approach consists of representing XML documents by a set of their paths. We exploit the semantic similarity between terms (tags and text) that composes XML paths, by unifying them using a thesaurus created in advance. Clustering is then used to organize documents into clusters based on their paths similarity. Experiments were conducted on a large set of documents which were made available as part of INEX 20071 (INitiative for the Evaluation of XML Retrieval).