Data mining and the Web: past, present and future
Proceedings of the 2nd international workshop on Web information and data management
A semi-structured document model for text mining
Journal of Computer Science and Technology
TreeFinder: a First Step towards XML Data Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
XML schema clustering with semantic and hierarchical similarity measures
Knowledge-Based Systems
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
XML documents clustering by structures
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
A flexible structured-based representation for XML document mining
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Clustering XML documents using self-organizing maps for structures
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
SSC: statistical subspace clustering
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Clustering XML documents using structural summaries
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Hi-index | 0.00 |
The aim of this paper is to bring a new approach of XML documents clustering. We use a flexible representation of documents by considering both the structure and the content. The approach consists of representing XML documents by a set of their paths. We exploit the semantic similarity between terms (tags and text) that composes XML paths, by unifying them using a thesaurus created in advance. Clustering is then used to organize documents into clusters based on their paths similarity. Experiments were conducted on a large set of documents which were made available as part of INEX 20071 (INitiative for the Evaluation of XML Retrieval).