XML documents clustering by structures

Authors:
Richi Nayak;Sumei Xu
Affiliations:
School of Information Systems, Queensland University of Technology, Brisbane, Australia;School of Information Systems, Queensland University of Technology, Brisbane, Australia
Venue:
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Year:
2005

Citing 10
Cited 3

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Data clustering: a review

ACM Computing Surveys (CSUR)
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Xyleme: A Dynamic Warehouse for XML Data of the Web

IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Information Systems - Special issue on web data integration
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
On the use of hierarchical information in sequential mining-based XML document similarity computation

Knowledge and Information Systems
XCLS: a fast and effective clustering algorithm for heterogenous XML documents

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
XMine: a methodology for mining XML structure

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development

Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

ACM SIGIR Forum
Clust-XPaths: clustering of XML paths

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

XCLS is a novel clustering algorithm to assemble heterogeneous XML documents by measuring their level similarity with a global criterion function. XCLS does not require the pair wise similarity to be computed between two individual documents, rather it measures the similarity at clustering level utilising the structural information of XML documents. Quality of the clustering solution depends on the calculation of the level similarity, and whether the level similarity can represent the documents’ structural similarity correctly. In this paper, we present the performance of XCLS for clustering the structural descriptions (ordered labeled trees) of XML documents. We have reported 5 sub-tasks corresponding to 5 corpuses as provided by the INEX 2005 document mining track.