Fast and effective clustering of XML data using structural information

Authors:
Richi Nayak
Affiliations:
Queensland University of Technology, School of Information Systems, Brisbane, Australia
Venue:
Knowledge and Information Systems
Year:
2008

Citing 0
Cited 14

Multilevel Conditional Fuzzy C-Means Clustering of XML Documents

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A schema matching-based approach to XML schema clustering

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Knowledge Discovery over the Deep Web, Semantic Web and XML

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Improving XML schema matching performance using Prüfer sequences

Data & Knowledge Engineering
Word Sense Disambiguation for XML Structure Feature Generation

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas

Information Sciences: an International Journal
A new multiobjective clustering technique based on the concepts of stability and symmetry

Knowledge and Information Systems
Element similarity measures in XML schema matching

Information Sciences: an International Journal
BusSEngine: a business search engine

Knowledge and Information Systems
Double-layered schema integration of heterogeneous XML sources

Journal of Systems and Software
The hidden web, XML and the Semantic Web: scientific data management perspectives

Proceedings of the 14th International Conference on Extending Database Technology
XML data clustering: An overview

ACM Computing Surveys (CSUR)
Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal
Discovering interesting information with advances in web technology

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the incremental clustering algorithm, XML documents Clustering with Level Similarity (XCLS), that groups the XML documents according to structural similarity. A level structure format is introduced to represent the structure of XML documents for efficient processing. A global criterion function that measures the similarity between the new document and existing clusters is developed. It avoids the need to compute the pair-wise similarity between two individual documents and hence saves a huge amount of computing effort. XCLS is further modified to incorporate the semantic meanings of XML tags for investigating the trade-offs between accuracy and efficiency. The empirical analysis shows that the structural similarity overplays the semantic similarity in the clustering process of the structured data such as XML. The experimental analysis shows that the XCLS method is fast and accurate in clustering the heterogeneous documents by structures.