XML document clustering using structure-preserving flat representation of XML content and structure

Authors:
Fedja Hadzic;Michael Hecker;Andrea Tagarelli
Affiliations:
Digital Ecosystems and Business Intelligence Institute, Curtin University, Australia;Digital Ecosystems and Business Intelligence Institute, Curtin University, Australia;Dept. of Electronics, Computer and Systems Sciences, University of Calabria, Italy
Venue:
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Year:
2011

Citing 14
Cited 2

BitCube: A Three-Dimensional Bitmap Indexing for XML Documents

Journal of Intelligent Information Systems
LOGML: Log Markup Language for Web Usage Mining

WEBKDD '01 Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
A bag of paths model for measuring structural similarity in Web documents

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
A tree-based approach to clustering XML documents by structure

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
A survey on tree edit distance and related problems

Theoretical Computer Science
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
HCX: an efficient hybrid clustering approach for XML documents

Proceedings of the 9th ACM symposium on Document engineering
Semantic clustering of XML documents

ACM Transactions on Information Systems (TOIS)
A methodology for clustering XML documents by structure

Information Systems
Mining of Data with Complex Structures

Mining of Data with Complex Structures
XML documents clustering using a tensor space model

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I

A framework for application of tree-structured data mining to process log analysis

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Application of tree-structured data mining for analysis of process logs in XML format

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing use of XML in many domains, XML document clustering has been a central research topic in semistructured data management and mining. Due to the semistructured nature of XML data, the clustering problem becomes particularly challenging, mainly because structural similarity measures specifically designed to deal with tree/graph-shaped data can be quite expensive. Specialized clustering techniques are being developed to account for this difficulty, however most of them still assume that XML documents are represented using a semistructured data model. In this paper we take a simpler approach whereby XML structural aspects are extracted from the documents to generate a flat data format to which well-established clustering methods can be directly applied. Hence, the expensive process of tree/graph data mining is avoided, while the structural properties are still preserved. Our experimental evaluation using a number of real world datasets and comparing with existing structural clustering methods, has demonstrated the significance of our approach.