BitCube: A Three-Dimensional Bitmap Indexing for XML Documents
Journal of Intelligent Information Systems
LOGML: Log Markup Language for Web Usage Mining
WEBKDD '01 Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
A bag of paths model for measuring structural similarity in Web documents
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A tree-based approach to clustering XML documents by structure
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
A survey on tree edit distance and related problems
Theoretical Computer Science
Xproj: a framework for projected structural clustering of xml documents
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
HCX: an efficient hybrid clustering approach for XML documents
Proceedings of the 9th ACM symposium on Document engineering
Semantic clustering of XML documents
ACM Transactions on Information Systems (TOIS)
A methodology for clustering XML documents by structure
Information Systems
Mining of Data with Complex Structures
Mining of Data with Complex Structures
XML documents clustering using a tensor space model
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A framework for application of tree-structured data mining to process log analysis
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Application of tree-structured data mining for analysis of process logs in XML format
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
With the increasing use of XML in many domains, XML document clustering has been a central research topic in semistructured data management and mining. Due to the semistructured nature of XML data, the clustering problem becomes particularly challenging, mainly because structural similarity measures specifically designed to deal with tree/graph-shaped data can be quite expensive. Specialized clustering techniques are being developed to account for this difficulty, however most of them still assume that XML documents are represented using a semistructured data model. In this paper we take a simpler approach whereby XML structural aspects are extracted from the documents to generate a flat data format to which well-established clustering methods can be directly applied. Hence, the expensive process of tree/graph data mining is avoided, while the structural properties are still preserved. Our experimental evaluation using a number of real world datasets and comparing with existing structural clustering methods, has demonstrated the significance of our approach.