Algorithms for clustering data
Algorithms for clustering data
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
The VLDB Journal — The International Journal on Very Large Data Bases
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Introduction to Information Retrieval
Introduction to Information Retrieval
A Framework for Clustering Massive-Domain Data Streams
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Sketch-Based Summarization of Ordered XML Streams
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Adaptive XML Tree Classification on Evolving Data Streams
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
A methodology for clustering XML documents by structure
Information Systems
Cardinality estimation and dynamic length adaptation for Bloom filters
Distributed and Parallel Databases
Transforming XML trees for efficient classification and clustering
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Eliminating the redundancy in blocking-based entity resolution methods
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Hi-index | 0.00 |
XML clustering finds many applications, ranging from storage to query processing. However, existing clustering algorithms focus on static XML collections, whereas modern information systems frequently deal with streaming XML data that needs to be processed online. Streaming XML clustering is a challenging task because of the high computational and space efficiency requirements implicated for online approaches. In this paper we propose XStreamCluster, which addresses the two challenges using a two-layered optimization. The bottom layer employs Bloom filters to encode the XML documents, providing a spaceefficient solution to memory usage. The top layer is based on Locality Sensitive Hashing and contributes to the computational efficiency. The theoretical analysis shows that the approximate solution of XStream-Cluster generates similarly good clusters as the exact solution, with high probability. The experimental results demonstrate that XStreamCluster improves both memory efficiency and computational time by at least an order of magnitude without affecting clustering quality, compared to its variants and a baseline approach.