Algorithms for clustering data
Algorithms for clustering data
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Concept decompositions for large sparse text data using clustering
Machine Learning
Modern Information Retrieval
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
A normal form for XML documents
ACM Transactions on Database Systems (TODS)
A tree-based approach to clustering XML documents by structure
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Peer-to-peer management of XML data: issues and research challenges
ACM SIGMOD Record
Xproj: a framework for projected structural clustering of xml documents
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Structure and value synopses for XML data graphs
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries
Proceedings of the 2008 ACM symposium on Applied computing
Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
Focused Access to XML Documents
Peer-to-peer collaboration over XML documents
CDVE '08 Proceedings of the 5th international conference on Cooperative Design, Visualization, and Engineering
XML processing in DHT networks
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Utilizing XML Clustering for Efficient XML Data Management on P2P Networks
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Semantic clustering of XML documents
ACM Transactions on Information Systems (TOIS)
Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables
IEEE Transactions on Knowledge and Data Engineering
Communications of the ACM
XCLS: a fast and effective clustering algorithm for heterogenous XML documents
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A flexible structured-based representation for XML document mining
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
A framework for distributed XML data management
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Combining structure and content similarities for XML document clustering
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Hi-index | 0.00 |
Clustering XML documents is extensively used to organize large collections of XML documents in groups that are coherent according to structure and/or content features. The growing availability of distributed XML sources and the variety of high-demand environments raise the need for clustering approaches that can exploit distributed processing techniques. Nevertheless, existing methods for clustering XML documents are designed to work in a centralized way. In this paper, we address the problem of clustering XML documents in a collaborative distributed framework. XML documents are first decomposed based on semantically cohesive subtrees, then modeled as transactional data that embed both XML structure and content information. The proposed clustering framework employs a centroid-based partitional clustering method that has been developed for a peer-to-peer network. Each peer in the network is allowed to compute a local clustering solution over its own data, and to exchange its cluster representatives with other peers. The exchanged representatives are used to compute representatives for the global clustering solution in a collaborative way. We evaluated effectiveness and efficiency of our approach on real XML document collections varying the number of peers. Results have shown that major advantages with respect to the corresponding centralized clustering setting are obtained in terms of runtime behavior, although clustering solutions can still be accurate with a moderately low number of nodes in the network. Moreover, the collaborativeness characteristic of our approach has revealed to be a convenient feature in distributed clustering as found in a comparative evaluation with a distributed non-collaborative clustering method.