Collaborative clustering of XML documents

Authors:
Sergio Greco;Francesco Gullo;Giovanni Ponti;Andrea Tagarelli
Affiliations:
Dept. of Electronics, Computer and Systems Sciences (DEIS), University of Calabria, Via P. Bucci, 41C, 87036 Arcavacata di Rende (CS), Italy;Dept. of Electronics, Computer and Systems Sciences (DEIS), University of Calabria, Via P. Bucci, 41C, 87036 Arcavacata di Rende (CS), Italy;Dept. of Electronics, Computer and Systems Sciences (DEIS), University of Calabria, Via P. Bucci, 41C, 87036 Arcavacata di Rende (CS), Italy;Dept. of Electronics, Computer and Systems Sciences (DEIS), University of Calabria, Via P. Bucci, 41C, 87036 Arcavacata di Rende (CS), Italy
Venue:
Journal of Computer and System Sciences
Year:
2011

Citing 25
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Concept decompositions for large sparse text data using clustering

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
A normal form for XML documents

ACM Transactions on Database Systems (TODS)
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
A tree-based approach to clustering XML documents by structure

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Peer-to-peer management of XML data: issues and research challenges

ACM SIGMOD Record
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries

Proceedings of the 2008 ACM symposium on Applied computing
Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach

Focused Access to XML Documents
Peer-to-peer collaboration over XML documents

CDVE '08 Proceedings of the 5th international conference on Cooperative Design, Visualization, and Engineering
XML processing in DHT networks

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Utilizing XML Clustering for Efficient XML Data Management on P2P Networks

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Semantic clustering of XML documents

ACM Transactions on Information Systems (TOIS)
Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables

IEEE Transactions on Knowledge and Data Engineering
Peer-to-peer systems

Communications of the ACM
XCLS: a fast and effective clustering algorithm for heterogenous XML documents

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
A framework for distributed XML data management

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Combining structure and content similarities for XML document clustering

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering XML documents is extensively used to organize large collections of XML documents in groups that are coherent according to structure and/or content features. The growing availability of distributed XML sources and the variety of high-demand environments raise the need for clustering approaches that can exploit distributed processing techniques. Nevertheless, existing methods for clustering XML documents are designed to work in a centralized way. In this paper, we address the problem of clustering XML documents in a collaborative distributed framework. XML documents are first decomposed based on semantically cohesive subtrees, then modeled as transactional data that embed both XML structure and content information. The proposed clustering framework employs a centroid-based partitional clustering method that has been developed for a peer-to-peer network. Each peer in the network is allowed to compute a local clustering solution over its own data, and to exchange its cluster representatives with other peers. The exchanged representatives are used to compute representatives for the global clustering solution in a collaborative way. We evaluated effectiveness and efficiency of our approach on real XML document collections varying the number of peers. Results have shown that major advantages with respect to the corresponding centralized clustering setting are obtained in terms of runtime behavior, although clustering solutions can still be accurate with a moderately low number of nodes in the network. Moreover, the collaborativeness characteristic of our approach has revealed to be a convenient feature in distributed clustering as found in a comparative evaluation with a distributed non-collaborative clustering method.