Algorithms for clustering data
Algorithms for clustering data
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
ACM Computing Surveys (CSUR)
Partitioning-based clustering for Web document categorization
Decision Support Systems - Special issue on WITS '97
Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
ACM SIGKDD Explorations Newsletter
A vector space model for automatic indexing
Communications of the ACM
Principal Direction Divisive Partitioning
Data Mining and Knowledge Discovery
Learning Algorithms for Keyphrase Extraction
Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Phrase-based Document Similarity Based on an Index Graph Model
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Incremental Document Clustering Using Cluster Similarity Histograms
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
Document Similarity Using a Phrase Indexing Graph Model
Knowledge and Information Systems
Information fusion in the context of multi-document summarization
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Fuzzy combinations of criteria: an application to web page representation for clustering
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Robust estimation of a global Gaussian mixture by decentralized aggregations of local models
Web Intelligence and Agent Systems
Hi-index | 0.00 |
For the past few decades the mainstream data clustering technologies have been fundamentally based on centralized operation; data sets were of small manageable sizes, and usually resided on one site that belonged to one organization. Today, data is of enormous sizes and is usually located on distributed sites; the primary example being the Web. This created a need for performing clustering in distributed environments. Distributed clustering solves two problems: infeasibility of collecting data at a central site, due to either technical and/or privacy limitations, and intractability of traditional clustering algorithms on huge data sets. In this paper we propose a distributed collaborative clustering approach for clustering Web documents in distributed environments. We adopt a peer-to-peer model, where the main objective is to allow nodes in a network to first form independent opinions of local document clusterings, then collaborate with peers to enhance the local clusterings. Information exchanged between peers is minimized through the use of cluster summaries in the form of keyphrases extracted from the clusters. This summarized view of peer data enables nodes to request merging of remote data selectively to enhance local clusters. Initial clustering, as well as merging peer data with local clusters, utilizes a clustering method, called similarity histogram-based clustering, based on keeping a tight similarity distribution within clusters. This approach achieves significant improvement in local clustering solutions without the cost of centralized clustering, while maintaining the initial local clustering structure. Results show that larger networks exhibit larger improvements, up to 15% improvement in clustering quality, albeit lower absolute clustering quality than smaller networks.