On-line algorithms for the dominating set problem
Information Processing Letters
On the hardness of approximating minimization problems
Journal of the ACM (JACM)
Approximation algorithms
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Using lower-bound similarity to enhance the performance of recommender systems
COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
Automatic discovery of high-level provenance using semantic similarity
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Hi-index | 0.00 |
We propose a new method, called SimClus, for clustering with lower bound on similarity. Instead of accepting k the number of clusters to find, the alternative similarity-based approach imposes a lower bound on the similarity between an object and its corresponding cluster representative (with one representative per cluster). SimClus achieves a O (logn ) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O (n ). Experiments on real and synthetic datasets show that our algorithm produces more than 40% fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.