Optimal algorithms for approximate clustering
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An empirical comparison of four initialization methods for the K-Means algorithm
Pattern Recognition Letters
Acceleration of K-Means and Related Clustering Algorithms
ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
A personalized search engine based on web-snippet hierarchical clustering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
A topology-driven approach to the design of web meta-search clustering engines
SOFSEM'05 Proceedings of the 31st international conference on Theory and Practice of Computer Science
Extraction and classification of dense communities in the web
Proceedings of the 16th international conference on World Wide Web
VISTO: visual storyboard for web video browsing
Proceedings of the 6th ACM international conference on Image and video retrieval
Dynamic user-defined similarity searching in semi-structured text retrieval
Proceedings of the 3rd international conference on Scalable information systems
Web Information Organization Using Keyword Distillation Based Clustering
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
FPF-SB: a scalable algorithm for microarray gene expression data clustering
ICDHM'07 Proceedings of the 1st international conference on Digital human modeling
Using semantic techniques to access web data
Information Systems
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Result disambiguation in web people search
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Mining subtopics from text fragments for a web query
Information Retrieval
Hi-index | 0.00 |
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment the well-known furthest-point-first algorithm for k-center clustering in metric spaces with a filtering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical k-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable.