A divide-and-merge methodology for clustering

Authors:
David Cheng;Santosh Vempala;Ravi Kannan;Grant Wang
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Yale University, New Haven, CT;Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2005

Citing 20
Cited 18

Algorithms for clustering data

Algorithms for clustering data
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
P-Complete Approximation Problems

Journal of the ACM (JACM)
Data clustering: a review

ACM Computing Surveys (CSUR)
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval

Information Retrieval
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Correlation Clustering

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering with Qualitative Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science

Detectives: detecting coalition hit inflation attacks in advertising networks streams

Proceedings of the 16th international conference on World Wide Web
Spectral clustering by recursive partitioning

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Spectral clustering in telephone call graphs

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Summarizing spatial data streams using ClusterHulls

Journal of Experimental Algorithmics (JEA)
Spectral Clustering in Social Networks

Advances in Web Mining and Web Usage Analysis
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Predicting Click Rates by Consistent Bipartite Spectral Graph Model

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A search space reduction methodology for large databases: a case study

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Comprehensible and accurate cluster labels in text clustering

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Clustering web search results with maximum spanning trees

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
On the NP-Completeness of some graph cluster measures

SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
Block matching for ontologies

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Evaluating subtopic retrieval methods: Clustering versus diversification of search results

Information Processing and Management: an International Journal
Survey: Graph clustering

Computer Science Review
KACTL: knowware based automated construction of a treelike library from web documents

WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
An automated search space reduction methodology for large databases

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a divide-and-merge methodology for clustering a set of objects that combines a top-down "divide" phase with a bottom-up "merge" phase. In contrast, previous algorithms either use top-down or bottom-up methods to construct a hierarchical clustering or produce a flat clustering using local search (e.g., k-means). Our divide phase produces a tree whose leaves are the elements of the set. For this phase, we use an efficient spectral algorithm. The merge phase quickly finds an optimal tree-respecting partition for many natural objective functions, e.g., k-means, min-diameter, min-sum, correlation clustering, etc., We present a meta-search engine that uses this methodology to cluster results from web searches. We also give empirical results on text-based data where the algorithm performs better than or competitively with existing clustering algorithms.