Algorithms for clustering data
Algorithms for clustering data
Approximate counting, uniform generation and rapidly mixing Markov chains
Information and Computation
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Finding $k$ Cuts within Twice the Optimal
SIAM Journal on Computing
Matrix computations (3rd ed.)
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
P-Complete Approximation Problems
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Principal Direction Divisive Partitioning
Data Mining and Knowledge Discovery
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximation schemes for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering with Qualitative Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Correlation Clustering: maximizing agreements via semidefinite programming
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
On clusterings: Good, bad and spectral
Journal of the ACM (JACM)
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Filtering spam with behavioral blacklisting
Proceedings of the 14th ACM conference on Computer and communications security
A discriminative framework for clustering via similarity functions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Spectral geometry for simultaneously clustering and ranking query search results
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Traffic Aggregation for Malware Detection
DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
A search space reduction methodology for data mining in large databases
Engineering Applications of Artificial Intelligence
Foundations and Trends in Databases
Fighting spam, phishing, and online scams at the network level
Proceedings of the 4th Asian Conference on Internet Engineering
Efficient top-k count queries over imprecise duplicates
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query result clustering for object-level search
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A spectral-based clustering algorithm for categorical data using data summaries
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Foundations and Trends® in Theoretical Computer Science
Sampling for information and structure preservation when mining large data bases
IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Minimum spanning tree based split-and-merge: A hierarchical clustering method
Information Sciences: an International Journal
An effective web document clustering algorithm based on bisection and merge
Artificial Intelligence Review
Cutting graphs using competing ant colonies and an edge clustering heuristic
EvoCOP'11 Proceedings of the 11th European conference on Evolutionary computation in combinatorial optimization
Measuring the impact of sense similarity on word sense induction
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Active clustering of biological sequences
The Journal of Machine Learning Research
Distributed spectral cluster management: a method for building dynamic publish/subscribe systems
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Evaluating unsupervised ensembles when applied to word sense induction
ACL '12 Proceedings of ACL 2012 Student Research Workshop
A peer-to-peer recommender system for self-emerging user communities based on gossip overlays
Journal of Computer and System Sciences
Hi-index | 0.00 |
We present a divide-and-merge methodology for clustering a set of objects that combines a top-down “divide” phase with a bottom-up “merge” phase. In contrast, previous algorithms use either top-down or bottom-up methods to construct a hierarchical clustering or produce a flat clustering using local search (e.g., k-means). For the divide phase, which produces a tree whose leaves are the elements of the set, we suggest an efficient spectral algorithm. When the data is in the form of a sparse document-term matrix, we show how to modify the algorithm so that it maintains sparsity and runs in linear space. The merge phase quickly finds the optimal partition that respects the tree for many natural objective functions, for example, k-means, min-diameter, min-sum, correlation clustering, etc. We present a thorough experimental evaluation of the methodology. We describe the implementation of a meta-search engine that uses this methodology to cluster results from web searches. We also give comparative empirical results on several real datasets.