Iterative Clustering of High Dimensional Text Data Augmented by Local Search

Authors:
Inderjit S. Dhillon;Yuqiang Guan;J. Kogan
Affiliations:
-;-;-
Venue:
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Year:
2002

Citing 0
Cited 22

A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Information Theoretic Clustering of Sparse Co-Occurrence Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Generative model-based clustering of directional data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
A fast kernel-based multilevel algorithm for graph clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Practical solutions to the problem of diagonal dominance in kernel document clustering

ICML '06 Proceedings of the 23rd international conference on Machine learning
Unsupervised minor prototype detection using an adaptive population partitioning algorithm

Pattern Recognition
QCS: A system for querying, clustering and summarizing documents

Information Processing and Management: an International Journal
Weighted Graph Cuts without Eigenvectors A Multilevel Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Novel Algorithm for Coexpression Detection in Time-Varying Microarray Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
KPCA for semantic object extraction in images

Pattern Recognition
Model-based document clustering with a collapsed gibbs sampler

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A spectral approach to clustering numerical vectors as nodes in a network

Pattern Recognition
Document clustering using synthetic cluster prototypes

Data & Knowledge Engineering
Lateen EM: unsupervised training with multiple objectives, applied to dependency grammar induction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Biclustering and feature selection techniques in bioinformatics

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Scalable clustering of signed networks using balance normalized cut

Proceedings of the 21st ACM international conference on Information and knowledge management
Unsupervised data processing for classifier-based speech translator

Computer Speech and Language
A novel self-adaptive clustering algorithm for dynamic data

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Enhanced cross-domain document clustering with a semantically enhanced text stemmer SETS

International Journal of Knowledge-based and Intelligent Engineering Systems - Selected papers of KES2012-Part 2 of 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

The k-means algorithm with cosine similarity, alsoknown as the spherical k-means algorithm, is a popularmethod for clustering document collections. However,spherical k-means can often yield qualitatively poor results,especially when cluster sizes are small, say 25-30 documentsper cluster, where it tends to get stuck at a localmaximum far away from the optimal solution. In this paper,we present a local search procedure, which we call"first-variation" that refines a given clustering by incrementallymoving data points between clusters, thus achievinga higher objective function value. An enhancement offirst variation allows a chain of such moves in a Kernighan-Linfashion and leads to a better local maximum. Combiningthe enhanced first-variation with spherical k-meansyields a powerful "ping-pong" strategy that often qualitativelyimproves k-means clustering and is computationallyefficient. We present several experimental results to high-lightthe improvement achieved by our proposed algorithmin clustering high-dimensional and sparse text data.