A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Information Theoretic Clustering of Sparse Co-Occurrence Data
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Generative model-based clustering of directional data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Kernel k-means: spectral clustering and normalized cuts
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
A fast kernel-based multilevel algorithm for graph clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Practical solutions to the problem of diagonal dominance in kernel document clustering
ICML '06 Proceedings of the 23rd international conference on Machine learning
QCS: A system for querying, clustering and summarizing documents
Information Processing and Management: an International Journal
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Novel Algorithm for Coexpression Detection in Time-Varying Microarray Data Sets
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
KPCA for semantic object extraction in images
Pattern Recognition
Model-based document clustering with a collapsed gibbs sampler
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A spectral approach to clustering numerical vectors as nodes in a network
Pattern Recognition
Document clustering using synthetic cluster prototypes
Data & Knowledge Engineering
Lateen EM: unsupervised training with multiple objectives, applied to dependency grammar induction
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Biclustering and feature selection techniques in bioinformatics
ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Scalable clustering of signed networks using balance normalized cut
Proceedings of the 21st ACM international conference on Information and knowledge management
Unsupervised data processing for classifier-based speech translator
Computer Speech and Language
A novel self-adaptive clustering algorithm for dynamic data
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Enhanced cross-domain document clustering with a semantically enhanced text stemmer SETS
International Journal of Knowledge-based and Intelligent Engineering Systems - Selected papers of KES2012-Part 2 of 2
Hi-index | 0.00 |
The k-means algorithm with cosine similarity, alsoknown as the spherical k-means algorithm, is a popularmethod for clustering document collections. However,spherical k-means can often yield qualitatively poor results,especially when cluster sizes are small, say 25-30 documentsper cluster, where it tends to get stuck at a localmaximum far away from the optimal solution. In this paper,we present a local search procedure, which we call"first-variation" that refines a given clustering by incrementallymoving data points between clusters, thus achievinga higher objective function value. An enhancement offirst variation allows a chain of such moves in a Kernighan-Linfashion and leads to a better local maximum. Combiningthe enhanced first-variation with spherical k-meansyields a powerful "ping-pong" strategy that often qualitativelyimproves k-means clustering and is computationallyefficient. We present several experimental results to high-lightthe improvement achieved by our proposed algorithmin clustering high-dimensional and sparse text data.