ICML '06 Proceedings of the 23rd international conference on Machine learning
Semi-supervised model-based document clustering: A comparative study
Machine Learning
A spectral clustering approach to optimally combining numericalvectors with a modular network
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Short communication: Variable space hidden Markov model for topic detection and analysis
Knowledge-Based Systems
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data
IEEE Transactions on Knowledge and Data Engineering
Combinational collaborative filtering for personalized community recommendation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
SAIL: summation-based incremental learning for information-theoretic clustering
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles
Integrated Computer-Aided Engineering
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
External validation measures for K-means clustering: A data distribution perspective
Expert Systems with Applications: An International Journal
Harmony K-means algorithm for document clustering
Data Mining and Knowledge Discovery
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Field independent probabilistic model for clustering multi-field documents
Information Processing and Management: an International Journal
Topic-Based Hard Clustering of Documents Using Generative Models
ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Semantic smoothing of document models for agglomerative clustering
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Knowledge transfer on hybrid graph
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Multi-grain hierarchical topic extraction algorithm for text mining
Expert Systems with Applications: An International Journal
A probabilistic model for clustering text documents with multiple fields
ECIR'07 Proceedings of the 29th European conference on IR research
Nonnegative Matrix Factorization on Orthogonal Subspace
Pattern Recognition Letters
Hierarchical clustering for topic analysis based on variable feature selection
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Text stream clustering algorithm based on adaptive feature selection
Expert Systems with Applications: An International Journal
Semantic multi-grain mixture topic model for text analysis
Expert Systems with Applications: An International Journal
Document clustering using synthetic cluster prototypes
Data & Knowledge Engineering
Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization
Information Sciences: an International Journal
Integrating Document Clustering and Multidocument Summarization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Journal of Biomedical Informatics
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
A statistical model for topically segmented documents
DS'11 Proceedings of the 14th international conference on Discovery science
Topics modeling based on selective Zipf distribution
Expert Systems with Applications: An International Journal
Wikipedia-based smoothing for enhancing text clustering
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Journal of Intelligent Information Systems
Live and learn from mistakes: A lightweight system for document classification
Information Processing and Management: an International Journal
Measuring the coverage and redundancy of information search services on e-commerce platforms
Electronic Commerce Research and Applications
Discrete-Time hopfield neural network based text clustering algorithm
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Fuzzy semi-supervised co-clustering for text documents
Fuzzy Sets and Systems
Towards information-theoretic K-means clustering for image indexing
Signal Processing
Proceedings of the Fourth Symposium on Information and Communication Technology
Document clustering using dirichlet process mixture model of von Mises-Fisher distributions
Proceedings of the Fourth Symposium on Information and Communication Technology
A continuous characterization of the maximum-edge biclique problem
Journal of Global Optimization
Hi-index | 0.01 |
This paper presents a detailed empirical study of 12 generative approaches to text clustering, obtained by applying four types of document-to-cluster assignment strategies (hard, stochastic, soft and deterministic annealing (DA) based assignments) to each of three base models, namely mixtures of multivariate Bernoulli, multinomial, and von Mises-Fisher (vMF) distributions. A large variety of text collections, both with and without feature selection, are used for the study, which yields several insights, including (a) showing situations wherein the vMF-centric approaches, which are based on directional statistics, fare better than multinomial model-based methods, and (b) quantifying the trade-off between increased performance of the soft and DA assignments and their increased computational demands. We also compare all the model-based algorithms with two state-of-the-art discriminative approaches to document clustering based, respectively, on graph partitioning (CLUTO) and a spectral coclustering method. Overall, DA and CLUTO perform the best but are also the most computationally expensive. The vMF models provide good performance at low cost while the spectral coclustering algorithm fares worse than vMF-based methods for a majority of the datasets.