A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
The Journal of Machine Learning Research
ICML '06 Proceedings of the 23rd international conference on Machine learning
Introduction to Information Retrieval
Introduction to Information Retrieval
Document clustering via dirichlet process mixture model with feature selection
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. This paper proposes a Dirichlet process mixture (DPM) model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. We have developed a mean-field variational inference algorithm for the DPM model of vMFs that is applied to clustering text documents. Using this model, the number of clusters is determined automatically after the clustering process rather than pre-estimated. We conducted extensive experiments to evaluate the proposed approach on a large number of high dimensional text datasets. Empirical experimental results over NMI (Normalized Mutual Information) and Purity evaluation measures demonstrate that our approach outperforms the four state-of-the-art clustering algorithms.