Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Concept decompositions for large sparse text data using clustering
Machine Learning
Simultaneous Feature Selection and Clustering Using Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Semi-supervised model-based document clustering: A comparative study
Machine Learning
Representing document as dependency graph for document clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Semantic Labelling for Document Feature Patterns Using Ontological Subjects
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Mapping semantic knowledge for unsupervised text categorisation
ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Document clustering using dirichlet process mixture model of von Mises-Fisher distributions
Proceedings of the Fourth Symposium on Information and Communication Technology
Hi-index | 0.00 |
One essential issue of document clustering is to estimate the appropriate number of clusters for a document collection to which documents should be partitioned. In this paper, we propose a novel approach, namely DPMFS, to address this issue. The proposed approach is designed 1) to group documents into a set of clusters while the number of document clusters is determined by the Dirichlet process mixture model automatically; 2) to identify the discriminative words and separate them from irrelevant noise words via stochastic search variable selection technique. We explore the performance of our proposed approach on both a synthetic dataset and several realistic document datasets. The comparison between our proposed approach and stage-of-the-art document clustering approaches indicates that our approach is robust and effective for document clustering.