Algorithms for clustering data
Algorithms for clustering data
Feature selection in unsupervised learning via evolutionary search
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Information Retrieval
Document clustering with cluster refinement and model selection capabilities
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Feature Subset Selection and Order Identification for Unsupervised Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Spectral Analysis of Text Collection for Similarity-based Clustering
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Simultaneous Feature Selection and Clustering Using Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving classification performance using unlabeled data: Naive Bayesian case
Knowledge-Based Systems
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
On EM Estimation for Mixture of Multivariate t-Distributions
Neural Processing Letters
An incremental affinity propagation algorithm and its applications for text clustering
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Temporal expert finding through generalized time topic modeling
Knowledge-Based Systems
Data clustering with size constraints
Knowledge-Based Systems
A modification of the k-means method for quasi-unsupervised learning
Knowledge-Based Systems
Hi-index | 0.00 |
The task of selecting relevant features is a hard problem in the field of unsupervised text clustering due to the absence of class labels that would guide the search. This paper proposes a new mixture model method for unsupervised text clustering, named multinomial mixture model with feature selection (M3FS). In M3FS, we introduce the concept of component-dependent ''feature saliency'' to the mixture model. We say a feature is relevant to a certain mixture component if the feature saliency value is higher than a predefined threshold. Thus the feature selection process is treated as a parameter estimation problem. The Expectation-Maximization (EM) algorithm is then used for estimating the model. The experiment results on commonly used text datasets show that the M3FS method has good clustering performance and feature selection capability.