Multinomial mixture model with feature selection for text clustering

Authors:
Minqiang Li;Liang Zhang
Affiliations:
School of Management, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China;School of Management, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China
Venue:
Knowledge-Based Systems
Year:
2008

Citing 13
Cited 7

Algorithms for clustering data

Algorithms for clustering data
Feature selection in unsupervised learning via evolutionary search

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Information Retrieval

Information Retrieval
Document clustering with cluster refinement and model selection capabilities

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Feature Subset Selection and Order Identification for Unsupervised Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning

Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
Spectral Analysis of Text Collection for Similarity-based Clustering

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Simultaneous Feature Selection and Clustering Using Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving classification performance using unlabeled data: Naive Bayesian case

Knowledge-Based Systems
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

On EM Estimation for Mixture of Multivariate t-Distributions

Neural Processing Letters
An incremental affinity propagation algorithm and its applications for text clustering

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Temporal expert finding through generalized time topic modeling

Knowledge-Based Systems
Data clustering with size constraints

Knowledge-Based Systems
A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition

Knowledge-Based Systems
Functional grouping of natural language requirements for assistance in architectural software design

Knowledge-Based Systems
A modification of the k-means method for quasi-unsupervised learning

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of selecting relevant features is a hard problem in the field of unsupervised text clustering due to the absence of class labels that would guide the search. This paper proposes a new mixture model method for unsupervised text clustering, named multinomial mixture model with feature selection (M3FS). In M3FS, we introduce the concept of component-dependent ''feature saliency'' to the mixture model. We say a feature is relevant to a certain mixture component if the feature saliency value is higher than a predefined threshold. Thus the feature selection process is treated as a parameter estimation problem. The Expectation-Maximization (EM) algorithm is then used for estimating the model. The experiment results on commonly used text datasets show that the M3FS method has good clustering performance and feature selection capability.