Non-parametric mixture models for clustering

Authors:
Pavan Kumar Mallapragada;Rong Jin;Anil Jain
Affiliations:
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI;Department of Computer Science and Engineering, Michigan State University, East Lansing, MI;Department of Computer Science and Engineering, Michigan State University, East Lansing, MI
Venue:
SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Year:
2010

Citing 10
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Latent dirichlet allocation

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
An objective evaluation criterion for clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Clustering Using a Similarity Measure Based on Shared Near Neighbors

IEEE Transactions on Computers
Data clustering: 50 years beyond K-means

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered in practice.We propose a non-parametric mixture model (NMM) for data clustering in order to detect clusters generated from arbitrary unknown distributions, using non-parametric kernel density estimates. The proposed model is non-parametric since the generative distribution of each data point depends only on the rest of the data points and the chosen kernel. A leave-one-out likelihood maximization is performed to estimate the parameters of the model. The NMM approach, when applied to cluster high dimensional text datasets significantly outperforms the state-of-the-art and classical approaches such as K-means, Gaussian Mixture Models, spectral clustering and linkage methods.