A probabilistic clustering-projection model for discrete data

Authors:
Shipeng Yu;Kai Yu;Volker Tresp;Hans-Peter Kriegel
Affiliations:
Institute for Computer Science, University of Munich, Germany;Siemens Corporate Technology, Munich, Germany;Siemens Corporate Technology, Munich, Germany;Institute for Computer Science, University of Munich, Germany
Venue:
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2005

Citing 9
Cited 2

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Introduction to Variational Methods for Graphical Models

Machine Learning
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Adaptive dimension reduction for clustering high dimensional data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Statistical Models for Co-occurrence Data

Statistical Models for Co-occurrence Data
Latent dirichlet allocation

The Journal of Machine Learning Research
Document clustering via adaptive subspace iteration

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Latent grouping models for user preference prediction

Machine Learning
Two-Way Grouping by One-Way Topic Models

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII

Quantified Score

Hi-index	0.00

Visualization

Abstract

For discrete co-occurrence data like documents and words, calculating optimal projections and clustering are two different but related tasks. The goal of projection is to find a low-dimensional latent space for words, and clustering aims at grouping documents based on their feature representations. In general projection and clustering are studied independently, but they both represent the intrinsic structure of data and should reinforce each other. In this paper we introduce a probabilistic clustering-projection (PCP) model for discrete data, where they are both represented in a unified framework. Clustering is seen to be performed in the projected space, and projection explicitly considers clustering structure. Iterating the two operations turns out to be exactly the variational EM algorithm under Bayesian model inference, and thus is guaranteed to improve the data likelihood. The model is evaluated on two text data sets, both showing very encouraging results.