GaP: a factor model for discrete data

Authors:
John Canny
Affiliations:
University of California, Berkeley, CA
Venue:
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2004

Citing 7
Cited 22

Learning human-like knowledge by singular value decomposition: a progress report

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Independent component analysis: algorithms and applications

Neural Networks
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Fast and robust fixed-point algorithms for independent component analysis

IEEE Transactions on Neural Networks

Short comings of latent models in supervised settings

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Fast maximum margin matrix factorization for collaborative prediction

ICML '05 Proceedings of the 22nd international conference on Machine learning
CAAD: an automatic task support system

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Inference and evaluation of the multinomial mixture model for text clustering

Information Processing and Management: an International Journal
Practical private computation of vector addition-based functions

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Multiscale topic tomography

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Multilevel Image Coding with Hyperfeatures

International Journal of Computer Vision
Large-scale behavioral targeting

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting task-specific webpages for revisiting

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
The TaskTracer system

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Practical lessons of data mining at Yahoo!

Proceedings of the 18th ACM conference on Information and knowledge management
Learning author-topic models from text corpora

ACM Transactions on Information Systems (TOIS)
Estimating Likelihoods for Topic Models

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Unsupervised Object Discovery: A Comparison

International Journal of Computer Vision
Behavioral Targeting: The Art of Scaling Up Simple Algorithms

ACM Transactions on Knowledge Discovery from Data (TKDD)
Topic modeling for personalized recommendation of volatile items

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Probabilistic factor models for web site recommendation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hierarchical task instance mining in interaction histories

Proceedings of the 29th ACM international conference on Design of communication
Hyperfeatures – multilevel local coding for visual recognition

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Discrete component analysis

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation

Computer Speech and Language
Big data analytics with small footprint: squaring the cloud

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called "GaP" for Gamma-Poisson, the distributions of the first and last random variable. GaP is a factor model, that is it gives an approximate factorization of the document-term matrix into a product of matrices Λ and X. These factors have strictly non-negative terms. GaP is a generative probabilistic model that assigns finite probabilities to documents in a corpus. It can be computed with an efficient and simple EM recurrence. For a suitable choice of parameters, the GaP factorization maximizes independence between the factors. So it can be used as an independent-component algorithm adapted to document data. The form of the GaP model is empirically as well as analytically motivated. It gives very accurate results as a probabilistic model (measured via perplexity) and as a retrieval model. The GaP model projects documents and terms into a low-dimensional space of "themes," and models texts as "passages" of terms on the same theme.