Learning human-like knowledge by singular value decomposition: a progress report
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Independent component analysis: algorithms and applications
Neural Networks
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Fast and robust fixed-point algorithms for independent component analysis
IEEE Transactions on Neural Networks
Short comings of latent models in supervised settings
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Fast maximum margin matrix factorization for collaborative prediction
ICML '05 Proceedings of the 22nd international conference on Machine learning
CAAD: an automatic task support system
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Inference and evaluation of the multinomial mixture model for text clustering
Information Processing and Management: an International Journal
Practical private computation of vector addition-based functions
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Multilevel Image Coding with Hyperfeatures
International Journal of Computer Vision
Large-scale behavioral targeting
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting task-specific webpages for revisiting
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Practical lessons of data mining at Yahoo!
Proceedings of the 18th ACM conference on Information and knowledge management
Learning author-topic models from text corpora
ACM Transactions on Information Systems (TOIS)
Estimating Likelihoods for Topic Models
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Unsupervised Object Discovery: A Comparison
International Journal of Computer Vision
Behavioral Targeting: The Art of Scaling Up Simple Algorithms
ACM Transactions on Knowledge Discovery from Data (TKDD)
Topic modeling for personalized recommendation of volatile items
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Probabilistic factor models for web site recommendation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hierarchical task instance mining in interaction histories
Proceedings of the 29th ACM international conference on Design of communication
Hyperfeatures – multilevel local coding for visual recognition
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Big data analytics with small footprint: squaring the cloud
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called "GaP" for Gamma-Poisson, the distributions of the first and last random variable. GaP is a factor model, that is it gives an approximate factorization of the document-term matrix into a product of matrices Λ and X. These factors have strictly non-negative terms. GaP is a generative probabilistic model that assigns finite probabilities to documents in a corpus. It can be computed with an efficient and simple EM recurrence. For a suitable choice of parameters, the GaP factorization maximizes independence between the factors. So it can be used as an independent-component algorithm adapted to document data. The form of the GaP model is empirically as well as analytically motivated. It gives very accurate results as a probabilistic model (measured via perplexity) and as a retrieval model. The GaP model projects documents and terms into a low-dimensional space of "themes," and models texts as "passages" of terms on the same theme.