Generalized component analysis for text with heterogeneous attributes

Authors:
Xuerui Wang;Chris Pal;Andrew McCallum
Affiliations:
University of Massachusetts;University of Massachusetts;University of Massachusetts
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 17
Cited 1

An introduction to variational methods for graphical models

Proceedings of the NATO Advanced Study Institute on Learning in graphical models
A unifying review of linear Gaussian models

Neural Computation
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Training products of experts by minimizing contrastive divergence

Neural Computation
Latent dirichlet allocation

The Journal of Machine Learning Research
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Applying discrete PCA in data analysis

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
The rate adapting poisson model for information retrieval and object recognition

ICML '06 Proceedings of the 23rd international conference on Machine learning
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised probabilistic principal component analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Restricted Boltzmann machines for collaborative filtering

Proceedings of the 24th international conference on Machine learning
Multi-conditional learning: generative/discriminative training for clustering and classification

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Topic and role discovery in social networks

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Expectation-propagation for the generative aspect model

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Nonnegative shared subspace learning and its application to social media retrieval

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. Our model generalizes techniques such as principal component analysis to heterogeneous data types. In contrast to other approaches, this framework allows modalities such as words, authors and timestamps to be captured in their natural, probabilistic encodings. A latent space representation for a previously unseen document can be obtained through a fast matrix multiplication using our method. We demonstrate the effectiveness of our framework on the task of author prediction from 13 years of the NIPS conference proceedings and for a recipient prediction task using a 10-month academic email archive of a researcher. Our approach should be more broadly applicable to many real-world applications where one wishes to efficiently make predictions for a large number of potential outputs using dimensionality reduction in a well defined probabilistic framework.