A maximum entropy approach to natural language processing
Computational Linguistics
The Journal of Machine Learning Research
Modeling and predicting personal information dissemination behavior
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised modeling of Twitter conversations
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A latent dirichlet allocation method for selectional preferences
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Latent variable models of selectional preference
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning to tag from open vocabulary labels
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
An architecture for parallel topic models
Proceedings of the VLDB Endowment
Streaming cross document entity coreference resolution
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unified analysis of streaming news
Proceedings of the 20th international conference on World wide web
Empirical study of topic modeling in Twitter
Proceedings of the First Workshop on Social Media Analytics
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
MPI/OpenMP hybrid parallel inference for Latent Dirichlet Allocation
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Mining tags using social endorsement networks
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Larger residuals, less work: active document scheduling for latent dirichlet allocation
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Who is Doing What and When: Social Map-Based Recommendation for Content-Centric Social Web Sites
ACM Transactions on Intelligent Systems and Technology (TIST)
Personalized topic-based tag recommendation
Neurocomputing
Probabilistic models of similarity in syntactic context
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Named entity recognition in tweets: an experimental study
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Computational historiography: Data mining in a century of classics journals
Journal on Computing and Cultural Heritage (JOCCH)
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
Improving performance of topic models by variable grouping
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Practical collapsed variational bayes inference for hierarchical dirichlet process
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast mining and forecasting of complex time-stamped events
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Open domain event extraction from twitter
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient tree-based topic modeling
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
NIFTY: a system for large scale information flow tracking and clustering
Proceedings of the 22nd international conference on World Wide Web
Proceedings of the 22nd international conference on World Wide Web
Proceedings of the 22nd international conference on World Wide Web
A study on document retrieval system based on visualization to manage OCR documents
HCI'13 Proceedings of the 15th international conference on Human-Computer Interaction: interaction modalities and techniques - Volume Part IV
Context-dependent conceptualization
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Fast topic discovery from web search streams
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Topic models provide a powerful tool for analyzing large text collections by representing high dimensional data in a low dimensional subspace. Fitting a topic model given a set of training documents requires approximate inference techniques that are computationally expensive. With today's large-scale, constantly expanding document collections, it is useful to be able to infer topic distributions for new documents without retraining the model. In this paper, we empirically evaluate the performance of several methods for topic inference in previously unseen documents, including methods based on Gibbs sampling, variational inference, and a new method inspired by text classification. The classification-based inference method produces results similar to iterative inference methods, but requires only a single matrix multiplication. In addition to these inference methods, we present SparseLDA, an algorithm and data structure for evaluating Gibbs sampling distributions. Empirical results indicate that SparseLDA can be approximately 20 times faster than traditional LDA and provide twice the speedup of previously published fast sampling methods, while also using substantially less memory.