Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Very fast EM-based mixture model clustering using multiresolution kd-trees
Proceedings of the 1998 conference on Advances in neural information processing systems II
Multidimensional binary search trees used for associative searching
Communications of the ACM
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The Journal of Machine Learning Research
A Scalable Topic-Based Open Source Search Engine
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Subject metadata enrichment using statistical topic models
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Organizing the OCA: learning faceted subjects from a library of digital books
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Bayesian k-Means as a "Maximization-expectation" algorithm
Neural Computation
Efficient methods for topic model inference on streaming document collections
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications
AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
A Generic Approach to Topic Models
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Global models of document structure using latent permutations
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
TwitterRank: finding topic-sensitive influential twitterers
Proceedings of the third ACM international conference on Web search and data mining
Content modeling using latent permutations
Journal of Artificial Intelligence Research
Software traceability with topic modeling
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
An efficient block model for clustering sparse graphs
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Measuring distributional similarity in context
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Topic models for meaning similarity in context
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
ACM Transactions on Intelligent Systems and Technology (TIST)
Mixed-membership naive Bayes models
Data Mining and Knowledge Discovery
Mining software repositories using topic models
Proceedings of the 33rd International Conference on Software Engineering
Annotating knowledge work lifelog: term extraction from sensor and operation history
Proceedings of the 20th ACM international conference on Information and knowledge management
Collective context-aware topic models for entity disambiguation
Proceedings of the 21st international conference on World Wide Web
Data Mining and Knowledge Discovery
Improving performance of topic models by variable grouping
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Fast mining and forecasting of complex time-stamped events
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
ComSoc: adaptive transfer of user behaviors over composite social network
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Review of statistical network analysis: models, algorithms, and software
Statistical Analysis and Data Mining
Multiple location profiling for users and relationships from social network and content
Proceedings of the VLDB Endowment
DRETOM: developer recommendation based on topic models for bug resolution
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
DualSum: a topic-model based approach for update summarization
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
The generalized dirichlet distribution in enhanced topic detection
Proceedings of the 21st ACM international conference on Information and knowledge management
Topic-sensitive probabilistic model for expert finding in question answer communities
Proceedings of the 21st ACM international conference on Information and knowledge management
Towards high-throughput gibbs sampling at scale: a study across storage managers
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 International Conference on Software Engineering
An exploratory analysis of mobile development issues using stack overflow
Proceedings of the 10th Working Conference on Mining Software Repositories
Scalable inference in max-margin topic models
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Community detection in content-sharing social networks
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
On handling textual errors in latent document modeling
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining user interest from search tasks and annotations
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Topic modelling of clickthrough data in image search
Multimedia Tools and Applications
User behavior learning and transfer in composite social networks
ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue
Studying software evolution using topic models
Science of Computer Programming
Fast topic discovery from web search streams
Proceedings of the 23rd international conference on World wide web
A time-based collective factorization for topic discovery and monitoring in news
Proceedings of the 23rd international conference on World wide web
Static test case prioritization using topic models
Empirical Software Engineering
Hi-index | 0.00 |
In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than the standard collapsed Gibbs sampler for LDA. No approximations are necessary, and we show that our fast sampling scheme produces exactly the same results as the standard (but slower) sampling scheme. Experiments on four real world data sets demonstrate speedups for a wide range of collection sizes. For the PubMed collection of over 8 million documents with a required computation time of 6 CPU months for LDA, our speedup of 5.7 can save 5 CPU months of computation.