Deterministic annealing EM algorithm
Neural Networks
The Journal of Machine Learning Research
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
Clustering with Bregman Divergences
The Journal of Machine Learning Research
Euclidean Embedding of Co-occurrence Data
The Journal of Machine Learning Research
A Unified Continuous Optimization Framework for Center-Based Clustering Methods
The Journal of Machine Learning Research
Organizing the OCA: learning faceted subjects from a library of digital books
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A new probabilistic retrieval model based on the dirichlet compound multinomial distribution
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
SAIL: summation-based incremental learning for information-theoretic clustering
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Statistical Model for Histogram Refinement
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Active relevance feedback for difficult queries
Proceedings of the 17th ACM conference on Information and knowledge management
Accounting for burstiness in topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A discrete mixture-based kernel for SVMs: Application to spam and image categorization
Information Processing and Management: an International Journal
An improved hierarchical Bayesian model of language for document classification
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
The ineffectiveness of within-document term frequency in text classification
Information Retrieval
Integrating spatial and color information in images using a statistical framework
Expert Systems with Applications: An International Journal
Maximum margin clustering made practical
IEEE Transactions on Neural Networks
Bayesian surprise and landmark detection
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Improving probabilistic information retrieval by modeling burstiness of words
Information Processing and Management: an International Journal
The BNB distribution for text modeling
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Information-based models for ad hoc IR
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document clustering via dirichlet process mixture model with feature selection
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bayesian framework for image segmentation with spatially varying mixtures
IEEE Transactions on Image Processing
Retrieval constraints and word frequency distributions a log-logistic model for IR
Information Retrieval
Integrating Document Clustering and Multidocument Summarization
ACM Transactions on Knowledge Discovery from Data (TKDD)
GIS enabled service site selection: Environmental analysis and beyond
Information Systems Frontiers
Hypergeometric language models for republished article finding
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
The Journal of Machine Learning Research
Practical collapsed variational bayes inference for hierarchical dirichlet process
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive pattern classification for symbolic dynamic systems
Signal Processing
The generalized dirichlet distribution in enhanced topic detection
Proceedings of the 21st ACM international conference on Information and knowledge management
A Theoretical Analysis of Pseudo-Relevance Feedback Models
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Document clustering using dirichlet process mixture model of von Mises-Fisher distributions
Proceedings of the Fourth Symposium on Information and Communication Technology
Hi-index | 0.00 |
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact that if a word occurs once in a document, it is likely to occur repeatedly. We derive a new family of distributions that are approximations to DCM distributions and constitute an exponential family, unlike DCM distributions. We use these so-called EDCM distributions to obtain insights into the properties of DCM distributions, and then derive an algorithm for EDCM maximum-likelihood training that is many times faster than the corresponding method for DCM distributions. Next, we investigate expectation-maximization with EDCM components and deterministic annealing as a new clustering algorithm for documents. Experiments show that the new algorithm is competitive with the best methods in the literature, and superior from the point of view of finding models with low perplexity.