Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
An information-theoretic perspective of tf—idf measures
Information Processing and Management: an International Journal
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
Parametric models of linguistic count data
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
ICML '06 Proceedings of the 23rd international conference on Machine learning
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Organizing the OCA: learning faceted subjects from a library of digital books
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A new probabilistic retrieval model based on the dirichlet compound multinomial distribution
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Discrete data clustering using finite mixture models
Pattern Recognition
Kernel-Based Text Classification on Statistical Manifold
ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks
Active relevance feedback for difficult queries
Proceedings of the 17th ACM conference on Information and knowledge management
A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation
DS '08 Proceedings of the 11th International Conference on Discovery Science
Statistical Language Models for Information Retrieval A Critical Review
Foundations and Trends in Information Retrieval
A Nonparametric Bayesian Learning Model: Application to Text and Image Categorization
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Accounting for burstiness in topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Risky business: modeling and exploiting uncertainty in information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
A discrete mixture-based kernel for SVMs: Application to spam and image categorization
Information Processing and Management: an International Journal
An improved hierarchical Bayesian model of language for document classification
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
The ineffectiveness of within-document term frequency in text classification
Information Retrieval
Hierarchical text segmentation from multi-scale lexical cohesion
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bayesian surprise and landmark detection
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Improving probabilistic information retrieval by modeling burstiness of words
Information Processing and Management: an International Journal
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Comparing LDA with pLSI as a dimensionality reduction method in document clustering
LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
The BNB distribution for text modeling
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling
IEEE Transactions on Neural Networks
Information-based models for ad hoc IR
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document clustering via dirichlet process mixture model with feature selection
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
DivRank: the interplay of prestige and diversity in information networks
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Discrete visual features modeling via leave-one-out likelihood estimation and applications
Journal of Visual Communication and Image Representation
A Bayesian framework for image segmentation with spatially varying mixtures
IEEE Transactions on Image Processing
Retrieval constraints and word frequency distributions a log-logistic model for IR
Information Retrieval
Online probabilistic topological mapping
International Journal of Robotics Research
Hypergeometric language models for republished article finding
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
The Journal of Machine Learning Research
Dirichlet Gaussian mixture model: Application to image segmentation
Image and Vision Computing
Model-Based estimation of word saliency in text
DS'06 Proceedings of the 9th international conference on Discovery Science
Automatic sentiment classification of product reviews using maximal phrases based analysis
WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Deriving TF-IDF as a fisher kernel
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
An empirical study of SLDA for information retrieval
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
PLISS: labeling places using online changepoint detection
Autonomous Robots
Computational Statistics & Data Analysis
Automatic taxonomy construction from keywords
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive pattern classification for symbolic dynamic systems
Signal Processing
Correlation-based burstiness for logo retrieval
Proceedings of the 20th ACM international conference on Multimedia
Size matters: finding the most informative set of window lengths
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Mining evolutionary multi-branch trees from text streams
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A finite mixture model for detail-preserving image segmentation
Signal Processing
Hi-index | 0.00 |
Multinomial distributions are often used to model text documents. However, they do not capture well the phenomenon that words in a document tend to appear in bursts: if a word appears once, it is more likely to appear again. In this paper, we propose the Dirichlet compound multinomial model (DCM) as an alternative to the multinomial. The DCM model has one additional degree of freedom, which allows it to capture burstiness. We show experimentally that the DCM is substantially better than the multinomial at modeling text data, measured by perplexity. We also show using three standard document collections that the DCM leads to better classification than the multinomial model. DCM performance is comparable to that obtained with multiple heuristic changes to the multinomial model.