Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
Document classification by machine: theory and practice
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Parametric models of linguistic count data
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Principled Hybrids of Generative and Discriminative Models
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
A technique for improving the performance of naive bayes text classification
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Generating exact- and ranked partially-matched answers to questions in advertisements
Proceedings of the VLDB Endowment
Web-based closed-domain data extraction on online advertisements
Information Systems
Hi-index | 0.01 |
This paper addresses the fundamental problem of document classification, and we focus attention on classification problems where the classes are mutually exclusive. In the course of the paper we advocate an approximate sampling distribution for word counts in documents, and demonstrate the model's capacity to outperform both the simple multinomial and more recently proposed extensions on the classification task. We also compare the classifiers to a linear SVM, and show that provided certain conditions are met, the new model allows performance which exceeds that of the SVM and attains amongst the very best published results on the Newsgroups classification task.