On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Machine Learning - Special issue on learning with probabilistic representations
A statistical approach to the spam problem
Linux Journal
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
WBCsvm: Weighted Bayesian Classification based on Support Vector Machines
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A refinement approach to handling model misfit in text categorization
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Ensemble Modeling Through Multiplicative Adjustment of Class Probability
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Feature selection using linear classifier weights: interaction with classification models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Sequential conditional Generalized Iterative Scaling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Local sparsity control for naive Bayes with extreme misclassification costs
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Some Effective Techniques for Naive Bayes Text Classification
IEEE Transactions on Knowledge and Data Engineering
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Beyond TFIDF weighting for text categorization in the vector space model
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Techniques for improving the performance of naive bayes for text classification
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
An anti-noise text categorization method based on support vector machines
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Partitioned logistic regression for spam filtering
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic model methods for automatically identifying out-of-scope resources
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
A survey of emerging approaches to spam filtering
ACM Computing Surveys (CSUR)
Confidence-Based incremental classification for objects with limited attributes in vertical search
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Measuring word relatedness using heterogeneous vector space models
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Automatic classification of documents in cold-start scenarios
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Many important application areas of text classifiers demand high precision andit is common to compare prospective solutions to the performance of Naive Bayes. This baseline is usually easy to improve upon, but in this work we demonstrate that appropriate document representation can make out performing this classifier much more challenging. Most importantly, we provide a link between Naive Bayes and the logarithmic opinion pooling of the mixture-of-experts framework, which dictates a particular type of document length normalization. Motivated by document-specific feature selection we propose monotonic constraints on document term weighting, which is shown as an effective method of fine-tuning document representation. The discussion is supported by experiments using three large email corpora corresponding to the problem of spam detection, where high precision is of particular importance.