The nature of statistical learning theory
The nature of statistical learning theory
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
IEEE Transactions on Pattern Analysis and Machine Intelligence
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Using asymmetric distributions to improve text classifier probability estimates
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A generative Bayesian model for aggregating experts' probabilities
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Sequential conditional Generalized Iterative Scaling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Combining email models for false positive reduction
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Building reliable metaclassifiers for text learning
Building reliable metaclassifiers for text learning
Logarithmic opinion pools for conditional random fields
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Reducing weight undertraining in structured discriminative learning
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Raising the baseline for high-precision text classifiers
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Using gazetteers in discriminative information extraction
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Symbiotic Data Mining for Personalized Spam Filtering
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Symbiotic filtering for spam email detection
Expert Systems with Applications: An International Journal
Text mining and probabilistic language modeling for online review spam detection
ACM Transactions on Management Information Systems (TMIS)
Domain adaptation with ensemble of feature groups
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Hi-index | 0.00 |
Naive Bayes and logistic regression perform well in different regimes. While the former is a very simple generative model which is efficient to train and performs well empirically in many applications,the latter is a discriminative model which often achieves better accuracy and can be shown to outperform naive Bayes asymptotically. In this paper, we propose a novel hybrid model, partitioned logistic regression, which has several advantages over both naive Bayes and logistic regression. This model separates the original feature space into several disjoint feature groups. Individual models on these groups of features are learned using logistic regression and their predictions are combined using the naive Bayes principle to produce a robust final estimation. We show that our model is better both theoretically and empirically. In addition, when applying it in a practical application, email spam filtering, it improves the normalized AUC score at 10% false-positive rate by 28.8% and 23.6% compared to naive Bayes and logistic regression, when using the exact same training examples.