Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
C4.5: programs for machine learning
C4.5: programs for machine learning
The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Combining classifiers in text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
IEEE Transactions on Pattern Analysis and Machine Intelligence
Maximizing Text-Mining Performance
IEEE Intelligent Systems
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEBL: Web Page Classification without Negative Examples
IEEE Transactions on Knowledge and Data Engineering
A Comparison of Several Ensemble Methods for Text Categorization
SCC '04 Proceedings of the 2004 IEEE International Conference on Services Computing
Spam and the Social-Technical Gap
Computer
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
A comparison of event models for Naive Bayes anti-spam e-mail filtering
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Content based SMS spam filtering
Proceedings of the 2006 ACM symposium on Document engineering
Spam and the ongoing battle for the inbox
Communications of the ACM - Spam and the ongoing battle for the inbox
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
A hybrid approach for efficient ensembles
Decision Support Systems
Commercial Internet filters: Perils and opportunities
Decision Support Systems
Cross-lingual text categorization: Conquering language boundaries in globalized environments
Information Processing and Management: an International Journal
ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
A comparison of evaluation metrics for document filtering
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Automatic Moderation of Online Discussion Sites
International Journal of Electronic Commerce
Exploring the disseminating behaviors of eWOM marketing: persuasion in online video
Electronic Commerce Research
The bank loan approval decision from multiple perspectives
Expert Systems with Applications: An International Journal
Exploiting poly-lingual documents for improving text categorization effectiveness
Decision Support Systems
Hi-index | 0.00 |
The annoyance of spam emails increasingly plagues both individuals and organizations. In response, most of prior research investigates spam filtering as a classical text categorization task, in which training examples must include both spam (positive examples) and legitimate (negative examples) emails. However, in many spam filtering scenarios, obtaining legitimate emails for training purpose can be more difficult than collecting spam and unclassified emails. Hence, it is more appropriate to construct a classification model for spam filtering that uses positive training examples (i.e., spam) and unlabeled instances only and does not require legitimate emails as negative training examples. Several single-class learning techniques, such as PNB and PEBL, have been proposed in the literature. However, they incur inherent limitations with regard to spam filtering. In this study, we propose and develop an ensemble approach, referred to as E2, to address these limitations. Specifically, we follow the two-stage framework of PEBL but extend each stage with an ensemble strategy. The empirical evaluation results from two spam filtering corpora suggest that our proposed E2 technique generally outperforms benchmark techniques (i.e., PNB and PEBL) and exhibits more stable performance than its counterparts.