Communications of the ACM
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Robust Classification for Imprecise Environments
Machine Learning
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Inducing Cost-Sensitive Trees via Instance Weighting
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Combining text and heuristics for cost-sensitive spam filtering
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Spam filters: bayes vs. chi-squared; letters vs. words
ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
"In vivo" spam filtering: a challenge problem for KDD
ACM SIGKDD Explorations Newsletter
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
SF-HME system: a hierarchical mixtures-of-experts classification system for spam filtering
Proceedings of the 2006 ACM symposium on Applied computing
Content based SMS spam filtering
Proceedings of the 2006 ACM symposium on Document engineering
Online supervised spam filter evaluation
ACM Transactions on Information Systems (TOIS)
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Instance weighting versus threshold adjusting for cost-sensitive classification
Knowledge and Information Systems
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Journal of Computer Security
A collaborative anti-spam system
Expert Systems with Applications: An International Journal
Journal of Computer Security - Best papers of the Sec Track at the 2006 ACM Symposium
Filtering spams using the minimum description length principle
Proceedings of the 2010 ACM Symposium on Applied Computing
Adaptive email spam filtering based on information theory
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
EuroGP'08 Proceedings of the 11th European conference on Genetic programming
Contributions to the study of SMS spam filtering: new collection and results
Proceedings of the 11th ACM symposium on Document engineering
The role of word sense disambiguation in automated text categorization
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Facing the spammers: A very effective approach to avoid junk e-mails
Expert Systems with Applications: An International Journal
Segmental parameterisation and statistical modelling of e-mail headers for spam detection
Information Sciences: an International Journal
Analysis and detection of web spam by means of web content
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
SAAD, a content based Web Spam Analyzer and Detector
Journal of Systems and Software
Hi-index | 0.00 |
In the recent years, Unsolicited Bulk Email has became an increasingly important problem, with a big economic impact. In this paper, we discuss cost-sensitive Text Categorization methods for UBE filtering. In concrete, we have evaluated a range of Machine Learning methods for the task (C4.5, Naive Bayes, PART, Support Vector Machines and Rocchio), made cost sensitive through several methods (Threshold Optimization, Instance Weighting, and Meta-Cost). We have used the Receiver Operating Characteristic Convex Hull method for the evaluation, that best suits classification problems in which target conditions are not known, as it is the case. Our results do not show a dominant algorithm nor method for making algorithms cost-sensitive, but are the best reported on the test collection used, and approach real-world hand-crafted classifiers accuracy.