Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
A multistrategy approach for digital text categorization from imbalanced documents
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning to detect phishing emails
Proceedings of the 16th international conference on World Wide Web
Statistical recognition of noun phrases in unrestricted text
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Hi-index | 0.00 |
This paper presents a system for classifying e-mails into two categories, legitimate and fraudulent. This classifier system is based on the serial application of three filters: a Bayesian filter that classifies the textual content of e-mails, a rule- based filter that classifies the non grammatical content of e-mails and, finally, a filter based on an emulator of fictitious accesses which classifies the responses from websites referenced by links contained in e-mails. This system is based on an approach that is hybrid, because it uses different classification methods, and also integrated, because it takes into account all kind of data and information contained in e-mails. This approach aims to provide an effective and efficient classification. The system first applies fast and reliable classification methods, and only when the resulting classification decision is imprecise does the system apply more complex analysis and classification methods.