The nature of statistical learning theory
The nature of statistical learning theory
Communications of the ACM
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Hi-index | 0.01 |
This paper describes a two-phase method for filtering spam mails based on textual information and hyperlinks. Since the body of a spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information and less definite textual information. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using an original email header and body only.