Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
The nature of statistical learning theory
The nature of statistical learning theory
Learning in the presence of concept drift and hidden contexts
Machine Learning
An algorithm for suffix stripping
Readings in information retrieval
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Information Retrieval
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
The State of the Art in Text Filtering
User Modeling and User-Adapted Interaction
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Diagnosis and Decision Support
Case-Based Reasoning Technology, From Foundations to Applications
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Analyzing the Performance of Spam Filtering Methods When Dimensionality of Input Vector Changes
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Assessing Classification Accuracy in the Revision Stage of a CBR Spam Filtering System
ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Relaxing feature selection in spam filtering by using case-based reasoning systems
EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Hi-index | 0.00 |
Junk e-mail detection and filtering can be considered a cost-sensitive classification problem. Nevertheless, preprocessing methods and noise reduction strategies used to enhance the computational efficiency in text classification cannot be so efficient in e-mail filtering. This fact is demonstrated here where a comparative study of the use of stopword removal, stemming and different tokenising schemes is presented. The final goal is to preprocess the training e-mail corpora of several content-based techniques for spam filtering (machine approaches and case-based systems). Soundness conclusions are extracted from the experiments carried out where different scenarios are taken into consideration.