A practical approach to feature selection
ML92 Proceedings of the ninth international workshop on Machine learning
Hierarchical mixtures of experts and the EM algorithm
Neural Computation
Communications of the ACM
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating cost-sensitive Unsolicited Bulk Email categorization
Proceedings of the 2002 ACM symposium on Applied computing
Feature Subset Selection in Text-Learning
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Challenges of the Email Domain for Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Email classification with co-training
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Identifying Junk Electronic Mail in Microsoft Outlook with a Support Vector Machine
SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
Context-Dependent Hybrid HME/HMM Speech Recognition using Polyphone Clustering Decision Trees
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Using latent semantic indexing to filter spam
Proceedings of the 2003 ACM symposium on Applied computing
Spam filters: bayes vs. chi-squared; letters vs. words
ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Fighting the spam wars: A remailer approach with restrictive aliasing
ACM Transactions on Internet Technology (TOIT)
"In vivo" spam filtering: a challenge problem for KDD
ACM SIGKDD Explorations Newsletter
Margin based feature selection - theory and algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Many linear statistical models have been lately proposed in text classification related literature and evaluated against the Unsolicited Bulk Email filtering problem. Despite their popularity - due both to their simplicity and relative ease of interpretation - the non-linearity assumption of data samples is inappropriate in practice, due to its inability to capture the apparent non-linear relationships, which characterize these samples. In this paper, we propose the SF-HME, a Hierarchical Mixture-of-Experts system, attempting to overcome limitations common to other machine-learning based approaches when applied to spam mail classification. By reducing the dimensionality of data through the usage of the effective Simba algorithm for feature selection, we evaluated our SF-HME system with a publicly available corpus of emails, with very high similarity between legitimate and bulk email - and thus low discriminative potential - where the traditional rule based filtering approaches achieve considerable lower degrees of precision. As a result, we confirm the domination of our SF-HME method against other machine learning approaches, which appeared to present lesser degree of recall.