On the behavior of SVM and some older algorithms in binary text classification tasks

Authors:
Fabrice Colas;Pavel Brazdil
Affiliations:
LIACS, Leiden University, The Netherlands;LIACC-NIAAD, University of Porto, Portugal
Venue:
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Year:
2006

Citing 10
Cited 1

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine Learning

Machine Learning
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Pairwise Classification as an Ensemble Technique

ECML '02 Proceedings of the 13th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Does SVM really scale up to large bag of words feature spaces?

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document classification has already been widely studied In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms Recently, following the rising interest towards the Support Vector Machine, various studies showed that the SVM outperforms other classification algorithms So should we just not bother about other classification algorithms and opt always for SVM? We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks An important issue is to compare optimized versions of these algorithms, which is what we have done Our results show all the classifiers achieved comparable performance on most problems One surprising result is that SVM was not a clear winner, despite quite good overall performance If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM As for naive Bayes, it also achieved good performance.