On the behavior of SVM and some older algorithms in binary text classification tasks

  • Authors:
  • Fabrice Colas;Pavel Brazdil

  • Affiliations:
  • LIACS, Leiden University, The Netherlands;LIACC-NIAAD, University of Porto, Portugal

  • Venue:
  • TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document classification has already been widely studied In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms Recently, following the rising interest towards the Support Vector Machine, various studies showed that the SVM outperforms other classification algorithms So should we just not bother about other classification algorithms and opt always for SVM? We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks An important issue is to compare optimized versions of these algorithms, which is what we have done Our results show all the classifiers achieved comparable performance on most problems One surprising result is that SVM was not a clear winner, despite quite good overall performance If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM As for naive Bayes, it also achieved good performance.