Does SVM really scale up to large bag of words feature spaces?

Authors:
Fabrice Colas;Pavel Paclík;Joost N. Kok;Pavel Brazdil
Affiliations:
LIACS, Leiden University, The Netherlands;ICT Group, Delft University of Technology, The Netherlands;LIACS, Leiden University, The Netherlands;LIACC, NIAAD, University of Porto, Portugal
Venue:
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Year:
2007

Citing 15
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Pairwise Classification as an Ensemble Technique

ECML '02 Proceedings of the 13th European Conference on Machine Learning
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Feature selection using linear classifier weights: interaction with classification models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Parameterized generation of labeled datasets for text categorization based on a hierarchical directory

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Support vector machines classification with a very large-scale taxonomy

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Kernel-Based Learning of Hierarchical Multilabel Classification Models

The Journal of Machine Learning Research
Exploiting extremely rare features in text categorization

ECML'06 Proceedings of the 17th European conference on Machine Learning
On the behavior of SVM and some older algorithms in binary text classification tasks

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Binarized Support Vector Machines

INFORMS Journal on Computing
Document-level sentiment classification: An empirical comparison between SVM and ANN

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are concerned with the problem of learning classification rules in text categorization where many authors presented Support Vector Machines (SVM) as leading classification method. Number of studies, however, repeatedly pointed out that in some situations SVM is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule. In this paper, we aim at developing better understanding of SVM behaviour in typical text categorization problems represented by sparse bag of words feature spaces. We study in details the performance and the number of support vectors when varying the training set size, the number of features and, unlike existing studies, also SVM free parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show that SVM solutions with small C are high performers. However, most training documents are then bounded support vectors sharing a same weight C. Thus, SVM reduce to a nearest mean classifier; this raises an interesting question on SVM merits in sparse bag of words feature spaces. Additionally, SVM suffer from performance deterioration for particular training set size/number of features combinations.