Is unlabeled data suitable for multiclass SVM-based web page classification?

Authors:
Arkaitz Zubiaga;Víctor Fresno;Raquel Martínez
Affiliations:
NLP & IR Group at UNED;NLP & IR Group at UNED;NLP & IR Group at UNED
Venue:
SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Year:
2009

Citing 12
Cited 1

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Support-Vector Networks

Machine Learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
The use of bigrams to enhance text categorization

Information Processing and Management: an International Journal
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A continuation method for semi-supervised SVMs

ICML '06 Proceedings of the 23rd international conference on Machine learning
Optimization Approaches for Semi-Supervised Multiclass Classification

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Optimization Techniques for Semi-Supervised Support Vector Machines

The Journal of Machine Learning Research
Unsupervised and semi-supervised multi-class support vector machines

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Tags vs shelves: from social tagging to social classification

Proceedings of the 22nd ACM conference on Hypertext and hypermedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Support Vector Machines present an interesting and effective approach to solve automated classification tasks. Although it only handles binary and supervised problems by nature, it has been transformed into multiclass and semi-supervised approaches in several works. A previous study on supervised and semi-supervised SVM classification over binary taxonomies showed how the latter clearly outperforms the former, proving the suitability of unlabeled data for the learning phase in this kind of tasks. However, the suitability of unlabeled data for multiclass tasks using SVM has never been tested before. In this work, we present a study on whether unlabeled data could improve results for multiclass web page classification tasks using Support Vector Machines. As a conclusion, we encourage to rely only on labeled data, both for improving (or at least equaling) performance and for reducing the computational cost.