Advantages of Unbiased Support Vector Classifiers for Data Mining Applications

Authors:
A. Navia-Vázquez;F. Pérez-Cruz;A. Artés-Rodríguez;A. R. Figueiras-Vidal
Affiliations:
DTSC, Univ. Carlos III de Madrid, Avda Universidad 30, 28911-Leganés, Madrid, Spain;DTSC, Univ. Carlos III de Madrid, Avda Universidad 30, 28911-Leganés, Madrid, Spain;DTSC, Univ. Carlos III de Madrid, Avda Universidad 30, 28911-Leganés, Madrid, Spain;DTSC, Univ. Carlos III de Madrid, Avda Universidad 30, 28911-Leganés, Madrid, Spain
Venue:
Journal of VLSI Signal Processing Systems
Year:
2004

Citing 16
Cited 1

Competitive learning algorithms for vector quantization

Neural Networks
Energy functions for minimizing misclassification error with minimum-complexity networks

Neural Networks
The nature of statistical learning theory

The nature of statistical learning theory
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Evolution and generalization of a single neurone: I. single-layer perceptron as seven statistical classifiers

Neural Networks
Evolution and generalization of a single neurone: II. complexity of statistical classifiers and sample size considerations

Neural Networks
Making large-scale support vector machine learning practical

Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Saturated Perceptrons for Maximum Margin and Minimum Misclassification Error

Neural Processing Letters
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces

Mustererkennung 1998, 20. DAGM-Symposium
Support vector classifier with hyperbolic tangent penalty function

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 06
Weighted least squares training of support vector classifiers leading to compact and adaptive schemes

IEEE Transactions on Neural Networks
Empirical risk minimization for support vector classifiers

IEEE Transactions on Neural Networks

Guest Editorial for Special Issue on Machine Learning for Signal Processing

Journal of VLSI Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many learning algorithms have been used for data mining applications, including Support Vector Classifiers (SVC), which have shown improved capabilities with respect to other approaches, since they provide a natural mechanism for implementing Structural Risk Minimization (SRM), obtaining machines with good generalization properties. SVC leads to the optimal hyperplane (maximal margin) criterion for separable datasets but, in the nonseparable case, the SVC minimizes the L1 norm of the training errors plus a regularizing term, to control the machine complexity. The L1 norm is chosen because it allows to solve the minimization with a Quadratic Programming (QP) scheme, as in the separable case. But the L1 norm is not truly an “error counting” term as the Empirical Risk Minimization (ERM) inductive principle indicates, leading therefore to a biased solution. This effect is specially severe in low complexity machines, such as linear classifiers or machines with few nodes (neurons, kernels, basis functions). Since one of the main goals in data mining is that of explanation, these reduced architectures are of great interest because they represent the origins of other techniques such as input selection or rule extraction. Training SVMs as accurately as possible in these situations (i.e., without this bias) is, therefore, an interesting goal.We propose here an unbiased implementation of SVC by introducing a more appropriate “error counting” term. This way, the number of classification errors is truly minimized, while the maximal margin solution is obtained in the separable case. QP can no longer be used for solving the new minimization problem, and we apply instead an iterated Weighted Least Squares (WLS) procedure. This modification in the cost function of the Support Vector Machine to solve ERM was not possible up to date given the Quadratic or Linear Programming techniques commonly used, but it is now possible using the iterated WLS formulation. Computer experiments show that the proposed method is superior to the classical approach in the sense that it truly solves the ERM problem.