Online training on a budget of support vector machines using twin prototypes

Authors:
Zhuang Wang;Slobodan Vucetic
Affiliations:
Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA;Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
Venue:
Statistical Analysis and Data Mining
Year:
2010

Citing 0
Cited 2

Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes twin prototype support vector machine (TVM), a constant space and sublinear time support vector machine (SVM) algorithm for online learning. TVM achieves its favorable scaling by memorizing only a fixed-size data summary in the form of example prototypes and their associated information during training. In addition, TVM guarantees that the optimal SVM solution is maintained on all prototypes at any time. To maximize the accuracy of TVM, prototypes are constructed to approximate the data distribution near the decision boundary. Given a new training example, TVM is updated in three steps. First, the new example is added as a new prototype if it is near the decision boundary. If this happens, to maintain the budget, either the prototype farthest away from the decision boundary is removed or two near prototypes are selected and merged into a single one. Finally, TVM is updated by incremental and decremental techniques to account for the change. Several methods for prototype merging were proposed and experimentally evaluated. TVM algorithms with hinge loss and ramp loss were implemented and thoroughly tested on 12 large datasets. In most cases, the accuracy of low-budget TVMs was comparable with the resource-unconstrained SVMs. Additionally, the TVM accuracy was substantially larger than that of SVM trained on a random sample of the same size. Even larger difference in accuracy was observed when comparing with Forgetron, a popular budgeted kernel perceptron algorithm. As expected, the difference in accuracy between hinge loss and ramp loss TVM was negligible and hinge loss version is preferable due to its lower computational cost. The results illustrate that highly accurate online SVMs could be trained from arbitrary large data streams using devices with severely limited memory budgets. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 149-169, 2010