Efficient Text Classification by Weighted Proximal SVM

Authors:
Dong Zhuang;Benyu Zhang;Qiang Yang;Jun Yan;Zheng Chen;Ying Chen
Affiliations:
Beijing Institute of Technology;Microsoft Research Asia;Hong Kong University of Science and Technology;Peking University;Microsoft Research Asia;Beijing Institute of Technology
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 8
Cited 5

Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Algorithm 583: LSQR: Sparse Linear Equations and Least Squares Problems

ACM Transactions on Mathematical Software (TOMS)
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proximal support vector machine classifiers

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval

Modern Information Retrieval
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research

A novel pattern recognition algorithm: Combining ART network with SVM to reconstruct a multi-class classifier

Computers & Mathematics with Applications
An improved support vector machine with soft decision-making boundary

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Urdu text classification

Proceedings of the 7th International Conference on Frontiers of Information Technology
Improved response modeling based on clustering, under-sampling, and ensemble

Expert Systems with Applications: An International Journal
The Effect of Stemming on Arabic Text Classification: An Empirical Study

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we present an algorithm that can classify large-scale text data with high classification quality and fast training speed. Our method is based on a novel extension of the proximal SVM mode [3]. Previous studies on proximal SVM have focused on classification for low dimensional data and did not consider the unbalanced data cases. Such methods will meet difficulties when classifying unbalanced and high dimensional data sets such as text documents. In this work, we extend the original proximal SVM by learning a weight for each training error. We show that the classification algorithm based on this model is capable of handling high dimensional and unbalanced data. In the experiments, we compare our method with the original proximal SVM (as a special case of our algorithm) and the standard SVM (such as SVM light) on the recently published RCV1-v2 dataset. The results show that our proposed method had comparable classification quality with the standard SVM. At the same time, both the time and memory consumption of our method are less than that of the standard SVM.