Boosting support vector machines for text classification through parameter-free threshold relaxation

Authors:
James G. Shanahan;Norbert Roma
Affiliations:
Clairvoyance Corporation;Clairvoyance Corporation
Venue:
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Year:
2003

Citing 9
Cited 10

The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Perceptron Algorithm with Uneven Margins

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Training Support Vector Machines: an Application to Face Detection

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)

Extraction and search of chemical formulae in text documents on the web

Proceedings of the 16th international conference on World Wide Web
Improvement of behavior detection by dynamic threshold

DNCOCO'07 Proceedings of the 9th WSEAS International Conference on Data Networks, Communications, Computers
Personalizing Threshold Values on Behavior Detection with Collaborative Filtering

UIC '08 Proceedings of the 5th international conference on Ubiquitous Intelligence and Computing
Dynamic threshold determination for stable behavior detection

WSEAS Transactions on Computers
On strategies for imbalanced text classification using SVM: A comparative study

Decision Support Systems
Behavior detection based on touched objects with dynamic threshold determination model

EuroSSC'07 Proceedings of the 2nd European conference on Smart sensing and context
Classifying Wikipedia articles into NE's using SVM's with threshold adjustment

NEWS '10 Proceedings of the 2010 Named Entities Workshop
K-farthest-neighbors-based concept boundary determination for support vector data description

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents

ACM Transactions on Information Systems (TOIS)
Learning to advertise: how many ads are enough?

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Support vector machine (SVM) learning algorithms focus on finding the hyperplane that maximizes the margin (the distance from the separating hyperplane to the nearest examples) since this criterion provides a good upper bound of the generalization error. When applied to text classification, these learning algorithms lead to SVMs with excellent precision but poor recall. Various relaxation approaches have been proposed to counter this problem including: asymmetric SVM learning algorithms (soft SVMs with asymmetric misclassification costs); uneven margin based learning; and thresholding. A review of these approaches is presented here. In addition, in this paper, we describe a new threshold relaxation algorithm. This approach builds on previous thresholding work based upon the beta-gamma algorithm. The proposed thresholding strategy is parameter free, relying on a process of retrofitting and cross validation to set algorithm parameters empirically, whereas our previous approach required the specification of two parameters (beta and gamma). The proposed approach is more efficient, does not require the specification of any parameters, and similarly to the parameter-based approach, boosts the performance of baseline SVMs by at least 20% for standard information retrieval measures.