Confidence-weighted linear classification for text categorization

Authors:
Koby Crammer;Mark Dredze;Fernando Pereira
Affiliations:
Department of Electrical Engineering, The Technion, Haifa, Israel;Human Language Technology Center of Excellence, Johns Hopkins University, Baltimore, MD;Google, Inc., Mountain View, CA
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 36
Cited 1

Mistake bounds and logarithmic linear-threshold learning algorithms

Mistake bounds and logarithmic linear-threshold learning algorithms
The weighted majority algorithm

Information and Computation
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
How to use expert advice

Journal of the ACM (JACM)
A Database for Handwritten Text Recognition Research

IEEE Transactions on Pattern Analysis and Machine Intelligence
Maximum entropy discrimination

Maximum entropy discrimination
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Bayes point machines

The Journal of Machine Learning Research
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Second-Order Perceptron Algorithm

SIAM Journal on Computing
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Prediction, Learning, and Games

Prediction, Learning, and Games
Single-pass online learning: performance, voting schemes and online feature selection

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Flexible text segmentation with structured multilabel classification

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
Exponentiated gradient algorithms for log-linear structured prediction

Proceedings of the 24th international conference on Machine learning
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Identifying suspicious URLs: an application of large-scale online learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active learning with confidence

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Online large-margin training of syntactic and structural translation features

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Online methods for multi-domain learning and adaptation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multi-domain learning by confidence-weighted parameter combination

Machine Learning
Maximum Relative Margin and Data-Dependent Regularization

The Journal of Machine Learning Research
Online Bayes point machines

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Virtual vector machine for Bayesian online classification

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Distributed training strategies for the structured perceptron

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Adapting bias by gradient descent: an incremental version of delta-bar-delta

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
The huller: a simple and efficient online SVM

ECML'05 Proceedings of the 16th European conference on Machine Learning

Adaptive regularization of weight vectors

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural-language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of text-categorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing.