Single-pass online learning: performance, voting schemes and online feature selection

Authors:
Vitor R. Carvalho;William W. Cohen
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 8
Cited 9

Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Machine Learning

Machine Learning
The Relaxed Online Maximum Margin Algorithm

Machine Learning
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Learning to understand web site update requests

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Fast learning of document ranking functions with the committee perceptron

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

Proceedings of the 17th ACM conference on Information and knowledge management
Training parsers by inverse reinforcement learning

Machine Learning
Learning When Concepts Abound

The Journal of Machine Learning Research
Measuring the interestingness of articles in a limited user environment

Information Processing and Management: an International Journal
Confidence-weighted linear classification for text categorization

The Journal of Machine Learning Research
Adaptive regularization of weight vectors

Machine Learning
A survey on concept drift adaptation

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

To learn concepts over massive data streams, it is essential to design inference and learning methods that operate in real time with limited memory. Online learning methods such as perceptron or Winnow are naturally suited to stream processing; however, in practice multiple passes over the same training data are required to achieve accuracy comparable to state-of-the-art batch learners. In the current work we address the problem of training an on-line learner with a single passover the data. We evaluate several existing methods, and also propose a new modification of Margin Balanced Winnow, which has performance comparable to linear SVM. We also explore the effect of averaging, a.k.a. voting, on online learning. Finally, we describe how the new Modified Margin Balanced Winnow algorithm can be naturally adapted to perform feature selection. This scheme performs comparably to widely-used batch feature selection methods like information gain or Chi-square, with the advantage of being able to select features on-the-fly. Taken together, these techniques allow single-pass online learning to be competitive with batch techniques, and still maintain the advantages of on-line learning.