Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Predicting UNIX Command Lines: Adjusting to User Patterns
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A family of additive online algorithms for category ranking
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs
The Journal of Machine Learning Research
Support vector machines classification with a very large-scale taxonomy
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization
Proceedings of the 17th ACM conference on Information and knowledge management
Efficient online learning and prediction of users' desktop actions
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
The Journal of Machine Learning Research
The ECIR 2010 large scale hierarchical classification workshop
ACM SIGIR Forum
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Hi-index | 0.00 |
In many multiclass learning scenarios, the number of classes is relatively large (thousands,...), or the space and time efficiency of the learning system can be crucial. We investigate two online update techniques especially suited to such problems. These updates share a sparsity preservation capacity: they allow for constraining the number of prediction connections that each feature can make. We show that one method, exponential moving average, is solving a "discrete" regression problem for each feature, changing the weights in the direction of minimizing the quadratic loss. We design the other method to improve a hinge loss subject to constraints, for better accuracy. We empirically explore the methods, and compare performance to previous indexing techniques, developed with the same goals, as well as other online algorithms based on prototype learning. We observe that while the classification accuracies are very promising, improving over previous indexing techniques, the scalability benefits are preserved.