Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Constraint Classification: A New Approach to Multiclass Classification
ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
An introduction to variable and feature selection
The Journal of Machine Learning Research
Cost-sensitive feature acquisition and classification
Pattern Recognition
Adaptive building of decision trees by reinforcement learning
AIC'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Informatics and Communications - Volume 7
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Learning when to stop thinking and do something!
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Feature discovery in reinforcement learning using genetic programming
EuroGP'08 Proceedings of the 11th European conference on Genetic programming
Text classification: a sequential reading approach
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Hi-index | 0.00 |
We propose a novel classification technique whose aim is to select an appropriate representation for each datapoint, in contrast to the usual approach of selecting a representation encompassing the whole dataset. This datum-wise representation is found by using a sparsity inducing empirical risk, which is a relaxation of the standard L0 regularized risk. The classification problem is modeled as a sequential decision process that sequentially chooses, for each datapoint, which features to use before classifying. Datum-Wise Classification extends naturally to multi-class tasks, and we describe a specific case where our inference has equivalent complexity to a traditional linear classifier, while still using a variable number of features. We compare our classifier to classical L1 regularized linear models (L1-SVM and LARS) on a set of common binary and multi-class datasets and show that for an equal average number of features used we can get improved performance using our method.