Automatic induction of bellman-error features for probabilistic planning
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
A form of temporal difference learning is presented that learns the relative utility of states, instead of the absolute utility. This formulation backs up decisions instead of values, making it possible to learn a simpler function for defining a decision-making policy. A nonlinear relative value function can be learned without increasing the dimensionality of the inputs.