A stochastic approach to sensor fusion and perception control
IEA/AIE '90 Proceedings of the 3rd international conference on Industrial and engineering applications of artificial intelligence and expert systems - Volume 1
Computational & Mathematical Organization Theory
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms
Neural Computation
DEA: An Architecture for Goal Planning and Classification
Neural Computation
Input generalization in delayed reinforcement learning: an algorithm and performance comparisons
IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2
Anytime problem solving using dynamic programming
AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
Complexity analysis of real-time reinforcement learning
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Hi-index | 0.00 |
IN THIS REPORT WE SHOW HOW THE CLASS OF ADAPTIVE PREDICTION METHODS THAT SUTTON CALLED "TEMPORAL DIFFERENCE", OR TD, METHODS ARE RELATED TO THE THE- ORY OF SEQUENTIAL DECISION MAKING. TD METHODS HAVE BEEN USED AS "ADAPTIVE CRITICS" IN CONNECTIONIST LEARNING SYSTEMS,AND HAVE BEEN PROPOSED AS MODELS OF ANIMAL LEARNING IN CLASSICAL CONDITIONING EXPERIMENTS. HERE WE RELATE TD METHODS TO DECISION TASKS FORMULATED IN TERMS OF A STOCHASTIC DYNAMICAL SYSTEM WHOSE BEHAVIOR UNFOLDS OVER TIME UNDER THE INFLUENCE OF A DECISION MAKER''S ACTIONS. STRATEGIES ARE SOUGHT FOR SELECTING ACTIONS SO AS TO MAXI- MIZE A MEASURE OF LONG-TERM PAYOFF GAIN. MATHEMATICALLY, TASKS SUCH AS THIS CAN BE FORMULATED AS MARKOVIAN DECISION PROBLEMS, AND NUMEROUS METHODS HAVE BEEN PROPOSED FOR LEARNING HOW TO SOLVE SUCH PROBLEMS. WE SHOW HOW A TD METHOD CAN BE UNDERSTOOD AS A NOVEL SYNTHESIS OF CONCEPTS FROM THE THEORY OF STOCHASTIC DYNAMIC PROGRAMMING, WHICH COMPRISES THE STANDARD METHOD FOR SOLVING SUCH TASKS WHEN A MODEL OF THE DYNAMICAL SYSTEM IS AVAILABLE, AND THE THEORY OF PARAMETER ESTIMATION, WHICH PROVIDES THE APPROPRIATE CONTEXT FOR STUDYING LEARNING RULES IN THE FORM OF EQUATIONS FOR UPDATING ASSOCIA- TIVE STRENGTHS IN BEHAVIORAL MODELS, OR CONNECTION WEIGHTS IN CONNECTIONIST NETWORKS. BECAUSE THIS REPORT IS ORIENTED PRIMARILY TOWARD THE NON-ENGINEER INTERESTED IN ANIMAL LEARNING, IT PRESENTS TUTORIALS ON STOCHASTIC SEQUEN- TIAL DECISION TASKS, STOCHASTIC DYNAMIC PROGRAMMING, AND PARAMETER ESTIMATI