Greedy Utile Suffix Memory for Reinforcement Learning with Perceptually-Aliased States

Authors:
Leonard Breslow
Affiliations:
-
Venue:
Greedy Utile Suffix Memory for Reinforcement Learning with Perceptually-Aliased States
Year:
1996

Citing 0
Cited 1

Efficient Exploration in Reinforcement Learning Based on Utile Suffix Memory

Informatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning agents are faced with the problem of perceptual aliasing when two or more states are perceptually identical but require different actions. Purely reactive policies do not produce optimal performance in such situations. To address this problem, various researchers have incorporated memory of preceding events into the definition of states to distinguish perceptually-aliased states. Approaches to differentiating aliased states engage in two concurrent interacting learning processes: learning of the correct {\it state representation} and reinforcement learning of the correct {\it policy} of actions to take from each state. Recently, McCallum (1995b) has offered Utile Suffix Memory (USM), an instance-based algorithm using a tree to store instances and to represent states for reinforcement learning. USM''s use of online instance-based state learning permits state definitions to be updated quickly based on the latest results of reinforcement learning. USM uses statistical tests to determine the relevance of history information considered for inclusion in state definitions. However, USM conducts many unnecessary statistical comparisons, making it vulnerable to false positive errors that produce state distinctions that are not useful and overbranching of the state tree. The algorithm cannot correct such errors since it does not prune the state tree. The problem of over-branching of the state tree is particularly serious when the algorithm is applied to tasks in which some aliased states cannot be differentiated on the basis of the event immediately prior to the current observation (i.e., at time t-1) but can only be differentiated on the basis of earlier events (e.g., t-2 or t-3). Greedy Utile Suffix Memory (GUSM) addresses these concerns through several modifications of USM: greedy state splitting, incremental state splitting, and the restriction of statistical comparions to potentially useful differences. GUSM is shown to learn action policies faster than USM and to generate smaller state spaces (i.e., more correctly-sized trees).