Efficient learning of relational models for sequential decision making

  • Authors:
  • Michael L. Littman;Thomas J. Walsh

  • Affiliations:
  • Rutgers The State University of New Jersey - New Brunswick;Rutgers The State University of New Jersey - New Brunswick

  • Venue:
  • Efficient learning of relational models for sequential decision making
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The exploration-exploitation tradeoff is crucial to reinforcement-learning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, near-optimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in action-schema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcement-learning setting.These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link the class of efficiently learnable dynamics in the apprenticeship setting to a rich class of models derived from well-known learning frameworks.As an application, we present theoretical and empirical results on learning relational models of web-service descriptions using a dataflow model called a Task Graph to capture the important connections between inputs and outputs of services in a workflow, with experiments constructed using publicly available web services. This application shows that compact relational models can be efficiently learned from limited amounts of basic data.Finally, we present several extensions of the main results in the thesis, including expansions of the languages with Description Logics. We also explore the use of sample-based planners to speed up the computation time of our algorithms.