Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)

  • Authors:
  • Harold Soh;Yiannis Demiris

  • Affiliations:
  • Imperial College London, London, United Kingdom;Imperial College London, London, United Kingdom

  • Venue:
  • Proceedings of the 13th annual conference on Genetic and evolutionary computation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Plans and decisions in many real-world scenarios are made under uncertainty and to satisfy multiple, possibly conflicting, objectives. In this work, we contribute the multi-reward partially-observable Markov decision process (MR-POMDP) as a general modelling framework. To solve MR-POMDPs, we present two hybrid (memetic) multi-objective evolutionary algorithms that generate non-dominated sets of policies (in the form of stochastic finite state controllers). Performance comparisons between the methods on multi-objective problems in robotics (with 2, 3 and 5 objectives), web-advertising (with 3, 4 and 5 objectives) and infectious disease control (with 3 objectives), revealed that memetic variants outperformed their original counterparts. We anticipate that the MR-POMDP along with multi-objective evolutionary solvers will prove useful in a variety of theoretical and real-world applications.