Learning a decision maker's utility function from (possibly) inconsistent behavior

  • Authors:
  • Thomas D. Nielsen;Finn V. Jensen

  • Affiliations:
  • Department of Computer Science, Aalborg University, Fredrik Bajers Vej 7E, DK-9220 Aalborg Ø, Denmark;Department of Computer Science, Aalborg University, Fredrik Bajers Vej 7E, DK-9220 Aalborg Ø, Denmark

  • Venue:
  • Artificial Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

When modeling a decision problem using the influence diagram framework, the quantitative part rests on two principal components: probabilities for representing the decision maker's uncertainty about the domain and utilities for representing preferences. Over the last decade, several methods have been developed for learning the probabilities from a database. However, methods for learning the utilities have only received limited attention in the computer science community. A promising approach for learning a decision maker's utility function is to take outset in the decision maker's observed behavioral patterns, and then find a utility function which (together with a domain model) can explain this behavior. That is, it is assumed that decision maker's preferences are reflected in the behavior. Standard learning algorithms also assume that the decision maker is behavioral consistent, i.e., given a model of the decision problem, there exists a utility function which can account for all the observed behavior. Unfortunately, this assumption is rarely valid in real-world decision problems, and in these situations existing learning methods may only identify a trivial utility function. In this paper we relax this consistency assumption, and propose two algorithms for learning a decision maker's utility function from possibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utility function. The main difference between the two algorithms is that the first facilitates a form of batch learning whereas the second focuses on adaptation and is particularly well-suited for scenarios where the DM's preferences change over time. Empirical results demonstrate the tractability of the algorithms, and they also show that the algorithms converge toward the true utility function for even very small sets of observations.