Learning a decision maker's utility function from (possibly) inconsistent behavior

Authors:
Thomas D. Nielsen;Finn V. Jensen
Affiliations:
Department of Computer Science, Aalborg University, Fredrik Bajers Vej 7E, DK-9220 Aalborg Ø, Denmark;Department of Computer Science, Aalborg University, Fredrik Bajers Vej 7E, DK-9220 Aalborg Ø, Denmark
Venue:
Artificial Intelligence
Year:
2004

Citing 18
Cited 2

Evaluating influence diagrams

Operations Research
Sampling and integration of near log-concave functions

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Learning models of other agents using influence diagrams

UM '99 Proceedings of the seventh international conference on User modeling
Bayesian Networks and Decision Graphs

Bayesian Networks and Decision Graphs
A Guide to the Literature on Learning Probabilistic Networks from Data

IEEE Transactions on Knowledge and Data Engineering
Learning an Agent's Utility Function by Observing Behavior

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Utilities as Random Variables: Density Estimation and Structure Discovery

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Making Rational Decisions Using Adaptive Utility Elicitation

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Dynamic Programming

Dynamic Programming
Representing and Solving Decision Problems with Limited Information

Management Science
Lazy evaluation of symmetric Bayesian decision problems

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Welldefined decision scenarios

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Efficient value of information computation

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Utility elicitation as a classification problem

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
A measure of decision flexibility

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
From influence diagrams to junction trees

UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence
Sensitivity analysis in influence diagrams

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Modeling challenges with influence diagrams: Constructing probability and utility models

Decision Support Systems
Computing rank dependent utility in graphical models for sequential decision problems

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

When modeling a decision problem using the influence diagram framework, the quantitative part rests on two principal components: probabilities for representing the decision maker's uncertainty about the domain and utilities for representing preferences. Over the last decade, several methods have been developed for learning the probabilities from a database. However, methods for learning the utilities have only received limited attention in the computer science community. A promising approach for learning a decision maker's utility function is to take outset in the decision maker's observed behavioral patterns, and then find a utility function which (together with a domain model) can explain this behavior. That is, it is assumed that decision maker's preferences are reflected in the behavior. Standard learning algorithms also assume that the decision maker is behavioral consistent, i.e., given a model of the decision problem, there exists a utility function which can account for all the observed behavior. Unfortunately, this assumption is rarely valid in real-world decision problems, and in these situations existing learning methods may only identify a trivial utility function. In this paper we relax this consistency assumption, and propose two algorithms for learning a decision maker's utility function from possibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utility function. The main difference between the two algorithms is that the first facilitates a form of batch learning whereas the second focuses on adaptation and is particularly well-suited for scenarios where the DM's preferences change over time. Empirical results demonstrate the tractability of the algorithms, and they also show that the algorithms converge toward the true utility function for even very small sets of observations.