Efficient market mechanisms and simulation-based learning for multi-agent systems

  • Authors:
  • Rahul Jain;Pravin P. Varaiya

  • Affiliations:
  • University of California, Berkeley;University of California, Berkeley

  • Venue:
  • Efficient market mechanisms and simulation-based learning for multi-agent systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This dissertation has two independent theses. In the first part, we study the design auction-based distributed mechanisms for resource allocation in multi-agent systems such as bandwidth allocation, network routing, electronic marketplaces, robot teams, and air-traffic control. The work is motivated by a resource allocation problem in communication networks where there are buyers and sellers of bandwidth, each of them being independent and selfish. Buyers want routes while sellers offer bandwidth on individual links (we call such markets combinatorial). We first investigate the existence of competitive equilibrium in combinatorial markets. We first show how network topology affects existence of competitive equilibrium. We then adopt Aumann's continuum exchange economy as a model of perfect competition and show the existence of competitive equilibrium in it when money is also a good. We assume that preferences are continuous and monotonic in money. The existence of competitive equilibrium in the continuum combinatorial market is then used to show the existence of various enforceable and non-enforceable approximate competitive equilibria in finite markets. We then propose a combinatorial market mechanism c-SeBiDA. We study the interaction between buyers and sellers when they act strategically and may not be truthful. We show that a Nash equilibrium exists in the c-SeBiDA auction game with complete information, and more surprisingly, the resulting allocation is efficient. In reality, the players may have incomplete information. So we consider the Bayesian-Nash equilibrium. When there is only one type of good, we show that the mechanism is asymptotically Bayesian incentive-compatible under the ex post individual rationality constraint and hence asymptotically efficient. In the second part, we consider the multi-agent pursuit-evasion game as the motivating problem and study simulation-based learning for partially observable Markov decision processes (MDP) and games. The value function of a Markov decision process assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the uniform convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the Vapnik-Chervonenkis or Pseudo-dimension of the policy class. These results are extended for partially observed processes, and for Markov games. Uniform convergence results are also obtained for the average reward case, the only such known results in the literature. (Abstract shortened by UMI.)