Toward guidelines for modeling learning agents in multiagent-based simulation: implications from Q-learning and sarsa agents

Authors:
Keiki Takadama;Hironori Fujita
Affiliations:
Tokyo Institute of Technology, Yokohama, Japan;Hitotsubashi University, Tokyo, Japan
Venue:
MABS'04 Proceedings of the 2004 international conference on Multi-Agent and Multi-Agent-Based Simulation
Year:
2004

Citing 4
Cited 2

Technical Note: \cal Q-Learning

Machine Learning
Bargaining theory with applications

Bargaining theory with applications
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

A Quantitative Method for Comparing Multi-Agent-Based Simulations in Feature Space

Multi-Agent-Based Simulation IX
Incentive-rewarding mechanism to stimulate activities in social networking services

International Journal of Network Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on how simulation results are sensitive to agent modeling in multiagent-based simulation (MABS) and investigates such sensitivity by comparing results where agents have different learning mechanisms, i.e., Q-learning and Sarsa, in the context of reinforcement learning. Through an analysis of simulation results in a bargaining game as one of the canonical examples in game theory, the following implications have been revealed: (1) even a slight difference has an essential influence on simulation results; (2) testing in static and dynamic environments highlights the different tendency of results; and (3) three stages in both Q-learning and Sarsa agents (i.e., (a) competition; (b) cooperation; and (c) learning impossible) are found in the dynamic environment, while no stage is found in the static environment. From these three implications, the following very rough guidelines for modeling agents can be derived: (1) cross-element validation for specifying key factors that affect simulation results; (2) a comparison of results between the static and dynamic environments for determining candidates to be investigated in detail; and (3) sensitive analysis for specifying applicable range for learning agents.