Time-based reward shaping in real-time strategy games

Authors:
Martin Midtgaard;Lars Vinther;Jeppe R. Christiansen;Allan M. Christensen;Yifeng Zeng
Affiliations:
Aalborg University, Denmark;Aalborg University, Denmark;Aalborg University, Denmark;Aalborg University, Denmark;Aalborg University, Denmark
Venue:
ADMI'10 Proceedings of the 6th international conference on Agents and data mining interaction
Year:
2010

Citing 7
Cited 1

TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Transfer learning in real-time strategy games using hybrid CBR/RL

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Concurrent hierarchical reinforcement learning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-Time Strategy (RTS) is a challenging domain for AI, since it involves not only a large state space, but also dynamic actions that agents execute concurrently. This problem cannot be optimally solved through general Q-learning techniques, so we propose a solution using a Semi Markov Decision Process (SMDP). We present a time-based reward shaping technique, TRS, to speed up the learning process in reinforcement learning. Especially, we show that our technique preserves the solution optimality for some SMDP problems. We evaluate the performance of our method in the Spring game Balanced Annihilation, and provide some benchmarks showing the performance of our approach.