Learning situation dependent success rates of actions in a RoboCup scenario

  • Authors:
  • Sebastian Buck;Martin Riedmiller

  • Affiliations:
  • Munich University of Technology, Computer Science Department, München, FRG;University of Karlsruhe, Institute for Logic, Complexity and Deductive Systems, Karlsruhe, FRG

  • Venue:
  • PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A quickly changing, not predictable environment complicates autonomous decision making in a system of mobile robots. To simplify action selection we suggest a suitable reduction of decision space by restricting the number of executable actions the agent can choose from. We use supervised neural learning to automaticly learn success rates of actions to facilitate decision making. To determine probabilities of success each agent relies on its sensory data. We show that using our approach it is possible to compute probabilities of success close to the real success rates of actions and further we give a few results of games of a RoboCup simulation team based on this approach. The RoboCup soccer server offers a couple of low level commands for soccer agents to choose from each 100 ms. Mainly they have the following options: turn (angle), dash (power), kick (power) (angle). If we treat the task of playing soccer as an optimization problem the aim is to control our agents in the given environment such that they score more goals than the opponent team does. We can estimate the number of possible policies by discretising the angle and the power value of the low level commands: Assuming 72 possible angles (5 degree steps) to turn to or to kick to and 10 power levels to dash with or to kick with we get 802 different commands to choose from for a player possessing the ball at one time step. This means we have up to 8023000 different policies over a period of five minutes for only one agent. This forces us to reduce the number of possible choices per time step. To do this we introduce a number of actions such as pass, shoot2goal or go2ball from which the agent can choose. We compute explicit situation dependent success rates for these actions using neural networks (one for each action). Prom all promising actions (estimated success rate exceeds threshold) the one ranked highest in a priority list (shoot2goal is ranked higher than pass...) is chosen to be executed. In order to evaluate our concept we compared estimated success rates with real success rates and played simulation games against different teams. In addition to our statistics our concept was quite successful in official games of our team Karlsruhe Brainstormers against some simulator league teams of 1999. For further information please contact Sebastian Buck.