Policy sharing between multiple mobile robots using decision trees

  • Authors:
  • Yu-Jen Chen;Kao-Shing Hwang;Wei-Cheng Jiang

  • Affiliations:
  • Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan;Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan;Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Reinforcement learning is one of the more prominent machine learning technologies, because of its unsupervised learning structure and its ability to produce continual learning, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from other agents in the system, in order to increase the speed of learning. In the proposed learning algorithm, an agent stores its experience in terms of a state aggregation, by use of a decision tree, such that policy sharing between multiple agents is eventually accomplished by merging the different decision trees of peers. Unlike lookup tables, which have a homogeneous structure for state aggregation, decision trees carried with in agents have a heterogeneous structure. The method detailed in this study allows policy sharing between cooperative agents by means merging their trees into a hyper-structure, instead of forcefully merging entire trees. The proposed scheme initially allows the entire decision tree to be translated from one agent to others. Based on the evidence, only partial leaf nodes have useful experience for use in policy sharing. The proposed method induces a hyper decision tree by using a large amount of samples that are sampled from the shared nodes. The results from simulations in a multi-agent cooperative domain illustrate that the proposed algorithms perform better than the algorithm that does not allow sharing.