Induced states in a decision tree constructed by Q-learning

  • Authors:
  • Kao-Shing Hwang;Yu-Jen Chen;Wei-Cheng Jiang;Tsung-Wen Yang

  • Affiliations:
  • Department of Electrical Engineering, National Sun Yat-sen University, Taiwan;Department of Electrical Engineering, National Chung Cheng University, Taiwan;Department of Electrical Engineering, National Chung Cheng University, Taiwan;Department of Electrical Engineering, National Chung Cheng University, Taiwan

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

This paper develops a tree-construction method based on the framework of reinforcement learning (RL). The induction of a decision tree is regarded as a problem of RL, where the optimal policy should be found to obtain the maximal accumulated information gain. The proposed approach consists of two stages. At the first stage, the emulation/demonstration stage, sensory-action data in a mechatronic system or samples of training patterns are generated by an operator or stimulator. The records of these emulation data are aggregated into components of the state space represented by a decision tree. State aggregation for decartelization of a state space consists of two phases: split estimation and tree growing. In the split estimation phase, an inducer estimates long-term evaluations of splits at visited nodes. In the second phase, the inducer grows the tree by the predicted long-term evaluations, which is approximated by a neural network model. At the second stage, the learned behavior or classifier is shaped by the RL scheme with a discretized state space constructed by the decision tree derived from the previous stage. Unlike the conventional greedy procedure for constructing and pruning a the tree, the proposed method casts the sequential process of tree induction to policy iterations, where policies for node split are evaluated and improved repeatedly until an optimal or near-optimal policy is obtained. The splitting criterion regarded as an action policy is based on long-term evaluations of payoff instead of using immediate evaluations on impurity. A comparison with CART (classification and regression tree) and C4.5 on several benchmark datasets is presented. Furthermore, to show its applications for learning control, the proposed method is applied further to construct a so-called tree-based reinforcement learning method, where the mechanism works with a discrete state space derived from the proposed method. The results show the feasibility and high performance of the proposed system as a state partition by comparison with the renowned Adaptive Heuristic Critic (AHC) model.