A comparative study of reinforcement learning techniques on dialogue management
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
The problem of slow convergence speed and low efficiency of experience exploitation in SARSA(λ) learning is analyzed. And then the least-squares approximation model of the state-action pair's value function is constructed according to current and previous experiences. A set of linear equations is derived, which is satisfied by the weight vector of function approximator on a set of basis. Thus the fast and practical least-squares SARSA(λ) algorithm and improved recursive algorithm are proposed. The experiment of inverted pendulum demonstrates that these algorithms can effectively improve convergence speed and the efficiency of experience exploitation.