k-nearest neighbor Monte-Carlo control algorithm for POMDP-based dialogue systems

Authors:
F. Lefévre;M. Gašić;F. Jurčíček;S. Keizer;F. Mairesse;B. Thomson;K. Yu;S. Young
Affiliations:
Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK
Venue:
SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Year:
2009

Citing 6
Cited 0

Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Agenda-based user simulation for bootstrapping a POMDP dialogue system

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

Computer Speech and Language
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Scaling POMDPs for Spoken Dialog Management

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In real-world applications, modelling dialogue as a POMDP requires the use of a summary space for the dialogue state representation to ensure tractability. Sub-optimal estimation of the value function governing the selection of system responses can then be obtained using a grid-based approach on the belief space. In this work, the Monte-Carlo control technique is extended so as to reduce training over-fitting and to improve robustness to semantic noise in the user input. This technique uses a database of belief vector prototypes to choose the optimal system action. A locally weighted k-nearest neighbor scheme is introduced to smooth the decision process by interpolating the value function, resulting in higher user simulation performance.