Using paraphrases of deep semantic representions to support regression testing in spoken dialogue systems

Authors:
Beth Ann Hockey;Manny Rayner
Affiliations:
UC Santa Cruz and BAHRC LLC, NASA Ames Research Center, CA;University of Geneva, Geneva, Switzerland
Venue:
SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Year:
2009

Citing 8
Cited 1

Explanation-based generalisation = partial evaluation

Artificial Intelligence
An architecture for a generic dialogue shell

Natural Language Engineering
Information state and dialogue management in the TRINDI dialogue move engine toolkit

Natural Language Engineering
Spoken dialogue management using probabilistic reasoning

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler (Studies in Computational Linguistics (Stanford, Calif.).)

Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler (Studies in Computational Linguistics (Stanford, Calif.).)
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
A voice enabled procedure browser for the International Space Station

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Training and evaluation of the HIS POMDP dialogue system in noise

SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

Dimensions in program synthesis

Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rule-based spoken dialogue systems require a good regression testing framework if they are to be maintainable. We argue that there is a tension between two extreme positions when constructing the database of test examples. On the one hand, if the examples consist of input/output tuples representing many levels of internal processing, they are fine-grained enough to catch most processing errors, but unstable under most system modifications. If the examples are pairs of user input and final system output, they are much more stable, but too coarse-grained to catch many errors. In either case, there are fairly severe difficulties in judging examples correctly. We claim that a good compromise can be reached by implementing a paraphrasing mechanism which maps internal semantic representations into surface forms, and carrying out regression testing using paraphrases of semantic forms rather than the semantic forms themselves. We describe an implementation of the idea using the Open Source Regulus toolkit, where paraphrases are produced using Regulus grammars compiled in generation mode. Paraphrases can also be used at run-time to produce confirmations. By compiling the paraphrase grammar a second time, as a recogniser, it is possible in a simple and natural way to guarantee that confirmations are always within system coverage.