Rapid bootstrapping of statistical spoken dialogue systems

Authors:
Ruhi Sarikaya
Affiliations:
IBM T.J. Watson Research Center, 1101 Kitchawan Road/Route 134, Yorktown Heights, NY 10598, United States
Venue:
Speech Communication
Year:
2008

Citing 11
Cited 0

Natural language parsing as statistical pattern recognition

Natural language parsing as statistical pattern recognition
The nature of statistical learning theory

The nature of statistical learning theory
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries

IEEE Transactions on Knowledge and Data Engineering
The indexable web is more than 11.5 billion pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Target word detection and semantic role chunking using support vector machines

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Data-driven strategies for an automated dialogue system

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rapid deployment of statistical spoken dialogue systems poses portability challenges for building new applications. We discuss the challenges that arise and focus on two main problems: (i) fast semantic annotation for statistical speech understanding and (ii) reliable and efficient statistical language modeling using limited in-domain resources. We address the first problem by presenting a new bootstrapping framework that uses a majority-voting based combination of three methods for the semantic annotation of a ''mini-corpus'' that is usually manually annotated. The three methods are a statistical decision tree based parser, a similarity measure and a support vector machine classifier. The bootstrapping framework results in an overall cost reduction of about a factor of two in the annotation effort compared to the baseline method. We address the second problem by devising a method to efficiently build reliable statistical language models for new spoken dialog systems, given limited in-domain data. This method exploits external text resources that are collected for other speech recognition tasks as well as dynamic text resources acquired from the World Wide Web. The proposed method is applied to a spoken dialog system in a financial transaction domain and a natural language call-routing task in a package shipment domain. The experiments demonstrate that language models built using external resources, when used jointly with the limited in-domain language model, result in relative word error rate reductions of 9-18%. Alternatively, the proposed method can be used to produce a 3-to-10 fold reduction for the in-domain data requirement to achieve a given performance level.