Natural language parsing as statistical pattern recognition
Natural language parsing as statistical pattern recognition
The nature of statistical learning theory
The nature of statistical learning theory
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
IEEE Transactions on Knowledge and Data Engineering
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Target word detection and semantic role chunking using support vector machines
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Use of support vector learning for chunk identification
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Data-driven strategies for an automated dialogue system
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Hi-index | 0.00 |
Rapid deployment of statistical spoken dialogue systems poses portability challenges for building new applications. We discuss the challenges that arise and focus on two main problems: (i) fast semantic annotation for statistical speech understanding and (ii) reliable and efficient statistical language modeling using limited in-domain resources. We address the first problem by presenting a new bootstrapping framework that uses a majority-voting based combination of three methods for the semantic annotation of a ''mini-corpus'' that is usually manually annotated. The three methods are a statistical decision tree based parser, a similarity measure and a support vector machine classifier. The bootstrapping framework results in an overall cost reduction of about a factor of two in the annotation effort compared to the baseline method. We address the second problem by devising a method to efficiently build reliable statistical language models for new spoken dialog systems, given limited in-domain data. This method exploits external text resources that are collected for other speech recognition tasks as well as dynamic text resources acquired from the World Wide Web. The proposed method is applied to a spoken dialog system in a financial transaction domain and a natural language call-routing task in a package shipment domain. The experiments demonstrate that language models built using external resources, when used jointly with the limited in-domain language model, result in relative word error rate reductions of 9-18%. Alternatively, the proposed method can be used to produce a 3-to-10 fold reduction for the in-domain data requirement to achieve a given performance level.