Extending boosting for large scale spoken language understanding

Authors:
Gokhan Tur
Affiliations:
SRI International Speech Technology and Research Lab, Menlo Park, CA
Venue:
Machine Learning
Year:
2007

Citing 21
Cited 1

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Multitask Learning

Machine Learning - Special issue on inductive transfer
How may I help you?

Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Logistic Regression, AdaBoost and Bregman Distances

Machine Learning
Class Probability Estimation and Cost-Sensitive Classification Decisions

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Vector-based natural language call routing

Computational Linguistics
Effective utterance classification with unsupervised phonotactic models

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Language model adaptation with MAP estimation and the perceptron algorithm

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
The AT&T spoken language understanding system

IEEE Transactions on Audio, Speech, and Language Processing

Predicting concept types in user corrections in dialog

SRSL '09 Proceedings of the 2nd Workshop on Semantic Representation of Spoken Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose three methods for extending the Boosting family of classifiers motivated by the real-life problems we have encountered. First, we propose a semisupervised learning method for exploiting the unlabeled data in Boosting. We then present a novel classification model adaptation method. The goal of adaptation is optimizing an existing model for a new target application, which is similar to the previous one but may have different classes or class distributions. Finally, we present an efficient and effective cost-sensitive classification method that extends Boosting to allow for weighted classes. We evaluated these methods for call classification in the AT&T VoiceTone® spoken language understanding system. Our results indicate that it is possible to obtain the same classification performance by using 30% less labeled data when the unlabeled data is utilized through semisupervised learning. Using model adaptation we can achieve the same classification accuracy using less than half of the labeled data from the new application. Finally, we present significant improvements in the "important" (i.e., higher weighted) classes without a significant loss in overall performance using the proposed cost-sensitive classification method.